You are not logged in.

**BlueChip****Moderator**

a % b

...is a very expensive operation as it is essentially a complex division

For value of b where b is a power of 2 (Eg 2, 4, 8, 16, 32, etc) the same effect can be achieved by a very cheap AND (&) operation

a % b == a & (b - 1)

Therefore

offset = (((y >> 2)<<4)*tex->w) + ((x >> 2)<<6) + (((y%4 << 2) + x%4 ) << 1); // Fuckin equation found by NoNameNo ;)

as used by GRRLIB_GetPixelFromtexImg and GRRLIB_SetPixelTotexImg ...can be rewritten as:

offset = (((y >> 2)<<4)*tex->w) + ((x >> 2)<<6) + (((y&3 << 2) + x&3 ) << 1); // Fuckin equation found by NoNameNo ;)

Also

u8 r, g, b, a; a=*(truc+offset); r=*(truc+offset+1); g=*(truc+offset+32); b=*(truc+offset+33); return ((r<<24) | (g<<16) | (b<<8) | a);

...is (sort of) readable code, but again, if the compiler does not realise the cpu has enough registers to perform the operation and furthermore go on to optimise the code, it'll probably store all those values in RAM for a while.

This can be replaced by a slightly less friendly, but more efficient:

return (*(truc+offset) <<24) | (*(truc+offset+1) <<16) | (*(truc+offset+32) <<8) | *(truc+offset+33) ;

These optimisations will help many functions (including the new composition function) run faster

BC

I can be found on efnet, freenode, msn, gtalk, aim, ychat & icq ...PM me for details

Offline

**Crayon****Bad Mother Fucker**

Thanks, for the information. On a forum I found those assembly lines to represent each function. I don't know it's on which architectures, but it's interesting.

For input & 0xFF:

1. movzx eax, BYTE PTR [esp + 4] 2. ret

For input % 256:

1. mov eax, DWORD PTR [esp + 4] 2. and eax, -2147483393 3. jge .B2.4 4. sub eax, 1 5. or eax, -256 6. inc eax 7. .B2.4: 8. ret

Changes as been done in revision 103.

Useful links: libOGC Documentation, libOGC Repository, GRRLIB Documentation, GRRLIB Repository, #GRRLIB on IRC

Offline