
the purpose of always reading and writing was the try to use only 32-bit memory operations. of course one could optimize for the case where nothing has to be written at all.
maybe it makes even sense to optimize for cases where only 16 bits need to be updated.
i did test it with a setup where every pixel is drawn. so saving time by not processing anything for invisible pixels would not have increased performance in this test. i just wanted to see how this version compares to one that does no z-testing/writing: it is ~4x slower.
maybe i should just use the intel gpp and not bother about optimizing ppc code...
DANIEL