by refractor » Jun 19, 2002 @ 4:47pm
Ummm... I really do doubt it, on a PPC processor. On a "real" PC with a much larger cache, it might work out alright.
You could do the mask and shifts in very few cycles (circa 4 or 5 per pair of pixels).
As I said earlier, the StrongARM will fill a cache-line for every load. So, in this case, you'll be doing:
LDR value,[base,pixel]
The moment that hits the processor, a cache-line is requested. Until that cache line is fully loaded into the cache, any subsequent loads will *stall*.
So:
LDR value,[base,pixel1]
(Big stall)
LDR value,[base,pixel2]
In this case, the (Big stall) is going to be larger than the 4 or 5 cycles for the shifting method.
On an area of memory that large, you've got to assume that a large majority of the lookups will miss the cache.
Of course, if you can fill the (Big stall) with useful operations that don't use the memory or loading registers (data processing) then you're ok... but for something to process a screenfull of 555 to 565 the shifting method will walk all over a LUT (on a PPC).
Cheers,
Refractor