>It may be down to the granularity of your timer. You might need I higher resolution timer.
I'm timing 100 and 1000 repetitions
>You could try copying longwords instead of shorts. This should 1/2 the amount of time spent in the loop. Good to see your source data is consecutive in memory, saving an add per line.
>As for the cache, that'll probably depend on the CPU. 32 bytes sounds a bit weedy!
What is it for the StrongARM in the iPAQs?