"...The cache in the StrongARM CPU is 16k for code and 8k for data. The cache in the XScale CPU is 32k for code and 32k for data. So if the code and data that is being executed fits into the cache, the system is able to operate at the full clock speed of the CPU. If it does not fit then it will not execute at full speed. The actual performance of the system will vary dramatically between these 2 limits of 25 million instructions and 206 million instructions per second depending on whether or not the program and data fit into the cache..."
I'm always skeptical of claims such as this, which suggests that if my main body of code (and ideally data) can fit into the cache, then my program will run X magnitudes faster. Two or three times faster, quite possibly, but 8 times faster? Hmm.
Does anybody have real-world experience of the use of such techniques - if so do they really result in the dramatic performance gains suggested by the above quote?
(As a sidenote: yes I could go ahead and experiment in order to find out myself, but I'm ... lazy ... like that...)
