by Digby » Jun 21, 2002 @ 9:00pm
Everything is fixable. But that really depends on what your expectations are. Do you expect the e740 to be twice as fast as an iPaq because the Toshiba device is running at near twice the clock rate? Do you expect all of your existing games written for the StrongARM to run 2x the speed on an e740? That isn't going to happen.
Let me see if I can simplify my previous analysis for you. In the links to the benchmarks I posted earlier here is the performance of the e740 relative to the iPaq. (Values >1.0 mean that this feature is faster on the e740, <1.0 mean that the feature is slower on the e740)
Floating point ops: 1.57
Integer ops: 1.57
BitBlt: 4.47
StretchBlt: 0.33
Filled ellipse: 2.52
Filled rect: 11.00
Filled round rect: 2.08
Memory allocation: 1.28
Memory fill: 1.77
Memory move: 0.44
Text: 0.64
From that list above, the operations that are slower on the e740 are StretchBlt, memory moves, and text drawing. All of the filling operations are faster, and in some cases significantly faster probably because they are being accelerated via the ATI graphics controller.
The StretchBlt and text drawing results make me think that they are being performed in software rather than with the ATI. Perhaps the ATI can only do 1:1 blits? In any case, these are the sort of things that *might* be fixable with a new display driver provided that the hardware has support for the feature.
The memory move benchmark is the interesting number because the processor is running at a higher clock rate and yet it can only move memory at less than half the speed of the iPaq. Looking at the memory fill result though, it is faster on the e740. It's about what I would expect in moving from a 206 mhz chip to a 400 mhz chip. So that tells me that there is something odd in *reading* memory.
This type of problem can affect an entire system, and if it's a hardware limitation there's not much that software can do to reduce this other than reading fewer bytes. The important thing to note is that your existing apps running fine on a StrongARM today would have to be rewritten for an XScale in order to "fix" this. For that matter, so would the operating system.
For applications that used the C runtime routines (memcpy, memmove) to move large chunks of memory around, it's possible that MS could deliver an XScale optimized version of these and that might help. The XScale does have a new instruction that performs a prefetch to reduce the affect of a cache miss. I have no idea if cache misses are even the problem though, and besides that would only help with reading memory via the CRT routines. There's so much other memory fetching going on in a system that doesn't use the CRT and it would still be 1/2 as fast as a StrongARM.
If it weren't for this apparent memory read anomaly, I'd guess that this new XScale device should run existing StrongARM applications about 1.5 times faster than if they were run on an iPaq. I think that's a reasonable expectation.
If anyone has an e740 it would be pretty easy to positively determine if reading memory is the problem. I'd be willing to send you a test application to run that would measure the memory bandwidth on your device.
Hope that helps.