There are some benchmarks listed here: .
From the numbers it looks like it has a graphics chip that is helping with a number of the GDI ops. The stretchblt isn't being accelerated and is actually much slower than the speed of an iPaq so I would imagine reading/writing to the SRAM onboard the graphics chip via the CPU is slow.
You could verify this quite easily measuring memcpy/memset to the buffer returned by GXBeginDraw.
Let us know what you find out.