That's the problem. Not all devices' displays are portrait. The iPaq 3630 display is actually a landscape display, so as you increment the buffer you are actually stepping up the screen (when viewed in the natural portrait orientation). So a portrait buffer will cause lots of cache misses during the blit on that device. The issue is more pronounced on some devices than others.
Game libraries like GapiDraw align all their internal buffers to match whatever the device is, so the gapi blit can just be a simple memcpy. (to my knowledge)
Dan East