Yeah the SH3 is very slow. Memory bandwidth being the killer.
I think that on most SH-3 devices, the frame buffer is a DRAM buffer which is copied on calling GXEndDraw(). This means that you can save the cost of a blit by not double buffering, since the frame is only displayed at the end of the scene. (Whereever you call GXEndDraw() )
Try just blitting a few lines of the frame buffer, see how much of an increase you get.
Hope that makes some sense!