Page 1 of 1

Rotated Blit Optimization Tips?

PostPosted: Jun 5, 2007 @ 3:51pm
by Ludimate

PostPosted: Jun 6, 2007 @ 2:16am
by Dan East
Is your app running at 240x320 or 480x640? You will experience a major performance hit with the lower resolution, because it requires real-time backwards compatibility to convert to the native VGA device.

Dan East

PostPosted: Jun 6, 2007 @ 10:54am
by Ludimate
The game is using DirectDraw and with the HI_RES_AWARE flag set, therefore running in 640x480.

After further research it seems the video memory is indeed doing some kind of caching! A test that sets all pixels in the video memory to a certain value (so that no read cache is involved) runs at double speed if writes are sequential, as in a memset.
So the solution seems to be tile blitting: rotate-copy 16x16 pixels blocks one at the time, so that the involved read/write cache misses are minimized. And it works very well: frame rate jumped to about 20 FPS (from 11.7) which is much more acceptable.

It is weird that video memory is being cached: maybe it's not the real video memory and DirectDraw is doing some magic mapping(?) there...

The only improvement I can think of is reading 2 pixels/32 bits at the time, maybe that's a little faster...

Thanks and Best Regards,

Jorge Diogo
http://Ludimate.com

PostPosted: Jun 6, 2007 @ 9:51pm
by drgoldie

PostPosted: Jun 6, 2007 @ 10:36pm
by Ludimate

PostPosted: Jun 7, 2007 @ 8:47am
by drgoldie

PostPosted: Jul 6, 2007 @ 10:30am
by pappaxray
Does it help to preload the next macro block? Do you know how much cache these devices tend to have?

PostPosted: Jul 6, 2007 @ 3:09pm
by Ludimate
Could not find any improvement in preloading pixels/blocks. But doing rotated block blits certainly improves performance *a lot* due to its cache-friendliness.

Best Regards,

Jorge Diogo

PostPosted: Jul 6, 2007 @ 5:48pm
by pappaxray

PostPosted: Jul 6, 2007 @ 5:54pm
by drgoldie

PostPosted: Jul 8, 2007 @ 11:49am
by pappaxray
You'd have to preload the destination block as well tho else you'd still get a stall. (I meant 32x32 byte blocks rather than pixels btw :) ).