by Kzinti » Jan 13, 2002 @ 5:48am
Hello everyone, I just found this wonderfull site and I am impressed to see that there are many knowledgeable programmers around here. You might know me as being the author of the "GAPI emulator".
These days, I'm working on my PocketPC as I figured out that while I may never write a good game for the PC (lack of time / resources), it might be possible to do so on my PocketPC. The discussions you have here about "rotating" blits is really interesting. I've come to similar conclusions about the cache, but I have a few other points of information I wish to share with you guys.
As Digby already said, the video RAM is not cached by the memory controller or CPU, and the reason for this is simple: the video RAM is not normal RAM, it is internal to the display chip. This memory is mapped in the address space of applications so that you can access it (memory mapped I/O).
I am not sure that writting to uncached memory is slower then cached memory, the thing is that when you write to cached memory, you are invalidating cache lines that are in use for reading, thereby killing the reading performance, not the writting. Nevertheless, the result is the same: you get faster blits writting directly to the display then to a memory buffer.
Another thing that I haven't seen adressed yet is the use of DMA transfers by certain devices. You all know that the iPAQ devices allows direct access to the display memory (no caching), so in this case the fastest possible way to "flip" is to blit directly to the display. On some other devices like my Casio E-125, the GAPI buffer is trsnaferred to the display using some kind of DMA. What this means is that the actual blitting is done in the back while the processor is free to do other things. That's also the reason that the Casio E-125 gets decent performances with a slower processor and with an intermediate buffer. (Note that most games use an intermediate buffer on iPAQ anyways, while this is not needed on the E-125.)
Direct access is also possible on the Casio E-125, but from the tests I have conducted, the frame rate drops. There is at least 2 reasons for this: 1) The DMA transfert is not used, 2) The MIPS is a 32/64 bits processor internally, but the external bus is only 16 bits.
There is also another benefit to DMA transfers: it is most likely synchronized with the "refresh" of the display as to reduce tearing effects.
I've conducted a test using a "graphic-bandwidth" limited program:
Direct access --> 24 fps
DMA --> 28 fps
Understand that the test is really poorly written and doesn't do anything else but GXBeginDraw()-Blits-GXEndDraw(). In an actual game, you would do many things after the GXEndDraw() and before the next GXBeginDraw(). Even with all that, you can see a considerable gain in performances.
Another important pointer is that if you want to get the most out of the DMA transfert, you have to minimize the time period spent between GXBeginDraw() and GXEndDraw(). As soon as GXEndDraw() is called, the DMA transfert starts, so this is where you game logic has to take place.
Another thing you can see is that updating part of the screen (1/2 or 1/4) doesn't affect the performances much. I think that the reason for that is that like most DMA controllers, doing the actual setup takes more time then the transfert.