Brendan,
The idea with "compiled sprites" is not to compile them yourself. You write a simple generator, give it your tiles, and it'll give you the code to plot each individual tile.
However, from my point of view, they have more disadvantages than advantages. The generated code will be large (between 1/4 and 3 instructions per *pixel*).. not fun .. say an average pixel takes 2 instructions (a MOV and a STRH)
32*32 = 1024 pixels
1024 * 4 * 2 = 8192
= 8KB/tile..
which is going to *shaft* the cache.
The afore-mentioned RLE scheme would/will probably be the one I'd go for, certainly for partially-transparent tiles.
Another scheme that would work would be to store a mask bit for each pixel in the tile, and use a bitwise AND and OR with the screen data to plot it... the only disadvantage is that you have to load what you've already written to the screen when you plot (but if you're writing in scanlines then it should be in the cache). Obviously it's not as space-efficient as the RLE method.
I always strive for a zero-overdraw method, but for some tile-based engines it simply isn't efficient (because blasting entire tiles to the screen and overdrawing is faster than scan-lining the tiles, I reckon ... it's less complicated, certainly

).
Sorry, I'm waffling again,
Refractor.