I concur with Digby - for this kind of thing I'd use an RLE method (especially for "sparse" tiles).
However, historically (when I wrote tile-routines for games using 8-bit graphics many, many years ago), I used to do it something like this (I think this is the EOR method you mention):
make a mask the same size as your bitmap - make set pixels 0s and transparent pixels 1s.
load the destination
load the mask
AND the destination with the mask
load the tile
ORR the destination with the tile
then plot.
When it was all 8-bit pixels (and masks) it wasn't a terrible way of doing things on the ARM because you could load a chunk into registers, munge them, and slam them straight out with an STMIA (doing it pixel by pixel would be ghastly).
However, with 16-bit pixels that's just too much data being thrown around for my liking (though you could use a 32-bit bit-mask per 32-pixel tile row and expand it into the mask on the fly). I'd investigate that approach if you have the time.
I'd still go for RLE unless most of the things you're plotting only have a few "holes" (I would think that a bitmask set would be faster to load and process than a heavily "fragmented" RLE set, IYSWIM).
Cheers,
Ref.