Alpha Blending

by **RICoder** » Jul 26, 2001 @ 11:26am

Since it came up before...I thought I would ask. What is the best way to do an alpha blend. I have a 16BIT RGB565 image, and an 8BIT Alpha Mask. I look at it in a few parts. 1) I figure that I need to take the alpha as a % so it's something like this - src = 255/Alpha; dest = 1 - src; 2) Its multiplied per color, so I need to take the src pixel and do the mult per color, and then the dest pixel and do the mult per color and then add them together to get the result. Am I way off here? The result is weird looking....maybe I haven't thought about it enough....

by **Digby** » Jul 26, 2001 @ 1:12pm

The basic alpha blending equation is below: NB: In this equation I'm using floating point values for the color components and the alpha that range from 0.0 to 1.0. This only for explanation purposes. In your game replace these values with 0 and your max value for the component + 1 (alpha would be 256, and color would be 64). For alpha, a value of 0.0 is transparent and 1.0 is opaque. For color components, 0.0 is no color and 1.0 is full color. dC = destination color in frame buffer sC = source color in sprite/texture sA = source alpha value dC = sC*sA + dC*(1.0 - sA); Obviously you would need to use the above equation for each of the color components (R,G,B). You can do a bit of precalculation and simply the above equation a bit. If you store the premultiplied alpha in the source sprite/texture you can eliminate a multiply (actually 3 mults since you have to do it for R,G,B). This shortens the equation to: dC = sC + dC*(1.0 - sA); You can further simplify the equation if you store the inverse alpha (1.0 - sA) in your alpha mask instead of alpha. This shortens the equation to: dC = sC + dC*sA; This gets it down to one multiply and one addition per color component, per pixel. You can eliminate the multiply if you want to use a lookup table to calculate the dC*sA. #define ALPHA_LEVELS 256 #define COLOR_LEVELS 64 BYTE rgAlphaBlend[ALPHA_LEVELS][COLOR_LEVELS]; This array can be obviously be calculated ahead of time. Now your equation looks like this: dC = sC + rgAlphaBlend[sA][dC]; OK, now a bit of discussion about the lookup table approach. When a compiler generates code to perform an indexing of a multidimensional array, it has to do something like this: l-value = *(rgAlphaBlend + sA*dC); See the multiply and add in there? That's more than you had to do if you didn't use the lookup table approach. Plus, jumping all over this array to look up values isn't going to help your cache coherency any. Therefore I would recommend that you don't use the lookup table approach and just go with the add and multiply per color component. As always, you should profile the code in question and see which is faster on the target platform. Jacco, this might be a good topic for a future tutorial. P.S. The final blending equation if you're using 8-bit alpha (without floating point as in my examples above) looks like this: dC = sC + ((dC*sA)>>8 ); Note the shift operation that needs to be performed after the multiply. The lookup table approach wouldn't need to do this, but I still contend that this would be faster than using the array. Benchmark it and let me know!

by **RICoder** » Jul 26, 2001 @ 2:57pm

Digby...thanks! I have actually been playing this morning and read a book called Graphics Gems. As I evaluted the situation, I realized that a precalculation on the source would be a huge help, so I did that. Precalcing the dest alpha value I hadn't thought of, but I will give that a shot too. For the most part, since I am using offscreen buffers, and it's not running at 30fps anyway, I don't see a hit no matter what I do, but I want to use this in a library, so the optimizations are good. The BIG crapper of all this is that it is RGB565, so extracting and re-constituting the color channels is a pain in the ass, and is probably not helping much. I am thinking of reverting to RGB888, and doing the translation on the final 'flip'. Need to profile a bit first though. Thanks again.

by **Phantom** » Aug 1, 2001 @ 6:26am

Guys, There is a way to do this much cheaper, but I would have to investigate a bit first. The basic idea is that you first create 'bitgaps' between the color components by exploiting the fact that a 565 color occupies only 16 bits. If you shift red a lot to the left and green also a little, gaps of zeroes are created between the components. Good thing about gaps is that you can multiply R, G and B with a single multiply; after that you shift the whole bunch back and there you have your scaled color (wich is basically what you want with alpha blending). This technique can be improved by precalculating the bitgaps. This is of course only useful when you use a palette. Wich brings me to another solution that I use often: If you use a palette anyway, why not precalculate 16 or 32 scaled versions of the entire palette? If you want to do 25%/75% blending of two colors, this is simply a matter of looking up the 25% version of color 1, and the 75% version of color 2, adding them, and voila. I used this technique for very fast bilinear filtering; my texture mapper with 5bit bilerp ran faster without MMX code than the code that intel did WITH their MMX.

About LUT's: If you have an array like this: int a[256][5], and you want item [10][1], this does not become *(a + 1 + 10 * 256), but rather *(a + 1 + (10 <<

). It's just a matter of picking your array sizes smart. If you are uncertain about these compiler optimizations, simply always build 1D arrays and do the shift yourself. Final note, about LUT's and the cache: I found that on the PC an integer multiply is almost always better than a lookup in a huge table (where 'huge' is 32K or above). On the PocketPC, I have no idea how this relates.

Greets, - Jacco.

by **RICoder** » Aug 1, 2001 @ 11:08am

Yeah Jacco, thanks man. I am using RGD565 though, so you see that LUTs will be of limited use. I am also 'cheating' a bit. Pre-rendering is happening, but I am also looking at percentages and rounding a bit, so that I can do the math with << >> instead of *. Ya know? It's lossy but its quick. Thanks again though.

by **Digby** » Aug 1, 2001 @ 1:03pm

Good points Jacco. If you're willing reduce your color & alpha resolution you can implement a lot of this with look up tables. I have a difficult time understanding why someone needs 256 levels of alpha when using a 5 bit color components, but maybe I'm just thick. I would be really good if someone could try a number of these schemes on a Pocket PC and measure the results. We're all good at throwing out ideas on what seems to be a better way to go, but if we don't have the data to back it up it's just conjecture. BTW, the ARM compiler does produce code that performs the shifting instead of the multiply to index into a 2 dimensional array when the 2nd dimension is a power of 2. This results in the look up amounting to 2 additions (the ARM's add instruction can perform a shift prior to the add) - one add to compute the index and another to add this to the base address of the array. Not bad for a free compiler, eh? I'm wondering now if the unpacking/packing of the the 3 color components for every pixel is going to be the long pole in the tent? It seems doing these operations is going to take more time than performing the blend.

by **RICoder** » Aug 1, 2001 @ 1:17pm

Digby, YOU GOT IT! Doing the pack/unpack is intensive as all hell. To make matters worse, its RGB565, soooo, the components are not full bytes, which means to unpack and pack them you must extract them with a shift, and then & them with a mask to get the components. You cannot just refer to the bytes like you can with an RGB888. So, basicly it sucks. Having said that, I use offscreen buffers in GAPIx, which is what all of this is for. (You can look at the posts on it, we are trying to make a GAPI based DirectX). The buffers were just straight RGB565, then they were mimicks of the screen for fast blting...now they may end up being RGB888s for doing fat caluclations on Alphas and other stuff. Now it is a matter of profiling to get the best way to go.

by **Phantom** » Aug 2, 2001 @ 5:41am

Here is some code that is supposed to do the blending using the packing / unpacking trick. Only problem is - This code doesn't work.

I don't know why, and I left it because I was too busy with other things. It does give a good idea of the complexity involved in packing / unpacking though. // Generate gaps LONG p = color & (REDMASK|BLUEMASK); p = (p << 10)|((color & GREENMASK) >> 6); // Multiply p *= factor; // Pack to 565 LONG result = (p >> 15) & (REDMASK|BLUEMASK); result |= (p & (63 << 4)) << 2; So if anyone can see the problem here... I would like to hear it.

By the way, the gapped version contains the components in the order GRB, since that was faster, and it doesn't matter for the multiply. - Jacco.

by **Digby** » Aug 2, 2001 @ 1:04pm

This will do what you want. nFactor must not be greater than 32, otherwise you'll overflow the gaps between the shifted color components. 5 bits of alpha is probably enough for most games on a handheld device. If you need more bits, and can afford another multiply, you can put G in one LONG and RB in another. Note that the multiply is performed on RBG and not GRB. The reason for this is because you don't have to shift G when converting back to 565. Actually, you don't need the temporary wResult at all. That would eliminate two stores and one load. You could just glob all of the packing into the parameter of the return statement. The compiler's optimizer might do this already though. WORD FooBlend (WORD wRGB, int nFactor) { LONG lTemp; WORD wResult; lTemp = wRGB & (REDMASK | BLUEMASK); lTemp = (lTemp << 11) | ((wRGB & GREENMASK) >> 5); lTemp *= nFactor; wResult = (lTemp >> (11 + 5)) & (REDMASK | BLUEMASK); wResult |= lTemp & GREENMASK; return wResult; }

by **Malmer** » Aug 2, 2001 @ 9:55pm

uhm...where in all this does the destination pixels come in? Or does this thing only work on black? (me = stupid now?)

by **Digby** » Aug 2, 2001 @ 10:25pm

Ask Jacco.

That's one of his schemes. I just modified his C code so that it would handle the unpack/multiply/pack properly. Proper alpha blending requires an addition as well and that should be done after the components are unpacked (post-multiply). If you look at my post where I described the alpha blending equation with the shortcut: dC = sC + ((dC * A) >> 8 ) His code can get you everything to the righthand side of the '+' in that equation. (the alpha is restricted to 5 bits though, and so the shift will change to >>5). The wRGB parameter is the color of the destination pixel before the blend, and nFactor is the inverse alpha. The more I've thought about this, the more I think you should try to eliminate all of this pack/unpack overhead as much as possible. Depending on your content, you can get big wins by RLE encoding runs of transparent or opaque pixels. At a minimum, you should investigate adding a test in the inner loop. If the pixel is transparent, then you do nothing. If the pixel is opaque, then you just copy the source pixel to the destination. If the pixel is translucent then do the blend. I'm really reluctant to add code to my inner loops but in this case I believe it's warranted.

by **Malmer** » Aug 2, 2001 @ 11:04pm

actually that is what we do in ag...and we have to have stuff in the innerloop since each pixel has to be checked if behind a hill, and they must recieve shadows and stuff...that is actually the reason I added alphablending in the first place. When I do all the other stuff I might aswell do alpha blending, which makes it look soooo much better. the sprite drawing code probably can't get much more optimized. I might add RLE for the fully transparent pixels though to squeze a little bit more out of it...

by **Phantom** » Aug 4, 2001 @ 4:18am

Hi, If you want to do alpha blending between two pixels, with, say, 25%/75% factors, then you use the code I presented earlier with 25% on color one and 75% on color two. The two results can be summed directly, since they can (summed) never exceed pure white. BTW, I'm pretty sure the code I presented did NOT work. I tested it myself. I couldn't find the bug though. I presented it merely to indicate the complexity of packing/unpacking, since this packing/unpacking was discussed earlier. - Jacco.

by **Phantom** » Aug 6, 2001 @ 4:58am

Oops, sorry didn't notice you actually correct my code.

Thanks a lot for that. Added it to the Nutcracker right away; the new code is more than twice as fast as my old per-component approach. Here's the final code: inline unsigned short scalecolor( unsigned short c, int mul ) { unsigned int p = (((c&0xF81F)<<1)|((c&0x7E0)>>5)>>5))*mul; return (((p>>16)&0xF81F)|(p&0x7E0)); } This code scales one color to a percentage of the original color. 'Mul' must be a value from 0..31. I have removed all references to #defines that I use, so this should compile without problems for everyone. Bonus code: Fast additive blending with saturation. This code can be used to add two pixels, and have the result clipped to pure white. Very useful for lighting effects. #define SHFTMASK ((15<<11)+(31<<5)+15) inline unsigned short addblend( unsigned short c1, unsigned short c2 ) { unsigned short c = (unsigned short)(((c1>>1)&SHFTMASK)+((c2>>1)&SHFTMASK)); if (c&0x8410) { if (c&0x8000) c=(c&0x7FF)+0x7800; if (c&0x400) c=(c&0xF81F)+0x3E0; if (c&0x10) c=(c&0xFFE0)+0xF; } return (unsigned short)(c<<1); } Cool thing about this code is that it has only one 'if' to determine if there has been an overflow in any of the components. If so, it fixes the components one by one. Bad thing is that you loose a bit of precision. - Jacco.

by **Digby** » Aug 6, 2001 @ 1:05pm

Jacco, Looks like you've got some typos in your implementation of scalecolor(). It should look like this: inline unsigned short scalecolor( unsigned short c, int mul ) { unsigned int p = ((((c&0xF81F)<<11)|((c&0x7E0)>>5))*mul)>>5; return (((p>>16)&0xF81F)|(p&0x7E0)); }

Alpha Blending

Alpha Blending

Re: Alpha Blending

Re: Alpha Blending

Re: Alpha Blending

Re: Alpha Blending

Re: Alpha Blending

Re: Alpha Blending

Re: Alpha Blending

Re: Alpha Blending

Re: Alpha Blending

Re: Alpha Blending

Re: Alpha Blending

Re: Alpha Blending

Re: Alpha Blending

Re: Alpha Blending

Sort

Forum Description

Moderators:

Forum permissions