This site is no longer active and is available for archival purposes only. Registration and login is disabled.

fastest sprite routine


fastest sprite routine

Postby StephC » Jun 12, 2003 @ 10:57am

Hello,

I've spent some time to write a fast sprite routine in ARM assembly with RLE encoding and alpha blending.

Using the 32x32 animated sprite found in gapidraw and warmi's demos it seems that I get more than 1000 sprites displayed at 30FPS, same results on my ipaq 3600 and my Zaurus SL-5500.

I've also a sprite routine optimised for XScale (with PLD instruction to preload cache lines), but no XScale to test it for now...


My bench only consist of :

- blit a background image on the whole screen (memcpy)
- blit all the sprites
- update their coordinates
- update screen (rotate blit on my ipaq)

I don't know if this bench is acceptable for comparison with others demos.

If it's OK, what I want to know is : is there a faster sprite routine out there ?

--
StephC / int13 production
User avatar
StephC
pm Insider
 
Posts: 442
Joined: Jun 12, 2003 @ 10:41am
Location: Bordeaux - France


Postby warmi » Jun 12, 2003 @ 3:37pm

Ha , 1000 of them ?

I think I got up to 700 color keyed sprites ( the ZSurface demo is capped at 500) on my Z so getting aditional 300 with RLE would make perfect sense.
But if you were able to get 1000+ alpha-blended sprites at 30 fps then , hell, I don't know what kind of secrets you found about ARM CPU but that would be a huge number ( even considering RLE - after all you are paying penalty for blending process itself and 32x32 sprites used in GapiDraw demos doesn't have that many "empty" pixels anyway.)
BTW . what kind of alpha blending is that – 50/50 or arbitrary ?
warmi
pm Insider
 
Posts: 518
Joined: Aug 24, 2002 @ 8:07am
Location: Chicago USA


Postby StephC » Jun 12, 2003 @ 7:37pm

I'am using arbitrary alpha (0-32) and I think the speed is mainly due to my RLE encoding, source data is always 32 bits aligned when it's possible, and I can blit as much as 16 pixels at a time if there is no empty pixels.

For the bench process, I remember that you got differents results on ipaq and Zaurus, I think it's due to a flaw in your bench on Zaurus...
User avatar
StephC
pm Insider
 
Posts: 442
Joined: Jun 12, 2003 @ 10:41am
Location: Bordeaux - France


Postby warmi » Jun 12, 2003 @ 7:51pm

warmi
pm Insider
 
Posts: 518
Joined: Aug 24, 2002 @ 8:07am
Location: Chicago USA


Postby StephC » Jun 12, 2003 @ 8:18pm

Worst case for RLE encoding is a grid of alternate opaque and transparents pixels, which is highy improbable.

Most of the sprites have empty pixels at their borders, and that is the case where RLE encoding is fast.

my RLE encoding translates the image data to a stream of 32bits aligned segments of the form :

<pixel|skip|runcode><data>

where skip is the number of transparent pixels to skip, runcode is the number of pixels to blit,
pixel is the first pixel in case of odd segment
and data are the pixels themselves.

runcode also contains special markers for end of line, end of sprite and alpha level.

I even think that this approache is as fast as compiled sprites on this architecture.
Last edited by StephC on Jun 12, 2003 @ 9:14pm, edited 1 time in total.
User avatar
StephC
pm Insider
 
Posts: 442
Joined: Jun 12, 2003 @ 10:41am
Location: Bordeaux - France


Postby StephC » Jun 12, 2003 @ 8:24pm

I have 3 mult by pixels for alpha blending, maybe this code is not optimal...

[code]

;// r6 = pixel 2
;// r4 = pixel 1
;// r5 = alpha

and r7, r6, #0xF800 ;// p1M = p1 & REDMASK
and r8, r4, #0xF800 ;// p2M = p2 & REDMASK
sub r8, r8, r7 ;// p2M - p1M
mul r9, r8, r5 ;// * alpha
and r11, r6, #0x07E0 ;// p1M = p1 & GREENMASK - precalc to avoid pipeline stall
and r8, r4, #0x07E0 ;// p2M = p2 & GREENMASK - precalc to avoid pipeline stall
add r7, r7, r9, lsr #5 ;// >>5 + p1M
and r10, r7, #0xF800 ;// & REDMASK r10 = temp result

sub r8, r8, r11 ;// p2M - p1M
mul r9, r8, r5 ;// * alpha
and r8, r4, #0x001F ;// p2M = p2 & BLUEMASK - precalc to avoid pipeline stall
and r7, r6, #0x001F ;// p1M = p1 & BLUEMASK - precalc to avoid pipeline stall
add r11, r11, r9, lsr #5 ;// >>5 + p1M
and r11, r11, #0x07E0 ;// & GREENMASK
orr r10, r10, r11 ;// | temp result

sub r8, r8, r7 ;// p2M - p1M
mul r9, r8, r5 ;// * alpha
add r7, r7, r9, lsr #5 ;// >>5 + p1M
and r7, r7, #0x001F ;// & BLUEMASK
orr r10, r10, r7 ;// | temp result

;// r10 = blended pixel





[/code]
User avatar
StephC
pm Insider
 
Posts: 442
Joined: Jun 12, 2003 @ 10:41am
Location: Bordeaux - France


Postby warmi » Jun 12, 2003 @ 8:46pm

warmi
pm Insider
 
Posts: 518
Joined: Aug 24, 2002 @ 8:07am
Location: Chicago USA


Postby StephC » Jun 12, 2003 @ 8:59pm

User avatar
StephC
pm Insider
 
Posts: 442
Joined: Jun 12, 2003 @ 10:41am
Location: Bordeaux - France


Return to Windows Mobile


Sort


Forum Description

A discussion forum for mobile device developers on the Windows Mobile platform. Any platform specific topics are welcome.

Moderators:

Dan East, sponge, Digby, David Horn, Kevin Gelso, RICoder

Forum permissions

You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum