Hi
I'm Avdpro, a newbie PPC coder.
I C/C++, I think the key of optimize the screen update method is "data-alignment" and unrow the loop.
1>*ldst = *src++;
this is will cause a "strh" instruction. May be you will like to copy 2 pixels each time with 32bit pointers. I think this will brust up very much.
2>use reducing loop, that is:
for(y=240;y>=0;y--)
Some guys from intel said this will faster cuz the ARM instructions today.
3>unrow the loop
4>use ASM instead of C++, this always times faster due to the ugly M$ compiler.
I think you are lucky than me. I'm writing a palette based game. The screen 8bit->16bit update method is more complex than 16bit->16bit.
Here is my code of 8bit->16bit, hope it can help you get some idea for optimize yours. I used a re-ASMed version of this function that 2+ times faster than this.
I removed some prework. Sorry.
while(y--)//this,h,w,lpbase,lpdst,<lpfastpal,iScrpitchH,iScrpitchV>
{
lpdbyte=lpbase;
x=w;
m4=((int)lpsrc)&3;
m4=m4?4-m4:0;
x-=m4;
while(m4--)
{
*lpdst=(MAGWORD)lpFastPal[*lpsrc++];
lpdbyte+=iScrPitchH;
}
while(x>3)
{
dwcolor=*lpdwsrc++;
color=dwcolor&0xFF;
dwcolor>>=8;
*lpdst=(MAGWORD)lpFastPal[color];
lpdbyte+=iScrPitchH;
color=dwcolor&0xFF;
dwcolor>>=8;
*lpdst=(MAGWORD)lpFastPal[color];
lpdbyte+=iScrPitchH;
color=dwcolor&0xFF;
dwcolor>>=8;
*lpdst=(MAGWORD)lpFastPal[color];
lpdbyte+=iScrPitchH;
color=dwcolor&0xFF;
*lpdst=(MAGWORD)lpFastPal[color];
lpdbyte+=iScrPitchH;
x-=4;
}
while(x--)
{
*lpdst=(MAGWORD)lpFastPal[*lpsrc++];
lpdbyte+=iScrPitchH;
}
lpsrc+=ofs;
lpbase+=iScrPitchV;
}