Some help to optimize my new ATARI ST emu for PocketPC...

by **AndrewGower** » Oct 21, 2002 @ 1:26am

These are the best I can manage for endian swap routines. I can't help but feel their should be an even better solution, but I can't work out what it is if there is.

-----
word endian swap
-input in r0
-big-endian answer returned in r1
-corrupts r2

mov r1, r0, lsr #8 ;swap top half into r1
mov r2, r0, lsl #24 ;swap bottom half into r2
add r1, r1, r2, lsr #16 ;combine parts together
-----

----
long endian swap
-input in r0
-big-endian answer returned in r1
-corrupts r2,r0
(if you would rather it corrupted r3 instead of r0 it could do, if you wished to retain the input)

mov r1, r0, lsr #24 ;put byte 4 in position
add r1, r1, r0, lsl #24 ;put byte 1 in position
mov r0, r0, ror #16 ;move bytes 2 and 3 to edge
mov r2, r0, lsr #24 ;put byte 2 in position
add r2, r2, r0, lsl #24 ;put byte 3 in position
add r1, r1, r2, ror #16 ;combine final answer
-----

To use these in your read and writes just use the above snippets of code to swap the endian-ness of the data in the register before writing it normally, or after reading it normally.

e.g

for read16, do a normal 16 bit read, then run 'word endian swap' on the register

for write16 run 'word endian swap' on the register, and then do a normal 16 bit write

for read32, do a normal 32 bit read, then run 'long endian swap' on the register

for write32 run 'long endian swap' on the register, and then do a normal 32 bit write

I haven't tested this :-) But it's getting late here in the uk, so I'll leave it at that for now

by **Dave H** » Oct 21, 2002 @ 1:38am

>These are the best I can manage for endian swap routines. I can't help but feel their should be an even better solution, but I can't work out what it is if there is.

Well there is a better solution - the best optimisation, as always, is to not need to byteswap in the first place :)

by **Guest** » Oct 21, 2002 @ 10:23am

the problems with not byte swapping, and just storing everything back to front are:

a) byte read/writes on the 68000 are not word aligned. so if we just store it backwards and someone does:
move.w d0,(a0)
move.b 1(a0),d1
it's going to wrong. This could be compensated for relatively easy in the byte access routine, but probably not in in less than the 3 instructions it took to fix up the word access routine.

b) memory access is only word aligned, not long aligned, so if someone does:
move.l d0,(a0)
move.l 2(a0),d1
it's going to go wrong. Againt it could be compensated for but I'm not convinced be any quicker than the endian swap code I already place.

Perhaps you could post some code showing how to compensate for these issues quickly, because if it's shorter than the endian swap code it sounds really great.

Thanks
Andrew

by **schtruck** » Oct 21, 2002 @ 12:08pm

by **schtruck** » Oct 21, 2002 @ 12:14pm

by **Guest** » Oct 21, 2002 @ 12:17pm

by **refractor** » Oct 21, 2002 @ 1:49pm

by **Dave H** » Oct 21, 2002 @ 2:24pm

by **Guest** » Oct 21, 2002 @ 2:43pm

by **schtruck** » Oct 21, 2002 @ 3:30pm

by **AndrewGower** » Oct 21, 2002 @ 9:19pm

Hi,

here are the macroers for the technique, of storing words (but not long words) backwards.

#define ReadB(addr) *(uint8*)(addr^1)

#define WriteB(addr,value) *(uint8*)(addr^1)=value

#define ReadW(addr) *(uint16*)(addr)

#define WriteW(addr,value) *(uint16*)(addr)=value

#define readL(addr) (*(uint32*)(addr)<<16)|(*(uint32*)(addr)>>16)

#define WriteL(addr,value) *(uint32*)(addr)=(value<<16)|(value>>16);

Unfortunately as far as I am aware C++ doesn't have an operand for rotate (Correct me if I'm wrong!), so I just hope the compiler is smart enough to spot that (value<<16)|(value>>16) is in fact just a simple ror #16

Note that if you store the words backwards like this it's going to mess up any other code that accesses the memory, for instance the screen draw code I gave earlier will now end up drawing the columns back to front, this could be corrected easily enough

I guess the best thing to do would be try the code, see if it works, and how much faster is is (or isn't) and then if it gives a good speed up I'll rewrite the screen redraw code (again) to handle getting everything backwards

If you could post what the compiler produces from the above defines it would help to see how a good job it has managed to do

Thanks
Andrew

by **schtruck** » Oct 21, 2002 @ 10:26pm

by **schtruck** » Oct 21, 2002 @ 10:35pm

in the case of yes , here is what the compiler generate:

; 205 : WriteB(address + membase, value);

eor r3, r1, #1
ldr r1, [pc, #8] ; pc+8+8 = 00000014
ldr r1, [r1]
strb r3, [r1, +r0]

; 211 : WriteW(address + membase, value);

ldr r2, [pc, #8] ; pc+8+8 = 00000010
ldr r2, [r2]
strh r1, [r2, +r0]

; 217 : WriteL(address + membase, value);

mov r3, r1, lsr #16
orr r3, r3, r1, lsl #16
ldr r1, [pc, #8] ; pc+8+8 = 00000018
ldr r1, [r1]
str r3, [r1, +r0]

; 222 : return ReadB(address + membase);

ldr r1, [pc, #0xC] ; pc+8+12 = 00000014
ldr r1, [r1]
ldrb r3, [r1, +r0]
eor r0, r3, #1

; 227 : return ReadW(address + membase);

ldr r1, [pc, #8] ; pc+8+8 = 00000010
ldr r1, [r1]
ldrh r0, [r1, +r0]

; 232 : return ReadL(address + membase);

ldr r1, [pc, #0x10] ; pc+8+16 = 00000018
ldr r1, [r1]
ldr r0, [r1, +r0]
mov r3, r0, lsr #16
orr r0, r3, r0, lsl #16

by **AndrewGower** » Oct 21, 2002 @ 10:41pm

readB and writeB were strange, but they were in fact right :-) you have to xor the address not the value! It's to compensate for the fact that the bytes aren't stored in the correct place

the compiler hasn't done an amazing job with readL and writeL but at least it's still better than what it was producing before, so it should still be a good speed up

by **schtruck** » Oct 21, 2002 @ 10:42pm

just to confirm what i think, if we use this method, we'll must apply Read and Write on ROM (TOS) and on Floppy disk Sector read, no?

but how? just Readb and writeB or need to apply ReadB/WriteB a first time and then Apply ReadL and WriteL?

Some help to optimize my new ATARI ST emu for PocketPC...

Sort

Forum Description

Moderators:

Forum permissions