Page 2 of 3

PostPosted: Dec 14, 2004 @ 6:45pm
by StephC

PostPosted: Dec 14, 2004 @ 7:55pm
by drgoldie
hi stephC,

thanks for that hint.
strange thing is that using the _Preload function the code actually becomes slower.
i changed my previous code to not add, but just read from memory and the time went down to 2.04ms. on the other hand the PLD version takes 3.71ms.

the problem seems to be that in the LDR case the compiler creates the following:

ldr r0, [r2]
ldr lr, [r2, #0x20]
ldr r11, [r2, #0x40]
ldr r10, [r2, #0x60]

while in the PLD case it updates the address register all the time which prevents it from being used for PLD next clock circle.

any ideas on this? i'd still be interested to see how fast an optimal PLD solution would be. maybe somebody could create a 'precompiled' assembler function that can preload x bytes using PLD (i'm not an assembler programmer). i'd love to check how fast that would be...

besides that i'm happy to see that this code now got more than 3x faster than the original implementation that looked already fully optimized...

Daniel

PostPosted: Dec 15, 2004 @ 6:35pm
by Tala

PostPosted: Dec 15, 2004 @ 6:57pm
by drgoldie

PostPosted: Dec 15, 2004 @ 7:05pm
by Tala

PostPosted: Dec 15, 2004 @ 7:08pm
by drgoldie

PostPosted: Dec 16, 2004 @ 12:35am
by Tala

PostPosted: Dec 16, 2004 @ 10:40am
by drgoldie

PostPosted: Dec 17, 2004 @ 12:31am
by Tala

PostPosted: Dec 18, 2004 @ 2:38pm
by drgoldie

PostPosted: Sep 8, 2005 @ 4:13pm
by pappaxray

PostPosted: Sep 16, 2005 @ 10:38pm
by joshbu [MSFT]

PostPosted: Sep 16, 2005 @ 11:12pm
by drgoldie

PostPosted: Sep 16, 2005 @ 11:38pm
by Kzinti

PostPosted: Sep 19, 2005 @ 6:46pm
by pappaxray