Register
Site Search
Forums
Welcome to PocketMatrix. PocketMatrix is dedicated to providing the best online community for mobile device developers and enthusiests. What's new?

## Optimizing Division...

#### Optimizing Division...

hi all,

I need to divide two numbers by the same value (all fixed-point). It is obvious that doing the inversion first and then applying it twice is faster than two division. So this is my current code (BITS=8)
Code: Select all
`1 2 3 I64 rev = ((I64)1<<(BITS+32))/h;x = (I32)((hx*rev)>>32);y = (I32)((hy*rev)>>32);3 lines; 0 keywds; 4 nums; 34 ops; 0 strs; 0 coms    Syntactic Coloring v0.4 - Dan East   `

Yet, i'd need to speed this up more since this routine is called quite often in my code.

Does anybody know a fast (accuracy is important) inversion routine (1/value)?

thx,
Daniel
drgoldie
pm Member

Posts: 330
Joined: Jan 10, 2003 @ 10:46am
Location: Vienna

What range does h have? If it's small then a LUT would be suitable. If not, we've covered division/reciprocal routines in ARM code <a href="http://www.pocketmatrix.com/forums/viewtopic.php?t=11940&start=0">here </a>before

refractor
pm Insider

Posts: 2304
Joined: Feb 5, 2002 @ 1:12pm
Location: Luxembourg

h is usually something like 300-800, but it is difficult to say for sure...

Daniel
drgoldie
pm Member

Posts: 330
Joined: Jan 10, 2003 @ 10:46am
Location: Vienna

#### Re: Optimizing Division...

drgoldie wrote:hi all,

I need to divide two numbers by the same value (all fixed-point). It is obvious that doing the inversion first and then applying it twice is faster than two division. So this is my current code (BITS=8)
Code: Select all
`1 2 3 I64 rev = ((I64)1<<(BITS+32))/h;x = (I32)((hx*rev)>>32);y = (I32)((hy*rev)>>32);3 lines; 0 keywds; 4 nums; 34 ops; 0 strs; 0 coms    Syntactic Coloring v0.4 - Dan East   `

Yet, i'd need to speed this up more since this routine is called quite often in my code.

Does anybody know a fast (accuracy is important) inversion routine (1/value)?

thx,
Daniel

Daniel,

have you checked the code in the Vincent lib? The inversion routine can easily be adjusted to different fixed point positions...

- HM

hm
pm Member

Posts: 201
Joined: Dec 28, 2003 @ 8:47pm
Location: Seattle, WA

i was trying code very similar to that in Vincent (more or less identical) and it was a lot slower than the compiler's division routine (and less accurate).

i had to do 4 iterations to get enough accuracy but even with 2 iterations it was much, much slower...

i didn't do any benchmarks but the difference in my application was so big that i just switched back.

Daniel
drgoldie
pm Member

Posts: 330
Joined: Jan 10, 2003 @ 10:46am
Location: Vienna

Ok, i couldn't let that post stand as it is, so here's some benchmarks i did:

Performing 3 million divides with this routine (nCount=300, doing a sum to prevent compiler from optimizing the division away):

Code: Select all
`1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 int benchDiv(int nCount){    int j=nCount+1,        sum=0;    while(--j)    {        int i = 10001;        while(--i)        {#ifdef _DO_ORIGINAL_            sum += (1<<16)/i;#else            sum += EGL_Inverse(i);#endif        }    }    return sum;}20 lines; 10 keywds; 5 nums; 37 ops; 0 strs; 0 coms    Syntactic Coloring v0.4 - Dan East   `

Results from benching on an HTC Tornado Smartphone (200 MHz ARM CPU, VS2005 with /O2) are:

- compiler divides: 1020ms
- EGL_Inverse with 2 iterations: 1450ms
- EGL_Inverse with 4 iterations: 1930ms

Even the version with 4 iterations does not always give complete acurate results.

Am I doing something wrong or is the compiler's code just better (which i have to admit is what i expect from a good compiler)?

Daniel
drgoldie
pm Member

Posts: 330
Joined: Jan 10, 2003 @ 10:46am
Location: Vienna

BTW: i also benched the gppInvLP_16_32s and gppInvHP_16_32s functions from the Intel GPP (on an XScale device though...) and they are faster then the C code (EGL_Inverse) but still slower then the compiler's divide routine.

For me it seems as if the time for custom divide functions is over, isn't it?

Daniel
drgoldie
pm Member

Posts: 330
Joined: Jan 10, 2003 @ 10:46am
Location: Vienna

drgoldie wrote:h is usually something like 300-800, but it is difficult to say for sure...

Daniel

If you have that small of a range then try a 512 entry LUT and use it if your value is in that range. Set the range of your LUT to your most commonly used values.

mm40
pm Member

Posts: 135
Joined: Feb 21, 2003 @ 9:11pm

the range isn't small since accuracy is important and those values are fractional rather than integer.

i'll have to check the actual accuracy requirements. maybe it is enough to make a table from 300-800 with 0.1 stepsize (would be 100kb) and do a normal division for all other cases. maybe it is also worth interpolating between LUT values to gain more accuracy.

i'll try that and report when i have results...

Daniel
drgoldie
pm Member

Posts: 330
Joined: Jan 10, 2003 @ 10:46am
Location: Vienna

drgoldie wrote:the range isn't small since accuracy is important and those values are fractional rather than integer.

You said it's usually 300-800, which is only about 500 different possibilities, which is a small range. You could pre-calculate 500 floats or doubles and use multiplication instead of division, which would still be faster. Then you could do something like:
Code: Select all
`1 2 3 4 5 6 7 8 9 10 //Pre-calculate valuesfloat hlookup[512];for(int i=0;i<512;i++)  hlookup[i] = 1.0/(300+i);//Use the lookup if "h" is in the normal range (300-811)//Otherwise, fall back to using divisionI64 rev = ((h < 300) || (h > 811)) ? ((I64)1<<(BITS+32))/h : ((I64)1<<(BITS+32))*hlookup[(h-300)];x = (I32)((hx*rev)>>32);y = (I32)((hy*rev)>>32);10 lines; 3 keywds; 15 nums; 81 ops; 0 strs; 3 coms    Syntactic Coloring v0.4 - Dan East   `
Using fixed-point, there should be a more processor-friendly way to handle this, but I don't have much experience with fixed-point. I've glanced at it in the Allegro source though, which is rather nifty.

Or did you mean that "h" is already a floating-point variable? If that's the case, ignore everything I just typed.

Presto
pm Insider

Posts: 763
Joined: Jan 20, 2003 @ 5:51am
Location: Kalesian Archipelago

and the data range is actually from 100-2500.

so i created a table from 0 to 2500 with a step size of 1/16 (4 fractional bits source accuracy).

i tweaked the numeric range of the stored data to optimally fit into 16 bits. this results in 80KB memory consumption for the table. if this turns out to be a problem in the future i can reduce the size at the upper end since accuracy is quite reduced there anyways. in the lower part of the table accuracy is very good...

the result is pretty impressive. although a had only a single (fixed-point) division in my code, the performance went up from 40 to 68 image analysis per second (computer vision algorithm) on the HTC tornado (and from 200 to 235 on the X50v at 728MHz). this also means that the part of the code that was the major bottleneck until now is no longer a major one and it makes sense for me now to look more into other code sections too...

so thanks to everybody who contributed to this improvement (which opens oportunities for even more improvements...)!

Daniel
drgoldie
pm Member

Posts: 330
Joined: Jan 10, 2003 @ 10:46am
Location: Vienna

drgoldie wrote:For me it seems as if the time for custom divide functions is over, isn't it?

The code in Vincent is not particularly "custom"; it's a 16-entry table lookup plus 2-4 iterations of RN. I understand it is code that traces back to the standard math library on SGIs. Any chance to re-engineer the source for those faster routines and make them available in a similar form? That's would really come in handy for specific versions (like the 8.24 -> 24.8 division I need at some point in my code).

- hm

hm
pm Member

Posts: 201
Joined: Dec 28, 2003 @ 8:47pm
Location: Seattle, WA

### Forum Description

A discussion forum for mobile device developers on the Windows Mobile platform. Any platform specific topics are welcome.

### Moderators:

Dan East, sponge, Digby, David Horn, Kevin Gelso, RICoder

### Forum permissions

You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum