I use a table for divides up to 256, I get a lot of them in my rendering code.
I also wonder if you can use a 256-entry lookup table to simulate a full 16.16 / 16.16 operation.
1/(a+b) = 1/[a(1+b/a)] = 1/a * 1/(1+b/a)
Say you are dividing by 534 = 256*2 + 22 and set a=22. Then you have:
x/534 = x/22 + x/(1+512/22)
or something. That might be rubbish, but it might work
I'm sure you should be able to get really fast 16.16 division with just a 256 entry lookup table and multiplications.