When you say efficient, do you mean like native cpu support, like MIPS? Or do you mean like just doing 64-bit integer math efficiently on a 32-bit platform (i.e. every other newer processor out there, excluding like UltraSparc, Itanium, and AMD's Hammer)? You probably mean both, but I do not know exactly what kind of performance penalty would occur if you are doing 64-bit stuff on a 32-bit system. If I do any optimizations at all anytime soon, I think I'll stick to 32-bit.
BTW, my square-root algorithm works fine with 64-bit, as well! hehe... what would actually need to find square-roots a lot? Maybe 3d transformation or something like that... Hey, maybe my algorithm can be used for something! Well, I looked at GBA dev sites that have all this info on fixed-point stuff and stuff like that, and, well, there are about 10 other examples of doing what I did with my algorithm...

now I don't feel so "special" anymore!

Die, Palm, Die. If that offended you, then get rid of your Palm OS device.