I don't have a clue what's going wrong, it looks ok.
But what you can do to find out yourself, is to turn your c-macro into a function, calcuate the result, calculate the same result using your asm-function and then compare the two values. If you have different results log the parameters into a file and exit the program.
That way you have a numerical example where one version fails, and can trace down what exactly what's going wrong.
Nils