Page 3 of 3

Re: Assembly On iPAQ

PostPosted: Dec 20, 2001 @ 6:28am
by Mole
The point of these optimisations is that they are acaemic, i come from an age of 4mhz ARM 2 processors where saving a handfull of instructions per frame was very important, The extra development time to manage this kind of development would not get pass the accountants today! I just love messing with code. I hope most peeps here do to.<br><br>BTW<br>Gruk you don't have to worry about branching as a technical issue, but it does flush the pipe (it did anyway, ARM may now have changed that) so you will/did loose a few cycles filling it again which makes it a bit inefficient for very tight loops.<br>laterz<br>:)Last modification: Mole - 12/20/01 at 03:28:14

Re: Assembly On iPAQ

PostPosted: Dec 20, 2001 @ 9:39am
by Dan East
Hey guys. I just figured out a way to do inline assembly with ARM. I've not tested this, because I don't have time to write the tiny extra program that is required. But here's the technique:<br><br>1) Add the /FAs compiler flag to your C/C++ Project Options. That causes the compiler to produce an ASM listing for each source file, PLUS includes the original C/C++ source listing in the asm file as comments.<br><br>2) In your C/C++ source use a predefined dummy function call to indicate where you want to start inlining. The reason this is needed is to indicate exactly where in the output asm the inline asm is to be inserted. _asm() is a good choice.<br><br>3) Immediately after the _asm() call place a comment block which contains your asm. Something like this:<br>[fixed]<br>int i=0;<br>_asm( ) ;<br>/*<br>  {<br>    mul       r8, r3, r6<br>    ;This is my asm comment<br>    mov       r3, r8, asr #5<br>    and       r7, r3, #0x3E, 22<br>  }<br>*/<br>[/fixed]<br>4) Now for the part that makes this work. A simple program is executed as a custom build step. It simply scans the .asm file looking for calls to the _asm() function. When one is found, it modifies the asm as follows:<br><br>a) Comment out the actual call to _asm, including any stack / register stuff done before / after the branch.<br><br>b) Uncomment out the block of asm the user has in their comments. This would be the lines within the /* {   } */ block.<br><br>c) Then your custom program calls armasm, and rebuilds the asm file, which now includes your inline asm code.<br><br>The main caveat is that it is critical that the comments that contain the inline asm fall immediately after the jump to the _asm() routine. The custom program might should check for that, and bump the inline asm up if need be. A quick test showed that comments after a function call do appear directly after the branch. However, there may be cases when the compiler sticks other asm before the comments, so that needs to be taken into consideration.<br><br>Dan East<br><br>[color=blue][sup]Last modification: Dan East - 12/20/01 at 06:39:21

Re: Assembly On iPAQ

PostPosted: Dec 20, 2001 @ 12:15pm
by Phantom
Very cool. That leaves us with just one problem: Addressing variables used in the C code... Any ideas how we could pass data to the asm at all? Perhaps we can extract variable addresses using the simple preprocessor that you mentioned?

Re: Assembly On iPAQ

PostPosted: Dec 20, 2001 @ 12:33pm
by Mole
Assembler is like Oracle, a right bugger to work with, but very good once its working! :)<br>you could just wrtie the whole of ur app in notepad using assembler, then there would be no need for this messing about with C!<br><br>BTW pocketmatrix dudes whens the developer section of the site going to open?<br>later<br>:)<br>

Re: Assembly On iPAQ

PostPosted: Dec 20, 2001 @ 12:34pm
by Dan East
You're right. The arm compiler assigns vars to registers and doesn't even hint at their association. So, that really compromises the usefulness of this add-on inline assembly.<br><br>Digby, is the } character valid / ever used in ARM ASM statements?<br><br>Dan East

Re: Assembly On iPAQ

PostPosted: Dec 20, 2001 @ 12:37pm
by Mole
Yes it is on multi register loads<br>LDMIA r12,{r1,r2,r3,r4}<br>soz

Re: Assembly On iPAQ

PostPosted: Dec 20, 2001 @ 12:48pm
by Digby
Jacco, you can get the compiler to do this with little trick.  The first 4 function parameters are passed in R0..R3.  If you pass these variables to your dummy function, then you'll know what registers they will be in before the call.<br><br>_asm(foo, bar);<br>/*<br>;R0 = foo<br>;R1 = bar<br>*/<br><br>

Re: Assembly On iPAQ

PostPosted: Dec 20, 2001 @ 12:51pm
by Dan East
Another problem. It appears it is not guaranteed that the compiler will output every line of source to the asm file. During the optimization phase it must omit portions of the source. In this example:<br>[fixed]<br>void f() {<br>  _asm();<br>  /* {<br>  add r1, r2<br>  div r1, r2<br>  }*/<br>}<br>[/fixed]<br>the compiler does not dump the commented lines to the asm file. However, if I include meaningful C code underneath the inline section (that will not get opimized out), then the comments go through.<br><br>I have the preprocessor done. I need to have it call the arm assembler on the modified asm file, which is a trivial step.<br><br>Dan East

Re: Assembly On iPAQ

PostPosted: Dec 20, 2001 @ 1:17pm
by Dan East
Digby, that's excellent - and provides another practical use for the dummy function. However, I want to have the preprocessor (for lack of a better term - it is actually a pre-link processor) strip out any registry / stack stuff before the _arm() function. The assembler has to copy those vars into the registers immediately before the inline code, correct? So we still aren't going to be as effecient as inlining natively supported by the compiler.<br><br>Another question. Exactly what opcodes are called before / after a branch/link to prepare the registers, etc? I looks like stmdb is used to save registers / stack state. Which of those would be needed to set r0-r4, and which should be removed relating to the stack?<br><br>Dan East

Re: Assembly On iPAQ

PostPosted: Dec 20, 2001 @ 3:21pm
by Digby
CE currently only supports the _cdecl calling convention which means the caller is responsible for doing any clean up associated with the call, however in your case the caller shouldn't be doing anything but loading registers prior to making the call to your dummy function, provide that you don't pass more than 4 parameters.  The compiler should emit a BL instruction to call your routine.  If the function returns a value, it comes back through R0.<br><br>More info on this can be found in the eVC Docs under Microprocessor Reference/ARM/ARM Calling Standard.<br><br><br><br>Good luck guys.  Personally, what I think you're trying to do is overkill and will probably end up costing you more than you'll gain.  However, it is interesting to see your approach to working around a compiler limitation, and I'll contribute info when I can.<br><br>I've thought about doing something similar to this months ago to generate blocks of assembly code at run-time for the inner loop of a rasterizer, depending on what render states were set for the polygons. In the end, I just hand-optimized a few of the more popular cases, and the others have tests in the inner loop.<br><br>Oh a few other things: The latest version of the ARM compiler from MS still doesn't support inline assembly and probably never will.  So that means if you guys get something to work, it won't be meaningless when Talisker releases next month.  Did you know that everytime you use inline assembly on the x86 compiler, the compiler disables optimization for that entire function?  That means none of the C/C++ code in the funtion will be optimized.  That, very often, is not a good thing.<br><br>

Re: Assembly On iPAQ

PostPosted: Dec 21, 2001 @ 4:27am
by Phantom
Digby,<br><br>For me the main reason to want an inline assembly thingy is the way I develop assembly code: Usually I start with some good C code, then I prepare the C code for conversion to asm, then I convert some portions and test wether or not the whole thing still works. This is where inline assembly comes in handy: I can convert code line-by-line, wich is a Good Thing. Once the entire function is converted, it can be moved to a separate .asm file of course, but even then, if it's one of the members of a class, I would rather have it in the same source file as the rest of the methods.<br><br>But the line-by-line conversion is what I really need it for. And if Dan's approach means it's going to be a bit slower than a pure asm file, so be it. BTW, I doubt that it will be significantly slower. If I draw a 64x64 sprite with alpha blending, I expect a huge gain from asm. 8 assignments (4 to load the registers, 4 to restore them) will not undo that advantage.<br><br>- Jacco.