Society of Robots  Robot Forum
Software => Software => Topic started by: Admin on June 09, 2008, 01:24:23 PM

This is where my mechE skills fail . . . programming . . .
I'm programming my first realtime processor intensive algorithm, and I'm definitely pushing the limits . . . A solution needs to be found measured in milliseconds, or large PID error will result . . .
Anyway, my algorithm involves high precision trig (ruling out small lookup tables), doubles, long signed ints, multiplication/division of the before mentioned, and large arrays storing the results. Here are just a few examples of a much more complex calculation:
unsigned int Rib_1_Left_Adapt[9];
int amp_avg_F=11;
int amp_avg_R=15;
int amp_avg_U=5;
int amp_avg_D=8;
signed long int theta;//some angle in degrees
double weightA;//some value below 1
double weightB;
//once per loop
weightA=(amp_avg_U/amp_avg_F)/(tan(theta)+(amp_avg_U/amp_avg_F));
weightB=tan(theta)/(tan(theta)+(amp_avg_U/amp_avg_F));
//10 equations like this per loop
Rib_1_Left_Adapt[time_position]= Rib_1_Left_F[time_position]*weightA + Rib_1_Left_U[time_position]*weightB;
So my question . . . are there advanced programming techniques to speed these things up? Anything that I could trip up on? Is there a way to predict the total processing time given a particular math equation and a given processor? Know of tutorials to refer me to?
I probably got my variable types wrong, haven't paid much attention to that yet, and I always get confused using mixed type equations . . . if it looks wrong let me know . . .

What precision do you need, and on what processor does this need to run ?

(oops, forgot to mention that)
It's running on my Axon (but with an ATmega2560) at 16MHz.
Precision for calculating 'weightA' and 'weightB' should be at worst +/ .025 in the final result.
That bottom equation with the arrays can have an error more like +/ 10.

i think this totally depends on the compiler and the already made functions ,
there are multiple algorithms to do a specific function,
the important thing in processors are just their clocking , 16 mhz would do it twice faster than 8mhz.
thats because the processor only understands machine language
there are some DSPs that lets you run 2 programs at the same time,, that can add up a lot too.
so the thing is maybe its more a software thing than a hardware thing.
so if your C compiler is efficient enough to give optimal assembly (or machine code) then this helps.
another thing is about the programmer himself,the better the algorithm doesnt always means the more precise, somtimes you need fast not very acurate answers,, or here comes the tricks with programming
example:
one day i needed to use the SINE function in a program
in debugging i noticed that this function takes about 2ms, and that was too long for the application
so i did a lookup table with the angles i have (they were about 12 angle ) so this made the algorithm run much faster
somtimes there are already built functions in a high level language made for general purpose, a good programmer should do another according to his program needs (ex: make it faster with some sacrifices maybe like loosing addition of double kind variables)
so if you program in C and you use the compiler or a library already made functions, try to modify them.

I never use floating point libraries or hardware, but do use lookup tables and prescale my integer calculations in a way which is similar to floating point math by prescaling values and making certain that I don't end up with overflows. I would have to see the range of actual numbers you expect to process to suggest exactly how to structure your equations, but as an example, if I needed 1% accuracy on a calculation that had a expected result around 1.00, I would scale things up by 1000 (or 1024).
Here's an example of a calculation used in some neural net code, where all of the values are scaled to a range of 01024 instead of 0.01.0. In fact, the integer divides by 1024 could instead be performed with a right shift by 10.
E_HIDDEN(h) = (((N_HIDDEN(h) * (1024  N_HIDDEN(h))) / 1024) * err) / 1024;
By knowing in advance the expected range of inputs and outputs, I can dial in the necessary accuracy with some very fast code.

This may be two specific but does theta have a min and max range for the application? then you could have a smaller more detailed lookup table if theta only falls in a specific range.
Also changing this line
weightA=(amp_avg_U/amp_avg_F)/(tan(theta)+(amp_avg_U/amp_avg_F));
weightB=tan(theta)/(tan(theta)+(amp_avg_U/amp_avg_F));
to:
denominator = (tan(theta)+(amp_avg_U/amp_avg_F);
weightA=(amp_avg_U/amp_avg_F)/denominator);
weightB=tan(theta)/denominator);
may speed up things a pit. That way it does less calculation. Just have to watch for truncation error on the denominator.
The only way I know to speed math calculations up other then developing a better algorithm is to bust out the machine language and code in that. Have fun with that because it could take a while to code and you might not speed it up all that much. Just lets you control the math functions better to streamline them to your algorithm.

I would go a bit further than Asellith as there are still some values you are calculating more than once. If your compiler isn't a good optimiser, or you haven't enabled optimisation, then it will generate code to do the same calculations many times:
double tanTheta = tan(theta);
double uDivF = amp_avg_U/amp_avg_F;
double denominator = tanTheta +uDivF ;
weightA=uDivF /denominator);
weightB=tanTheta /denominator);
If the 'tan' function is taking up too much time and theta doesn't change every time this code is run then you could try having a global variable to store the old value. ie
// global variables initialised to a known value
double lastTheta = 0.5;
double lastTanTheta = tan(lastTheta);
// Then in your code
double tanTheta;
if( theta == lastTheta ){
// its the same as last time  so no need to recalculate
tanTheta = lastTanTheta;
}else{
// its changed  so recalculate  and store again for next time
lastTanTheta = tanTheta = tan(theta);
lastTheta = theta;
}

then you could have a smaller more detailed lookup table if theta only falls in a specific range.
The range is from 180 degrees to 180 degrees. Forces me to use a rather large lookup table . . .
I would scale things up by 1000
Yea this is my thought too.
As such, this brings up another question . . . so I have tan(theta), but I assume theta must be in radians for it to calculate properly, and the solution will also be in radians, right? Doesn't this force me to use floating point? tan(1.5) != tan(1500) :P
Anyway, starting over from scratch I found a simpler yet more accurate/fast algorithm to solve the same problem. It uses a lot of constants and a lot less trig:
weightA=(LuTu*tan(theta))/((TfTu)*tan(theta)Lf+Lu);
weightB=1weightA;

you guys blow me away with all this math. I wish I could keep up.
I'm actually in Linear Algebra right now, but still I have no clue when it comes to applying everything to the real world.

As such, this brings up another question . . . so I have tan(theta), but I assume theta must be in radians for it to calculate properly, and the solution will also be in radians, right? Doesn't this force me to use floating point? tan(1.5) != tan(1500) :P
No  you just prescale the 1.5 to an integer value. If you are using radians instead of degrees, your range is 0 to 6.28 radians, so you could multiply the radians by 100 to create 628 table entries. Or you could note that there is redundant information in the table for 0 to 157, 158 to 314, 315 to 472, and 473 to 627, and set up the calculation based on your quadrant.

Use a lookup table. They (if done properly) end up in FLASH, and you've got 256 KB of the stuff.
 Jon

As such, this brings up another question . . . so I have tan(theta), but I assume theta must be in radians for it to calculate properly, and the solution will also be in radians, right? Doesn't this force me to use floating point? tan(1.5) != tan(1500)
No  you just prescale the 1.5 to an integer value. If you are using radians instead of degrees, your range is 0 to 6.28 radians, so you could multiply the radians by 100 to create 628 table entries. Or you could note that there is redundant information in the table for 0 to 157, 158 to 314, 315 to 472, and 473 to 627, and set up the calculation based on your quadrant
Oh, I meant if I was to *not* use a lookup table.
Use a lookup table. They (if done properly) end up in FLASH, and you've got 256 KB of the stuff.
I think I got my algorithm working fairly fast now . . . but mostly I'm avoiding the lookup table from a combination of laziness and the fact I'm still in a prototyping stage (I change the code very often).
I already know all these basic tricks of lookup tables, scaling by 100, compiler optimization, using assembly, and storing common calculations as a stored variable. In fact, the forum is has dozens of posts where I suggest this to other people already . . . heck I wrote the lookup table tutorial years ago! :P
Specifically, I wanted to optimize in terms of order of operations, whether dividing or multiplying is faster, bit operation efficiency, etc. I was looking for advanced techniques, stuff they teach only in advanced CS classes, etc.
I assume there are tutorials on this, but I really don't know where to look . . .

I bumped into this recently. (http://www.research.scea.com/research/pdfs/RGREENfastermath_GDC02.pdf) Looks like Tangent is twice as evil as Sine or Cosine with respect to CPU utilization( tan(x)=sin(x)/cos(x) ). I only skimmed it, but there's some really good tips in here, if you can get through the fairly advanced math.
One possibility is to generate the Tangent lookup table at startup and just store it somewhere. This way you programatically decide your accuracy.
Regarding your question about tan(1.5) != tan(1500), no matter what the value you pass to the tan() function, the number is passed into the function as a floating point number, so you're still using the FP libraries. No way around it unless you write your own tan() function that is expecting to be passed a long integer. Even then, I'm not sure that the algorithm required to generate the tangent will work with a value that large or come up with the right answer. Perhaps instead of multiplying by a factor of 10's, multiply by a factor of 6.28's. That'll probably get you the right answer, but I'm not sure if that'll get you anything on the CPU utilization side of things.
EDIT: Here's something that might be useful. (http://www.myreckonings.com/Dead_Reckoning/Online/Materials/Fast_Approximation_of_Elementary_Functions.pdf) It's targeted towards those who want to learn how to do Tangent's in their head, but it looks like it deals mostly with integer math.

krich this is exactly what I was looking for, thanks!
Its quite a lot to absorb though, and I've only skimmed through it quickly so far. It'll probably take me weeks to assimilate all the new ideas . . . someone should write a tutorial :P