Author Topic: tips to speed up processor intensive algorithm  (Read 5378 times)

0 Members and 1 Guest are viewing this topic.

• Supreme Robot
• Posts: 11,702
tips to speed up processor intensive algorithm
« on: June 09, 2008, 01:24:23 PM »
This is where my mechE skills fail . . . programming . . .

I'm programming my first real-time processor intensive algorithm, and I'm definitely pushing the limits . . . A solution needs to be found measured in milliseconds, or large PID error will result . . .

Anyway, my algorithm involves high precision trig (ruling out small lookup tables), doubles, long signed ints, multiplication/division of the before mentioned, and large arrays storing the results. Here are just a few examples of a much more complex calculation:
Code: [Select]

int amp_avg_F=11;
int amp_avg_R=15;
int amp_avg_U=5;
int amp_avg_D=8;

signed long int theta;//some angle in degrees

double weightA;//some value below 1
double weightB;

//once per loop
weightA=(amp_avg_U/amp_avg_F)/(tan(theta)+(amp_avg_U/amp_avg_F));
weightB=tan(theta)/(tan(theta)+(amp_avg_U/amp_avg_F));

//10 equations like this per loop

So my question . . . are there advanced programming techniques to speed these things up? Anything that I could trip up on? Is there a way to predict the total processing time given a particular math equation and a given processor? Know of tutorials to refer me to?

I probably got my variable types wrong, haven't paid much attention to that yet, and I always get confused using mixed type equations . . . if it looks wrong let me know . . .

hgordon

• Expert Roboticist
• Supreme Robot
• Posts: 373
Re: tips to speed up processor intensive algorithm
« Reply #1 on: June 09, 2008, 02:38:25 PM »
What precision do you need, and on what processor does this need to run ?
Surveyor Corporation
www.surveyor.com

• Supreme Robot
• Posts: 11,702
Re: tips to speed up processor intensive algorithm
« Reply #2 on: June 09, 2008, 02:50:46 PM »
(oops, forgot to mention that)

It's running on my Axon (but with an ATmega2560) at 16MHz.

Precision for calculating 'weightA' and 'weightB' should be at worst +/- .025 in the final result.

That bottom equation with the arrays can have an error more like +/- 10.

benji

• Supreme Robot
• Posts: 830
Re: tips to speed up processor intensive algorithm
« Reply #3 on: June 09, 2008, 02:59:33 PM »
i think this totally depends on the compiler and the already made functions ,
there are multiple algorithms to do a specific function,
the important thing in processors are just their clocking , 16 mhz would do it twice faster than 8mhz.
thats because the processor only understands machine language
there are some DSPs that lets you run 2 programs at the same time,, that can add up a lot too.
so the thing is maybe its more a software thing than a hardware thing.

so if your C compiler is efficient enough to give optimal assembly (or machine code) then this helps.

another thing is about the programmer himself,the better the algorithm doesnt always means the more precise, somtimes you need fast not very acurate answers,, or here comes the tricks with programming

example:
one day i needed to use the SINE function in a program
in debugging i noticed that this function takes about 2ms, and that was too long for the application
so i did a lookup table with the angles i have (they were about 12 angle ) so this made the algorithm run much faster

somtimes there are already built functions in a high level language made for general purpose, a good programmer should do another according to his program needs (ex: make it faster with some sacrifices maybe like loosing addition of double kind variables)

so if you program in C and you use the compiler or a library already made functions, try to modify them.

good ol' BeNNy

hgordon

• Expert Roboticist
• Supreme Robot
• Posts: 373
Re: tips to speed up processor intensive algorithm
« Reply #4 on: June 09, 2008, 05:10:58 PM »
I never use floating point libraries or hardware, but do use lookup tables and prescale my integer calculations in a way which is similar to floating point math by prescaling values and making certain that I don't end up with overflows.  I would have to see the range of actual numbers you expect to process to suggest exactly how to structure your equations, but as an example, if I needed 1% accuracy on a calculation that had a expected result around 1.00, I would scale things up by 1000 (or 1024).

Here's an example of a calculation used in some neural net code, where all of the values are scaled to a range of 0-1024 instead of 0.0-1.0.  In fact, the integer divides by 1024 could instead be performed with a right shift by 10.

E_HIDDEN(h) = (((N_HIDDEN(h) * (1024 - N_HIDDEN(h))) / 1024) * err) / 1024;

By knowing in advance the expected range of inputs and outputs, I can dial in the necessary accuracy with some very fast code.
« Last Edit: June 09, 2008, 05:12:54 PM by hgordon »
Surveyor Corporation
www.surveyor.com

Asellith

• Contest Winner
• Supreme Robot
• Posts: 648
• "I'm a leaf on the wind. Watch how I soar"
Re: tips to speed up processor intensive algorithm
« Reply #5 on: June 11, 2008, 11:33:50 AM »
This may be two specific but does theta have a min and max range for the application? then you could have a smaller more detailed lookup table if theta only falls in a specific range.

Also changing this line

weightA=(amp_avg_U/amp_avg_F)/(tan(theta)+(amp_avg_U/amp_avg_F));
weightB=tan(theta)/(tan(theta)+(amp_avg_U/amp_avg_F));

to:
denominator = (tan(theta)+(amp_avg_U/amp_avg_F);
weightA=(amp_avg_U/amp_avg_F)/denominator);
weightB=tan(theta)/denominator);

may speed up things a pit. That way it does less calculation. Just have to watch for truncation error on the denominator.

The only way I know to speed math calculations up other then developing a better algorithm is to bust out the machine language and code in that. Have fun with that because it could take a while to code and you might not speed it up all that much. Just lets you control the math functions better to streamline them to your algorithm.
Jonathan Bowen
CorSec Engineering
www.corseceng.com

Webbot

• Expert Roboticist
• Supreme Robot
• Posts: 2,165
Re: tips to speed up processor intensive algorithm
« Reply #6 on: June 11, 2008, 07:10:36 PM »
I would go a bit further than Asellith as there are still some values you are calculating more than once. If your compiler isn't a good optimiser, or you haven't enabled optimisation, then it will generate code to do the same calculations many times:

double tanTheta = tan(theta);
double uDivF = amp_avg_U/amp_avg_F;
double denominator = tanTheta +uDivF ;
weightA=uDivF /denominator);
weightB=tanTheta /denominator);

If the 'tan' function is taking up too much time and theta doesn't change every time this code is run then you could try having a global variable to store the old value. ie
// global variables initialised to a known value
double lastTheta = 0.5;
double lastTanTheta = tan(lastTheta);

double tanTheta;
if( theta == lastTheta ){
// its the same as last time - so no need to recalculate
tanTheta = lastTanTheta;
}else{
// its changed - so recalculate - and store again for next time
lastTanTheta = tanTheta = tan(theta);
lastTheta = theta;
}

Webbot Home: http://webbot.org.uk/
WebbotLib online docs: http://webbot.org.uk/WebbotLibDocs
If your in the neighbourhood: http://www.hovinghamspa.co.uk

• Supreme Robot
• Posts: 11,702
Re: tips to speed up processor intensive algorithm
« Reply #7 on: June 12, 2008, 07:44:32 AM »
Quote
then you could have a smaller more detailed lookup table if theta only falls in a specific range.
The range is from 180 degrees to -180 degrees. Forces me to use a rather large lookup table . . .

Quote
I would scale things up by 1000
Yea this is my thought too.

As such, this brings up another question . . . so I have tan(theta), but I assume theta must be in radians for it to calculate properly, and the solution will also be in radians, right? Doesn't this force me to use floating point? tan(1.5) != tan(1500)

Anyway, starting over from scratch I found a simpler yet more accurate/fast algorithm to solve the same problem. It uses a lot of constants and a lot less trig:
Code: [Select]
weightA=(Lu-Tu*tan(theta))/((Tf-Tu)*tan(theta)-Lf+Lu);
weightB=1-weightA;

pomprocker

• Supreme Robot
• Posts: 1,431
• Sorry miss, I was giving myself an oil-job.
Re: tips to speed up processor intensive algorithm
« Reply #8 on: June 12, 2008, 10:58:22 AM »
you guys blow me away with all this math. I wish I could keep up.

I'm actually in Linear Algebra right now, but still I have no clue when it comes to applying everything to the real world.

hgordon

• Expert Roboticist
• Supreme Robot
• Posts: 373
Re: tips to speed up processor intensive algorithm
« Reply #9 on: June 12, 2008, 11:53:13 AM »
As such, this brings up another question . . . so I have tan(theta), but I assume theta must be in radians for it to calculate properly, and the solution will also be in radians, right? Doesn't this force me to use floating point? tan(1.5) != tan(1500)
No - you just prescale the 1.5 to an integer value.  If you are using radians instead of degrees, your range is 0 to 6.28 radians, so you could multiply the radians by 100 to create 628 table entries.  Or you could note that there is redundant information in the table for 0 to 157, 158 to 314, 315 to 472, and 473 to 627, and set up the calculation based on your quadrant.
Surveyor Corporation
www.surveyor.com

JonHylands

• Expert Roboticist
• Supreme Robot
• Posts: 562
• Robot Builder/ Software Developer
Re: tips to speed up processor intensive algorithm
« Reply #10 on: June 12, 2008, 02:39:39 PM »
Use a lookup table. They (if done properly) end up in FLASH, and you've got 256 KB of the stuff.

- Jon

• Supreme Robot
• Posts: 11,702
Re: tips to speed up processor intensive algorithm
« Reply #11 on: June 13, 2008, 07:09:29 AM »
Quote
Quote
As such, this brings up another question . . . so I have tan(theta), but I assume theta must be in radians for it to calculate properly, and the solution will also be in radians, right? Doesn't this force me to use floating point? tan(1.5) != tan(1500)
No - you just prescale the 1.5 to an integer value.  If you are using radians instead of degrees, your range is 0 to 6.28 radians, so you could multiply the radians by 100 to create 628 table entries.  Or you could note that there is redundant information in the table for 0 to 157, 158 to 314, 315 to 472, and 473 to 627, and set up the calculation based on your quadrant
Oh, I meant if I was to *not* use a lookup table.

Quote
Use a lookup table. They (if done properly) end up in FLASH, and you've got 256 KB of the stuff.
I think I got my algorithm working fairly fast now . . . but mostly I'm avoiding the lookup table from a combination of laziness and the fact I'm still in a prototyping stage (I change the code very often).

I already know all these basic tricks of lookup tables, scaling by 100, compiler optimization, using assembly, and storing common calculations as a stored variable. In fact, the forum is has dozens of posts where I suggest this to other people already . . . heck I wrote the lookup table tutorial years ago!

Specifically, I wanted to optimize in terms of order of operations, whether dividing or multiplying is faster, bit operation efficiency, etc. I was looking for advanced techniques, stuff they teach only in advanced CS classes, etc.

I assume there are tutorials on this, but I really don't know where to look . . .

krich

• Robot Overlord
• Posts: 165
Re: tips to speed up processor intensive algorithm
« Reply #12 on: June 13, 2008, 10:30:19 AM »
I bumped into this recently.  Looks like Tangent is twice as evil as Sine or Cosine with respect to CPU utilization( tan(x)=sin(x)/cos(x) ).  I only skimmed it, but there's some really good tips in here, if you can get through the fairly advanced math.

One possibility is to generate the Tangent lookup table at startup and just store it somewhere.  This way you programatically decide your accuracy.

Regarding your question about tan(1.5) != tan(1500), no matter what the value you pass to the tan() function, the number is passed into the function as a floating point number, so you're still using the FP libraries.  No way around it unless you write your own tan() function that is expecting to be passed a long integer.  Even then, I'm not sure that the algorithm required to generate the tangent will work with a value that large or come up with the right answer.  Perhaps instead of multiplying by a factor of 10's, multiply by a factor of 6.28's.  That'll probably get you the right answer, but I'm not sure if that'll get you anything on the CPU utilization side of things.

EDIT:  Here's something that might be useful.  It's targeted towards those who want to learn how to do Tangent's in their head, but it looks like it deals mostly with integer math.
« Last Edit: June 13, 2008, 10:42:08 AM by krich »