02 - Optimisation fundamentals

Now we know how memory is used within a micro-controller then lets look at the options a compiler can use for optimising your code.

 

We will start by considering a very simple (but useless!) piece of code:


int j=0;

for(int i=0; i<100;i++){

j+=i;

}

postIt(++j);


Most compilers will decompose this C syntax by replacing all of the constructs like (for, while, do until, ++, --, etc) to produce a more simple language made up only from assignments and arithmetic:-

int i;

int j;

j = 0;

i = 0;

goto while;

loop:

i = i + 1;

 

while:

if i >= 100 then goto end

j = j + i;

goto loop

end:

j = j + 1;

postIt(j);


Why does it do this? Well you can decompose the syntax of most languages into such simple steps. Having done so then the process of converting this ‘simple’ language into machine code can then be common. So it makes life easier when creating say a Basic and a C compiler – as once the program is in this ‘common’ language then the same compiler can be used to convert it into machine code.

 

If you compile the original program with no optimisation then you will get something similar to the above.

 

How can the compiler do it better?

 

The variables ‘i’ and ‘j’ in our program are stored in RAM (either in the Data Segment if they are global, or in the Stack Segment if they are local to a method). So the processor will have created enough space to store them there. Lets assume for now that an ‘int’ is stored as a single byte (this is not the real world case!).

 

The ‘post-boy’ analogy

 

Think of memory as a long line of pigeon-holes in the post room. Each slot has its own unique location (or ‘address’ in geek speak) and can store, in our case, one piece of paper (or a byte) that has a current value written on it. So our program allocates two new pigeon-holes:- one for ‘i’ and one for ‘j’ and stores a blank sheet of paper into each one (ie the variables start off un-initialised).

 

So if I told the ‘post boy’ that ‘j = j + i’ he would have to run/walk to get the paper in slot I, look at the value written on it and put it back, go to slot ‘j’, look at the value on it, put it back, add the two values together in his head, go to slot ‘j’ and write the answer on the sheet in slot ’j’ and put it back.

 

Let’s convert our program into ‘post-boy’:


Our program

Un-optimised Post-Boy code

int i;

Find an empty pigeon-hole, label it as ‘i’, and insert a blank sheet of paper – ie the variable is ‘un-initialised’.

int j;

Find an empty pigeon-hole, label it as ‘j’, and insert a blank sheet of paper – ie the variable is ‘un-initialised’.

j = 0;

Walk to ‘j’, write 0 on the paper, and put it back. Now the variable has an initial value.

i = 0;

Walk to ‘i’, write 0 on the paper, and put it back. Now the variable has an initial value

goto while;

Goto while:

loop:

Loop:

i = i + 1;

Walk to ‘i’, add 1 to the value, and put it back

 

while:

While:

if i >= 100 then goto end

walk to ‘i’, look at the value, and put it back. If its >=100 then goto end

j = j + i;

walk to ‘j’, look at the value, walk to ‘i’, look at the value, walk back to ‘j’, write the result, and put it back

goto loop

Goto loop

end:

End:

j = j + 1;

walk to ‘j’, add one to it, and put it back

postIt(j);

walk to ‘j’, get the paper, and ‘postIt’


 

Phew! That’s slow – and has a lot of pointless running around. Welcome to ‘non-optimised code’. It does what you’ve asked it to do - but not very well .We’ll come back to this in a minute.

 

Just like the post-boy a processor doesn’t like having to access RAM – all that running around is slow.

 

To make things faster the processor has a very few other items of memory called registers. These sit right at the heart of the chip, and are very fast to access. But there ain’t many of them and some are allocated to special things like stack pointers, program pointers etc. Since there are a finite number of them available then they are given names like: A,X,Y or R1,R2,R3 etc depending on the specific processor. Since each processor has a different number of registers then this is one of the reasons why you have to tell the compiler what sort of processor you are compiling the program to run on. We will assume that there are two registers and that they are called R1 and R2.

 

N.B. If you look at the instructions available on an AVR, and most processors, then anything that requires memory access (ie a walk to the ‘pigeon-hole’) typically takes twice the amount of cycles (time) than operators that do something with registers. Also there are normally no machine code instructions that allow you to fetch something from memory, do something to it (like adding a fixed value) and then writing it back to the same, or a different, memory location without using a register. Everything has to go via a register. You can load from memory to a register, add/subtract etc registers together, then write the answer back to a memory variable. So all processors must have at least one register.

 

So back to our post-boy…

 

He has two registers as well but he calls them his ‘left hand’ and his ‘right hand’ – on which he can write a value with a pen. Once he has copied the value of a pigeon-hole onto his hand then he can see its value very quickly without having to run back to the pigeon-hole every time.  Therefore it makes sense if he keeps both hands full at any given time. 

 

If he then needs to access a third pigeon hole he will have to ‘dump’ one of the ones he has written on his hands so that he can write down the new value from the third pigeon hole.

 

Obviously: the decision as to which hand he should free up is an important decision.

A common technique is ‘least frequently used’. This means that he puts back the value from the hand that he hasn’t needed to look at for a while – by writing it back down on the slot where it belongs.

Other solutions may take the variable size into account – ie a ‘byte’ only needs one hand but a ‘long’ may require two hands so freeing a ‘long’ may release more ‘hands’.

Alternatively: the value on one hand may not have changed – in which case he can just rub it out without having to go back to the pigeon hole to save the new value.

 

Before we proceed:- there is one more very important thing the compiler needs to be able to do. In ‘Post-Boy’ speak I will call this ‘Wash your hands’. This means: copy any changed values written on your hands back into the pigeon-holes where they belong. Why do we need this? If you have a loop in your code (for, while etc) then every time the loop starts we have no idea what information is stored on what hand at the previous end of the loop and so we need to start with a clean pair of hands. Also if we are going to call another method then we have no idea what will be in our hands when we call it (since it may be called by 100 different places in our code) so we need to ‘wash our hands’ before calling a method. For the same reason a method will normally ‘wash our hands’ when it exits – so that any modified values are written back since the 100 places that call it cannot depend on what may be in each hand. The only exception is a function – ie a method that returns a value – in which case it is assumed that the return value is always in a given register – your left hand say. Other branches such as GOTO then need to do the same thing – since, for the code where we end up, we don’t know what will be in each hand.

 

So now lets look at the previous example but this time we want to optimise it so that we don’t do as much slow walking about:-

Our program

Optimised Post-Boy code

Left Hand

Right Hand

int i;

Find an empty pigeon-hole, label it as ‘i’, and insert a blank sheet of paper – ie the variable is ‘un-initialised’.

Unused

Unused

int j;

Find an empty pigeon-hole, label it as ‘j’, and insert a blank sheet of paper – ie the variable is ‘un-initialised’.

Unused

Unused

j = 0;

Find a free hand (Left) and write ‘0’ on it remembering that it should be stored back in j.

j=0 (changed)

Unused

i = 0;

Find a free hand (Right) and write ‘0’ on it remembering that it should be stored back in i.

j=0 (changed)

i=0 (changed)

goto while;

About to do a goto so: Wash-your-hands:

Left hand has changed so walk to j and put the value ‘0’.

Right hand has changed so walked to i and put the value 0.

Goto while:

j=0 (unchanged)

i=0 (unchanged)

loop:

Loop:

Start of a loop so assume its empty

Start of a loop so assume its empty

i = i + 1;

Walk to ‘i’, find a free hand (Left) and write it down

i (unchanged)

Empty

 

add 1 to the value

i + 1 (changed)

Empty

 

About to do a goto so: Wash-your-hands:

Left hand has changed so walk to I and put its new value of I=I+1. Left hand now stores new value of I

Goto while

i (unchanged)

Empty

while:

While:

Start of a loop so assume its empty

Start of a loop so assume its empty

if i >= 100 then goto end

walk to ‘i’, find a free hand (Left) and write it down.

 

i (unchanged)

Empty

 

If its >=100 then goto end

The goto will ‘Wash your hands’ but since no value has changed then nothing needs to be written. Otherwise the left hand keeps the same value.

i (unchanged)

Empty

j = j + i;

walk to ‘j’, find a free hand (Right) and write it down

i (unchanged)

j (unchanged)

 

Add left hand to right hand

i (unchanged)

j = j + i (changed)

goto loop

Wash your hands

Left hand is unchanged so do nothing.

Right hand is changed so walk to j and store right hand.

Goto loop

Empty

Empty

end:

Could have got here by various Gotos (which will have washed-their-hands) so assume everything is empty

End:

Empty

Empty

j = j + 1;

walk to ‘j’, find a free hand (Left) and write it down

j (unchanged)

Empty

 

Add one to it

j=j+1 (changed)

Empty

postIt(j);

Calling a method so wash-your-hands: save left hand to j

 

j (unchanged)

Empty

 

Call ‘postit’ passing variablej

j (unchanged)

Empty

 

Since ‘postit’ doesn’t return a value then assume all hands are empty upon return

Empty

Empty

 

I know it seems more complex – but believe me we have saved a very small amount of walking and so it runs a bit faster. But it’s still not very efficient – ie there’s still a lot more ‘walking’ than we need.

 

How can we do better?

Examining the code above then the variable ‘I’ is used quite a lot. We only have to walk to/from it when after/before we ‘wash-your-hands’. Wouldn’t it be great if we could tell the compiler that we actually always wanted to keep it in our hands and avoid all that walking about. Well you can do this by using the keyword ‘register’ before the name of the variable. Ie change

int i;

To

register int i;

 

This will then try to reserve the first register (left hand) to be used exclusively for storing this variable and so the normal wash-your-hands can ignore it. The only exception is when calling another method which may still use any registers (for example: it may define its own ‘register’ variables). So before calling a method then your routine will still save any unchanged value for your register variables prior to calling another method, and will assume that the register is empty upon return. So now our code looks like this:-

 

Our program

Optimised Post-Boy code

Left Hand

Right Hand

register int i;

Find an empty pigeon-hole, label it as ‘i’, and insert a blank sheet of paper – ie the variable is ‘un-initialised’.

If there is an available register (yes – Left Hand) then reserve it ;for variable i.

i (unchanged)

Unused

int j;

Find an empty pigeon-hole, label it as ‘j’, and insert a blank sheet of paper – ie the variable is ‘un-initialised’

i (unchanged)

Unused

j = 0;

Find a free hand (Right) and write ‘0’ on it remembering that it should be stored back in j.

i (unchanged)

j=0 (changed)

i = 0;

I is in the left hand so set it to zero

i=0 (changed)

j=0 (changed)

goto while;

About to do a goto so: Wash-your-hands:

Left hand is a ‘register’ variable so leave it alone.

Right hand has changed so walked to j and put the value 0.

Goto while:

i=0 (changed)

j=0(unchanged)

loop:

Loop:

I (changed)

Start of a loop so assume its empty

i = i + 1;

We have i in our left hand so just add 1 to it

i=i+1 (changed)

Empty

 

About to do a goto so: Wash-your-hands:

Left hand is a register variable so leave it alone.

Right hand is empty so do nothing

i (changed)

Empty

while:

While:

Start of a loop so assume it has the current value of i

Start of a loop so assume its empty

if i >= 100 then goto end

i is already in left-hand

 

i (changed)

Empty

 

If its >=100 then goto end

The goto will ‘Wash your hands’ but since the left hand is a register variable and the right hand is empty then do nothing.

i (changed)

Empty

j = j + i;

walk to ‘j’, find a free hand (Right) and write it down

i (changed)

j (unchanged)

 

Add left hand to right hand

i (changed)

j = j + i (changed)

goto loop

Wash your hands

Left hand is a register variable so do nothing.

Right hand is changed so walk to J and store right hand.

Goto loop

Empty

Empty

end:

Could have got here by various Gotos (which will have washed-their-hands) so assume everything is empty except left hand always stores i

End:

i (changed)

Empty

j = j + 1;

walk to ‘j’, find a free hand (Right) and write it down

i (changed)

j (unchanged)

 

Add one to it

i (changed)

j=j+1 (changed)

postIt(j);

Calling a method so wash-your-hands including any register variables.

Walk to i and save left hand, walk to j and save right hand

 

Empty

Empty

 

Call ‘postit’ passing variable j

Empty

Empty

 

Since ‘postit’ doesn’t return a value then assume all hands are empty upon return.

But reload ‘Left’ with register variable ‘i’ since its always meant to be there

i (unchanged)

Empty

 

What you should find is that almost all of those expensive walk-to-i steps have vanished. So this code will be:-

a)      smaller – as it doesn’t have to include the instructions to keep loading from i, and saving to j

b)     faster – because all of the instructions in a) are quite slow and are no longer used.

 

So now you may think that putting ‘register’ in front of EVERY variable will be great. Well the problem is you may have 16 variables but only 3 registers – in which case the compiler cannot honour your request. So you should consider the ‘register’ keyword as a ‘hint’ to the compiler – ie you are trying to help it optimise the code but it may not be able to honour what you have asked it to do in which case it may ignore some/all of your ‘register’ keywords. So use them sparingly. In fact most modern compilers will inspect the code and automatically try to decide which variables are best kept in registers and may disregard the ‘register’ keyword all together. But whether your compiler does it automatically, or respects your ‘register’ keywords, the fact is: the code is much faster.

 

So, in summary, we can make the program smaller and thus faster by reducing the number of round trips to the RAM to read and/or write variables un-necessarily.


What is different then when you optimise for speed?

The speed at which the program runs will depend not only on the number of bytes of program code that it has to execute but may also be affected by something called pipelining.

 

We already have post-boy running around doing things – but how does he know what it is that he should be doing next? He’s controlled by the program. But the program is also stored in Flash Memory and someone or something needs to be reading the next command from the program in order to tell post-boy what to do next. Most processors have a dedicated register called the Program Counter that contains the offset into the program for the next instruction. Of course we could potentially tell post-boy that after he has done something: he should look at the the program counter and run to that memory location to get the next command, and do what it asks him to do. Again this is slightly inefficient. Why not have a post-boy-supervisor who tells him what to do?  Then whilst post-boy is running around doing what he has been told to do (such as walk to ‘j’, and come back with its current value in your left hand) then post-boy-supervisor could, simultaneously, be running to the program to find out what he should tell post-boy to do next. Makes sense. Ooh and by the way – the processor just does this for you so just take it as read.

 

So where does ‘pipelining’ come in?

 

Well lets assume post-boy is very busy running around with bits of data. In the meantime ‘post-boy-supervisor’ may already have come back with the next instruction. Rather than waiting for post-boy to come back the supervisor goes and gets as many more program instructions as he can. So he reads ahead (called prefetch) and thus helps predict the next instructions that post-boy will need. This is fine until supervisor reaches a branching instruction (if, for, while, do, switch, goto, call another method etc). Since most of these will depend upon the contents of the variables that post-boy comes back with then supervisor will not know where the program will actually go until post-boy comes back with the actual answers. So all he can do is keep pre-fetching the next instruction in the program. When you come to branches then, more often than not, the supervisor has got it wrong and the program actually ends up going somewhere different. In which case all of the prefetched instructions are from the wrong part of the program and must be discarded. Now supervisor has to start getting the instructions from the new position.

 

Optimising for speed can therefore generate code which tries to reduce the number of branches. This often means duplicating some code into both the ‘then’ and ‘else’ elements of an ‘if’ statement - for example. So the program is bigger but because there are less branches then the supervisor can be more efficient at pre-fetching and so post-boy doesn’t spend as much time waiting for the supervisor to come back with the next command for him to do.

 

So if you tell your compiler to ‘optimise for speed’ then it will re-arrange your code for you automatically. Just be tolerant of the fact that it may actually make your program bigger in size.

 

Things to avoid so that post-boy and post-boy-supervisor can work at a reasonable speed

 

Here is an example of where trying to optimise stuff yourself by changing your code can have hidden problems. Lets assume that your method does ‘i = i + 1’ quite a lot. You may be tempted to create a method:

int increment(int i){

return i+1;

}

 

and change the rest of your code so that it instead of:

i = i +1

It does

i = increment(i);

 

This is bad for a number of reasons:

  1. Post-boy-supervisor cannot pre-fetch very effectively as the program keeps calling your method so he has to keep throwing away the instructions he has already retrieved.
  2. Post-boy keeps having to ‘wash-your-hands’ since it has no idea what registers the method is going to need. So if he has ‘i’ as a register variable then he will have to keep storing it back into its pigeon hole every time the routine is called and reloading it upon return.
  3. Every method has some code automatically inserted by your compiler at the start and at the end. So this additional code, plus the code caused by ‘wash-your-hands’ whenever its called will probably actually mean that your program is bigger and runs slower.
  4. A lot of very small methods tend to make your code very hard to understand to the reader.

 

Methods should do something sensible and not just be one-line bits of code. If you really want to do something like this then see ‘#define’ later on.

 

Simple things that make a big difference - use the correct variable type

Although the compiler can help to optimise the code based on what you have written – it is very difficult for it to correct any mistakes you have made. So the big thing you can do to help is to choose your variable types sensibly. Most microcontroller code deals with either ‘numbers’ or ‘characters’. Characters = ‘text’ and is normally logged out to an LCD, UART, computer etc. The standard definition for a ‘char’ is an 8 bit byte and is the smallest lump of data we can deal with – so cannot be optimised further. But ‘Numbers’ can be a minefield. C, like most languages, allows you to define ‘number’ variables that can store different ranges of numbers. Hence: a ‘long’ can store a bigger integer number than you can store in an ‘int’. But, in my opinion, the biggest problem with C is that the size of an ‘int’ depends on the hardware it is running on – (before the purists beat me up I know ‘why’ this is the case). As a programmer you normally know what the range of legal values will be and so you need to make sure your program will cope with this range no matter what platform it actually runs on. The ‘avr-lib’ helps you out here by creating some type definitions for different sized variables. I recommend that you use them. They include:-

int8_t – Can store a number from –128 to +127 (requires 1 byte)

uint8_t – Can store a number from 0 to 255 (requires 1 byte)

int16_t – Can store a number from –32,768 to +32,767 (requires 2 bytes)

uint16_t – Can store a number from 0 to 65,535 (requires 2 bytes)

int32_t – Can store a number from –2,147,483,648 to + 2,147,483,647 (requires 4 bytes)

uint32_t – Can store a number from 0 to +4,294,967,295  (requires 4 bytes)

 

Then there are the other standard ‘C’ types:-

Float – stores a floating point number.

Double – like a float but can store the number with a greater precision.

Both of these should be avoided whenever possible. They both requires a lot of space (say 8 bytes or more). More importantly: on platforms without dedicated floating point hardware such as microcontrollers it means that a whole bunch of extra ‘floating point code’ gets added to your program and is quite big. Note that this code is added as soon as you use one 'float' or 'double'. If you add further floats/doubles then no more code is added.

 

Make sure that you choose one of the above types that can store the range of numbers you need but requires the fewest bytes of memory. Bear in mind that ‘post-boy’ has hands of a fixed size – they can only store one byte. So if you use a 2 byte variable type then you are limiting his options to keep things in his hands and will cause him to do more walking and slow down your program.

 

 

Summary

 

This section has tried to describe the basics of how optimisation works – across any processor. This has included descriptions of our ‘post-boy’ and ‘post-boy-supervisor’ analogies to try and explain what the compiler is doing to make your program run differently. A later section will show you how to tell the compiler what optimisation method you want it to use.

 

Optimise for speed may make your program bigger but makes post-boy-supervisor more efficient at pre-fetching the next instructions, by minimising any branching, meaning post-boy spends less time waiting for him to get back with the next instruction.

 

Optimising for size, and by using register variables, means that the compiler can issue less ‘walks’ for the post-boy.

 

Always try to choose the ‘smallest’ data type for each of your variables. Failure to do so will make your program bigger and slower.

 

Once we start to get our hands dirty with real examples then we can show the actual effects of these options.

 

Of course you may not be very interested in all this detail and just want to ‘make stuff work better’. This, of course, is an option – but to get the most out of your compiler and hardware then the greater your understanding then the better results you can achieve.