Society of Robots - Robot Forum

Software => Software => Topic started by: obiwanjacobi on April 23, 2013, 02:53:09 AM

Title: Distributed algorithms
Post by: obiwanjacobi on April 23, 2013, 02:53:09 AM
Hi,

I am researching a 'distributed design' for a robot. See also this thread for my question on the hardware (http://www.societyofrobots.com/robotforum/index.php?topic=16870.0).

The idea is to divide a big problem (of all the functions a robot might have) up in autonomous small problem. So suppose we have a walking robot where each leg has its own MCU managing just that one leg. All these slave controllers are connected onto a bus to receive high-level commands and/or send back events (leg stuck, too much force etc.). The central brain (MCU) - also connected to the bus- only has to deal with these high level commands and events. It orchestrates /coordinates the operations of all these intelligent slave MCU's.

The idea is to use very simple/small MCU for each slave controller and use a multi-tasking MCU as the brain. I haven't decided yet on which exact brands/types/models to use. I assume I will be programming in C++ (or perhaps C#/.NET micro). So some overhead will require a minor increase in available (programming) memory.

My question is, what algorithms or (code) design patterns would be appropriate for this type of design.

I have some idea's of my own but I was wondering what you guys think.

Thanx,
Marc
Title: Re: Distributed algorithms
Post by: jwatte on April 23, 2013, 11:51:44 AM
This topology has been used a lot in robotics. ROS has significant support for this topology, for example.

In general, where you put your smarts has a lot to do with economics. Building reliable busses to communicate status and commands with low latency adds cost, and support functions for a MCU (power regulation, filtering, etc) also adds cost.

Most systems that go for this approach end up putting one controller per separately-built or separately-sold item. For example, if you build a vision sensor, you end up building image processing into that sensor and selling it all as a single device.

The example of one MCU per leg is a little stretched, because the processing power needed for IK and servo control is trivial even for the cheapest microcontrollers today. Running four legs from a single MCU ends up being cheaper, and often more robust, for current systems. If you want a system that has some degree of autonomy even when the executive functions are disabled ("reflexes") then you'd want an MCU per extremity, though. So, it depends on what requirements you have and what you want to pay for.
Title: Re: Distributed algorithms
Post by: obiwanjacobi on April 24, 2013, 01:23:20 AM
I understand the economics and efficiency part of this design. The 'one MCU per leg' was only and example to clarify what I meant (clearly you got it ). At this moment I am only researching the possibilities and options. The decision on how many MCU per 'part' will be made later.

One of the reasons why I am researching this design is to make things modular. Now I know that there are other ways to do that (and it is not the only reason) but those other ways always seem to involve a limited amount of 'connectors' on the main board - only allowing you to connect so many legs/wheels/sensors etc.

The way around that is to use a bus that can address each attached 'device'. Each 'device' board would have the connectors to connect to the next 'device' - so at least 2 connectors.

Another reason is just for the fun of it. I want to explore the implications of using a distributed design (rule 1 of distribution: don't do it ;-) and in this particular post I am asking for the software implications.

What are the commonly used practices for communicating with other MCU's and how do you let multiple MCU's accomplish one task? What can you do when one or more 'node' are offline? That sort of stuff...
Title: Re: Distributed algorithms
Post by: obiwanjacobi on April 24, 2013, 01:56:39 AM
This topology has been used a lot in robotics. ROS has significant support for this topology, for example. [..]

I just looked up ROS (Robotic Operating System) and it runs on Linux! That is not quite the level I was aiming for. I was aiming on a more small scale, lower level solution. So not quite as professional probably :-P

But a good source of 'inspiration' - no doubt.
Title: Re: Distributed algorithms
Post by: Jon_Thompson on April 24, 2013, 04:18:20 AM
You could implement your own non pre-emptive scheduler straight onto the tin. It's just a loop with a clock variable and a set of priorities. When clock % priority = 0, then run the associated task. Keep the tasks small and quick, and it's a very efficient paradigm.
Title: Re: Distributed algorithms
Post by: obiwanjacobi on April 24, 2013, 05:51:43 AM
Yep cooperative multi tasking was what I had in mind for the main MCU. You can keep it (overhead for tasking) very lean. Unfortunately I haven't bumped into a good C++ library yet... so I guess I will have to roll my own.

Another option I have considered is using a multi-core processor. I read some on the Propeller Parallax and that looks very promising. That way you can have really separate task that run truly parallel.

Have true multi-threading also introduces some synchronization and locking issues but I was hoping to adopt a message based paradigm and avoid shared state (that should take the pain out of most issues).

What do you guys think?
Title: Re: Distributed algorithms
Post by: Jon_Thompson on April 24, 2013, 07:11:45 AM
To be frank, I think you're over-complicating things, but my overriding philosophy is always to get the most work out of the least computing power.
Title: Re: Distributed algorithms
Post by: obiwanjacobi on April 24, 2013, 08:29:18 AM
Yes, I tend to do that  ;D

I usually let my 'imagination' run wild in the design phase to end up with the 'perfect picture'.
But when building time comes I simplify it and only build what I need - but following the design of my perfect picture as much as possible/sensible.

I know that once you start building you will get new ideas - or discard old ideas - things will change. That is evolution and that is part of the fun.

My questions here are more geared toward 'what experience of others who've been there can I use'.

Thanx for your comments.
Title: Re: Distributed algorithms
Post by: jwatte on April 24, 2013, 01:23:41 PM
In general, for safety, you want a "dead man's" system. Two nodes communicating will exchange data at a certain rate, and when one end has missed its time slot by more than X time, assume that that node is dead until you find out otherwise.
Thus, for example, a node driving a motor controller should stop the motors if it goes for 100 milliseconds without getting any new commands from the higher-order function that decides on desired motor speeds.

If you use a network like CAN or I2C, they already have built-in functions for addressing particular nodes. You just assign node IDs ("front left leg == 5") and go.
Title: Re: Distributed algorithms
Post by: Duane Degn on April 25, 2013, 01:55:00 AM
Another option I have considered is using a multi-core processor. I read some on the Propeller Parallax and that looks very promising. That way you can have really separate task that run truly parallel.

I was going to suggest the Propeller. The multiple processors are a great fit for lots of robotic projects.

For example, I used the Propeller in my Mecanum wheeled Rover 5. One processor (cog) is dedicated to reading the four quadrature encoders (without causing insane numbers of interrupts(or any for that matter)). Another cog produces the four PWM signals (a single cog can produce many more than just the four). Another cog used used as a UART (a single cog can easily be used to provide four UARTs).

Breaking the tasks down into independent cogs keeps the main loop nice and clean. You don't have to worry about interrupts messing up the timing of other parts of the code.

Post #85 of this thread has a fun (IMO) video of the robot moving around.

http://forums.parallax.com/showthread.php/130797-Mecanum-Wheeled-Robot-with-Machine-Vision?p=1166320&viewfull=1#post1166320 (http://forums.parallax.com/showthread.php/130797-Mecanum-Wheeled-Robot-with-Machine-Vision?p=1166320&viewfull=1#post1166320)

I've listed a bunch of my Propeller projects in post #2 of this thread:

http://forums.parallax.com/showthread.php/135706-Index-Test?p=1049895&viewfull=1#post1049895 (http://forums.parallax.com/showthread.php/135706-Index-Test?p=1049895&viewfull=1#post1049895)

I'll force myself to stop here. I often get kind of carried away when I start talking about the Propeller. I'm a really big fan of the chip.
Title: Re: Distributed algorithms
Post by: obiwanjacobi on April 25, 2013, 02:46:08 AM
Excellent!

I started reading more on the propeller and I (still) have some questions about it. Like how big a program can you load into each cog. From what I understand it runs from (cog) RAM!? I was also wondering if you could leave out the Spin interpreter from ROM and stuff some of your own code down there...? What is a good way to extend the memory? I saw it has I2C pins and saw some 'talk' of an external (E)EPROM.

Now I will check out your links. Thanx.
Title: Re: Distributed algorithms
Post by: obiwanjacobi on April 25, 2013, 11:31:30 AM
Found XMOS as another manufacturer of multi-core MCU's. Anybody used these?

http://www.xmos.com/discover/products/xcore (http://www.xmos.com/discover/products/xcore)
Title: Re: Distributed algorithms
Post by: Duane Degn on April 25, 2013, 01:03:02 PM
Excellent!

I started reading more on the propeller and I (still) have some questions about it. Like how big a program can you load into each cog. From what I understand it runs from (cog) RAM!? I was also wondering if you could leave out the Spin interpreter from ROM and stuff some of your own code down there...? What is a good way to extend the memory? I saw it has I2C pins and saw some 'talk' of an external (E)EPROM.

Now I will check out your links. Thanx.

Cog RAM can be limiting issue. Normally only drivers that need to work at high speed are need to completely fit in a cog. For example the SC card driver uses almost all the 2K of cog RAM. These high speed drivers are written in assembly. I haven't had this issue myself but I haven't written any very intense drivers.

I have had problems of running out the 32K of hub RAM but these have always been when using some sort of video driver since video uses RAM as a screen buffer. In these cases I've either broken the program into smaller pieces that can be loaded from a SD card quickly or used a second Propeller as a video slave.

Parts to the program written in Spin can be much longer. Apparently the Spin byte code is pretty compact and you're not limited to 2K of Spin code per code since a cog using Spin uses cog RAM to hold the Spin interpreter, the interpreter then executes the Spin byte code from the larger hub RAM.

There are lots of ways to add memory such as EEPROM, SD cards, flash RAM etc. One of my experiments was to use eight 23K256 SRAM chips as a group and used an 8-bit data bus to communicate with all the chips at once (each chip would store one bit of a byte).

Any external memory added wont be treated the same as hub RAM though, the external memory will need to be accessed through a driver for the particular memory being used. The various cogs wont be able to directly read and write to it like they can hub RAM.

Parallax is working on a C compiler for the Propeller which makes it easier to use large programs. It's in Beta testing now.

I personally like SPIN and PASM myself.
Title: Re: Distributed algorithms
Post by: obiwanjacobi on May 03, 2013, 12:02:38 AM
I have decided that I will make a cooperative multitasking prototype first to get some real-life experience with the problem-domain. I just ordered a simple robot car platform (DFRobot 4WD+E) - including a motor driver board- I will use for my prototype that will start out as an object avoidance robot.

I will simulate each separate MCU as a software Task that runs in conjunction with other Tasks in the system: motor-control, sensor feedback and brains will probably be the 3 Tasks that will be running.

I will use my Arduino Uno (I already have) as its brains. I have already  some nice code to perform lightweight cooperative multitasking (https://atl.codeplex.com/SourceControl/changeset/view/102658#2066544).

Thank you for your responses and feedback.
Title: Re: Distributed algorithms
Post by: Jon_Thompson on May 03, 2013, 08:04:30 AM
Just remember that you have lots of program space and very little variable space to play with, so don't waste too many bytes on the scheduler. If you can use a flag in a register byte instead of a whole variable just to store true or false, do so and take the coding hit.  ;)
Title: Re: Distributed algorithms
Post by: obiwanjacobi on May 03, 2013, 08:21:00 AM
Agreed. My Arduino library is made with that in mind. I try to keep things 'static' and constant as much as possible - so it is stored in prog mem - without sacrificing the OO concept too much. After seeing a lot of dynamic code for things that were not dynamic (like buttons!) in other libs, I decided to write my own.

The Task(.h) macros are recently added and need to be tested and tweaked. It is a clever (to my eyes) trick with a switch statement and one int variable per task. Seems a reasonable price to pay. The 'scheduler' will be the way (flow) you call the tasks in the main loop. I have not tried it yet but it sounds very cool and lightweight.

One thing I may want to add is some sense of time/timing. I want to avoid that all tasks call methods to retrieve the time separately. Not sure yet how to fix that.
Title: Re: Distributed algorithms
Post by: Jon_Thompson on May 03, 2013, 09:47:34 AM
I might be wrong, but I don't think static and const store things in progmem. If they do, once you read them they're copied into RAM even if they did. You need to check out the PROGMEM directive and use the associated function to read the values/strings.
Title: Re: Distributed algorithms
Post by: obiwanjacobi on May 03, 2013, 10:27:38 AM
Really? Thanx, I will.
Title: Re: Distributed algorithms
Post by: jwatte on May 03, 2013, 11:52:20 AM
Yes; "static" and "const" still uses SRAM.

PROGMEM can be used to put data in program flash, but then you have to use a special function to copy it to a temporary working buffer before you actually work with it.
Title: Re: Distributed algorithms
Post by: obiwanjacobi on May 03, 2013, 12:50:45 PM
I always thought that PROGMEM was for arrays (strings etc) and that a simple const value would be some assembly code register load with that literal value - thus ending up in prog-mem.

Do template params also end up in ram? like: template<byte constValue> class {  ...  };

Thanx.
Title: Re: Distributed algorithms
Post by: jwatte on May 04, 2013, 12:16:14 AM
A static const int that you do not pass by reference or pointer may be optimized out, if I recall correctly. An easier way of making sure that happens is to use an enum.
Note that a static const int may be passed by reference, and you can get a pointer to it; if there's any chance of that happening, then the compiler/linker will have to reserve memory for it.
Also, not all types are eliminatable -- floats are not, strings are not, etc. On some architectures ints above a certain size also are not.

Template parameters and enum values do not generate memory storage. However, template parameters may generate a large amount of code, so be careful about that. If you only ever instantiate one of something (say, "the servo driver" or "the display driver" or whatever) then template argument values are totally fine, as you'll only need one copy of the code anyway.

Title: Re: Distributed algorithms
Post by: obiwanjacobi on May 04, 2013, 01:23:24 AM
Ok thanx, then I'm in pretty good shape with my library, I think.

I use a lot of these: template<BaseT> class SomeService : BaseT and not using any virtual methods. I 'think' that eliminates the v-table.
Title: Re: Distributed algorithms
Post by: jwatte on May 05, 2013, 08:51:45 PM
Templates and vtables are totally separate. Any class instance that has "virtual" something somewhere will have a vtable, template or not. Meanwhile, a template is mainly a way of writing code with better type checking than a macro. Templates will have vtables if and only if any component class of the final instantiated class has a virtual anything (virtual base type, or virtual member function.)

The warning I made was not against vtable space, but instead against template argument code bloat.

Here's an illustration:

Code: [Select]
template<unsigned char Size> class CharArray {
public:
  CharArray() { memset(data_, 0, Size); }
  unsigned char &operator[](unsigned char ix) {
    assert(ix < Size);
    return data_[ix];
  }
  unsigned char operator[](unsigned char ix) const {
    assert(ix < Size);
    return data_[ix];
  }
private:
  unsigned char data_[Size];
};

CharArray<10> myFirstArray;
CharArray<20> mySecondArray;

The compiler now has to emit two copies of the member functions, specialized for Size == 10 and for Size == 20. Once you use a lot of templates, this can generate a lot of code bloat in a hurry, and because the code is actually different between the function versions, Common-data folding of function sections would not help. (Otherwise a popular optimization among linkers to get a handle on bloat of trivial code.)
Title: Re: Distributed algorithms
Post by: obiwanjacobi on May 06, 2013, 12:03:04 AM
I know vtables and templates are separate concepts. I meant that using templates where the means by which I can afford to not use virtuals.

About the template code generation: will all code of a template be duplicated for each unique template parameter? Or just the methods that use the template parameter?

Anyway your point is taken. I probably will make two versions (static with template params and dynamic with ctor params) so the user (developer) can choose whats best for that specific situation.

Thanx!
Title: Re: Distributed algorithms
Post by: jwatte on May 06, 2013, 12:02:56 PM
Quote
will all code of a template be duplicated for each unique template parameter? Or just the methods that use the template parameter?

All of them will generally be duplicated, because the mangled name will be different for the two instances.
If you have comdat folding capability, then the linker can see that the functions that don't use the template argument are the same, and merge them at linking.
To make this happen in GCC, you have to compile with "-ffunction-sections" (which really ought to always be the default!) and then link with --icf using gold, not gld. And gold doesn't support non-ELF outputs, so I don't know if that works for AVR code ...