I saw a post with a guy that made a 3d vision system, but he was wondering what to do with it, so heres an attempt at explaining my system, which can make a simulation out of real video.
The main idea, is we are classifying distinct moving pieces, attached or not attached, it works either way.
The idea is you only need a new classification, if it is moving slightly differently, but of course also basic shape would also be important. both are discovered with 3d photography.
Train a net (boltzmann machine or even heirarchical temporal memory would do.) on some video, it can just be 2d, you can neglect 3d for this bit, you can do this prior to the classification step, or you can do it at the same time, its up to you.
Training detects some redundancies to improve your similarity matching. This can be done before the classification step.
This basicly just gives you your cell activations with a bit of redundancy taken care of, now every different photo will have a slightly different response up your layers, and more similar photos will have more similar responses.
It needent be noise free, you dont need it for reconstructive purposes, we employ classification later to make much better reconstruction possible.
Now we start classifying.
We start off with one module (or distinct piece), it means no matter what the response of the network is, itll arrive at this module id. it has 100% error leniency. We detect which module it is by the closest distance from all its snap shots, which we are adding once a frame to it, it starting with 0 snapshots.
Every time we add the module state, we detect if we need a new part, thats if it has performed a mechanical posture change. We do this employing 3d photography, and we split off the part to another image on the new id, leaving a stencil of it behind. This stencilled off area, is an automatic pass, when counting error when doing a comparison.
You only need a new module (a new id) if its a distinct moving piece, or an i/o pair doubled.
There is an I/O pair with every state, a description of sorrounding modules, and you wont have this at the start, this has to be inserted in once all moving objects have now been separated, a doubled i/o means its an extra complex function it needs a new category for. I is sorrounds, O is velocity.
If its continually dollying between 2 modules, it means your better off just averaging the I/O together, than splitting off, as its just an error.
The networks purpose is for training only, but it employs the snap shots in more of a realtime database thing to get an error reading of "closeness"... reconstruction is now left to the snap shots stored at each categorization id, you have to rebuild the novel posture out of a patchwork of many of the states, closest states possible. (it requires depth to do this, if it were just 2d, its derivatives would all be flat wrong, pieces wouldnt come apart any good at all and the reconstruction would look like mario kart
so tell that to Jeff Hawkins when he thinks he can do image recognition in 2d... but actually its not a bad area of thought, doing it in 2d, because depth maps are buggy as hell.)
If you provide a 3d scene of ids, and now youll be able to get the colours, and displacements.
When I finally get this thing working, It'll make a video game out of stereo pair video.