You can also do other neat tricks with images, such as thresholding only
a particular color like red.

The basic shapes are very easy, but as you get into more complex shapes (pattern recognition) you will have to use
probability analysis. For example, suppose your algorithm needed to recognize between 10 different fruits
(only by shape) such as an apple, an orange, a pear, a cherry, etc. How would you do it?
Well all are circular, but none perfectly circular. And not all apples look the same, either.

By using probability, you can run an analysis that says 'oh, this fruit fits 90% of the characteristics
of an apple, but only 60% the characteristics of an orange, so its more likely an apple.'
Its the computational version of an 'educated guess.' You could also say 'if this particular feature is
present, then it has a 20% higher probability of being an apple.' The feature could be a stem such as on an apple,
fuzziness like on a coconut, or spikes like on a pinneapple, etc. This method is known as feature detection .

What the algorithm does is labels each blob by a number, counting up for every new blob
it encounters. Then to find middle mass, you can just find it for each individual blob.

In this below video, I ran a few algorithms in tandem. First, I removed all non-red objects.
Next, I blurred the video a bit to make blobs more connected. Then, using blob detection,
I only kept the blob that had the most pixels (the largest red object). This removed background
objects such as the fire extinguisher. Lastly, I did center of mass to track the actual
location of the object. I also ran a population threshold algorithm that made the
object edges really sharp. It doesnt improve the algorithm in this case, but it does make
it look nicer as a video.

CASE 1: Parallel Cameras
Now moving on to two parallel facing cameras (L for left camera and R for right camera), we have this diagram:

The Z-axis is the optical axis (the direction the cameras are pointing). b is
the distance between cameras, while f is still the focal length.
The equations of stereo triangulation (because it looks like a triangle) are:

Z_actual = (b * focal_length) / (x_camL - x_camR)
X_actual = x_camL * Z_actual / focal_length
Y_actual = y_camL * Z_actual / focal_length
CASE 2a: Non-Parallel Cameras, Rotation About Y-axis
And lastly, what if the cameras are pointing in different non-parallel directions? In this below diagram,
the Z-axis is the optical axis for the left camera, while the Zo-axis is the optical axis of the right camera.
Both cameras lie on the XZ plane, but the right camera is rotated by some angle phi . The point
where both optical axes (plural for axis, pronounced ACKS - I) intersect at the point (0,0,Zo) is called the fixation point .
Note that the fixation point could also be behind the cameras when Zo < 0.

calculating for the alien location . . .

Zo = b / tan(phi )
Z_actual = (b * focal_length) / (x_camL - x_camR + focal_length * b / Zo)
X_actual = x_camL * Z_actual / focal_length
Y_actual = y_camL * Z_actual / focal_length
CASE 2b: Non-Parallel Cameras, Rotation About X-axis
calculating for the alien location . . .

Z_actual = (b * focal_length) / (x1 - x2)
X_actual = x_camL * Z_actual / focal_length
Y_actual = y_camL * Z_actual / focal_length + tan(phi ) * Z
CASE 2c: Non-Parallel Cameras, Rotation About Z-axis
For simplicity, rotation around the optical axis is usually dealt with by rotating the image before applying matching and triangulation.
Given the translation vector T and rotation matrix R describing the transormation from
left camera to right camera coordinates, the equation to solve for stereo triangulation is:

where p and p' are the coordinates of P in the left and right camera coordinates respectively, and RT is the transpose (or the inverse) matrix of R.