I'd say that even using image processing and neural networks or haussdorf distances for shape recognition is hard to achieve what you need, and the success rate won't get bigger then 20%. Not to mention that the processing task is a killer even for a fast PC.
So from my point of view, the logical conclusion is to drop the idea of using a camera completely, and use some other identification mechanism for the bottom of the plant.
Things like UV fluorescence if works on your plants, or even some small detectors implanted near the bottom of each of the plants.