I think I have a rough idea of what you're attempted. It's a fairly commonly approached problem, I think. It's not one that I've been interested in tackling, so I don't know specific examples off-hand.
I'm pretty sure the biggest challenge is going to be the object recognition software, especially if you write it yourself. This has been done before, and you might in fact look into ROS that I linked to. I don't know much about ROS, but it may well include visualization code/framework/etc.
Also keep in mind that you will need to have additional hardware to interface the regular computer hardware with servos/sensors etc.