A major issue (well, among many, many others) is that if your background is not textured enough, stereoscopic vision can have problems. The overall goal is to be able to segment a vision stream into OBJECTS and BACKGROUND (well...simplificiation). If your background is just a plain white wall, for example, then it might show up as being a distance of 0 away from the camera, since pixel matching or feature matching won't work.
Another major issue is segmentation. how do you define an object? When two books are on top of one another, are they two books or one object? Why, and how do you explain that to a computer?
Ultimately, sterovision is great for giving you a depth map, as long as your background is textured enough. But that's about it. The rest is all processing!