Typically, you read an image stream from a camera using a platform specific API. On Windows, it's either DirectShow, or Windows Media Foundation, IIRC. On Linux, it's video4linux2, which ends up being a single header file with some ioctl() structures and control codes.
OpenCV has some classes for working with webcams and Kinect cameras and a few others. This means you don't have to learn your platform-specific video capture API, but instead the OpenCV API.
Any camera will give you pictures. The more expensive cameras typically have various features not found on the cheaper ones:
- higher-quality sensors with less noise
- better frame rates
- better optics
- spatially calibrated for specifc capture transform (for specific computer vision cameras)
- stereo capable (for specific computer vision cameras)
- more robust
- etc
Often, because of the cost advantage of the mass market, a more expensive camera may give you one particular feature (say, stereo) but actually be worse in all the other respects. However, if you need that feature, well, you're stuck between a rock and a hard place :-)