With a Wifi IP camera, you would probably do the video processing on a computer and then send derived information back to the Arduino. These cameras do not interface directly with Arduino.
Vision processing *on* the Arduino (with a different sort of camera) while also doing other robot stuff is generally too much for the Arduino's processing capabilities. There are other cameras+processing that are usually used for this purpose (e.g. the Blackfin camera, which might not be easily obtainable anymore, and is also ~$100-$200, I think), or one would use an ARM-based controller (more processing power). Vision processing is also not something I'd typically recommend a new person to jump into. Beyond that, I don't know more, as I've not done vision processing myself.