Bear in mind that cameras like the Blackfin generate a frame of pixels which is far too big to transmit to another processor. eg 320 x 200 pixels = 64,000 bytes
Equally 64k is a lot of RAM.
Therefore all image processing is normally done on the camera.
The most common feature is to look for blobs eg any rectangles whose pixels are within a min/max colour range.
These blobs are then detected as the camera is transmitting the scan lines - ie the camera hardware doesn't need to store the whole image either.
So if the car is a red car and the person is wearing blue then you stand a chance of detecting and distinguishing them both. But if you need to detect any colour car and a person dressed in any colour then you've got a much bigger job as you are performing shape detection. You would probably need to write that code and then upload it to the Blackfin to replace its default image processing.