After the robot recognizes the Come Here command, it should scan it's vecinity using a camera and a PIR sensor (maybe a distance sensor also) trying to locate the place where the person is. If the voice recognition software is voice trained, the robot will know the person that initiated the command and will search for it's face recognition. The PIR sensor will make the difference from a picture and the real person, and a distance sensor or stereo vision can find the distance to the person.
If the person is not in the room but is still audible, the robot can look for entrances and go to them and locate the person with the above method. Or, it can say Please specify your location! You should definitely do that if you talk to your robot using a wireless mic.
Now if the voice software is not user trained (can recognize any voice) AND in the room are more persons, finding the right one will be more difficult. Now it will be necessary to triangulate the sound source.