Society of Robots - Robot Forum

Software => Software => Topic started by: SeagullOne on May 20, 2010, 04:15:40 PM

Title: Face Detection and Speech recognition
Post by: SeagullOne on May 20, 2010, 04:15:40 PM
I'm in the process of reprogramming my robot, NINA.

It is programmed using python.

Instead of using Roborealm - as I did for previous versions of my robot - I've decided to switch to OpenCV; I'm pretty impressed, although I still have a lot to learn and it isn't as graphically easy to use.

The objective right now for my robot is this: to detect my face, say hello if greeted, and tell where my face is (left or right of the camera).

I've done a lot of problem solving thus far, but one thing I can't seem to tackle is this: the speech synthesis seems to be delayed or doesn't even occur at all in the programming. The face detection and voice recognition runs just fine, and when the speech synthesis does kick in, it gives the appropriate response. Its just that sometimes the tts, specifically, just won't kick in.

I tested the speech recognition and voice synthesis on its own in a separate .py file, and both speech recognition and text-to-speech run very smoothly. It's just when its accompanied by the vision processing...

The code I'm using is listed below. Basically I modified it from two existing python scripts (one for face detection and the other for speech recognition) and added my own code to it to work for my robotics application. Credits to Michael Gundlac for the speech recognition.

Any idea how I could improve the performance of the voice synthesis? Anything wrong with the software here? Could it possibly be my hardware? I'm using an analogue wireless microphone, a Hercules webcam, python 2.6, OpenCV installed, all on a Dell studio XPS 8100 computer.

Code: [Select]
import sys
import cv


##################Speech Recogntion and Voice Synthesis#############################

from win32com.client import constants as _constants
import win32com.client
import pythoncom
import time
import thread

# Make sure that we've got our COM wrappers generated.
from win32com.client import gencache
gencache.EnsureModule('{C866CA3A-32F7-11D2-9602-00C04F8EE628}', 0, 5, 0)

_voice = win32com.client.Dispatch("SAPI.SpVoice")
_recognizer = win32com.client.Dispatch("SAPI.SpSharedRecognizer")
_listeners = []
_handlerqueue = []
_eventthread=None

class Listener(object):

    """Listens for speech and calls a callback on a separate thread."""

    _all = set()

    def __init__(self, context, grammar, callback):
        """
        This should never be called directly; use speech.listenfor()
        and speech.listenforanything() to create Listener objects.
        """
        self._grammar = grammar
        Listener._all.add(self)

        # Tell event thread to create an event handler to call our callback
        # upon hearing speech events
        _handlerqueue.append((context, self, callback))
        _ensure_event_thread()

    def islistening(self):
        """True if this Listener is listening for speech."""
        return self in Listener._all

    def stoplistening(self):
        """Stop listening for speech.  Returns True if we were listening."""

        try:
            Listener._all.remove(self)
        except KeyError:
            return False

        # This removes all refs to _grammar so the event handler can die
        self._grammar = None

        if not Listener._all:
            global _eventthread
            _eventthread = None # Stop the eventthread if it exists

        return True

_ListenerBase = win32com.client.getevents("SAPI.SpSharedRecoContext")
class _ListenerCallback(_ListenerBase):

    """Created to fire events upon speech recognition.  Instances of this
    class automatically die when their listener loses a reference to
    its grammar.  TODO: we may need to call self.close() to release the
    COM object, and we should probably make goaway() a method of self
    instead of letting people do it for us.
    """

    def __init__(self, oobj, listener, callback):
        _ListenerBase.__init__(self, oobj)
        self._listener = listener
        self._callback = callback

    def OnRecognition(self, _1, _2, _3, Result):
        # When our listener stops listening, it's supposed to kill this
        # object.  But COM can be funky, and we may have to call close()
        # before the object will die.
        if self._listener and not self._listener.islistening():
            self.close()
            self._listener = None

        if self._callback and self._listener:
            newResult = win32com.client.Dispatch(Result)
            phrase = newResult.PhraseInfo.GetText()
            self._callback(phrase, self._listener)

def say(phrase):
    """Say the given phrase out loud."""
    _voice.Speak(phrase)


def input(prompt=None, phraselist=None):
    """
    Print the prompt if it is not None, then listen for a string in phraselist
    (or anything, if phraselist is None.)  Returns the string response that is
    heard.  Note that this will block the thread until a response is heard or
    Ctrl-C is pressed.
    """
    def response(phrase, listener):
        if not hasattr(listener, '_phrase'):
            listener._phrase = phrase # so outside caller can find it
        listener.stoplistening()

    if prompt:
        print prompt

    if phraselist:
        listener = listenfor(phraselist, response)
    else:
        listener = listenforanything(response)

    while listener.islistening():
        time.sleep(.1)

    return listener._phrase # hacky way to pass back a response...

def stoplistening():
    """
    Cause all Listeners to stop listening.  Returns True if at least one
    Listener was listening.
    """
    listeners = set(Listener._all) # clone so stoplistening can pop()
    returns = [l.stoplistening() for l in listeners]
    return any(returns) # was at least one listening?

def islistening():
    """True if any Listeners are listening."""
    return not not Listener._all

def listenforanything(callback):
    """
    When anything resembling English is heard, callback(spoken_text, listener)
    is executed.  Returns a Listener object.

    The first argument to callback will be the string of text heard.
    The second argument will be the same listener object returned by
    listenforanything().

    Execution takes place on a single thread shared by all listener callbacks.
    """
    return _startlistening(None, callback)

def listenfor(phraselist, callback):
    """
    If any of the phrases in the given list are heard,
    callback(spoken_text, listener) is executed.  Returns a Listener object.

    The first argument to callback will be the string of text heard.
    The second argument will be the same listener object returned by
    listenfor().

    Execution takes place on a single thread shared by all listener callbacks.
    """
    return _startlistening(phraselist, callback)

def _startlistening(phraselist, callback):
    """
    Starts listening in Command-and-Control mode if phraselist is
    not None, or dictation mode if phraselist is None.  When a phrase is
    heard, callback(phrase_text, listener) is executed.  Returns a
    Listener object.

    The first argument to callback will be the string of text heard.
    The second argument will be the same listener object returned by
    listenfor().

    Execution takes place on a single thread shared by all listener callbacks.
    """
    # Make a command-and-control grammar       
    context = _recognizer.CreateRecoContext()
    grammar = context.CreateGrammar()

    if phraselist:
        grammar.DictationSetState(0)
        # dunno why we pass the constants that we do here
        rule = grammar.Rules.Add("rule",
                _constants.SRATopLevel + _constants.SRADynamic, 0)
        rule.Clear()

        for phrase in phraselist:
            rule.InitialState.AddWordTransition(None, phrase)

        # not sure if this is needed - was here before but dupe is below
        grammar.Rules.Commit()

        # Commit the changes to the grammar
        grammar.CmdSetRuleState("rule", 1) # active
        grammar.Rules.Commit()
    else:
        grammar.DictationSetState(1)

    return Listener(context, grammar, callback)

def _ensure_event_thread():
    """
    Make sure the eventthread is running, which checks the handlerqueue
    for new eventhandlers to create, and runs the message pump.
    """
    global _eventthread
    if not _eventthread:
        def loop():
            while _eventthread:
                pythoncom.PumpWaitingMessages()
                if _handlerqueue:
                    (context,listener,callback) = _handlerqueue.pop()
                    # Just creating a _ListenerCallback object makes events
                    # fire till listener loses reference to its grammar object
                    _ListenerCallback(context, listener, callback)
                time.sleep(.5)
        _eventthread = 1 # so loop doesn't terminate immediately
        _eventthread = thread.start_new_thread(loop, ())


##########################Proceed with Vision Algorithms#########################

def detect(image):
    image_size = cv.GetSize(image)
   
    # create grayscale version
    grayscale = cv.CreateImage(image_size, 8, 1)
    cv.CvtColor(image, grayscale, cv.CV_BGR2GRAY)

    # create storage
    storage = cv.CreateMemStorage(0)
   
    # equalize histogram
    cv.EqualizeHist(grayscale, grayscale)
       
    #detect objects
    cascade = cv.Load("C:\OpenCV2.1\data\haarcascades\haarcascade_frontalface_default.xml")
    global faces
    faces = cv.HaarDetectObjects(grayscale, cascade, storage, 1.2, 2, 0, (50, 50))

    if faces:
        for (x,y,w,h),n in faces:
            pt1 = (x,y)
            pt2 = (x+w,y+h)
            cv.Rectangle(image, pt1, pt2, 255)

def Loc_Callback(phrase, listener):
    if phrase == "Where am I":
        if faces:
            for (x,y,w,h),n in faces:
                if x >= 140:
                    say("You are at my right.")
                if x <= 120:
                    say("You are at my left.")
                if x >= 121 >=139:
                    say("You are in front of me.")
        if not faces:
            say("I can not find you")
    if phrase == "Hello Nina":
        if faces:
            say("Hello")
        if not faces:
            say("Where are you?")

def listen(Interaction, thread):
    listener1 = listenfor(["Where am I", "Hello Nina",], Loc_Callback)

if __name__ == "__main__":

    print "Press ESC to exit ..."
    # create windows
    cv.NamedWindow('Camera')
 
    # create capture device
    device = 0 # assume we want first device
    capture = cv.CreateCameraCapture(0)
    cv.SetCaptureProperty(capture, cv.CV_CAP_PROP_FRAME_WIDTH, 320)
    cv.SetCaptureProperty(capture, cv.CV_CAP_PROP_FRAME_HEIGHT, 240)   


    # check if capture device is OK
    if not capture:
        print "Error opening capture device"
        sys.exit(1)
 
    while 1:
        # do forever
        thread.start_new_thread(listen, ("speech thread", 2))
        # capture the current frame
        frame = cv.QueryFrame(capture)
        if frame is None:
            break
 
        # mirror
        cv.Flip(frame, None, 1)
 
        # face detection
        detect(frame)
 
        # display webcam image
        cv.ShowImage('Camera', frame)
 
        # handle events
        k = cv.WaitKey(10)
 
        if k == 0x1b: # ESC
            print 'ESC pressed. Exiting ...'
            break
Title: Re: Face Detection and Speech recognition
Post by: SeagullOne on May 26, 2010, 05:51:30 PM
Okay I found the problem.

I had to insert a line of code to assert that listener1 was listening.  8)