Tuesday, March 28, 2017

Easy Speech Recognition in Python with PyAudio and Pocketsphinx

If you remember, I was getting started with Audio Processing in Python (thinking of implementing a audio classification system) couple of weeks back (my earlier post). I got the PyAudio package setup and was having some success with it. As you know, one of the more interesting areas in audio processing in machine learning is Speech Recognition. So, although it wasn't my original intention of the project, I thought of trying out some speech recognition code as well.

I searched around to see what Python packages are available for the task, and found the SpeechRecognition package.

Python Speech Recognition running with Sphinx
Python Speech Recognition running with Sphinx
SpeechRecognition is a library for Speech Recognition (as the name suggests), which can work with many Speech Engines and APIs. The current version supports the following engines and APIs,
  • CMU Sphinx
  • Google Speech Recognition
  • Google Cloud Speech API
  • Wit.ai
  • Microsoft Bing Voice Recognition
  • Houndify API
  • IBM Speech to Text
I decided to start out with the Sphinx engine, since it was the only one that worked offline. But keep in mind that Sphinx is not as accurate as something like Google Speech Recognition.

First, let's setup the SpeechRecognition package.


To start, you need to have the PyAudio package. SpeechRecognition requires PyAudio in order to interact with the microphone of your computer. If you don't have PyAudio installed already, you can follow the instructions from my earlier post to set it up.

Next, since we will be using the Sphinx engine, we need to install the pocketsphinx package,
 pip install pocketsphinx  

Finally, you can install SpeechRecognition, again from pip,
 pip install SpeechRecognition  

With everything setup, we are ready to code our speech recognition script.

The basic code is quite simple,
 import speech_recognition as sr  
   
 # obtain audio from the microphone  
 r = sr.Recognizer()  
 with sr.Microphone() as source:  
   print("Say something!")  
   audio = r.listen(source)  
   
 # recognize speech using Sphinx  
 try:  
   print("Sphinx thinks you said '" + r.recognize_sphinx(audio) + "'")  
 except sr.UnknownValueError:  
   print("Sphinx could not understand audio")  
 except sr.RequestError as e:  
   print("Sphinx error; {0}".format(e))  

The code will create a Recognizer object, create a Microphone object, listen to the microphone to hear a spoken phrase, and use the appropriate recognizer engine ('recognize_sphinx' here) to recognize the phrase.

Sounds quite simple right?

But, if you run this code, you may find that the code hangs sometimes, not recognizing you speaking.

Speech Recognition hangs, not recognizing you speaking
Speech Recognition hangs, not recognizing you speaking
This happens due to ambient noise.

A typical microphone will pick up a lot of noise from a background, even though we don't hear it, which will interfere with the speech recognition.

We need to filter out this ambient noise to make the speech recognition more accurate. You do this by setting the energy threshold of the Recognizer object. The energy threshold defines which levels are noise, and which levels are speech. We need to set the threshold so that the recognizer ignores the ambient noise in our environment so that it can focus on the speech. But, how do we know to which value to set the threshold?

Luckily, the SpeechRecognition package has a built-in method to help us with that.

We just need to use the adjust_for_ambient_noise method, and it will automatically listen to the environment and will calculate and set the optimal energy threshold for it.

Here, I've set the duration for 5 seconds to listen to the ambient noise,
 import speech_recognition as sr  
   
 # obtain audio from the microphone  
 r = sr.Recognizer()  
 with sr.Microphone() as source:  
   print("Please wait. Calibrating microphone...")  
   # listen for 5 seconds and create the ambient noise energy level  
   r.adjust_for_ambient_noise(source, duration=5)  
   print("Say something!")  
   audio = r.listen(source)  
   
 # recognize speech using Sphinx  
 try:  
   print("Sphinx thinks you said '" + r.recognize_sphinx(audio) + "'")  
 except sr.UnknownValueError:  
   print("Sphinx could not understand audio")  
 except sr.RequestError as e:  
   print("Sphinx error; {0}".format(e))  

Now, when you run the code, you will see it recognize your speech.

Speech Recognition running with ambient noise cancelling
Speech Recognition running with ambient noise cancelling

With that working, you can use this simple piece of code to build a program to respond to voice commands.

Summary:
  • The SpeechRecognition library needs the PyAudio package to be installed in order for it to interact with the microphone input.
  • The SpeechRecognition library supports multiple Speech Engines and APIs. However, the CMU Spinx engine, with the pocketsphinx library for Python, is the only one that works offline.
  • The pocketsphinx library was not as accurate as other engines like Google Speech Recognition in my testing. There may be ways to tweak it to be more accurate, but I need to explore it further.
  • If your code is not detecting speech when run, it's most probably due to the ambient noise the microphone might be picking up. 
  • To counter the ambient noise, you need to set the proper energy threshold to the Recognizer object. The easiest way to do it is to use the adjust_for_ambient_noise method.
  •  There are other ways to adjust the energy threshold, which I will explain in a later post.

Next, I'm going to try out some of the other Speech Engines / APIs supported by SpeechRecognition.

Related posts:
Stepping into audio classification - Getting started with PyAudio

Build Deeper: Deep Learning Beginners' Guide is the ultimate guide for anyone taking their first step into Deep Learning.

Get your copy now!

4 comments:

  1. hai
    i am getting raise AttributeError("PyAudio 0.2.11 or later is required (found version {})".format(pyaudio.__version__))
    AttributeError: PyAudio 0.2.11 or later is required (found version 0.2.8)

    i cant find new version

    ReplyDelete
    Replies
    1. Hi,
      PyAudio 0.2.11 is available from PIP,
      https://pypi.python.org/pypi/PyAudio/0.2.11

      Try running,
      pip install --upgrade pyaudio

      Delete
  2. All these tutorials are using the microphone.

    How can I speech-to-text a recording which is saved in a .wav file?

    ReplyDelete