Tuesday, March 28, 2017

Easy Speech Recognition in Python with PyAudio and Pocketsphinx

If you remember, I was getting started with Audio Processing in Python (thinking of implementing an audio classification system) a couple of weeks back (see my earlier post). I got the PyAudio package setup and was having some success with it. As you know, one of the more interesting areas in audio processing in machine learning is Speech Recognition. So, although it wasn't my original intention of the project, I thought of trying out some speech recognition code as well.

I searched around to see what Python packages are available for the task and found the SpeechRecognition package.

Python Speech Recognition running with Sphinx
Python Speech Recognition running with Sphinx
SpeechRecognition is a library for Speech Recognition (as the name suggests), which can work with many Speech Engines and APIs. The current version supports the following engines and APIs,
  • CMU Sphinx
  • Google Speech Recognition
  • Google Cloud Speech API
  • Wit.ai
  • Microsoft Bing Voice Recognition
  • Houndify API
  • IBM Speech to Text
I decided to start with the Sphinx engine since it was the only one that worked offline. But keep in mind that Sphinx is not as accurate as something like Google Speech Recognition.

First, let's set up the SpeechRecognition package.

To start, you need to have the PyAudio package. SpeechRecognition requires PyAudio to interact with the microphone of your computer. If you don't have PyAudio installed already, you can follow the instructions from my earlier post to set it up.

Next, since we will be using the Sphinx engine, we need to install the pocketsphinx package,
 pip install pocketsphinx  

Finally, you can install SpeechRecognition, again from pip,
 pip install SpeechRecognition  

With everything set up, we are ready to code our speech recognition script.

The basic code is quite simple,
 import speech_recognition as sr  
 # obtain audio from the microphone  
 r = sr.Recognizer()  
 with sr.Microphone() as source:  
   print("Say something!")  
   audio = r.listen(source)  
 # recognize speech using Sphinx  
   print("Sphinx thinks you said '" + r.recognize_sphinx(audio) + "'")  
 except sr.UnknownValueError:  
   print("Sphinx could not understand audio")  
 except sr.RequestError as e:  
   print("Sphinx error; {0}".format(e))  

The code will create a Recognizer object, create a Microphone object, listen to the microphone to hear a spoken phrase, and use the appropriate recognizer engine ('recognize_sphinx' here) to recognize the phrase.

Sounds quite simple right?

But, if you run this code, you may find that the code hangs sometimes, not recognizing you speaking.

Speech Recognition hangs, not recognizing you speaking
Speech Recognition hangs, not recognizing you speaking
This happens due to ambient noise.

A typical microphone will pick up a lot of noise from a background, even though we don't hear it, which will interfere with the speech recognition.

We need to filter out this ambient noise to make the speech recognition more accurate. You do this by setting the energy threshold of the Recognizer object. The energy threshold defines which levels are noise, and which levels are speech. We need to set the threshold so that the recognizer ignores the ambient noise in our environment so that it can focus on the speech. But, how do we know to which value to set the threshold?

Luckily, the SpeechRecognition package has a built-in method to help us with that.

We just need to use the adjust_for_ambient_noise method, and it will automatically listen to the environment and will calculate and set the optimal energy threshold for it.

Here, I've set the duration for 5 seconds to listen to the ambient noise,
 import speech_recognition as sr  
 # obtain audio from the microphone  
 r = sr.Recognizer()  
 with sr.Microphone() as source:  
   print("Please wait. Calibrating microphone...")  
   # listen for 5 seconds and create the ambient noise energy level  
   r.adjust_for_ambient_noise(source, duration=5)  
   print("Say something!")  
   audio = r.listen(source)  
 # recognize speech using Sphinx  
   print("Sphinx thinks you said '" + r.recognize_sphinx(audio) + "'")  
 except sr.UnknownValueError:  
   print("Sphinx could not understand audio")  
 except sr.RequestError as e:  
   print("Sphinx error; {0}".format(e))  

Now, when you run the code, you will see it recognize your speech.

Speech Recognition running with ambient noise canceling
Speech Recognition running with ambient noise canceling

With that working, you can use this simple piece of code to build a program to respond to voice commands.

  • The SpeechRecognition library needs the PyAudio package to be installed for it to interact with the microphone input.
  • The SpeechRecognition library supports multiple Speech Engines and APIs. However, the CMU Spinx engine, with the pocketsphinx library for Python, is the only one that works offline.
  • The pocketsphinx library was not as accurate as other engines like Google Speech Recognition in my testing. There may be ways to tweak it to be more accurate, but I need to explore it further.
  • If your code is not detecting speech when run, it's most probably due to the ambient noise the microphone might be picking up. 
  • To counter the ambient noise, you need to set the proper energy threshold to the Recognizer object. The easiest way to do it is to use the adjust_for_ambient_noise method.
  • There are other ways to adjust the energy threshold, which I will explain in a later post.

Next, I'm going to try out some of the other Speech Engines / APIs supported by SpeechRecognition.

Related Tutorials:

Build Deeper: The Path to Deep Learning

Learn the bleeding edge of AI in the most practical way: By getting hands-on with Python, TensorFlow, Keras, and OpenCV. Go a little deeper...

Get your copy now!


  1. hai
    i am getting raise AttributeError("PyAudio 0.2.11 or later is required (found version {})".format(pyaudio.__version__))
    AttributeError: PyAudio 0.2.11 or later is required (found version 0.2.8)

    i cant find new version

    1. Hi,
      PyAudio 0.2.11 is available from PIP,

      Try running,
      pip install --upgrade pyaudio

  2. All these tutorials are using the microphone.

    How can I speech-to-text a recording which is saved in a .wav file?

    1. Hi George,

      Here is an example that shows how to use an audio file instead of the microphone,

      I'm hoping to do a tutorial on it soon.

  3. I am getting error as below-
    Traceback (most recent call last):
    File "/home/sneha123/.local/lib/python2.7/site-packages/pip/basecommand.py", line 215, in main
    status = self.run(options, args)
    File "/home/sneha123/.local/lib/python2.7/site-packages/pip/commands/install.py", line 342, in run
    File "/home/sneha123/.local/lib/python2.7/site-packages/pip/req/req_set.py", line 784, in install
    File "/home/sneha123/.local/lib/python2.7/site-packages/pip/req/req_install.py", line 851, in install
    self.move_wheel_files(self.source_dir, root=root, prefix=prefix)
    File "/home/sneha123/.local/lib/python2.7/site-packages/pip/req/req_install.py", line 1064, in move_wheel_files
    File "/home/sneha123/.local/lib/python2.7/site-packages/pip/wheel.py", line 345, in move_wheel_files
    clobber(source, lib_dir, True)
    File "/home/sneha123/.local/lib/python2.7/site-packages/pip/wheel.py", line 323, in clobber
    shutil.copyfile(srcfile, destfile)
    File "/usr/lib/python2.7/shutil.py", line 83, in copyfile
    with open(dst, 'wb') as fdst:
    IOError: [Errno 13] Permission denied: '/usr/local/lib/python2.7/dist-packages/pocketsphinx/__init__.pyc'

    How to remove it??

  4. Please reply as early as possible!!

    1. Hi Sneha,

      I haven't come across this error. But it seems to be a file permission issue. So, try installing with 'sudo'
      sudo pip install pocketsphinx

      Hope this helps.

  5. Above code when run shows error :

    Sphinx error; missing PocketSphinx module: ensure that PocketSphinx is set up correctly.

    The code without the adjust_method also shows the same error.
    Please Reply asap

  6. I got this error:

    python: pcm.c:2757: snd_pcm_area_copy: Assertion `src < dst || src >= dst + bytes' failed.

    However, if I run below command on my terminal, it worked:

    python -m speech_recognition

    How can I solve this?

  7. how to identify language automatically from audio.. using python. please tell me

  8. Hello, I keep getting the following error:

    File "C:\Users\ccatx\Downloads\pystuff\Lib\site-packages\pocketsphinx\pocketsphinx.py", line 275, in __init__
    this = _pocketsphinx.new_Decoder(*args)

    RuntimeError: new_Decoder returned -1

    Do you have any suggestions on how to fix this issue? Thanks in advance!

  9. Thanks for this web. I am trying to run your simple codes on this page but got
    OSError: [Errno -9988] Stream closed

    I think I installed pyaudio correctly. What shall I do? thanks,

    1. Hi,
      What is the version of Python you're using, and the version of PyAudio?

    2. I use python 3.6.3 (spider 3.2.4). I installed PyAudio recently. It shall be the latest version. I couldn't find where I installed it. It is not in Anaconda3 folder. I can use PyAudio to transfer Speech to Wav file, and then use Wav file as the source for speechRecognizer. That was not problem. However, I want to directly use the source from Microphone as done in your script. Is it because the microphone device setup? thanks.

  10. CMU sphinx speech recognition is an open source it works offline . so if unplug my internet cable it should work ..but it is not working..why ?

  11. This comment has been removed by the author.

  12. Hello, thanks for your tutorial. It's awesome. I checked your other tutorials also, all are helpful.

    Anyway, I made a speech recognition using Google Speech Recognition api. Everything works as expected but I find out that it is always listening. I just want to activate it when I say "Hello Mark". For example, Amazon Alexa. Alexa isn't always listening my voice. When I say "Alexa", it only then activate and take my voice. I want to implement the same technique in my voice recognition app. Is it possible? How?

    Thanks again

  13. what if I wan't to do it with my own acoustic model?

  14. Hi sir,
    while doing pip install pocketsphinx I am getting below error please help me to fix this.Thanks in advance for the help.
    tils\dist.py:274: UserWarning: Unknown distribution option: 'long_description_co
    running install
    running build_ext
    building 'sphinxbase._sphinxbase' extension
    swigging deps/sphinxbase/swig/sphinxbase.i to deps/sphinxbase/swig/sphinxbas
    -python -modern -threads -Ideps/sphinxbase/include -Ideps/sphinxbase/include/sph
    inxbase -Ideps/sphinxbase/include/win32 -Ideps/sphinxbase/swig -outdir sphinxbas
    e -o deps/sphinxbase/swig/sphinxbase_wrap.c deps/sphinxbase/swig/sphinxbase.i
    (1) : Error: Unable to find 'swig.swg'
    (3) : Error: Unable to find 'python.swg'
    deps\sphinxbase\swig\typemaps.i(1) : Error: Unable to find 'exception.i'
    error: command 'C:\\Users\\kiran.koribilli\\AppData\\Local\\Programs\\Python
    \\Python37-32\\swig.exe' failed with exit status 1

    Command "c:\users\kiran.koribilli\appdata\local\programs\python\python37-32\pyth
    on.exe -u -c "import setuptools, tokenize;__file__='C:\\Users\\KIRAN~1.KOR\\AppD
    ze, 'open', open)(__file__);code=f.read().replace('\r\n', '\n');f.close();exec(c
    ompile(code, __file__, 'exec'))" install --record C:\Users\KIRAN~1.KOR\AppData\L
    ocal\Temp\pip-record-vkug6dg6\install-record.txt --single-version-externally-man
    aged --compile" failed with error code 1 in C:\Users\KIRAN~1.KOR\AppData\Local\T


  15. hello
    can i specify and determining voice commands in pocketsphinx?
    I almost need just 10 commands

  16. how can i specify and determining voice commands in PocketSphinx which i almost need to recognize between ten words

  17. Please wait. Calibrating microphone...
    Traceback (most recent call last):
    File "c:\Users\Mayank\Final_Project\new.py", line 8, in
    r.adjust_for_ambient_noise(source, duration=5)
    AttributeError: 'Recognizer' object has no attribute 'adjust_for_ambient_noise'

    Bro I am getting a error like this.