Tuesday, March 28, 2017

Easy Speech Recognition in Python with PyAudio and Pocketsphinx

If you remember, I was getting started with Audio Processing in Python (thinking of implementing an audio classification system) a couple of weeks back (see my earlier post). I got the PyAudio package setup and was having some success with it. As you know, one of the more interesting areas in audio processing in machine learning is Speech Recognition. So, although it wasn't my original intention of the project, I thought of trying out some speech recognition code as well.

I searched around to see what Python packages are available for the task and found the SpeechRecognition package.

Python Speech Recognition running with Sphinx

SpeechRecognition is a library for Speech Recognition (as the name suggests), which can work with many Speech Engines and APIs. The current version supports the following engines and APIs,

CMU Sphinx
Google Speech Recognition
Google Cloud Speech API
Wit.ai
Microsoft Bing Voice Recognition
Houndify API
IBM Speech to Text

I decided to start with the Sphinx engine since it was the only one that worked offline. But keep in mind that Sphinx is not as accurate as something like Google Speech Recognition.

First, let's set up the SpeechRecognition package.

To start, you need to have the PyAudio package. SpeechRecognition requires PyAudio to interact with the microphone of your computer. If you don't have PyAudio installed already, you can follow the instructions from my earlier post to set it up.

Next, since we will be using the Sphinx engine, we need to install the pocketsphinx package,

 pip install pocketsphinx

Finally, you can install SpeechRecognition, again from pip,

 pip install SpeechRecognition

With everything set up, we are ready to code our speech recognition script.

The basic code is quite simple,

 import speech_recognition as sr  
   
 # obtain audio from the microphone  
 r = sr.Recognizer()  
 with sr.Microphone() as source:  
   print("Say something!")  
   audio = r.listen(source)  
   
 # recognize speech using Sphinx  
 try:  
   print("Sphinx thinks you said '" + r.recognize_sphinx(audio) + "'")  
 except sr.UnknownValueError:  
   print("Sphinx could not understand audio")  
 except sr.RequestError as e:  
   print("Sphinx error; {0}".format(e))

The code will create a Recognizer object, create a Microphone object, listen to the microphone to hear a spoken phrase, and use the appropriate recognizer engine ('recognize_sphinx' here) to recognize the phrase.

Sounds quite simple right?

But, if you run this code, you may find that the code hangs sometimes, not recognizing you speaking.

Speech Recognition hangs, not recognizing you speaking

This happens due to ambient noise.

A typical microphone will pick up a lot of noise from a background, even though we don't hear it, which will interfere with the speech recognition.

We need to filter out this ambient noise to make the speech recognition more accurate. You do this by setting the energy threshold of the Recognizer object. The energy threshold defines which levels are noise, and which levels are speech. We need to set the threshold so that the recognizer ignores the ambient noise in our environment so that it can focus on the speech. But, how do we know to which value to set the threshold?

Luckily, the SpeechRecognition package has a built-in method to help us with that.

We just need to use the adjust_for_ambient_noise method, and it will automatically listen to the environment and will calculate and set the optimal energy threshold for it.

Here, I've set the duration for 5 seconds to listen to the ambient noise,

 import speech_recognition as sr  
   
 # obtain audio from the microphone  
 r = sr.Recognizer()  
 with sr.Microphone() as source:  
   print("Please wait. Calibrating microphone...")  
   # listen for 5 seconds and create the ambient noise energy level  
   r.adjust_for_ambient_noise(source, duration=5)  
   print("Say something!")  
   audio = r.listen(source)  
   
 # recognize speech using Sphinx  
 try:  
   print("Sphinx thinks you said '" + r.recognize_sphinx(audio) + "'")  
 except sr.UnknownValueError:  
   print("Sphinx could not understand audio")  
 except sr.RequestError as e:  
   print("Sphinx error; {0}".format(e))

Now, when you run the code, you will see it recognize your speech.


Speech Recognition running with ambient noise canceling

With that working, you can use this simple piece of code to build a program to respond to voice commands.

Summary:

The SpeechRecognition library needs the PyAudio package to be installed for it to interact with the microphone input.
The SpeechRecognition library supports multiple Speech Engines and APIs. However, the CMU Spinx engine, with the pocketsphinx library for Python, is the only one that works offline.
The pocketsphinx library was not as accurate as other engines like Google Speech Recognition in my testing. There may be ways to tweak it to be more accurate, but I need to explore it further.
If your code is not detecting speech when run, it's most probably due to the ambient noise the microphone might be picking up.
To counter the ambient noise, you need to set the proper energy threshold to the Recognizer object. The easiest way to do it is to use the adjust_for_ambient_noise method.
There are other ways to adjust the energy threshold, which I will explain in a later post.

Next, I'm going to try out some of the other Speech Engines / APIs supported by SpeechRecognition.

Related Tutorials:

Build Deeper: The Path to Deep Learning

Learn the bleeding edge of AI in the most practical way: By getting hands-on with Python, TensorFlow, Keras, and OpenCV. Go a little deeper...

Get your copy now!

24 comments:

UnknownSeptember 19, 2017 at 1:57 PM
Love it !!
ReplyDelete
Replies
Vikneshwar ThandeswaranSeptember 22, 2017 at 4:52 PM
hai
i am getting raise AttributeError("PyAudio 0.2.11 or later is required (found version {})".format(pyaudio.__version__))
AttributeError: PyAudio 0.2.11 or later is required (found version 0.2.8)

i cant find new version
ReplyDelete
Replies
UnknownSeptember 26, 2017 at 2:39 PM
All these tutorials are using the microphone.

How can I speech-to-text a recording which is saved in a .wav file?
ReplyDelete
Replies
Sneha M.October 3, 2017 at 10:26 AM
I am getting error as below-
Exception:
Traceback (most recent call last):
File "/home/sneha123/.local/lib/python2.7/site-packages/pip/basecommand.py", line 215, in main
status = self.run(options, args)
File "/home/sneha123/.local/lib/python2.7/site-packages/pip/commands/install.py", line 342, in run
prefix=options.prefix_path,
File "/home/sneha123/.local/lib/python2.7/site-packages/pip/req/req_set.py", line 784, in install
**kwargs
File "/home/sneha123/.local/lib/python2.7/site-packages/pip/req/req_install.py", line 851, in install
self.move_wheel_files(self.source_dir, root=root, prefix=prefix)
File "/home/sneha123/.local/lib/python2.7/site-packages/pip/req/req_install.py", line 1064, in move_wheel_files
isolated=self.isolated,
File "/home/sneha123/.local/lib/python2.7/site-packages/pip/wheel.py", line 345, in move_wheel_files
clobber(source, lib_dir, True)
File "/home/sneha123/.local/lib/python2.7/site-packages/pip/wheel.py", line 323, in clobber
shutil.copyfile(srcfile, destfile)
File "/usr/lib/python2.7/shutil.py", line 83, in copyfile
with open(dst, 'wb') as fdst:
IOError: [Errno 13] Permission denied: '/usr/local/lib/python2.7/dist-packages/pocketsphinx/__init__.pyc'

How to remove it??
ReplyDelete
Replies
Sneha M.October 3, 2017 at 10:29 AM
Please reply as early as possible!!
ReplyDelete
Replies
DAVEOctober 7, 2017 at 10:53 PM
Above code when run shows error :

Sphinx error; missing PocketSphinx module: ensure that PocketSphinx is set up correctly.

The code without the adjust_method also shows the same error.
Please Reply asap
ReplyDelete
Replies
ChandraNovember 23, 2017 at 2:52 PM
I got this error:

python: pcm.c:2757: snd_pcm_area_copy: Assertion `src < dst || src >= dst + bytes' failed.
Aborted

However, if I run below command on my terminal, it worked:

python -m speech_recognition

How can I solve this?
ReplyDelete
Replies
UnknownJanuary 9, 2018 at 11:16 AM
how to identify language automatically from audio.. using python. please tell me
ReplyDelete
Replies
UnknownApril 2, 2018 at 3:58 AM
Hello, I keep getting the following error:

File "C:\Users\ccatx\Downloads\pystuff\Lib\site-packages\pocketsphinx\pocketsphinx.py", line 275, in __init__
this = _pocketsphinx.new_Decoder(*args)

RuntimeError: new_Decoder returned -1

Do you have any suggestions on how to fix this issue? Thanks in advance!
ReplyDelete
Replies
UnknownApril 10, 2018 at 8:26 PM
Thanks for this web. I am trying to run your simple codes on this page but got
OSError: [Errno -9988] Stream closed

I think I installed pyaudio correctly. What shall I do? thanks,
ReplyDelete
Replies
UnknownApril 11, 2018 at 5:01 PM
CMU sphinx speech recognition is an open source it works offline . so if unplug my internet cable it should work ..but it is not working..why ?
ReplyDelete
Replies
Kamrul AhsanJuly 2, 2018 at 4:53 PM
This comment has been removed by the author.
ReplyDelete
Replies
Kamrul AhsanJuly 12, 2018 at 11:28 PM
Hello, thanks for your tutorial. It's awesome. I checked your other tutorials also, all are helpful.

Anyway, I made a speech recognition using Google Speech Recognition api. Everything works as expected but I find out that it is always listening. I just want to activate it when I say "Hello Mark". For example, Amazon Alexa. Alexa isn't always listening my voice. When I say "Alexa", it only then activate and take my voice. I want to implement the same technique in my voice recognition app. Is it possible? How?

Thanks again
ReplyDelete
Replies
UnknownAugust 5, 2018 at 8:18 PM
what if I wan't to do it with my own acoustic model?
ReplyDelete
Replies
UnknownAugust 13, 2018 at 10:16 AM
Hi sir,
while doing pip install pocketsphinx I am getting below error please help me to fix this.Thanks in advance for the help.
c:\users\kiran.koribilli\appdata\local\programs\python\python37-32\lib\distu
tils\dist.py:274: UserWarning: Unknown distribution option: 'long_description_co
ntent_type'
warnings.warn(msg)
running install
running build_ext
building 'sphinxbase._sphinxbase' extension
swigging deps/sphinxbase/swig/sphinxbase.i to deps/sphinxbase/swig/sphinxbas
e_wrap.c
C:\Users\kiran.koribilli\AppData\Local\Programs\Python\Python37-32\swig.exe
-python -modern -threads -Ideps/sphinxbase/include -Ideps/sphinxbase/include/sph
inxbase -Ideps/sphinxbase/include/win32 -Ideps/sphinxbase/swig -outdir sphinxbas
e -o deps/sphinxbase/swig/sphinxbase_wrap.c deps/sphinxbase/swig/sphinxbase.i
(1) : Error: Unable to find 'swig.swg'
(3) : Error: Unable to find 'python.swg'
deps\sphinxbase\swig\typemaps.i(1) : Error: Unable to find 'exception.i'
error: command 'C:\\Users\\kiran.koribilli\\AppData\\Local\\Programs\\Python
\\Python37-32\\swig.exe' failed with exit status 1

----------------------------------------
Command "c:\users\kiran.koribilli\appdata\local\programs\python\python37-32\pyth
on.exe -u -c "import setuptools, tokenize;__file__='C:\\Users\\KIRAN~1.KOR\\AppD
ata\\Local\\Temp\\pip-install-roviraq_\\pocketsphinx\\setup.py';f=getattr(tokeni
ze, 'open', open)(__file__);code=f.read().replace('\r\n', '\n');f.close();exec(c
ompile(code, __file__, 'exec'))" install --record C:\Users\KIRAN~1.KOR\AppData\L
ocal\Temp\pip-record-vkug6dg6\install-record.txt --single-version-externally-man
aged --compile" failed with error code 1 in C:\Users\KIRAN~1.KOR\AppData\Local\T
emp\pip-install-roviraq_\pocketsphinx\

C:\Users\kiran.koribilli>
ReplyDelete
Replies
Habib KenjrawyMarch 24, 2019 at 11:54 PM
hello
can i specify and determining voice commands in pocketsphinx?
I almost need just 10 commands
ReplyDelete
Replies
Habib KenjrawyApril 1, 2019 at 11:06 PM
how can i specify and determining voice commands in PocketSphinx which i almost need to recognize between ten words
ReplyDelete
Replies
UnknownSeptember 25, 2020 at 8:38 PM
Please wait. Calibrating microphone...
Traceback (most recent call last):
File "c:\Users\Mayank\Final_Project\new.py", line 8, in
r.adjust_for_ambient_noise(source, duration=5)
AttributeError: 'Recognizer' object has no attribute 'adjust_for_ambient_noise'

Bro I am getting a error like this.
ReplyDelete
Replies

Add comment

Pages

Tuesday, March 28, 2017

Easy Speech Recognition in Python with PyAudio and Pocketsphinx

24 comments: