Showing posts with label PyAudio. Show all posts
Showing posts with label PyAudio. Show all posts

Wednesday, April 26, 2017

Energy Threshold Calibration in Speech Recognition

In my last post on Speech Recognition, I showed how to setup the Python SpeechRecognition package with PyAudio, and pocketsphinx to recognize speech with just a few lines of code. And, as you can remember, we ran into issues where the speech recognition just hangs there unable to recognize our speaking.

Speech Recognition just hanging there, not recognizing that you're speaking
Speech Recognition just hanging there, not recognizing that you're speaking

We found out that this was happening due to ambient noise.

Although we humans are able to distinguish speech from noise naturally, for a computer program they are just audio levels. It needs to know which levels should be considered speech (which it needs to process in order to recognize what's being said), and which levels should be considered silence or background noise. So, libraries like the SpeechRecognition has an energy threshold set which defines what audio level and above should be considered speech.

Now, this default energy threshold works most of the time. If your environment is sufficiently quiet, it will be able to recognize you talking without problems. But, if your environment is noisy - e.g. an office environment with many people talking, or there's machinery around - then the program will have issues distinguishing speech from noise, which will cause the issue we observed.

So, in a situation like that, we should adjust the energy threshold to properly distinguish the speech from noise. The SpeechRecognition package has a couple of parameters that helps you with this.

Tuesday, March 28, 2017

Easy Speech Recognition in Python with PyAudio and Pocketsphinx

If you remember, I was getting started with Audio Processing in Python (thinking of implementing an audio classification system) a couple of weeks back (see my earlier post). I got the PyAudio package setup and was having some success with it. As you know, one of the more interesting areas in audio processing in machine learning is Speech Recognition. So, although it wasn't my original intention of the project, I thought of trying out some speech recognition code as well.

I searched around to see what Python packages are available for the task and found the SpeechRecognition package.

Python Speech Recognition running with Sphinx
Python Speech Recognition running with Sphinx
SpeechRecognition is a library for Speech Recognition (as the name suggests), which can work with many Speech Engines and APIs. The current version supports the following engines and APIs,
  • CMU Sphinx
  • Google Speech Recognition
  • Google Cloud Speech API
  • Wit.ai
  • Microsoft Bing Voice Recognition
  • Houndify API
  • IBM Speech to Text
I decided to start with the Sphinx engine since it was the only one that worked offline. But keep in mind that Sphinx is not as accurate as something like Google Speech Recognition.

First, let's set up the SpeechRecognition package.

Friday, January 13, 2017

Stepping into audio classification - Getting started with PyAudio

I've had an idea to attempt to train a deep learning model that can classify different audio noises. Having a system that can accurately identify different sounds would have implications in many fields, from medical diagnostics to echo locations systems, among others. I wouldn't expect building such a system would be an easy task. But, let's give it a try.

I first had the idea about 8 years back. But back then, libraries that enable audio processing were scarce, and what was available required a massive effort to get installed. Now it seems that things have improved a lot. I myself was searching online for audio processing libraries for Python, and I came across this excellent post - Realtime Audio Visualization in Python - by Scott Harden of University of Florida.

Realtime Audio Visualization with PyAudio
Realtime Audio Visualization with PyAudio

The tutorial uses the PyAudio Python package (PyAudio homepage), which is the Python bindings for PortAudio (PortAudio homepage) - a cross-platform audio I/O library - allowing PyAudio to give a consistent interface to process audio across platforms. So, I decided to give PyAudio a try. First of all, I needed to install PyAudio.

Installing PyAudio


At the time of this writing, the latest version of PyAudio is v0.2.9, and the PyAudio team has made the installation as simple as possible.