Thursday, May 18, 2017

What is Deep Learning? - Updated

What is Deep Learning? And, how does it relates to Machine Learning, and Artificial Intelligence?

I did an article to answer these questions some time back.

Now, thanks to the feedback I got from you all, I was able to updated it, with more clarifications, improved examples, and answers to more questions in Deep Learning.


Check out the updated article here,


Your feedback are always welcome.

Tuesday, May 9, 2017

image_data_format vs. image_dim_ordering in Keras v2

If you have been using Keras for some time, then you would probably know the image_dim_ordering parameter of Keras. Specially, if you switch between TensorFlow and Theano backends frequently when using Keras.

When I first started using Keras for image classification, most of my experiments failed because I have set the image_dim_ordering incorrectly. Learning from my mistakes, last year I did a post on what image_dim_ordering is and why is it important.

In short, image_dim_ordering instructed Keras to properly rearrange the image data structure when passing to the backend:
Both TensorFlow and Theano expects 4D tensors of image data as input. But, while TensorFlow expects its structure/shape to be (samples, rows, cols, channels), Theano expects it to be (samples, channels, rows, cols). So, setting the image_dim_ordering to 'tf' made Keras use the TensorFlow ordering, while setting it to 'th' made it Theano ordering.

At least, that's how it used to work.

But recently, if you have updated to the latest version of Keras, you might have run into issues with the dimension ordering, even if you're sure that you set the image_dim_ordering correctly.

You may have gotten errors like,
 ValueError: The shape of the input to "Flatten" is not fully defined (got (0, 7,  
  50). Make sure to pass a complete "input_shape" or "batch_input_shape" argument  
  to the first layer in your model.  

It may seem to you that Keras has started to ignore your image_dim_ordering setting.

And you're right.

Wednesday, May 3, 2017

Visualizing Keras Models - Updated

About 2 months back, I did a post on how you can visualize the structure of a Keras model. As I mentioned, when the machine learning (or deep learning) model you're building is complex, then it may be easier to understand it if you can see a visual representation of it.

I showed you how to use the Visualization utility in Keras in order to draw the structure of a model in Keras, such as this visualization of the LeNet model,

Visualizing the LeNet model
Visualizing the LeNet model

But a few days back, several people had got some errors when following the steps I explained. I digged a bit to find why the errors are happening, and found that with the latest version of Keras (v 2.*) they have changed the API of the visualization utility.

The following are the main changes,
  • The module has been renamed, from visualize_util to vis_utils.
  • The function name for plotting has been renamed, from plot to plot_model.
So, here's the updated guide on how to visualize a Keras model.

Wednesday, April 26, 2017

Energy Threshold Calibration in Speech Recognition

In my last post on Speech Recognition, I showed how to setup the Python SpeechRecognition package with PyAudio, and pocketsphinx to recognize speech with just a few lines of code. And, as you can remember, we ran into issues where the speech recognition just hangs there unable to recognize our speaking.

Speech Recognition just hanging there, not recognizing that you're speaking
Speech Recognition just hanging there, not recognizing that you're speaking

We found out that this was happening due to ambient noise.

Although we humans are able to distinguish speech from noise naturally, for a computer program they are just audio levels. It needs to know which levels should be considered speech (which it needs to process in order to recognize what's being said), and which levels should be considered silence or background noise. So, libraries like the SpeechRecognition has an energy threshold set which defines what audio level and above should be considered speech.

Now, this default energy threshold works most of the time. If your environment is sufficiently quiet, it will be able to recognize you talking without problems. But, if your environment is noisy - e.g. an office environment with many people talking, or there's machinery around - then the program will have issues distinguishing speech from noise, which will cause the issue we observed.

So, in a situation like that, we should adjust the energy threshold to properly distinguish the speech from noise. The SpeechRecognition package has a couple of parameters that helps you with this.

Thursday, April 13, 2017

How deep should it be to be called Deep Learning?

If you remember, some time back, I made an article on What is Deep Learning?, in which I explored the confusion that many have on terms Artificial Intelligence, Machine Learning, and Deep Learning. We talked about how those terms relate to each other: how the drive to build an intelligent machine started the field of Artificial Intelligence, when building an intelligence from scratch proved too ambitious, how the field evolved into Machine Learning, and with the expansion of both the capabilities of computer hardware and our understanding of the natural brain, dawned the field of Deep Learning

We learned that the deeper and more complex models (compared to traditional models) of Deep Learning are able to consume massive amounts of data, and able to learn complex features by Hierarchical Feature Learning through multiple layers of abstraction. We saw that Deep Learning algorithms don’t have a "plateau in performance" compared to traditional machine learning algorithms: that they don’t have a limit on the amount of data they can ingest. Simply, the more data they are given, the better they would perform.

The Plateau in Performance in Traditional vs. Deep Learning
The Plateau in Performance in Traditional vs. Deep Learning


With the capabilities of Deep Learning grasped, there’s one question that usually comes up when one first learns about Deep Learning:

If we say that deeper and more complex models gives Deep Learning models the capabilities to surpass even human capabilities, then how deep a machine learning model should be to be considered a Deep Learning model?

I’ve had the same question when I was first getting started with Deep Learning, and I had few other Deep Learning enthusiasts asking me the same question.

It turns out, we were asking the wrong question. We need to look at Deep Learning from a different angle to understand it.

Let’s take a step back and see how a Deep Learning model works.

Tuesday, April 4, 2017

Extracting individual Facial Features from Dlib Face Landmarks

If you remember, in my last post on Dlib, I showed how to get the Face Landmark Detection feature of Dlib working with OpenCV. We saw how to use the pre-trained 68 facial landmark model that comes with Dlib with the shape predictor functionality of Dlib, and then to convert the output of into a numpy array to use it in an OpenCV context. We were able to get all 68 feature points on to our face image.

Dlib detecting the 68 Face Landmarks
Dlib detecting the 68 Face Landmarks

The 68 feature points which the Dlib model detects include the Jawline of the face, left and right eyes, left and right eyebrows, the nose, and the mouth. So, what if you only want to detect few of those features on a face? E.g. you may only want to detect the positions of the eyes and the nose. Is there a way to extract only few of the features from the Dlib shape predictor?

There is actually a very simple way to do that. Here’s how.

Tuesday, March 28, 2017

Easy Speech Recognition in Python with PyAudio and Pocketsphinx

If you remember, I was getting started with Audio Processing in Python (thinking of implementing a audio classification system) couple of weeks back (my earlier post). I got the PyAudio package setup and was having some success with it. As you know, one of the more interesting areas in audio processing in machine learning is Speech Recognition. So, although it wasn't my original intention of the project, I thought of trying out some speech recognition code as well.

I searched around to see what Python packages are available for the task, and found the SpeechRecognition package.

Python Speech Recognition running with Sphinx
Python Speech Recognition running with Sphinx
SpeechRecognition is a library for Speech Recognition (as the name suggests), which can work with many Speech Engines and APIs. The current version supports the following engines and APIs,
  • CMU Sphinx
  • Google Speech Recognition
  • Google Cloud Speech API
  • Wit.ai
  • Microsoft Bing Voice Recognition
  • Houndify API
  • IBM Speech to Text
I decided to start out with the Sphinx engine, since it was the only one that worked offline. But keep in mind that Sphinx is not as accurate as something like Google Speech Recognition.

First, let's setup the SpeechRecognition package.