Friday, June 9, 2017

Wink Detection using Dlib and OpenCV

A couple of weeks ago, I was going through a tutorial for eye blink detection by Adrian at PyImageSearch. It was an excellent tutorial, which explained the use of Eye Aspect Ratio (EAR) in order to detect when an eye gets closed. Then, few weeks back, I was having a chat with Shirish Ranade, a reader of this blog and a fellow computer vision and machine learning enthusiast, on whether we can perform an action by winking at the computer. So, I decided to try out a code to detect winking.

Wink Detection Running with Dlib and OpenCV
Wink Detection Running with Dlib and OpenCV
It's an interesting idea to perform an action or a task just by winking at your computer. It can be thought as a form of gesture detection or facial expression detection as well. So, here's how you can build your own 'wink' detector for it.

We start by importing all the necessary packages,
 import numpy as np  
 import cv2  
 import dlib  
 from scipy.spatial import distance as dist  

Tuesday, May 23, 2017

OpenCV 3.2 and Dlib 19.4 Packages Now Available from Conda-Forge

Anaconda is an asset for us Machine Learning enthusiasts. Not only does it have the ability to create fully isolated Python environments, its pre-built packages for many operating systems and architectures helps you to spend less time setting up, and more time doing actual machine learning stuff.

In some scenarios, to get some Python packages to work in certain environments, getting them from Anaconda was the only way that worked. If you tried installing Dlib or OpenCV 3 on Windows 64-Bit, then you know the effort it takes to get them setup, if it wasn't for Anaconda.

OpenCV and Dlib working perfectly together, thanks to Conda
OpenCV and Dlib working perfectly together, thanks to Conda

If you check my posts on Installing Dlib on Anaconda Python on Windows, and Installing OpenCV 3 on Anaconda Python 3.5 on Windows, you know how easy it is to use conda to install them on Python 3.5 64-Bit on Windows.

But there was a catch: The Anaconda registry only had OpenCV 3.1 and Dlib 19.0 - not the latest versions.

Thursday, May 18, 2017

What is Deep Learning? - Updated

What is Deep Learning? And, how does it relates to Machine Learning, and Artificial Intelligence?

I did an article to answer these questions some time back.

Now, thanks to the feedback I got from you all, I was able to updated it, with more clarifications, improved examples, and answers to more questions in Deep Learning.


Check out the updated article here,


Your feedback are always welcome.

Tuesday, May 9, 2017

image_data_format vs. image_dim_ordering in Keras v2

If you have been using Keras for some time, then you would probably know the image_dim_ordering parameter of Keras. Specially, if you switch between TensorFlow and Theano backends frequently when using Keras.

When I first started using Keras for image classification, most of my experiments failed because I have set the image_dim_ordering incorrectly. Learning from my mistakes, last year I did a post on what image_dim_ordering is and why is it important.

In short, image_dim_ordering instructed Keras to properly rearrange the image data structure when passing to the backend:
Both TensorFlow and Theano expects 4D tensors of image data as input. But, while TensorFlow expects its structure/shape to be (samples, rows, cols, channels), Theano expects it to be (samples, channels, rows, cols). So, setting the image_dim_ordering to 'tf' made Keras use the TensorFlow ordering, while setting it to 'th' made it Theano ordering.

At least, that's how it used to work.

But recently, if you have updated to the latest version of Keras, you might have run into issues with the dimension ordering, even if you're sure that you set the image_dim_ordering correctly.

You may have gotten errors like,
 ValueError: The shape of the input to "Flatten" is not fully defined (got (0, 7,  
  50). Make sure to pass a complete "input_shape" or "batch_input_shape" argument  
  to the first layer in your model.  

It may seem to you that Keras has started to ignore your image_dim_ordering setting.

And you're right.

Wednesday, May 3, 2017

Visualizing Keras Models - Updated

About 2 months back, I did a post on how you can visualize the structure of a Keras model. As I mentioned, when the machine learning (or deep learning) model you're building is complex, then it may be easier to understand it if you can see a visual representation of it.

I showed you how to use the Visualization utility in Keras in order to draw the structure of a model in Keras, such as this visualization of the LeNet model,

Visualizing the LeNet model
Visualizing the LeNet model

But a few days back, several people had got some errors when following the steps I explained. I digged a bit to find why the errors are happening, and found that with the latest version of Keras (v 2.*) they have changed the API of the visualization utility.

The following are the main changes,
  • The module has been renamed, from visualize_util to vis_utils.
  • The function name for plotting has been renamed, from plot to plot_model.
So, here's the updated guide on how to visualize a Keras model.

Wednesday, April 26, 2017

Energy Threshold Calibration in Speech Recognition

In my last post on Speech Recognition, I showed how to setup the Python SpeechRecognition package with PyAudio, and pocketsphinx to recognize speech with just a few lines of code. And, as you can remember, we ran into issues where the speech recognition just hangs there unable to recognize our speaking.

Speech Recognition just hanging there, not recognizing that you're speaking
Speech Recognition just hanging there, not recognizing that you're speaking

We found out that this was happening due to ambient noise.

Although we humans are able to distinguish speech from noise naturally, for a computer program they are just audio levels. It needs to know which levels should be considered speech (which it needs to process in order to recognize what's being said), and which levels should be considered silence or background noise. So, libraries like the SpeechRecognition has an energy threshold set which defines what audio level and above should be considered speech.

Now, this default energy threshold works most of the time. If your environment is sufficiently quiet, it will be able to recognize you talking without problems. But, if your environment is noisy - e.g. an office environment with many people talking, or there's machinery around - then the program will have issues distinguishing speech from noise, which will cause the issue we observed.

So, in a situation like that, we should adjust the energy threshold to properly distinguish the speech from noise. The SpeechRecognition package has a couple of parameters that helps you with this.

Thursday, April 13, 2017

How deep should it be to be called Deep Learning?

If you remember, some time back, I made an article on What is Deep Learning?, in which I explored the confusion that many have on terms Artificial Intelligence, Machine Learning, and Deep Learning. We talked about how those terms relate to each other: how the drive to build an intelligent machine started the field of Artificial Intelligence, when building an intelligence from scratch proved too ambitious, how the field evolved into Machine Learning, and with the expansion of both the capabilities of computer hardware and our understanding of the natural brain, dawned the field of Deep Learning

We learned that the deeper and more complex models (compared to traditional models) of Deep Learning are able to consume massive amounts of data, and able to learn complex features by Hierarchical Feature Learning through multiple layers of abstraction. We saw that Deep Learning algorithms don’t have a "plateau in performance" compared to traditional machine learning algorithms: that they don’t have a limit on the amount of data they can ingest. Simply, the more data they are given, the better they would perform.

The Plateau in Performance in Traditional vs. Deep Learning
The Plateau in Performance in Traditional vs. Deep Learning


With the capabilities of Deep Learning grasped, there’s one question that usually comes up when one first learns about Deep Learning:

If we say that deeper and more complex models gives Deep Learning models the capabilities to surpass even human capabilities, then how deep a machine learning model should be to be considered a Deep Learning model?

I’ve had the same question when I was first getting started with Deep Learning, and I had few other Deep Learning enthusiasts asking me the same question.

It turns out, we were asking the wrong question. We need to look at Deep Learning from a different angle to understand it.

Let’s take a step back and see how a Deep Learning model works.