Showing posts with label Tutorial. Show all posts
Showing posts with label Tutorial. Show all posts

Sunday, June 3, 2018

Setting up the OpenMV Cam

Few weeks back, we talked about OpenMV - the small embedded computer vision module with a built-in camera, that can be programmed to perform various vision tasks. It gives you the ability to bring computer vision in to your embedded projects.

After I first read about it, I was eager to get my hands on a kit. Their official site - openmv.io - offers international shipping, as well as links to local distributes in some countries. The shipping was quite fast.

The OpenMV Cam M7
The OpenMV Cam M7

First Impressions


The cam comes with a really nice clear plastic case. The headers (used to connect other shields on to the board) comes separate (pictured above). This gives you the option to solder them - or other types of headers - yourself if needed. Headers aren't essential for the basic usage of OpenMV. Both the case and the board itself has an excellent build quality.

The cam itself is smaller than I expected: about 2/3 the size of a credit card.

The size of the OpenMV Cam
The size of the OpenMV Cam

Tuesday, February 27, 2018

Track any object in a video with Dlib Correlation Trackers

Training an object detector is bit of a complicated task. You need to have a proper training dataset with the relevant bounding boxes, and then use something like a HOG feature extractor with a SVM classifier for the detection - such as the Dlib Object Detection classes (link).

But that's a lot of work if you just need to track an object across a limited number of frames, or just need to detect motion or direction of movement. For that, we can easily use the Correlation Trackers feature in Dlib.

Object Tracking
Object Tracking

See it in action,

Object Tracking in Action - Animated
Object Tracking in Action

Correlation Trackers - as their name suggests - works by correlating a set of pixels from one frame to the next.

Let's see how to build it.

Saturday, February 17, 2018

Using Data Augmentations in Keras

When I did the article on Using Bottleneck Features for Multi-Class Classification in Keras and TensorFlow, a few of you asked about using data augmentation in the model. So, I decided to do few articles experimenting various data augmentations on a bottleneck model. As a start, here's a quick tutorial explaining what data augmentation is, and how to do it in Keras.

The idea of augmenting the data is simple: we perform random transformations and normalization on the input data so that the model we’re training never sees the same input twice. With little data, this can greatly reduce the chance of the model overfitting.

But, trying to manually add transformations to the input data would be a tedious task.

Which is why Keras has built-in functions to do just that.

The Keras Preprocessing package has the ImageDataGeneraor function, which can be configured to perform the random transformations and the normalization of input images as needed. And, coupled with the flow() and flow_from_directory() functions, can be used to automatically load the data, apply the augmentations, and feed into the model.

Let’s write a small script to see the data augmentation capabilities of ImageDataGeneraor.

Wednesday, January 3, 2018

Visualizing the Convolutional Filters of the LeNet Model

First of all, Happy New Year to you all!

We have a great year ahead. And, let's start it with something interesting.

We've talked about how Convolutional Neural Networks (CNNs) are able to learn complex features from input procedurally through convolutional filters in each layer.

But, how does a convolutional filter really look like?

In today's post, let's try to visualize the convolutional filters of the LeNet model trained on the MNIST dataset (handwritten digit classification) - often considered the 'hello world' program of deep learning.

We can use a technique to visualize the filters from the article "How convolutional neural networks see the world" by François Chollet (the author of the Keras library). The original article is available at the Keras Blog: https://blog.keras.io/how-convolutional-neural-networks-see-the-world.html.

The original code is designed to work with the VGG16 model. Let’s modify it a bit to work with our LeNet model.

We need to load the LeNet model with its weights. You can follow the code here to train the model yourself and get the weights. Let's name the weights file as 'lenet_weights.hdf5'.

We'll start with the imports,

from scipy.misc import imsave
import numpy as np
import time
from keras import backend as K

from keras.models import Sequential
from keras.layers.convolutional import Conv2D
from keras.layers.convolutional import MaxPooling2D
from keras.layers.core import Activation
from keras.layers.core import Flatten
from keras.layers.core import Dense

from keras.optimizers import SGD

We need to build and load the LeNet model with the weights. So, we define a function - build_lenet - for it.

Tuesday, August 8, 2017

Using Bottleneck Features for Multi-Class Classification in Keras and TensorFlow

Training an Image Classification model - even with Deep Learning - is not an easy task. In order to get sufficient accuracy, without overfitting requires a lot of training data. If you try to train a deep learning model from scratch, and hope build a classification system with similar level of capability of an ImageNet-level model, then you'll need a dataset of about a million training examples (plus, validation examples also). Needless to say, it's not easy to acquire, or build such a dataset practically.

So, is there any hope for us to build a good image classification system ourselves?

Yes, there is!

Luckily, Deep Learning supports an immensely useful feature called 'Transfer Learning'. Basically, you are able to take a pre-trained deep learning model - which is trained on a large-scale dataset such as ImageNet - and re-purpose it to handle an entirely different problem. The idea is that since the model has already learned certain features from a large dataset, it may be able to use those features as a base to learn the particular classification problem we present it with.

This task is further simplified since popular deep learning models such as VGG16 and their pre-trained ImageNet weights are readily available. The Keras framework even has them built-in in the keras.applications package.

An image classification system built with transfer learning
An image classification system built with transfer learning


The basic technique to get transfer learning working is to get a pre-trained model (with the weights loaded) and remove final fully-connected layers from that model. We then use the remaining portion of the model as a feature extractor for our smaller dataset. These extracted features are called "Bottleneck Features" (i.e. the last activation maps before the fully-connected layers in the original model). We then train a small fully-connected network on those extracted bottleneck features in order to get the classes we need as outputs for our problem.

Friday, July 21, 2017

Snapchat like Image Overlays with Dlib, OpenCV, and Python

You're probably familiar with Snapchat, and it's filters feature where you can put some cool and funny image overlays on your face images. As computer vision enthusiasts, we typically look at applications like these, and try to understand how it's done, and whether we can build something similar.

It turns out, we can make our own application with Snapchat like image overlays using Python, OpenCV, and Dlib.

Snapchat like Image Overlays with Dlib, OpenCV, and Python
Snapchat like Image Overlays with Dlib, OpenCV, and Python

So, how do we build it?
  1. We'll first load the Webcam feed using OpenCV.
  2. We'll load an image (in our example, and image for the 'eye') to be used as the overlay.
  3. Use Dlib's face detection to localize the faces, and then use facial landmarks to find where the eyes are.
  4. Calculate the size and the position of the overlay for each eye.
  5. Finally, place the overlay image over each eye, resized to the correct size.

Let's start.

Friday, June 9, 2017

Wink Detection using Dlib and OpenCV

A couple of weeks ago, I was going through a tutorial for eye blink detection by Adrian at PyImageSearch. It was an excellent tutorial, which explained the use of Eye Aspect Ratio (EAR) in order to detect when an eye gets closed. Then, few weeks back, I was having a chat with Shirish Ranade, a reader of this blog and a fellow computer vision and machine learning enthusiast, on whether we can perform an action by winking at the computer. So, I decided to try out a code to detect winking.

Wink Detection Running with Dlib and OpenCV
Wink Detection Running with Dlib and OpenCV
It's an interesting idea to perform an action or a task just by winking at your computer. It can be thought as a form of gesture detection or facial expression detection as well. So, here's how you can build your own 'wink' detector for it.

We start by importing all the necessary packages,
 import numpy as np  
 import cv2  
 import dlib  
 from scipy.spatial import distance as dist  

Wednesday, May 3, 2017

Visualizing Keras Models - Updated

About 2 months back, I did a post on how you can visualize the structure of a Keras model. As I mentioned, when the machine learning (or deep learning) model you're building is complex, then it may be easier to understand it if you can see a visual representation of it.

I showed you how to use the Visualization utility in Keras in order to draw the structure of a model in Keras, such as this visualization of the LeNet model,

Visualizing the LeNet model
Visualizing the LeNet model

But a few days back, several people had got some errors when following the steps I explained. I digged a bit to find why the errors are happening, and found that with the latest version of Keras (v 2.*) they have changed the API of the visualization utility.

The following are the main changes,
  • The module has been renamed, from visualize_util to vis_utils.
  • The function name for plotting has been renamed, from plot to plot_model.
So, here's the updated guide on how to visualize a Keras model.

Tuesday, April 4, 2017

Extracting individual Facial Features from Dlib Face Landmarks

If you remember, in my last post on Dlib, I showed how to get the Face Landmark Detection feature of Dlib working with OpenCV. We saw how to use the pre-trained 68 facial landmark model that comes with Dlib with the shape predictor functionality of Dlib, and then to convert the output of into a numpy array to use it in an OpenCV context. We were able to get all 68 feature points on to our face image.

Dlib detecting the 68 Face Landmarks
Dlib detecting the 68 Face Landmarks

The 68 feature points which the Dlib model detects include the Jawline of the face, left and right eyes, left and right eyebrows, the nose, and the mouth. So, what if you only want to detect few of those features on a face? E.g. you may only want to detect the positions of the eyes and the nose. Is there a way to extract only few of the features from the Dlib shape predictor?

There is actually a very simple way to do that. Here’s how.

Tuesday, March 28, 2017

Easy Speech Recognition in Python with PyAudio and Pocketsphinx

If you remember, I was getting started with Audio Processing in Python (thinking of implementing an audio classification system) a couple of weeks back (see my earlier post). I got the PyAudio package setup and was having some success with it. As you know, one of the more interesting areas in audio processing in machine learning is Speech Recognition. So, although it wasn't my original intention of the project, I thought of trying out some speech recognition code as well.

I searched around to see what Python packages are available for the task and found the SpeechRecognition package.

Python Speech Recognition running with Sphinx
Python Speech Recognition running with Sphinx
SpeechRecognition is a library for Speech Recognition (as the name suggests), which can work with many Speech Engines and APIs. The current version supports the following engines and APIs,
  • CMU Sphinx
  • Google Speech Recognition
  • Google Cloud Speech API
  • Wit.ai
  • Microsoft Bing Voice Recognition
  • Houndify API
  • IBM Speech to Text
I decided to start with the Sphinx engine since it was the only one that worked offline. But keep in mind that Sphinx is not as accurate as something like Google Speech Recognition.

First, let's set up the SpeechRecognition package.

Friday, March 3, 2017

How to Graph Model Training History in Keras

When we are training a machine learning model in Keras, we usually keep track of how well the training is going (the accuracy and the loss of the model) using the values printed out in the console. Wouldn't it be great if we can visualize the training progress? Not only would it be easier to see how well the model trained, but it would also allow us to compare models.

Something like this?
Training accuracy and loss for 100 epochs


Well, you can actually do it quite easily, by using the History objects of Keras along with Matplotlib.

Saturday, November 12, 2016

Getting the LeNet model working with Face Recognition

In my last post, I talked about how the LeNet Convolutional Neural Network model is capable of handling much more complex data than the intended MNIST dataset. We saw how it got ~99% accuracy when it learned to identify 10 faces from the raw pixel intensities.

So, let’s see the code I used to get it working.

First of all, I needed a training dataset. For that, I created a set of face images of 10 subjects with around 500 images each.

Few of the images from the training dataset
The training dataset (yep, that's my face)

I use a file naming convention as <subject_label>-<subject_name>-<unique_number>.jpg (e.g. 0-Thimira-1475137898.65.jpg) for the training images to make it easier to read in and get the metadata of the images in one go. (I will do a separate post on how to easily create training datasets of face images like this).

We'll mainly be using Keras to build the model, and scikit-learn for some utility functions. We’ll need to import the following packages,
 from sklearn.cross_validation import train_test_split  
 from keras.optimizers import SGD  
 from keras.utils import np_utils  
 import numpy as np  
 import argparse  
 import cv2  
 import os  
 import sys  
 from PIL import Image