Tuesday, August 8, 2017

Using Bottleneck Features for Multi-Class Classification in Keras and TensorFlow

Training an Image Classification model - even with Deep Learning - is not an easy task. In order to get sufficient accuracy, without overfitting requires a lot of training data. If you try to train a deep learning model from scratch, and hope build a classification system with similar level of capability of an ImageNet-level model, then you'll need a dataset of about a million training examples (plus, validation examples also). Needless to say, it's not easy to acquire, or build such a dataset practically.

So, is there any hope for us to build a good image classification system ourselves?

Yes, there is!

Luckily, Deep Learning supports an immensely useful feature called 'Transfer Learning'. Basically, you are able to take a pre-trained deep learning model - which is trained on a large-scale dataset such as ImageNet - and re-purpose it to handle an entirely different problem. The idea is that since the model has already learned certain features from a large dataset, it may be able to use those features as a base to learn the particular classification problem we present it with.

This task is further simplified since popular deep learning models such as VGG16 and their pre-trained ImageNet weights are readily available. The Keras framework even has them built-in in the keras.applications package.

An image classification system built with transfer learning
An image classification system built with transfer learning

The basic technique to get transfer learning working is to get a pre-trained model (with the weights loaded) and remove final fully-connected layers from that model. We then use the remaining portion of the model as a feature extractor for our smaller dataset. These extracted features are called "Bottleneck Features" i.e. the last activation maps before the fully-connected layers in the original model). We then train a small fully-connected network on those extracted bottleneck features in order to get the classes we need as outputs for our problem.

How bottleneck feature extraction works on the VGG16 model
How bottleneck feature extraction works on the VGG16 model (Image from: https://blog.keras.io)

The Keras Blog has an excellent guide on how to build an image classification system for binary classification ('Cats' and 'Dogs' in their example) using bottleneck features. You can find the guide here: Building powerful image classification models using very little data.

However, the Keras guide doesn't show to use the same technique for multi-class classification, or how to use the finalized model to make predictions.

So, here's my tutorial on how to build a multi-class image classifier using bottleneck features in Keras running on TensorFlow, and how to use it to predict classes once trained.

Let's get started.

In this tutorial, I'm going to build a classifier for 10 different bird images. I only had around 150 images per class, which is nowhere near enough data to train a model from scratch.

First of all, we need to structure our training and validation datasets. We'll be using the ImageDataGenerator and flow_from_directory() functionality of Keras, so we need to create a directory structure where images of each class sits within its own sub-directory in the training and validation directories. So, I created the following directory structure,

Training and Validation datasets, structured into their own directories
Training and Validation datasets, structured into their own directories

Make sure all the sub-directories (classes) in the training set are present in the validation set also. And, remember that the names of the sub-directories will be the names of your classes.

In order to build out model, we need to go through the following steps,
  1. Save the bottleneck features from the VGG16 model.
  2. Train a small network using the saved bottleneck features to classify our classes, and save the model (we call this the 'top model').
  3. Use both the VGG16 model along with the top model to make predictions.

We start the code by importing the necessary packages,
 import numpy as np  
 from keras.preprocessing.image import ImageDataGenerator, img_to_array, load_img  
 from keras.models import Sequential  
 from keras.layers import Dropout, Flatten, Dense  
 from keras import applications  
 from keras.utils.np_utils import to_categorical  
 import matplotlib.pyplot as plt  
 import math  
 import cv2  

We'll be using OpenCV to display the result of a prediction. You can omit it if not needed.
Matplotlib is used to graph the model training history, so that we can see how well the model trained. See How to Graph Model Training History in Keras for more details on it.

We then define couple of parameters,
 # dimensions of our images.  
 img_width, img_height = 224, 224  
 top_model_weights_path = 'bottleneck_fc_model.h5'  
 train_data_dir = 'data/train'  
 validation_data_dir = 'data/validation'  
 # number of epochs to train top model  
 epochs = 50  
 # batch size used by flow_from_directory and predict_generator  
 batch_size = 16  

We add a function - save_bottlebeck_features() - to save the bottleneck features from the VGG16 model.

In the function, we create the VGG16 model - without the final fully-connected layers (by specifying include_top=False) - and load the ImageNet weights,
 model = applications.VGG16(include_top=False, weights='imagenet')  

We then create the data generator for training images, and run them on the VGG16 model to save the bottleneck features for training.
 datagen = ImageDataGenerator(rescale=1. / 255)  
 generator = datagen.flow_from_directory(  
     target_size=(img_width, img_height),  
 nb_train_samples = len(generator.filenames)  
 num_classes = len(generator.class_indices)  
 predict_size_train = int(math.ceil(nb_train_samples / batch_size))  
 bottleneck_features_train = model.predict_generator(  
     generator, predict_size_train)  
 np.save('bottleneck_features_train.npy', bottleneck_features_train)  

generator.filenames contains all the filenames of the training set. By getting its length, we can get the size of the training set.
generator.class_indices is the map/dictionary for the class-names and their indexes. Getting its length gives us the number of classes.

There is a small bug in predict_generator, where it can't determine the correct number of iterations when working on batches when the number of training samples isn't divisible by the batch size. So, we calculate it ourselves with the 'predict_size_train = int(math.ceil(nb_train_samples / batch_size))' line.

We do the same for the validation data,
 generator = datagen.flow_from_directory(  
     target_size=(img_width, img_height),  
 nb_validation_samples = len(generator.filenames)  
 predict_size_validation = int(math.ceil(nb_validation_samples / batch_size))  
 bottleneck_features_validation = model.predict_generator(  
     generator, predict_size_validation)  
 np.save('bottleneck_features_validation.npy', bottleneck_features_validation)  

With the bottleneck features saved, now we're ready to train our top model. We define a function for that also - train_top_model().

In order to train the top model, we need the class labels for each of the training/validation samples. We use a data generator for that also. We also need to convert the labels to categorical vectors.

 datagen_top = ImageDataGenerator(rescale=1./255)  
 generator_top = datagen_top.flow_from_directory(  
         target_size=(img_width, img_height),  
 nb_train_samples = len(generator_top.filenames)  
 num_classes = len(generator_top.class_indices)  
 # load the bottleneck features saved earlier  
 train_data = np.load('bottleneck_features_train.npy')  
 # get the class lebels for the training data, in the original order  
 train_labels = generator_top.classes  
 # convert the training labels to categorical vectors  
 train_labels = to_categorical(train_labels, num_classes=num_classes)  

We do the same for validation features as well,
 generator_top = datagen_top.flow_from_directory(  
         target_size=(img_width, img_height),  
 nb_validation_samples = len(generator_top.filenames)  
 validation_data = np.load('bottleneck_features_validation.npy')  
 validation_labels = generator_top.classes  
 validation_labels = to_categorical(validation_labels, num_classes=num_classes)  

Now create and train a small fully-connected network - the top model - using the bottleneck features as input, with our classes as the classifier output.

 model = Sequential()  
 model.add(Dense(256, activation='relu'))  
 model.add(Dense(num_classes, activation='sigmoid'))  
              loss='categorical_crossentropy', metrics=['accuracy'])  
 history = model.fit(train_data, train_labels,  
          validation_data=(validation_data, validation_labels))  
 (eval_loss, eval_accuracy) = model.evaluate(  
     validation_data, validation_labels, batch_size=batch_size, verbose=1)

 print("[INFO] accuracy: {:.2f}%".format(eval_accuracy * 100))  
 print("[INFO] Loss: {}".format(eval_loss))  

It's always better to see how well a model gets trained. So, we graph the training history,
 # summarize history for accuracy  
 plt.title('model accuracy')  
 plt.legend(['train', 'test'], loc='upper left')  
 # summarize history for loss  
 plt.title('model loss')  
 plt.legend(['train', 'test'], loc='upper left')  

Now we're ready to train our model. We call the two functions in sequence,

The top model training
The top model training

The training takes about 2 minutes on a GPU. On CPU however, it may take about 30 minutes.

The accuracy and loss of training and validation
The accuracy and loss of training and validation

I got around ~90% accuracy, and it doesn't looks like the model if overfitting. Which is awesome, since I only had around 150 images per class.

How to make a prediction from the trained model?

With our classification model trained - from the bottleneck features of a pre-trained model - the next question would be how we can use it. That is, how do we make a prediction using the model we just built?

In order to predict the class of an image, we need to run it through the same pipeline as before. Which means,
  1. We first run the image through the pre-trained VGG16 model (without the fully-connected layers again) and get the bottleneck predictions.
  2. We then run the bottleneck prediction through the trained top model - which we created in the previous step - and get the final classification.

We first load and pre-process the image,
 image_path = 'data/eval/Malabar_Pied_Hornbill.png'  
 orig = cv2.imread(image_path)  
 print("[INFO] loading and preprocessing image...")  
 image = load_img(image_path, target_size=(224, 224))  
 image = img_to_array(image)  
 # important! otherwise the predictions will be '0'  
 image = image / 255  
 image = np.expand_dims(image, axis=0)  

Pay close attention to the 'image = image / 255' step. Otherwise, all your predictions will be '0'.
Why is this needed? Remember that in our ImageDataGenerator we set rescale=1. / 255, which means all data is re-scaled from a [0 - 255] range to [0 - 1.0]. So, we need to do the same to the image we're trying to predict.

Now we run the image through the same pipeline,
 # build the VGG16 network  
 model = applications.VGG16(include_top=False, weights='imagenet')  
 # get the bottleneck prediction from the pre-trained VGG16 model  
 bottleneck_prediction = model.predict(image)  
 # build top model  
 model = Sequential()  
 model.add(Dense(256, activation='relu'))  
 model.add(Dense(num_classes, activation='sigmoid'))  
 # use the bottleneck prediction on the top model to get the final classification  
 class_predicted = model.predict_classes(bottleneck_prediction)  
Finally, we decode the prediction and show the result,
 inID = class_predicted[0]  
 class_dictionary = generator_top.class_indices  
 inv_map = {v: k for k, v in class_dictionary.items()}  
 label = inv_map[inID]  
 # get the prediction label  
 print("Image ID: {}, Label: {}".format(inID, label))  
 # display the predictions with the image  
 cv2.putText(orig, "Predicted: {}".format(label), (10, 30), cv2.FONT_HERSHEY_PLAIN, 1.5, (43, 99, 255), 2)  
 cv2.imshow("Classification", orig)  

The class_indices has the label as the key, and the index as the value.
e.g. In my example it looks like this,
{'Owl': 6, 'Wood Duck': 9, 'Toucan': 8, 'Puffin': 7, 'Malabar Pied Hornbill': 4, 'Egret': 1, 'Cotton Pygmy Goose': 0, 'Great Cormorant': 2, 'Mandarin': 5, 'Lesser Whistling Duck': 3}
Since the prediction prom the model give the index, it would be easier for us if we had a dictionary with index as the key and the label as the value. So, we invert the class_indices like this: inv_map = {v: k for k, v in class_dictionary.items()}

We use OpenCV to show the image, with its predicted class.

The result of a classification
The result of a classification

The complete code for this tutorial can be found here at GitHub.

You'll notice that the code isn't the most optimized. We can easily extract some of the repeated code - such as the multiple image data generators - out to some functions. But, I kept them as is since it's easier to walk through the code like that. You can share your thoughts on what's the best way to streamline the code.

We didn't cover fine-tuning the model to achieve even higher accuracy. I will cover it in a future tutorial.

Related links:


  1. Very good tutorial. What about doing random transformations in the image generation phase?

    1. Thank you.
      Yes, adding some random transformations to the training images should improve the accuracy in theory (at least, reduce the chance of overfitting). We can easily enable transformations by adding some data augmentation parameters (shear_range, zoom_range, horizontal_flip) to the ImageDataGenerator used for training dataset.
      I haven't tested it yet though. I'll try it out and post the results on how data augmentations affect the accuracy. If you get to try it out, let us know the results also.