Thursday, April 13, 2017

How deep should it be to be called Deep Learning?

If you remember, some time back, I made an article on What is Deep Learning?, in which I explored the confusion that many have on terms Artificial Intelligence, Machine Learning, and Deep Learning. We talked about how those terms relate to each other: how the drive to build an intelligent machine started the field of Artificial Intelligence, when building an intelligence from scratch proved too ambitious, how the field evolved into Machine Learning, and with the expansion of both the capabilities of computer hardware and our understanding of the natural brain, dawned the field of Deep Learning

We learned that the deeper and more complex models (compared to traditional models) of Deep Learning are able to consume massive amounts of data, and able to learn complex features by Hierarchical Feature Learning through multiple layers of abstraction. We saw that Deep Learning algorithms don’t have a "plateau in performance" compared to traditional machine learning algorithms: that they don’t have a limit on the amount of data they can ingest. Simply, the more data they are given, the better they would perform.

The Plateau in Performance in Traditional vs. Deep Learning
The Plateau in Performance in Traditional vs. Deep Learning

With the capabilities of Deep Learning grasped, there’s one question that usually comes up when one first learns about Deep Learning:

If we say that deeper and more complex models gives Deep Learning models the capabilities to surpass even human capabilities, then how deep a machine learning model should be to be considered a Deep Learning model?

I’ve had the same question when I was first getting started with Deep Learning, and I had few other Deep Learning enthusiasts asking me the same question.

It turns out, we were asking the wrong question. We need to look at Deep Learning from a different angle to understand it.

Let’s take a step back and see how a Deep Learning model works.

Convolutional Neural Networks are a prime example for Deep Learning. They were inspired by how the neurons are arranged in the visual cortex (the area of the brain which processes visual input). Here, not all neurons are connected to all of the inputs from the visual field. Instead, the visual field is ‘tiled’ with groups of neurons (called Receptive fields) which partially overlap each other.

Convolutional Neural Networks (CNNs) work in a similar way. They process in overlapping blocks of the input using mathematical convolution operators (which approximates how a receptive field works).

A Convolutional Neural Network
A Convolutional Neural Network

The first convolution layer users a set of convolution filters to identify a set of low level features from the input image. These identified low level features are then pooled (from the pooling layers) and given as the input to the next convolution layer, which uses another set of convolution filters to identify a set of higher level features from the lower level features identified earlier. This continues for several layers, where each convolution layer uses the inputs from the previous layer to identify higher level features than the previous layer. Finally, the output of the last convolution layer is passed on to a set of fully connected layers for the final classification.

This is the Hierarchical Feature Learning we talked about earlier, and it is the key of Deep Learning, and what differentiates it from traditional Machine learning algorithms.

Hierarchical Feature Learning
Hierarchical Feature Learning

A Deep Learning model (such as a Convolutional Neural Network) does not try to understand the entire problem at once. I.e. it does not try to grasp all the features of the input at once, as traditional algorithms did. What it does is look at the input piece by piece, and derive lower level patterns/features from it. It then uses these lower level features to gradually identify higher level features, through many layers, hierarchically. This allows Deep Learning models to learn complicated patterns, by gradually building them up from simpler ones. This also allows Deep Learning models to comprehend the world better, and they not only ‘see’ the features, but also see the hierarchy of how those features are built upon.

And of course, having to learn features hierarchically means that the model must have many layers in it. Which means that such a model will be ‘deep’.

That brings us back to our original question: It is not that we call deep models as Deep Learning. It is that in order to achieve hierarchical learning the models need to be deep. The deepness is a byproduct of implementing Hierarchical Feature Learning.

So, how do we identify whether a model is a Deep Learning model or now?

Simply, if the model uses Hierarchical Feature Learning – identifying lower level features first, and then build upon them to identify higher level features - (e.g. by using convolution filters) then it is a Deep Learning model. If not, then no matter how many layers your model has (e.g. a neural network with only fully connected layers) then it’s not considered a Deep Learning model.

Related posts:
What is Deep Learning?

Related links:

Build Deeper: Deep Learning Beginners' Guide is the ultimate guide for anyone taking their first step into Deep Learning.

Get your copy now!


  1. Unsupervised feature learning can be quite passive and doesn't require search to connect concepts together. Time will tell how useful or not it is and whether the human brain uses much of it or not. I wonder how it relates to "feedback alignment?"

    1. This paper is very interesting. Thanks a lot for posting it :)
      I was not familiar with the "feedback alignment" concept. Going through it now.