What is Deep Learning?

You might be coming from a traditional computer science background learning about AI and Machine Learning. You could have been in the field of Artificial Intelligence for many years, or maybe just not wetting your feet in the field. Or, you could be a technology enthusiast diving into the latest technology trends. In any of those cases, you might be confronted with a multitude of terms such as "Artificial Intelligence", "Machine Learning", and "Deep Learning".

In recent times, the term Deep Learning seems to be the trend, so much so that it is now associated with some of the consumer technologies as well. Google, Apple, Amazon, Microsoft, IBM, and many others are talking more and more about Deep Learning. You might be trying to figure out what each of these terms - Artificial Intelligence, Machine Learning, Deep Learning - mean, how they relates to each other, or whether these can be used interchangeably.

I've also had the same problems when I started diving into this field. And, things weren't easy to be figured out mainly because of the sheer amount of information available out there.

The confusion of Deep Learning
The confusion of Deep Learning

So, after much research, here’s a simplified explanation on how the terms Artificial Intelligence, Machine Learning, and Deep Learning came to be, and how they relate to each other.

The evolution of Artificial Intelligence, Machine Learning, and Deep Learning
The evolution of Artificial Intelligence, Machine Learning, and Deep Learning


Artificial Intelligence


Artificial Intelligence is the idea that machines (or computers) can be built that has intelligence parallel (or greater) to that of a human, giving them capability to perform tasks that requires human intelligence to perform.

The idea of an intelligent machine has been around since 1300's, and through 19th century - where mathematicians like George Boole and Gottlob Frege refined the concepts of  Propositional Logic and Predicate Calculus. But the Dartmouth Conferences in 1956 is what’s commonly considered as the starting point of the formal research field of Artificial Intelligence. Since then the field of AI has gone through many ups-and-downs and has branched out into many sub fields. There has been attempts at applying AI for various fields – such as medical, finance, aviation, machinery etc. – with various degrees of success.

Around the late 1990s and early 2000s, the researchers identified a problem in their approach to AI, which was slowing down the success of AI – in order to artificially create a machine with an intelligence, one must first need to understand how intelligence work. But even today, we do not have a complete definition of what we call "intelligence".

In order to tackle the problem, they decided to go ground-up – rather than trying to build an intelligence, they could look into building a system that can grow its own intelligence. This idea created the new sub-field of AI called Machine Learning.


Machine Learning


Machine Learning is a subset of Artificial Intelligence which aims at providing machines with the ability to learn without explicitly programming. The idea is that such machines (or computer programs) once built will be able to evolve and adapt when they are exposed to new data.

The main idea behind Machine Learning is the ability of a learner to generalize from its experience. The learner (or the program), once given a set of training cases, must be able to build a generalized model upon them, which would allow it to decide upon new cases with sufficient accuracy.

Based on the approach, there are 3 learning methods of Machine Learning systems,

  • Supervised Learning – the system is given a set of labelled cases (training set) and asked to create a generalized model on those to act on unseen cases.
  • Unsupervised Learning – the system is given a set of cases unlabelled, and asked to find a pattern in them. Good for discovering hidden patterns.
  • Reinforcement Learning – the system is asked to take an action, and is given a reward. The system must learn which actions would yield most rewards in certain situations.

With these techniques, the field of machine learning flourished. They were particularly successful in the areas of Computer Vision and Text Analysis.

By around 2010, few things happened that influenced machine learning further. With the ever advancing technology, more computing power became available, making evaluating more complex machine learning models easier. Data processing and storage became cheaper, so more data became available to machine learning systems to consume. Parallel to these, our understanding of how the natural brain works also increased, allowing us to model new machine learning algorithms around them.

All these findings propelled a new area of Machine Learning called Deep Learning.


Deep Learning


Deep Learning is a subset of Machine Learning which focuses on an area of algorithms which was inspired by our understanding of how the brain works in order to obtain knowledge. It’s also referred to as Deep Structured Learning or Hierarchical Learning.

One of the definitions of Deep Learning is,
"A sub-field within machine learning that is based on algorithms for learning multiple levels of representation in order to model complex relationships among data. Higher-level features and concepts are thus defined in terms of lower-level ones, and such a hierarchy of features is called a deep architecture" - Deep Learning: Methods and Applications
Deep Learning builds upon the idea of Artificial Neural Networks and scales it up to be able to consume large amounts of data by deepening (adding more layers) the networks. By having a large number of layers a deep learning model has the capability of extracting features from raw data and "learn" about those features little-by-little in each layer, building up to the higher-level knowledge of the data. This technique is called Hierarchical Feature Learning, and it allows such systems to automatically learn complex features through multiple levels of abstraction with minimal human intervention.
"The hierarchy of concepts allows the computer to learn complicated concepts by building them out of simpler ones. If we draw a graph showing how these concepts are built on top of each other, the graph is deep, with many layers. For this reason, we call this approach to AI deep learning." - Deep Learning. MIT Press, Ian Goodfellow and Yoshua Bengio and Aaron Courville.
One of the most distinct characteristics of Deep Learning – and one that made it quite popular and practical – is that it scales well, that is, the more data given to it, the better it performs. Unlike many older machine learning algorithms which has a higher bound to the amount of data they can ingest – often called a "plateau in performance" – Deep Learning models has no such limitations (theoretically) and they may be able to go beyond human comprehension, which is evident with the modern deep learning based image processing systems being able to outperform humans.

The Plateau in Performance in Traditional vs. Deep Learning
The Plateau in Performance in Traditional vs. Deep Learning

Convolutional Neural Networks


Convolutional Neural Networks are a prime example for Deep Learning. They were inspired by how the neurons are arranged in the visual cortex (the area of the brain which processes visual input). Here, not all neurons are connected to all of the inputs from the visual field. Instead, the visual field is ‘tiled’ with groups of neurons (called Receptive fields) which partially overlap each other.

Convolutional Neural Networks (CNNs) work in a similar way. They process in overlapping blocks of the input using mathematical convolution operators (which approximates how a receptive field works).

A Convolutional Neural Network
A Convolutional Neural Network

The first convolution layer users a set of convolution filters to identify a set of low level features from the input image. These identified low level features are then pooled (from the pooling layers) and given as the input to the next convolution layer, which uses another set of convolution filters to identify a set of higher level features from the lower level features identified earlier. This continues for several layers, where each convolution layer uses the inputs from the previous layer to identify higher level features than the previous layer. Finally, the output of the last convolution layer is passed on to a set of fully connected layers for the final classification.

How Deep?


With the capabilities of Deep Learning grasped, there’s one question that usually comes up when one first learns about Deep Learning:

If we say that deeper and more complex models gives Deep Learning models the capabilities to surpass even human capabilities, then how deep a machine learning model should be to be considered a Deep Learning model?

It turns out, we were asking the wrong question. We need to look at Deep Learning from a different angle to understand it.

Let’s take a step back and see how a Deep Learning model works. Let's take the Convolutional Neural Networks as the example again.

As mentioned above, the convolution filters of a CNN attempts to identify lower-level features first, and use those identified features to identify higher-level features gradually through multiple steps.

This is the Hierarchical Feature Learning we talked about earlier, and it is the key of Deep Learning, and what differentiates it from traditional Machine learning algorithms.

Hierarchical Feature Learning
Hierarchical Feature Learning

A Deep Learning model (such as a Convolutional Neural Network) does not try to understand the entire problem at once. I.e. it does not try to grasp all the features of the input at once, as traditional algorithms did. What it does is look at the input piece by piece, and derive lower level patterns/features from it. It then uses these lower level features to gradually identify higher level features, through many layers, hierarchically. This allows Deep Learning models to learn complicated patterns, by gradually building them up from simpler ones. This also allows Deep Learning models to comprehend the world better, and they not only ‘see’ the features, but also see the hierarchy of how those features are built upon.

And of course, having to learn features hierarchically means that the model must have many layers in it. Which means that such a model will be ‘deep’.

That brings us back to our original question: It is not that we call deep models as Deep Learning. It is that in order to achieve hierarchical learning the models need to be deep. The deepness is a byproduct of implementing Hierarchical Feature Learning.

So, how do we identify whether a model is a Deep Learning model or now?

Simply, if the model uses Hierarchical Feature Learning – identifying lower level features first, and then build upon them to identify higher level features - (e.g. by using convolution filters) then it is a Deep Learning model. If not, then no matter how many layers your model has (e.g. a neural network with only fully connected layers) then it’s not considered a Deep Learning model.

Putting it all together


Getting back to our original question: How does the areas of Artificial Intelligence, Machine Learning and Deep Learning relate to each other?

How Artificial Intelligence, Machine Learning, and Deep Learning relates to each other
How Artificial Intelligence, Machine Learning, and Deep Learning relates to each other


Simply put, Machine Learning is a subset of Artificial Intelligence, and Deep Learning is a subset of Machine Learning, all working towards the common goal of creating an intelligent machine.

With the capabilities demonstrated and the success achieved by Deep Learning, we may be a step closer to the ultimate goal of Artificial Intelligence – building a machine with a human level (or greater) intelligence.

Now that you have got an idea on what Deep Learning is, head over to the Codes of Interest Blog to get hands on with Deep Learning.


Books on Deep Learning:
  • Deep Learning (Adaptive Computation and Machine Learning series) - The MIT Press - Ian Goodfellow, Yoshua Bengio, Aaron Courville

    http://amzn.to/2nHhCUj

  • Deep Learning: Methods and Applications (Foundations and Trends(r) in Signal Processing) - Now Publishers - Li Deng, Dong Yu

    http://amzn.to/2oGoXkW


Related links:

4 comments:

  1. Great de-mystifier into Deep Learning, Thimira!

    It'd have taken me days of reading to extract out the essence you have so succinctly put into words.
    Thanks indeed!

    ReplyDelete
  2. This post about deep learning was so useful for me. I will offer it to my colleagues with whom I work over data room services . thank you so much

    ReplyDelete
    Replies
    1. You are welcome :)
      If you have any questions, feel free to contact me.

      Delete