Feature Learning , Deep Learning and Machine learning

Machine learning is a very successful technology but applying it today often requires spending substantial effort hand-designing features. This is true for applications in vision, audio, and text

Any machine learning algorithm performs as good as provided features are. Let’s understand this by using image classification example. When we try to classify an image into “motorcycle” and “Not Motorcycle”.

Algorithm needs features so that it can draw information from them.

Earlier Several researchers have spent decades to hand design these features . Below image shows sample process of creating features.

fig 1 – Feature vector creation

In the case of images, audio, and text , coming up with features is difficult , time-consuming and it requires expert knowledge. When we work on applications of learning, we spend a lot of time in tuning these features.

So this is where usual machine learning fails us. What about if we can automate this feature learning task instead of hand -engineering them ?

Self-taught learning/ Unsupervised Feature Learning

In particular, the promise of self-taught learning and unsupervised feature learning is that if we can get our algorithms to learn from ”unlabeled” data, then we can easily obtain and learn from massive amounts of it. Even though a single unlabeled example is less informative than a single labeled example, if we can get tons of the former—for example, by downloading random unlabeled images/audio clips/text documents off the internet—and if our algorithms can exploit this unlabeled data effectively, then we might be able to achieve better performance than the massive hand-engineering and massive hand-labeling approaches.

In Self-taught learning and Unsupervised feature learning, we will give our algorithms a large amount of unlabeled data with which to learn a good feature representation of the input.

Deep Learning

“Deep Learning” algorithms can automatically learn feature representations (often from unlabeled data) thus avoiding a lot of time-consuming engineering. These algorithms are based on building massive artificial neural networks that were loosely inspired by cortical (brain) computations. Below image shows comparison of deep learning feature discovery process among other algorithms.

fig 2 – Deep Learning Feature creation

To simulate the brain’s visual processing, sparse coding was developed to explain early visual processing in the brain(edge – detection).

Input: Images x (1) , x (2) , …, x (m) (each in Rn* n)

Learn: Dictionary of bases f1 , f2 , …, fk (also Rn*n), so that each input x can be approximately decomposed as:

x = å aj fj s.t. aj ’s are mostly zero (“sparse”)

Sparse coding algorithm automatically learns to represent an image in terms of the edges that appear in it. It gives a more succinct, higher-level representation than the raw pixels.

Lets understand this by using below example.

fig 3 – Face detection using deep learning

In the first layer, deep learning algorithm uses sparse coding and express images in succinct, higher-level representation. Rectangles are shown in each layer look like the same size but higher level up features look at the bigger version of the image. In 2nd layer , we can say that some neuron detects an eye which looks like that, similarly, some neuron found ear feature too. And in highest layer, neurons find a way to detect faces.

Theoretical results suggest that in order to learn the kind of complicated functions that can represent high-level abstractions (e.g. in vision, language, and other AI-level tasks), one may need deep architectures. Deep architectures are composed of multiple levels of non-linear operations, such as in neural nets with many hidden layers or in complicated propositional formulae re-using many sub-formulae. Searching the parameter space of deep architectures is a difficult task, but learning algorithms such as those for Deep Belief Networks have recently been proposed to tackle this problem with notable success, beating the state-of-the-art in certain areas.

Conclusion :

Usual Machine learning is simply a curve fitting. It is capable of producing great results but we spent a lot of time in feature discovery. While deep learning is closer to AI. It automatically learns features instead to creating it manually. We might miss some important features while creating them but deep learning tries to learn higher level features by itself.

we can use Theono(Python) for implementation of deep learning.

Theano is a Python library that lets you to define, optimize, and evaluate mathematical expressions, especially ones with multi-dimensional arrays (numpy.ndarray). Using Theano it is possible to attain speeds rivaling hand-crafted C implementations for problems involving large amounts of data. It can also surpass C on a CPU by many orders of magnitude by taking advantage of recent GPUs.

To know more about Theono follow this link – http://deeplearning.net/software/theano/introduction.html

References :

fig 1- http://www.cs.stanford.edu/people/ang//slides/DeepLearning-Mar2013.pptx

fig 2 – http://videolectures.net/deeplearning2015_bengio_theoretical_motivations/

fig 3 – http://www.cs.stanford.edu/people/ang//slides/DeepLearning-Mar2013.pptx