Classes of Deep Learning Networks There are three basic types on Deep Networks: Deep Networks for unsupervised or generative learning. Capture high order correlations of the data (no class labels) Deep Networks for Supervised Learning Model the posterior distribution of the target variable for classification purposes (Discriminative Deep Networks). Hybrid Deep Networks Combine the methods above.
Deep Networks for Unsupervised Learning There are no class labels during the learning process. There are many types of generative or unsupervised deep networks. Energy-based deep networks are very popular. Example: Deep Auto Encoder.
Auto Encoder Deep Learning Auto Encoder
Deep Networks for Unsupervised Learning Auto Encoder No. of output features = No input features Intermediate nodes encode the original data. Decoder Encoder
Deep Networks for Unsupervised Learning Auto Encoder Network is trained using backpropagation. Output nodes replicate the input nodes. Originally used to re-construct noisy signals. First layer of weights “encode” the signal. Second layer of weights “decode” the signal The intermediate layer contains a “new” feature representation.
Deep Networks for Unsupervised Learning Auto Encoder New feature representation
“Deep” Auto Encoder Using a Step-Wise Mechanism Key idea: Pre-train each layer as an auto-encoder. Deep Networks for Unsupervised Learning
An Example of Deep Learning Learn a “concept” (sedimentary rocks) from many images until a high-level representation is achieved.
An Example of Deep Learning Learn a hierarchy of abstract concepts using deep learning. Local properties Global properties Deep Learning
Deep Networks for Unsupervised Learning Deep Autoencoders to Extract Speech Features
Deep Learning for Supervised Learning Convolutional Neural Networks Primarily used for image analysis. Inspired by animal visual cortex Neurons respond to input from near neurons in what is called the receptive field.
Deep Learning for Supervised Learning Design of a Convolutional Neural Network CNN A CNN has input, output and hidden units. Hidden units can be of 3 types: Convolutional Pooling Fully Connected output input Convolutional hidden Pooling Fully Connected
Deep Learning for Supervised Learning Design of a Convolutional Neural Network CNN Why are CNNs important when dealing with images? Assume an image of size 500x500 pixels and each pixel can have 3 color channels. 1 pixel w/ 3 color channels 500 pixels 500 pixels A fully connected NN needs a hidden layer with 500x500x3 weights = 750,000 weights!
Deep Learning for Supervised Learning Design of a Convolutional Neural Network CNN Why are CNNs important when dealing with images? Instead a CNN with have local set of weights only. Each neuron will be connected to a few close by neurons only (idea of receptive field)
Deep Learning for Supervised Learning Design of a Convolutional Neural Network CNN First difference with traditional NN is that neurons are arranged in 3 dimensions.
Deep Learning for Supervised Learning Design of a Convolutional Neural Network CNN
Deep Learning for Supervised Learning Local Weight Update Implies a sparse representation Convolutional Neural Networks
Deep Learning for Supervised Learning Convolution Operation Convolutional Neural Networks We need to learn the kernel K and share those parameters across the entire image.
Deep Learning for Supervised Learning Vertical and horizontal filters: Convolutional Neural Networks
Deep Learning for Supervised Learning Convolutional Neural Networks
Deep Learning for Supervised Learning Convolution Operation Convolutional Neural Networks We can learn rotated objects using convolution.
Deep Learning for Supervised Learning Design of a Convolutional Neural Network CNN Layers alternate between convolutional layers and pooling layers:
Deep Learning for Supervised Learning Design of a Convolutional Neural Network CNN Pooling aggressively reduces the dimensionality of the feature space. The idea is as follows: We partition the image into a set of non-overlapping rectangles. For each region we simply output the maximum value of that region (set of pixels). This is called “Max Pooling”.
Deep Learning for Supervised Learning Design of a Convolutional Neural Network CNN Full convolutional neural network : Apply convolution, pooling (or subsampling) iteratively. Finally apply fully connected neural network:
Introduction To Recurrent Neural Networks RNN were introduced in the late 80’s. Hochreiter discovers the ‘vanishing gradients’ problem in 1991. Long Short Term Memory published in 1997. LSTM a recurrent network to overcome these problems. 26
Motivation Feed forward networks accept a fixed-sized vector as input and produce a fixed-sized vector as output fixed amount of computational steps recurrent nets allow us to operate over sequences of vectors 27
RNN Forward Pass The network input at time t: The activation of the input at time t: The network input to the output unit at time t: The output of the network at time t is: 31
RNN Architecture If a network training sequence starts at time and ends at , the total loss function is the sum over time of the square error function 32
RNN Architecture The recurrent network can be converted into a feed forward network by unfolding over time 33
Back Propagation Through Time BPTT learning algorithm is an extension of standard backpropagation that performs gradients descent on an unfolded network. The gradient descent weight updates have contributions from each time step. The errors have to be back-propagated through time as well as through the network 34
RNN Backward Pass For recurrent networks, the loss function depends on the activation of the hidden layer through its influence on the output layer and through its influence on the hidden layer at the next step. , is the H adamard product 35
RNN Backward Pass For recurrent networks, the loss function depends on the activation of the hidden layer through its influence on the output layer and through its influence on the hidden layer at the next step. , is the H adamard product 36
RNN Backward Pass For recurrent networks, the loss function depends on the activation of the hidden layer through its influence on the output layer and through its influence on the hidden layer at the next step. , is the H adamard product 37
RNN Backward Pass We sum over the whole sequence to get the derivatives w.r.t the network weights: 38
RNN Backward Pass We sum over the whole sequence to get the derivatives w.r.t the network weights: ; ; 39
RNN Backward Pass We sum over the whole sequence to get the derivatives w.r.t the network weights: ; ; Updating the weights: 40