Neural Networks
By:
Dr. Abhinav Sharma
Assistant Professor
UPES
Neural Networks
▪Neural Networks are a computational model that shares some properties with the
animal brain where many simple units(neurons) are working in parallel with no
centralized control unit
▪An Artificial Neural Network (ANN) is an information processing paradigm that is
inspired by the way biological nervous system works
A neural network's main function is to
receive a set of inputs,
perform progressively complex calculations,
and then use the output to solve a problem
Artificial Neural Networks(ANN)
▪The behaviour of an Artificial neural networks is shaped by its network
architecture. A network’s architecture can be defined by:
–number of neurons
–number of layers
–types of connections between layers
Input layer
Hidden layer
Output layer
A neural net can be viewed as the result of
spinning classifiers together in a layered web.
This is because each node in the hidden and
output layers has its own classifier
How it Works?
▪The weights between the units are the primary means of long-term information storage in neural networks
▪Updating the weights is the primary way the neural network learns new information
▪let’s see how this plays out end to end across the entire network
▪Each node has it’s own activation function a set of input and output
Class1
Class2
A set of inputs is passed to the first hidden layer, the activations from that layer are passed to the
next layer and so on, until you reach the output layer
Biological Neuron
Neuron
Dendrite: Receives signals from
other neurons
Cell Body: Sums all the inputs
Axon: It is used to transmit signals
to the other cells
Structure of a Biological Neuron
Cytoarchitectural map of the cerebral cortex
•Dendrites:Ithasirregularsurfaceandreceivessignalsfromneighbouringneurons.
•Soma:Itisthemainbodyoftheneuronwhichaccumulatesthesignalscomingfromthedifferent
dendrites.Itfireswhenasufficientamountofsignalsisaccumulated.
•Axon:Ithassmoothersurface,fewerbranchesandgreaterlength.Itisthelastpartoftheneuron
whichreceivessignalfromsoma,oncetheneuronfiresandpassesitontotheneighbouring
neuronsthroughtheaxonterminals.
Central Nervous System
Interregional circuits
Local circuits
Neurons
Dendritic Trees
Neural Microcircuits
Synapses
Molecules
Structural Organization of Levels in the Brain
A neural microcircuits refers to an assembly
of synapses organized into patterns of
connectivity to produce a functional
operation of interest.
•Artificialneuralnetwork(ANN)isamachinelearningapproachthatmodelshuman
brainandconsistsofanumberofartificialneurons.
•Thebrainisahighlycomplexnon-linearandparallelcomputer.
•NeuroninANNstendtohavefewerconnectionsthanbiologicalneurons.
•EachneuroninANNreceivesanumberofinputs.
•Anactivationfunctionisappliedtotheseinputswhichresultsinactivationlevelof
neuron(outputvalueoftheneuron).
•Knowledgeaboutthelearningtaskisgivenintheformofexamplescalledtraining
examples.
Plasticity permits the developing nervous system to adapt to its surrounding environment.
Initsmostgeneralform,aneuralnetworkisamachinethatisdesigned
tomodelthewayinwhichthebrainperformsaparticulartaskorfunction
ofinterest;thenetworkisusuallyimplementedbyusingelectronic
componentsorissimulatedinsoftwareonadigitalcomputer.
A neural network is a massively parallel distributed processor made up of
simple processing units, which has a natural propensity for storing
experimental knowledge and making it available for use.
Itresemblethebrainintworespects:
(a)Knowledgeisacquiredbythenetworkfromitsenvironmentthrougha
learningprocess.
(b)Interneuronconnectionstrength,knownassynapticweight,areusedto
storetheacquiredknowledge.
(.) Output
y
k
Activation
function
Non Linear Model of a Neuron
w
k1
w
k2
w
km
x
1
x
2
x
m
v
k
Induced field
Summing
junction
Synaptic weights
Bias
b
k
Role of Weights and Bias
▪For a perceptron, there can be one more input called bias
▪While the weightsdetermine the slope of the classifier line, bias allows us to shift the line towards left
or right
▪Normally biasis treated as another weighted input with input value �
0= 1
�
1
�
2
�
�
�
1
�
2
�
�
??????(??????)??????
�
0
�
0
Affine Transformation produced by the presence of a bias
Another Non Linear model of a Neuron
Activation or Transformation Function
▪Activation function translates the inputs into outputs
▪It uses a threshold to produce an output
▪Let’s now take a look at some useful activation functions in neural networks
1.Linear or Identity
2.Unit or Binary Step
3.Sigmoid or Logistic
4.Tanh
5.ReLU
6.Softmax
NETWORK ARCHITECTURE
Therearethreedifferentclassesofnetworkarchitectures:
✓Single-layer feed-forward network
✓Multi-layer feed-forward network
✓Recurrent network
The manner in which the neurons of a neural network are structured is intimately
linked with the learning algorithm used to train the network.
Single Layer Feed-Forward Neural Network
Output layer
of
neurons
Input layer
of
source nodes
Feedforward network with a single layer of neurons.
Inalayeredneuralnetworktheneuronsareorganizedintheformoflayers.Inthesimplestformofa
layerednetwork,wehaveaninputlayerofsourcenodesthatprojectontoanoutputlayersofneurons,
butnotvice-versa.Thisisafeedforwardoracyclicnetwork.
Multi Layer Feed-Forward Neural Network
•MFFNNisamoregeneralnetworkarchitecture,wheretherearehidden
layersbetweeninputandoutputlayers.
•Hiddennodesdonotdirectlyreceiveinputsnorsendoutputstothe
externalenvironment.
•MFFNNsovercomethelimitationofsingle-layerNN.
•Theycanhandlenon-linearlyseparablelearningtasks.
Input
layer
Output
layer
Feedforward network with one hidden layer and one output layer.
Recurrent Neural Network
•Arecurrentneuralnetworkdistinguishesitselffromafeedforwardneuralnetworkinthatithasat
leastonefeedbackloop.
Recurrent neural network with no hidden neurons Recurrent neural network with hidden neurons
Major Aspects in ANN
•The number of layers in the network
•The direction of signal flow
•The number of nodes in each layer
•The value of weights attached with each interconnection between neurons
McCulloch-Pitts Model of a Neuron
•McCulloch-PittsneuronmodelisoneoftheearliestANNmodel,hasonlytwo
typesofinputsexcitatoryandinhibitory.
•Theexcitatoryinputshaveweightsofpositivemagnitudeandtheinhibitory
weightshaveweightsofnegativemagnitude.
•TheinputsoftheMcCulloch-Pittsmodelcouldbeeither0or1.
•Ithasathresholdfunctionasactivationfunctionandtheoutputis1iftheinput
isgreaterthanequaltoagiventhresholdelse0.
•McCulloch-Pittsneuronmodelcanbeusedtodesignlogicaloperations.For
thatpurpose,theconnectionweightsneedtobecorrectlydecidedalongwith
thethresholdfunction.
Situation X
1 X
2 Y
sum Y
out
1 0 0 0 0
2 0 1 1 1
3 1 0 1 1
4 1 1 2 1
X
1
X
2
1
1
Y
sum Y
out
�
���=σ
??????=1
2
�
??????�
??????
�
���=��
���=ቊ
1,�≥1
0,�<1
Perceptron
X1
X2
X3
Xn
W1
W2
W3
Wn
Transfer
Function
Activation
Function
Schematic for a neuron in a neural net
Each neuron has a set of inputs, each of which is given a specific weight. The neuron computes some
function on these weighted inputs and gives the output.
Perceptron
▪The Perceptron is a linear model used for binary classification. It models a neuron
▪It receives ninputs ( corresponding to each feature )
▪It then sums those inputs, applies a transformation and produces an output
▪It has two functions,
•Summation
•Transformation(Activation)
�
1
�
2
�
�
�
1
�
2
�
�
??????(??????)??????
??????= �
�∗�
�
??????=1
�
Summation functionActivation function
Perceptron
▪The Perceptron consists of weights, the summation processor, and an activation function
▪A perceptron takes the weighted sum of inputs ad compares it with a threshold value θ(theta) and
gives the output:
•1 if the sum > θ
•0 otherwise
�
1�
1+�
2�
2+…+�
��
�>θ 1
�
1�
1+�
2�
2+…+�
��
�≤θ 0
Here the inputs �and weights �are real values
�
1
�
2
�
�
�
1
�
2
�
�
??????(??????)??????
Summation functionActivation functionSummation functionActivation function
A neuron consists of a linear combiner followed by a hard limiter (signum
activation function).
The decision boundary, a hyperplane is defined by:
For the perceptron to function properly, the two
classes C1 and C2 must be linearly separable.
Perceptron Example
It is used to classify any linearly separable set of inputs.
Error = 2 Error = 1 Error = 0
Perceptron Example
It can be used to implement Logic Gates.
OR
X1 X2 Y
0 0 0
0 1 1
1 0 1
1 1 1
AND
X1 X2 Y
0 0 0
0 1 0
1 0 0
1 1 1
Perceptron Example
It can be used to implement Logic Gates.
OR
X1 X2 Y
0 0 0
0 1 1
1 0 1
1 1 1
t = 0.5
W
1= 1
W
2= 1
X1 X2
0 0
0 1
10
11
1
0 1
X1
X2
Unit step function
with threshold
value 0.5
W
1and W
2are the weights
Perceptron Example
It can be used to implement Logic Gates.
AND
X1 X2 Y
0 0 0
0 1 0
1 0 0
1 1 1
t = 1.5
W = 1
W = 1
X1 X2
0 0
0 1
1 0
1 1
1
0 1
X1
X2
Unit step function
with threshold
value 1.5
W
1and W
2are the weights
Training a Perceptron
▪By training we are trying to find a line| plane| hyperplanewhich can correctly
separate two classes by adjusting the weights and biases.
▪We train the perceptron to respond to each input vector with a corresponding target
value of 0or 1.
▪Let’s understand the perceptron training process.
Perceptron Learning Algorithm
Initialize the weights and
threshold
1
Wj–initial Weight
Tell initialization of
weights
Perceptron Learning Algorithm
Provide the input and
calculate the output
Initialize the weights and
threshold
1 2
X –Input
Y -Output
Perceptron Learning Algorithm
Provide the input and
calculate the output
Initialize the weights and
threshold
Update the weights Repeat step 2 and 3
1 2 3 4
Wj(t+1) = Wj(t) + n (d-y) x
Wj(t+1) –Updated Weight
Wj(t) –Old Weight
d –Desired Output
y –Actual Output
x -Input
Training Network Weights
▪We can estimate the weight values for our training data using ‘stochastic gradient
descent’ optimizer.
▪Stochastic gradient descent requires two parameters:
▪Learning Rate: Used to limit the amount each weight is corrected each time it is
updated.
▪Epochs: The number of times to run through the training data while updating the
weight.
▪These, along with the training data will be the arguments to the function.
▪Let’s learn Learning Rate and Epoch in detail ahead.
Learning Rate
▪Learning rate is used to control how much we change the biasand the weightsin order to reduce the
error
▪It is used to analyse, how an error will change when we change the values of weights and biases by a
unit
error
weights
Here we can see that with a unit increase in
weights the error decreases by some value,
but after a point, increase in weights
translates to increase in error
Setting up the Learning Rate
▪Learningrate:itisoneofthemostimportanthyperparameter,asit
decideshowquicklyyournetworklearnsorsaychangesit’sweightsto
learnaccordingtonewdata,thisuscalledthelearningprocess.
▪Alsonotethatifyousetthelearningratetoosmallandyourmodel
mighttakeagestoconverge,makeittoolargeandwithininitialfew
trainingexamples,yourlossmightshootuptosky.Generally,alearning
rateof0.01isasafebet.
Epoch
▪One epoch consists ofonefull training cycle on the training set. Once every sample
in the set is seen, you start again -marking the beginning of the 2nd epoch.
▪Preferred value of number of training epochs is set to 1000.
▪Epoch is basically a way to retrain your model on the same data again and again,
however your model improves after each cycle.
Example
▪We want to classify handwritten digits, into digits ( 0 to 9 ) based upon
their characteristics, can this problem be solved using Perceptron ?
▪Answer: No
▪As in this case we have more than one classes to classify into, using just a single perceptron can’t
solve this problem, in such cases perceptron fails
MULTI LAYER PERCEPTRON NEURAL NETWORK
•Multi layer perceptron neural network is an important class of feed forward neural network that consists
of a input layer, hidden layers and an output layer.
•The input signal propagates through the network in a forward direction on a layer by layer basis which is
referred to as a multi layer perceptron neural network.
•It is a generalization of the single layer perceptron.
•Multi layer perceptron have been successfully applied to solve difficult and diverse problems by training in
a supervised manner with a highly popular algorithm known as error back propagation (BP) algorithm.
•BP algorithm is based on the error correction learning rule and may be viewed as the generalization of the
least mean square algorithm.
•BP algorithm consists of two phases forward and backward pass.
•In the forward pass the weights are fixed and in the backward pass the weights are adjusted in accordance
with an error correction rule.
•Error signal is propagated backward through network against the direction of synaptic connection hence
the name error back propagation algorithm.
•The synaptic weights are adjusted to make the actual response of the network move closer to the desired
response in a statistical sense.
Multi-Layer Perceptron
▪Consider a Multi-Layer Perceptron having one input layer, one hidden layer and one output layer
▪Input vector and weights are passed onto the hidden layer
▪Output of the hidden layer is then passed to output layer
▪The activation function used here is Sigmoid Function
�
1
�
2
�
3
�
1
�
2
�
3
�
bias
bias
Input
layer
Hidden
layer
Output
layer
Now let’s see how MLP solves non-linearity problem
How MLP works?
▪As we know MLP contains multiple Perceptron which are grouped together
▪We can represent a trained Perceptron using:
x
y
.(x,y)
�
1
�
2
�
1
�
2
??????(??????)??????
How MLP works?
▪Consider we have a non-linearly separable data as given below
▪A Perceptron can separate the data as:
OR
How MLP works?
▪MLP adds the results of multiple Perceptron to get the desired output
How MLP works?
▪How it works is, it calculates the probability of a point in multiple models and adds
them
P = 0.9
P = 0.7
P = 0.9 + 0.7 = 1.6
Here also the probability is passed through an activation
function to get a value between 0 and 1
P = 0.87
How weights affect the result ?
▪The effect of weights on a neural network is shown below
We can see that weight increases the importance of
nodes. Here the output almost looks like the second node
7
2
How weights affect the result ?
▪Using probability we can see:
Weights and bias are the coefficients of the lines of
each nodes.
7
2
P = 0.9
P = 0.7
P = 2*0.9 + 7*0.7 -�= 1.6 P = 0.6
MLP model
▪The nodes or perceptron can be represented as:
7x-3y = -9
5x-4y=1
x
y
-9
x
y
1
7
-3
5
-4
MLP model
▪We can represent the neural network as:
x
y
-9
x
y
1
7
-3
5
-4
7
2
8
2
7
MLP model
▪By rearranging, we can represent them as:
x -9
y 1
7
-3
5
-4
8
2
7
x
y
-9
x
y
1
7
-3
5
-4
8
2
7
How it Works?
•The weights between the units are the primary means of long-term information storage in neural networks
•Updating the weights is the primary way the neural network learns new information
Class1
Class2
A set of inputs is passed to the first hidden layer, the activations from that layer
are passed to the next layer and so on, until you reach the output layer.
Backpropagation Algorithm
Leads generated from
various sources
Classify the leads on
the basis of priority
Input LayerHidden LayerOutput Layer
Backpropagation
What is Backpropagation?
In order to classify the leads on the basis of priorities, we need to provide the maximum weight to the most
important lead.
For that we can calculate the difference between the actual output and the desired output.
According to that difference we can update the weights.
The Backpropagation algorithm is a supervised learning method for Multilayer Perceptron.
Backpropagation –Learning Algorithm
The most common deep learning algorithm for supervised training of the multi-layer perceptrons is known as
backpropagation. In it, after the weighted sum of inputs and passing through the activation function we
propagate backwards and update the weights to reduce the error (desired output –model output). Consider the
below example:
Input Desired Output
0 0
1 1
2 4
Backpropagation -Example
Input Desired OutputModel Output
(W=3)
0 0 0
1 2 3
2 4 6
Let’s consider the initial value of the weight as 3 and see the model output
Backpropagation -Example
Input Desired
Output
Model Output
(W=3)
Absolute
Error
Square Error
0 0 0 0 0
1 2 3 1 1
2 4 6 2 4
Now, we will see the error (Absolute and Square)
Backpropagation -Example
Input Desired
Output
Model
Output
(W=3)
Absolute
Error
Square
Error
Model
Output
(W=4)
0 0 0 0 0 0
1 2 3 1 1 4
2 4 6 2 4 8
Let’s update the weight value and make it as 4
Backpropagation -Example
Input Desired
Output
Model
Output
(W=3)
Absolute
Error
Square
Error
Model
Output
(W=4)
Square
Error
0 0 0 0 0 0 0
1 2 3 1 1 4 4
2 4 6 2 4 8 16
Still there is error, but we can see that error has increased
Use-Case Implementation Steps
Start
Read the
Dataset
Define features
and labels
Divide the dataset into two
parts for training and testing
TensorFlow data structure
for holding features, labels
etc..
Implement the model
Train the
model
Reduce MSE (actual output
–desired output)
End
Repeat the process
to decrease the loss
Pre-processing of dataset
Make prediction on the test
data
Backpropagation –Learning Algorithm
net h1 = w1*i1 + w2*i2 + b1*1 net h1 = 0.15*0.05 + 0.2*0.1 + 0.35*1 = 0.3775
Net Input For h1:
Output Of h1:
out h1 = 1/1+e
-net h1
1/1+e = 0.593269992
.3775
Output Of h2:
out h2 = 0.596884378
Backpropagation –Learning Algorithm
We repeat this process for the output layer neurons, using the output from the hidden layer neurons as
inputs.
Output For o1:
net o1 = w5*out h1 + w6*out h2 + b2*1 0.4*0.593269992 + 0.45*0.596884378 + 0.6*1 = 1.105905967
Out o1 = 1/1+e
-net o1
1/1+e = 0.75136507
-1.105905967
Output For o2:
Out o2 = 0.772928465
Backpropagation –Learning Algorithm
Error For o1:
E o1= Σ1/2��??????���−������
2
½(0.01–0.75136507)
2
= 0.274811083
Error For o2:
E o2 = 0.023560026
Total Error:
Etotal = E o1 + E o2 0.274811083 + 0.023560026 = 0.298371109
Backpropagation –Learning Algorithm
Update each of the weights in the network so that they cause the actual output to be closer the target output.
Consider w5, we will calculate the change in total error w.r.t w5:
????????????����??????
??????�5
????????????���??????�
??????�5
=
????????????���??????�
??????����1
*
??????����1
??????�??????��1
??????�??????��1
??????�5
*
net o1out o1E totalout h1
1
out h2
w5
w6
b2
Backpropagation –Learning Algorithm
How much does the total error change with respect to the output?
Etotal = 1/2(targeto1–outo1)
2
+1/2(targeto2−outo2)
2
????????????���??????�
??????����1
= -(target o1 –out o1) = -(0.01 –0.75136507) = 0.74136507
How much does the output o1 change w.r.t its total net input?
out o1 = 1/1+e
−�??????��1
??????����1
??????�??????��1
= out o1 (1 -out o1) = 0.75136507 (1 –0.75136507) = 0.186815602
Backpropagation –Learning Algorithm
How much does the total net input of o1 changes w.r.t w5?
net o1 = w5 * out h1 + w6 * out h2 + b2 * 1
??????�??????��1
??????�5
= 1 * out h1 �5
(1−1)
+ 0 + 0 = 0.593269992
Putting all these values together
????????????���??????�
??????�5
=
????????????���??????�
??????����1
*
??????����1
??????�??????��1
??????�??????��1
??????�5
* 0.082167041
Backpropagation –Learning Algorithm
Decrease The Error:
w5
+
= w5 –n
w5
+
= 0.4 –0.5 * 0.082167041
????????????����??????
??????�5
Similarly, we can calculate the other weights as well.
Updated w5 0.35891648
What is Tensorflow?
❑Tensors are the standard way of representing data in deep learning.
❑Tensors are just multidimensional arrays, an extension of two-dimensional tables (matrices) to data with
higher dimension.
Tensor of
dimensions[6]
Tensor of
dimensions[6,4]
Tensor of
dimensions[6,4,2]
What is Tensorflow?
In Tensorflow, computation is approached as a dataflow graph
Tensor Flow
3.2 -1.4 5.1 …
-1.0 -2 2.4 …
… … … …
… … … …
Matmul
W X
Add
Relu
B
TensorFlowCode-Basics
TensorFlowcore programs consists of two discrete sections:
Building a computational graph Running a computational graph
Acomputational graphis a series of TensorFlow
operations arranged into a graph of nodes
TensorFlow Building and Running a Graph
Building a computational graph Running a computational graph
import tensorflowas tf
node1 = tf.constant(3.0, tf.float32)
node2 = tf.constant(4.0)
print(node1, node2)
Constant nodes
sess= tf.Session()
print(sess.run([node1, node2]))
To actually evaluate the nodes, we must run
the computational graph within asession.
As the session encapsulates the control and
state of the TensorFlowruntime.
TensorflowExample
a
5.0
Constimport tensorflowas tf
# Build a graph
a = tf.constant(5.0)
b = tf.constant(6.0)
c = a * b
# Launch the graph in a session
sess= tf.Session()
# Evaluate the tensor 'C'
print(sess.run(c))
Computational Graph
TensorflowExample
a
b
5.0
6.0
Const
Constimport tensorflowas tf
# Build a graph
a = tf.constant(5.0)
b = tf.constant(6.0)
c = a * b
# Launch the graph in a session
sess= tf.Session()
# Evaluate the tensor 'C'
print(sess.run(c))
Computational Graph
TensorflowExample
a
b c
5.0
6.0
Const
Mul
Constimport tensorflowas tf
# Build a graph
a = tf.constant(5.0)
b = tf.constant(6.0)
c = a * b
# Launch the graph in a session
sess= tf.Session()
# Evaluate the tensor 'C'
print(sess.run(c))
Computational Graph
TensorflowExample
a
b c
5.0
6.0
Const
Mul
30.0
Constimport tensorflowas tf
# Build a graph
a = tf.constant(5.0)
b = tf.constant(6.0)
c = a * b
# Launch the graph in a session
sess= tf.Session()
# Evaluate the tensor 'C'
print(sess.run(c))
Running The Computational Graph
Graph Visualization
❑For visualizing TensorFlowgraphs, we use TensorBoard.
❑The first argument when creating the FileWriteris an output directory name, which will be
created if it doesn't exist.
File_writer= tf.summary.FileWriter('log_simple_graph', sess.graph)
TensorBoardruns as a local web app, on port 6006.
(this is default port, “6006” is “ ” upside-
down.)
oo
tensorboard--logdir= “path_to_the_graph”
Execute this command in the cmd
Constant
Onetypeofanodeisaconstant.Ittakesnoinputs,anditoutputsa
valueitstoresinternally.
import tensorflowas tf
node1 = tf.constant(3.0, tf.float32)
node2 = tf.constant(4.0)
print(node1, node2)
Constant nodes
Constant
Placeholder
Variable
What if I want the
graph to accept
external inputs?
Placeholder
Constant
Placeholder
Variable
A graph can be parameterized to accept external inputs, known asplaceholders.
Aplaceholderis a promise to provide a value later.
Placeholder
Constant
Placeholder
Variable
A graph can be parameterized to accept external inputs, known asplaceholders.
Aplaceholderis a promise to provide a value later.
How to modify the
graph, if I want new
output for the same
input ?
Variable
Constant
Placeholder
Variable
To make the model trainable, we need to be able to modify the graph to
get new outputs with the same input.Variablesallowus to add trainable
parameters to a graph
Simple Linear Model
import tensorflowas tf
W = tf.Variable([.3], tf.float32)
b = tf.Variable([-.3], tf.float32)
x = tf.placeholder(tf.float32)
linear_model= W * x + b
init= tf.global_variables_initializer()
sess= tf.Session()
sess.run(init)
print(sess.run(linear_model, {x:[1,2,3,4]}))
We've created a model, but we don't
know how good it is yet
How to Increase the Efficiency of the Model?
Calculate the loss
Model
Update the Variables
Repeat the process until the loss becomes very small
A loss function measures how
far apart the current model is
from the provided data.
Calculating the Loss
In order to understand how good the Model is, we should know the loss/error.
To evaluate the model on training data, we need ayi.e. a
placeholder to provide the desired values, and we need to
write a loss function.
We'll use a standard loss model for linear regression.
(linear_model–y ) creates a vector where each element is
the corresponding example's error delta.
tf.squareis used to square that error.
tf.reduce_sumis used to sum all the squared error.
y = tf.placeholder(tf.float32)
squared_deltas= tf.square(linear_model-y)
loss = tf.reduce_sum(squared_deltas)
print(sess.run(loss, {x:[1,2,3,4], y:[0,-1,-2,-3]}))
Reducing the Loss
Optimizer modifies each variable according to the magnitude of the derivative of loss with
respect to that variable. Here we will use Gradient Descent Optimizer
How Gradient Descent Actually
Works?
Let’s understand this
with an analogy
Batch Gradient Descent
The weights are updated
incrementally after each
epoch. The cost functionJ(⋅),
the sum of squared errors
(SSE), can be written as:
Batch Gradient Descent
The weights are updated
incrementally after each
epoch. The cost functionJ(⋅),
the sum of squared errors
(SSE), can be written as:
The magnitude and direction
of the weight update is
computed by taking a step in
the opposite direction of the
cost gradient
Batch Gradient Descent
The weights are updated
incrementally after each
epoch. The cost functionJ(⋅),
the sum of squared errors
(SSE), can be written as:
The magnitude and direction
of the weight update is
computed by taking a step in
the opposite direction of the
cost gradient
The weights are then
updated after each epoch via
the following update rule:
Batch Gradient Descent
The weights are updated
incrementally after each
epoch. The cost functionJ(⋅),
the sum of squared errors
(SSE), can be written as:
The magnitude and
direction of the weight
update is computed by
taking a step in the opposite
direction of the cost
gradient
The weights are then
updated after each epoch via
the following update rule:
Here, Δwis a vector that
contains the weight
updates of each weight
coefficientw, which are
computed as follows:
Reducing the Loss
Suppose, we want to find the best parameters (W) for our learning algorithm. We can apply the
same analogy and find the best possible values for that parameter. Consider the example
below:
optimizer = tf.train.GradientDescentOptimizer(0.01)
train = optimizer.minimize(loss)
sess.run(init)
for iin range(1000):
sess.run(train, {x:[1,2,3,4], y:[0,-1,-2,-3]})
print(sess.run([W, b]))
Use Case –1 Sonar Dataset
Sonar
Output
Mine Rock
Predict whether there it is a Mine or a Rock
using the ship’s Sonar. Analyst
Use Case –1 Sonar Dataset
•Sonar output gives us 60 different energy levels between zero to one.
•Using these energy levels we need to predict whether it’s a mine or a rock.
•Below is a part of the dataset/Sonar output (this is what it looks like) :
These features are obtained by
bouncing sonar signals off a metal
cylinder at various angles and under
various conditions.
Use Case –1 Sonar Dataset
Multiple
Inputs
For a human to process so many
inputs is impossible
Multiple
Inputs
A machine that uses deep neural
networks can do this task for us
Use Case –1 Sonar Dataset
Note: This problem cannot be solved by traditional machine learning algorithms, because of the following
reasons:
❑Large number of inputs.
❑Non-linear data points.
❑High-level hypothesis between input and output.
❑It is not a Logistic Regression problem.
Importing the libraries
•Let’s import the necessary libraries using the code below:
•Define the below function to load the sonar data set
Reading the Sonar dataset
Features and Labels
▪For each dataset we have two parts, features and labels. Features are the independent variables,
labels are the dependent variables.
▪In the Sonar data set let’s look at the feature and labels
Features
Labels
Defining Feature and Labels
▪According to the Sonar data set we have 59 features and 1 label.
▪So for data set X ( sonar ) having shape 207X61, X.
•Our target variable is in string i.e. M or R, we need to convert it to binary
numerical value 1 or 0,
•To achieve this we’ll use label encoder, as defined in the below:
Encoding the Target Variable
One hot encoding is used in multinomial regression or a multi-class classification problem, where the data needs to
be categorized into more than two categories.
In such cases we extend the concept of output being [1/0] into a set of output “bits” where only one of them turns
on (i.e. has the value of 1) while the rest are turned off (value of 0). This represents that the particular input vector
belongs to the category for which the bit was turned on.
One Hot Encoding example
•If we have three independent variables such as number of rooms, sea
facing or not, area size of the flat output being whether to buy it or not.
•Before applying one hot encoding on independent variables it is a good
practice to drop one column before applying this way we reduce the
complexity, but this not followed in case of dependent variable such as
output.
Normalizing Data Set
▪Normalizationusually means to scale a variable to have a values between 0
and 1. Certain algorithms work better on normalized data sets, that is why
normalizing the data is a good practice most people follow.
▪To normalize the dataset use the following code
•To plot the values in the data set, we can use the below function:
Plotting the data
•To split the data set into training and testing set, we use the below code:
•We can check the dimensions by printing the shape of data frames using
the code below:
Creating Training and Test set
Training Model
▪Lets train the model now
Prediction and Accuracy
▪After the training the model lets look at the prediction and accuracy
Plotting Loss Function
▪Code for plotting the loss as a function of number of epochs is as shown below:
Calculating the MSE
▪We can calculate the Mean Squared Error using the below function
Plotting change in MSE
▪To view the change in MSE after each epoch, we will make a tensor which
stores the MSE history and then plot it, use the code as shown below
Perceptron -Drawbacks
▪The Perceptron was very good at classifying the data by plotting a separating
linear line
▪If the cases are not linearly separable, the learning process of perceptron will
never reach a point where all points are classified properly
▪One example for non linearly separable cases is theXOR problem
For Solving this problem, a multilayer perceptron with backpropagation can be used
Perceptron -Drawbacks
▪A single perceptron won’t be able to solve complex problems such as image classification.
▪In such kind of problems the dimensionality and complexity of the classification problem is
to high.
▪Let’s move ahead and see such a use case.
E-commerce Use Case
▪To understand things ahead, let’s take an example:
▪As an E-commerce firm, you have noticed a decline in your sales, you try to
form a marketing team who would market the products, for increasing the
sales.
▪The marketing team can market your product through various ways, such as:
•Google Ads
•Personal emails
•Sale advertisement on relevant sites
•Reference program
•Blogs and so on . . .
E-commerce Use Case
Considering all the factors and options available we have to decide a strategy to
do optimal and efficient marketing but this task is too complex for a human to
analyse, because number of parameters are quite high. This problem will have to
be solved using Deep Learning.
Marketing
Emails
Paid
Referral
Program
Organic
Direct
Google
LinkedIn
Facebook
Instagram
Twitter
1.Search ads
2.Remarketing
ads
3.Interested ads
4.Look alike ads
1.Customer Acquisition
cost
2.Money spent
3.Click rate/ Traffic
generated
4.Leads generated
5.Customers Generated
6.Time taken to become
a customer
Let’s see how Deep Learning works.
E-commerce Use Case
We can either use just one means to market our products or use a variety of
them.
Each way would have different advantages and disadvantages as well, we will
have to focus on variety of factors and options such as:
Category Sub -Category Type Parameters to
consider
Marketing
Emails
Paid
Referral
Program
Organic
Direct
Google
LinkedIn
Facebook
Instagram
Twitter
1.Search ads
2.Remarketing
ads
3.Interested ads
4.Look alike ads
1.Customer Acquisition
cost
2.Money spent
3.Click rate/ Traffic
generated
4.Leads generated
5.Customers Generated
6.Time taken to become
a customer
Using Single Perceptron
▪Number of sales that would happen would be dependent on different
categorical inputs, their sub categories and their parameters, however
computing and calculating from so many inputs and their sub parameters is
not possible just through one neuron (Perceptron).
▪That is why more than one neuron would be used to solve this problem
More number of inputs can be
handled, with more number of
perceptron. Let’s see multi-perceptron
representation.
Using Multiple Neurons to Represent the Problem
▪Every source behave as an input to the Neural Network
▪A Neural Network is a network of one or more Neurons
▪Let’s understand neural network in detail
1 0
EmailsReferral
programs
Paid AdsDirectSocial
media
Organic
Search
Using Multiple Neurons to Represent the Problem
▪Once all the sources are fed into the system, neural network calculates the output after computation.
Output
Input
1 0
EmailsReferral
programs
Paid AdsDirectSocial
media
Organic
Search
Using Multiple Neurons to Represent the Problem
▪Every source has certain weightage, based on which their might be some
change to the final output.
▪To get the final output as required, we might change the weightage given to
different sources or say inputs.
▪Let’s see how that works
Output
Input
Paid Ads Organic
Search
1 0
EmailsReferral
programs
Paid AdsDirectSocial
media
Organic
Search
Using Multiple Neurons to Represent the Problem
▪Each layer might have certain sub parts for example in case of ‘Paid Ads’ there might be sub parts such
as Paid social ads, Paid Search engine ads, Paid YouTube ads and so on.
▪Based upon which sub part the user is coming through, might change the output.
Output
Input
Paid Ads Organic
Search
1 0
EmailsReferral
programs
Paid AdsDirectSocial
media
Organic
Search
Using Multiple Neurons to Represent the Problem
▪The final output might be resultant of all the parameters and input values, as depicted by the network.
Output
Input
Paid Ads Organic
Search
1 0
EmailsReferral
programs
Paid AdsDirectSocial
media
Organic
Search
MLP creation in TensorFlow
▪To solve the handwritten digit classification problem, let’s create a MLP
▪First let’s setup data and TensorFlow using the code below:
Setting up the necessary parameters
▪Let’s set up some necessary parameters as shown below:
▪Learning rate: it is one of the most important hyperparameter, as it decides
how quickly your network learns
▪Epochs: As we know consists ofonefull training cycle on the training set.
Once every sample in the set is seen, you start again -marking the beginning
of the 2nd epoch.
▪batch size:the number of training examples in one forward/backward pass.
▪Display step: To display log after each epoch or not. ( 1 for yes and 0 for no )
Setting up Placeholders and Network Parameters
▪Let’s create some placeholders, use the following code
▪Let’s create some network parameters as well using the following code
Creating the MLP model
▪Let’s create a MLP model
▪Notice that we haven’t defined ‘weights’ and ‘biases’ yet, let’s define
them ahead
Defining Weights and Biases
▪First let’s define ‘weights’ and ‘biases’ using following code:
Constructing the Model and Optimizer
▪To construct the model and optimizer we’ll use the code as shown below:
Constructing the Training Cycle
▪To create the training cycle we’ll use the code as shown below, training cycle is used
to train the model using Backpropagation. We will learn Backpropagation algorithm
in the upcoming sessions.
Using Optimizer to Minimize Error
We’ll optimize our weights to minimize the error, after each epoch the model will learn and print the cost
Validating our Model
▪To validate our model and calculate it’s accuracy we’ll use the following code:
Accuracy plot
▪Let’s plot ‘Accuracy’ to see how ‘Accuracy’ changes after each epoch
▪Use the below code to plot the ‘Accuracy’ graph
Cost History plot
▪Let’s plot ‘Cost History’ to see how ‘Cost History’ changes after each epoch
▪Use the below code to plot the ‘Cost History’ graph
Complete Code for MNIST digit Classification
▪Let’s run the complete code for MNIST digit Classification
▪To download the code file use the link : MNIST code
Visualizing Graph Using TensorBoard
▪For visualizing graphs in TensorFlow, we use TensorBoard.
▪We can write the output with a FileWriter.
▪The first argument when creating the FileWriter is an output directory name, which will be created if
it doesn't exist.
TensorBoard
•Let’s start by setting up the Tensorboard and integrating it with the
Tensorflow
•To do that first we will, write a simple program to write Tensorflow
summaries which are essentially logs and in order to write logs we need
summary writer.
•Summary writer can be attached using the following code:
•This will create a log folder with that name, if it doesn’t exist and save the
graph structure.
•We can now start Tensorboard.
•Now let’s start the Tensorboard, go to the command prompt (cmd) and
enter the following code:
•tensorboard –logdir=“D:/graph1”
•You will see Tensorboard starting message as shown below:
•The address shown there might be different, just go and paste it in your
browser to view the Tensorboard.
Starting the TensorBoard
•Solution:
•TensorBoard has some unpredicted behaviour while visualizing the data.
•Just in case you are getting a blank graph, try
1.Changing the folder name and directory.
2.Try keeping the python file and graph file in the same folder.
3.Make sure you are giving the correct path and check if the log files are
being created after successfully running the python program. (Re-run
the python program again, if no file is created.)
•
Getting a Blank Graph Problem
•The graph tab in tensorflow shows the computational graph, you created
•For example for the following code, the graph created is as follows:
The Graphs Tab
Graph output
•To reload the Tensorflow graph, that is visualize the updated one or see
the changes, you need follow the below steps:
1.Stop the running code and the TensorBoard.
2.Delete the Tensorflow logdir, or change the graph name, or change the
graph directory
3.Restart the code and launch TensorBoard.
Reloading the Graph
Why Deep Networks
We have learned about Neural Networks up till now, let’s move a step ahead
and see what Deep Networks are:
▪Suppose we have important financial data such as Stock Market Data
▪There are two important factors during analysis of such datasets:
1.Accuracy matters a lot
2.The data is dependent on variety of factors and computations. There are various parameters
that are needed to be considered and therefore, requires more number of neurons
▪To add more number of neurons more number of hidden layers are required.
Why Deep Networks give better accuracy?
▪Adding more number of neurons leads to more number of computations
hence the accuracy increases.
▪Each neuron adds some non linearity through the activation function, non
linearity is required to classify the data better.
▪Deep networks have more number of neurons, more number of neurons also
facilitate better weight setting using backpropagation, more neurons add
more weights to adjust. Using this Deep networks are able to learn or say
adjust their weight better, giving greater precision output.
Accuracy on the sonar dataset earlier was:
Use Case: SONAR Data Classification
Using Deep Network Model
Sonar
Output
Mine Rock
Predict whether there it is a Mine or a
Rock using the ship’s Sonar.
Analyst
Use Case: SONAR Data Classification Using Deep Networks
•Sonar output gives us 60 different energy levels between zero to one.
•Using these energy levels we need to predict whether it’s a mine or a rock.
•Below is a part of the dataset/Sonar output (this is what it looks like):
Use Case: SONAR Data Classification Using Deep Network
Multiple
Inputs
For a human to process so
many inputs is impossible
Multiple
Inputs
A machine that uses deep neural
networks can do this task for us
Note: Traditional machine learning algorithms are not effective on this problem, because of high number of inputs.
Use Case: SONAR Data Classification Using Deep Network
▪We will begin by importing the required libraries
import matplotlib.pyplotas plt
import tensorflowas tf
import numpyas np
import pandas as pd
from sklearn.preprocessingimport LabelEncoder
from sklearn.utilsimport shuffle
from sklearn.model_selectionimport train_test_split
Use Case: SONAR Data Classification Using Deep Networks
▪Create function to read the dataset and segregate the independent and dependent variables
▪The categorical values in our independent variable (string value) is converted into labels using
LabelEncoder()
def read_dataset():
df= pd.read_csv("sonar.csv")
print(len(df.columns))
X = df[df.columns[1:60]].values
y=df[df.columns[60]]
encoder = LabelEncoder()
encoder.fit(y)
y = encoder.transform(y)
Y = one_hot_encode(y)
print(X.shape)
return(X,Y,y)
Use Case: SONAR Data Classification Using Deep Networks
▪Create function for using OneHotEncoderon created labels
def one_hot_encode(labels):
n_labels= len(labels)
n_unique_labels= len(np.unique(labels))
one_hot_encode= np.zeros((n_labels,n_unique_labels))
one_hot_encode[np.arange(n_labels), labels] = 1
return one_hot_encode
Use Case: SONAR Data Classification Using Deep Networks
▪Plot data points based on labels using Blue color for label 0 data points (normal) and Red color for label 1
data points (outliers)
def plot_points(features,labels):
normal = np.where(labels == 0)
outliers = np.where(labels == 1)
fig = plt.figure(figsize=(10,8))
plt.plot(features[normal ,0],features[normal ,1],'bx')
plt.plot(features[outliers,0],features[outliers ,1],'ro')
plt.xlabel('Latency (ms)')
plt.ylabel('Throughput (mb/s)')
plt.show()
Use Case: SONAR Data Classification Using Deep Networks
▪Read the dataset and plot the datapoints
X,Y,y= read_dataset()
plot_points(X,y)
Use Case: SONAR Data Classification Using Deep Networks
X,Y = shuffle(X,Y,random_state=1)
train_x,test_x,train_y,test_y= train_test_split(X,Y,test_size=0.20, random_state= 415)
print(train_x.shape)
print(train_y.shape)
print(test_x.shape)
▪Split the dataset into two subsets such that 20% of the dataset will be used for test subset and the
remaining will be used for training the model.
▪Definelearningrate,totalnumberofepochsanddimension
fortheplaceholders
▪Createplaceholdersforx,Wandb
Use Case: SONAR Data Classification Using Deep Networks
learning_rate= 0.1
training_epochs= 2500
cost_history= np.empty(shape=[1],dtype=float)
n_dim= X.shape[1]
n_class= 2
n_hidden_1 = 60
n_hidden_2 = 60
n_hidden_3 = 60
n_hidden_4 = 60
x = tf.placeholder(tf.float32,[None,n_dim])
W = tf.Variable(tf.zeros([n_dim,n_class]))
b = tf.Variable(tf.zeros([n_class]))
▪Assignthefunctiontoinitializeallthevariabletoinit
Use Case: SONAR Data Classification Using Deep Networks
init= tf.global_variables_initializer()
y_ = tf.placeholder(tf.float32,[None,n_class])
y = tf.nn.softmax(tf.matmul(x, W)+ b)
cost_function= tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=y,labels=y_))
training_step= tf.train.GradientDescentOptimizer(learning_rate).minimize(cost_function)
▪Createplaceholdersforpredictedyandactualy.
▪CalculatecostandoptimizingthecostusingGradientDescentOptimizer
sess= tf.Session()
sess.run(init)
▪Evaluatethenodeinitwithinthesessiontoinitializeallthevariable
▪Trainthemodelinthesuccessiveepochswhilereducingtheerrororcost
Use Case: SONAR Data Classification Using Deep Networks
for epoch in range(training_epochs):
sess.run(training_step,feed_dict={x:train_x,y_:train_y})
cost_history= np.append(cost_history,sess.run(cost_function,feed_dict={x: train_x,y_: train_y}))
correct_prediction= tf.equal(tf.argmax(y,1), tf.argmax(y_,1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
print("Accuracy: ",(sess.run(accuracy, feed_dict={x: test_x, y_: test_y})))
▪Calculateaccuracyofthemodelbasedoncorrectpredictions
Recall accuracy using single perceptron was
▪Plotthecostinsuccessiveepochsusingcosthistory
Use Case: SONAR Data Classification Using Deep Networks
plt.plot(range(len(cost_history)),cost_history)
plt.axis([0,training_epochs,0,np.max(cost_history)])
plt.show()
pred_y= sess.run(y, feed_dict={x: test_x})
mse= tf.reduce_mean(tf.square(pred_y-test_y))
print("MSE: %.4f" % sess.run(mse))
▪CalculatetheoverallMSE(MeanSquaredError)