Module1 (2).pptxvgybhunjimko,l.vgbyhnjmk;

vallepubalaji66 16 views 84 slides Mar 01, 2025
Slide 1
Slide 1 of 84
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27
Slide 28
28
Slide 29
29
Slide 30
30
Slide 31
31
Slide 32
32
Slide 33
33
Slide 34
34
Slide 35
35
Slide 36
36
Slide 37
37
Slide 38
38
Slide 39
39
Slide 40
40
Slide 41
41
Slide 42
42
Slide 43
43
Slide 44
44
Slide 45
45
Slide 46
46
Slide 47
47
Slide 48
48
Slide 49
49
Slide 50
50
Slide 51
51
Slide 52
52
Slide 53
53
Slide 54
54
Slide 55
55
Slide 56
56
Slide 57
57
Slide 58
58
Slide 59
59
Slide 60
60
Slide 61
61
Slide 62
62
Slide 63
63
Slide 64
64
Slide 65
65
Slide 66
66
Slide 67
67
Slide 68
68
Slide 69
69
Slide 70
70
Slide 71
71
Slide 72
72
Slide 73
73
Slide 74
74
Slide 75
75
Slide 76
76
Slide 77
77
Slide 78
78
Slide 79
79
Slide 80
80
Slide 81
81
Slide 82
82
Slide 83
83
Slide 84
84

About This Presentation

bnm


Slide Content

CSEN3011 ARTIFICIAL NEURAL NETWORKS O. V. Ramana Murthy

Course Outcome 1 2 Understand the origin, ideological basics, Learning process and various Neural Architectures of ANN.

Contents 3 Artificial Neuron Neural Network Training with Back propagation Practical issues Common Neural Architectures

Reference 4 Chapter 1, 2 Charu C. Aggarwal, “Neural Networks and Deep Learning”, Springer International Publishing AG Deep Learning- Charu Aggarwal https://www.youtube.com/playlist?list=PLLo1RD8Vbbb_6gCyqxG_qzCLOj9EKubw7

Neuron 5

Perceptron 6 The simplest neural network is referred to as the perceptron. This neural network contains a single input layer and an output node.

Artificial Neuron 7   w 2 x n x 1 w n w 1  1 f act   b y x 2

Numerical Example 1 8 Calculate the output assuming Binary step activation function 0.6 0.2 0.7 0.3 1 f act   0.45 y= 0.93  

Artificial Neuron model 9 Sigmoidal function =1 is Binary sigmoidal function   x 2 x 1 w 2 w 1 y

ACTIVATION FUNCTIONS 10 Identity/linear (B) Bipolar Binary step (C) Bipolar sigmoidal step   Source [2]

Numerical Example 2 11 Calculate the output assuming binary sigmoidal activation function 0.6 0.2 0.7 0.3 1 f act   0.45 y= 0.93 =1 is Binary sigmoidal function  

Pre- and Post-Activation Values 12

Linear Separability 13

Linearly Separable – AND gate 14 x 1 (input) x 2 (input) Y (output) 1 1 1 1 1

Linearly Separable – AND gate 15 Two input sources => Two input neurons One output => One output Neuron. Activation function is binary sigmoidal Derivative  

Linearly Separable – AND gate 16 Y   x 2 x 1 1 w 2 w 1 w  f (.)

Back-propagation training/algorithm 17 Given: Input vector i th instant , Target . Initialize weights w , w 1 , w 2 and learning rate with some random values in the range [0 1] Output Activation function sigmoidal activation function Compute error: Backpropagate the error to crossing activation function where is the derivative of activation function selected. for sigmoidal activation function  

Back-propagation training/algorithm 18 Compute change in weights and bias , , Update the changes in weights and bias Keep repeating the steps 1 – 6, for all input combinations ( 4 nos ). This is one epoch. Run multiple Epochs till the error decreases and stabilizes.  

Matrix Notation – AND gate 19 Forward pass After activation function Loss function  

Matrix Notation – AND gate 20 Backpropagation Update Weights: Update using This iterative process continues until convergence  

(4 Rules)Backpropagating Error 21 Y    f (.) x i    f (.) w i 1. Output Neuron   2. Across Link         3. Weights Update  

w 2 (4 Rules)Backpropagating Error 22 x i Type equation here.    f (.) w 1 4. Across Link ( >1 hidden layer )          f (.)     w n    f (.)  

The power of nonlinear activation functions in transforming a data set to linear separability 23

Linearly not Separable – XOR gate 24 x 1 (input) x 2 (input) Y (output) 1 1 1 1 1 1

Linearly not Separable – XOR gate 25 Two input sources => Two input neurons One output => One output Neuron. One hidden layer => 2 neurons Activation function is binary sigmoidal Derivative  

Linearly not Separable – XOR gate 26

w 2 w 1 v 21 v 12 27 x 2 x 1 v 11 Y 1 1 v 22 v 01 v 02 Z 2 Z 1 w Input layer Hidden layer Output layer

Back-propagation Training 28 Given: Inputs , target . Initialize weights and learning rate with some random values Hidden unit , j = 1 to p hidden neurons output , sigmoidal activation function Output unit Output sigmoidal activation function   Feed-forward Phase

Back-propagation Training 29 Compute error correction term where is derivative Compute change in weights and bias , send to previous layer Hidden unit Calculate error term Compute change in weights and bias ,   Back-propagation of error Phase

Back-propagation Training 30 Each output unit, k = 1 to m update weights and bias Each hidden unit, j = 1 to p update weights and bias Check for stopping criterion e.g. certain number of epochs or when targets are equal/close to network outputs   Weights and Bias update phase

w 2 w 1 v 21 v 12 31 x 2 x 1 v 11 Y 1 1 v 22 v 01 v 02 Z 2 Z 1 w     Hidden neuron input computation

w 2 w 1 v 21 v 12 32 x 2 x 1 v 11 Y 1 1 v 22 v 01 v 02 Z 2 Z 1 w         Hidden neuron output computation

w 2 w 1 v 21 v 12 33 x 2 x 1 v 11 Y 1 1 v 22 v 01 v 02 Z 2 Z 1 w       Output neuron input computation

w 2 w 1 v 21 v 12 34 x 2 x 1 v 11 Y 1 1 v 22 v 01 v 02 Z 2 Z 1 w       Output neuron Output computation  

w 2 w 1 v 21 v 12 35 x 2 x 1 v 11 Y 1 1 v 22 v 01 v 02 Z 2 Z 1 w       Output Error correction computation  

w 2 w 1 v 21 v 12 36 x 2 x 1 v 11 Y 1 1 v 22 v 01 v 02 Z 2 Z 1 w       Output neuron changes updates computation        

w 2 w 1 v 21 v 12 37 x 2 x 1 v 11 Y 1 1 v 22 v 01 v 02 Z 2 Z 1 w     Hidden neuron error propagation computation        

w 2 w 1 v 21 v 12 38 x 2 x 1 v 11 Y 1 1 v 22 v 01 v 02 Z 2 Z 1 w     Hidden neuron error correction computation            

w 2 w 1 v 21 v 12 39 x 2 x 1 v 11 Y 1 1 v 22 v 01 v 02 Z 2 Z 1 w     Hidden neuron changes updates computation           1         2   2    

Matrix Notation – XOR gate 40 Given  

Matrix Notation – XOR gate 41 Forward pass Hidden layer After activation function  

Matrix Notation – XOR gate 42 Forward pass Output layer After activation function Loss function  

Matrix Notation – XOR gate 43 Backpropagation Update Weights: Hidden layer gradient  

Matrix Notation – XOR gate 44 Update using This iterative process continues until convergence  

NN with Two Hidden Layers (HW) 45  

Practical Issues – Softmax Layer 46

Example 47 z =[2.0 1.0 0.1] Then Let Targets be T = [ 1 0 0] Define Loss function  

Example 48 Homework. Loss function is cross entropy  

Regularization (to avoid Overfitting) 49 One of the primary causes of corruption of the generalization process is overfitting. The objective is to determine a curve that defines the border of the two groups using the training data.

Overfitting 50 One of the primary causes of corruption of the generalization process is overfitting. The objective is to determine a curve that defines the border of the two groups using the training data.

Overfitting 51 Some outliers penetrate the area of the other group and disturb the boundary. As Machine Learning considers all the data, even the noise, it ends up producing an improper model (a curve in this case). This would be penny-wise and pound-foolish.

Remedy : Regularization 52 Regularization is a numerical method that attempts to construct a model structure as simple as possible. The simplified model can avoid the effects of overfitting at the small cost of performance. Cost function Sum of squared errors  

Remedy : Regularization 53 For this reason, overfitting of the neural network can be improved by adding the sum of weights to the cost function, (new) Cost function In order to drop the value of the cost function, both the error and weight should be controlled to be as small as possible. However, if a weight becomes small enough, the associated nodes will be practically disconnected. As a result, unnecessary connections are eliminated, and the neural network becomes simpler.  

Add L1 Regularization to XOR Network 54 New Loss function The gradient of the regularized loss w.r.t a weight w is: Update rule for weights w is  

Add L2 Regularization to XOR Network 55 New Loss function The gradient of the regularized loss w.r.t a weight w is: Update rule for weights w is  

XOR implementation with L1 56 # Apply L2 regularization to weights hidden_layer_weights += learning_rate * (np.dot( hidden_layer_output.T , output_layer_delta ) – sign( hidden_layer_weights ) ) input_layer_weights += learning_rate * (np.dot( inputs.T , hidden_layer_delta ) - sign( input_layer_weights ) ) # Update biases (no regularization applied to biases) hidden_layer_bias += np.sum ( output_layer_delta , axis=0, keepdims =True) * learning_rate input_layer_bias += np.sum ( hidden_layer_delta , axis=0, keepdims =True) * learning_rate

XOR implementation with L2 57 # Apply L2 regularization to weights hidden_layer_weights += learning_rate * (np.dot( hidden_layer_output.T , output_layer_delta ) - hidden_layer_weights ) input_layer_weights += learning_rate * (np.dot( inputs.T , hidden_layer_delta ) - input_layer_weights ) # Update biases (no regularization applied to biases) hidden_layer_bias += np.sum ( output_layer_delta , axis=0, keepdims =True) * learning_rate input_layer_bias += np.sum ( hidden_layer_delta , axis=0, keepdims =True) * learning_rate

Matrix Notation – XOR gate with L2 Regularization 58 Expanding from Slide 44. Only weights will be updated as follows. Bias values won’t change. This iterative process continues until convergence. L2 regularization penalizes large weights, resulting in slightly smaller weight updates compared to the non-regularized case.  

Common Neural Architectures 59 Autoencoder (Module 2) Deep Neural Network (Module 3) Attractor Neural Networks (Module 4) Self-organizing Maps (module 5)

Autoencoder 60

Autoencoder 61

Deep Neural Network 62

Attractor Neural Networks 63

Self-Organizing Maps 64

65 Thank You All Very Much

Appendix: Example Implementation 66 Using Back-propagation network, find the new weights for the network shown aside. Input = [0 1] and target output is 1. use learning rate 0.25 and binary sigmoidal activation function

1. Consolidate the information 67 Given: Inputs [0 1], target 1 . [ [ ]=[0.4 0.1 0.2] Learning rate Activation function is binary sigmoidal Derivative  

2. Feed-forward Phase 68 Hidden unit , j = 1,2 Output , sigmoidal activation function Output unit Output sigmoidal activation function  

2. Feed-forward Phase 69 Hidden unit , j = 1,2 Output , sigmoidal activation function , Output unit Output sigmoidal activation function  

3. Back-propagation of error Phase 70 Compute error correction term Compute change in weights and bias , , ,  

3. Back-propagation of error Phase 71 Compute error correction term Compute change in weights and bias , , , Hidden unit  

3. Back-propagation of error Phase 72 Compute error correction term Compute change in weights and bias , , , Hidden unit  

3. Back-propagation of error Phase 73 Calculate error term  

3. Back-propagation of error Phase 74 Calculate error term  

3. Back-propagation of error Phase 75 Calculate error term Compute change in weights and bias ,  

3. Back-propagation of error Phase 76 Calculate error term Compute change in weights and bias , 0.0118 0.0118  

3. Back-propagation of error Phase 77 Calculate error term Compute change in weights and bias , 0.0118 0.0118 0.0 0.00245  

3. Back-propagation of error Phase 78 Calculate error term Compute change in weights and bias , 0.0118 0.0118 0.0 0.00245  

4. Weights and Bias update phase 79 Each output unit, k = 1 to m update weights and bias ,  

4. Weights and Bias update phase 80 Each output unit, k = 1 to m update weights and bias ,  

4. Weights and Bias update phase 81 Each hidden unit, j = 1 to p update weights and bias  

4. Weights and Bias update phase 82 Each hidden unit, j = 1 to p update weights and bias  

4. Weights and Bias update phase 83 Each hidden unit, j = 1 to p update weights and bias  

84 Epoch v 11 v 21 v 01 v 12 v 22 v 02 0.6 -0.1 0.3 -0.3 0.4 0.5 1 0.6 -0.097 0.303 -0.3 0.401 0.501 Write a program for this case and cross-verify your answers. After how many epochs will the output converge? Epoch z 1 z 2 w 1 w 2 w y 0.549 0.711 0.4 0.1 -0.2 0.523 1 0.5513 0.7113 0.416 0.121 -0.17 0.5363
Tags