Artificial neural networks - A gentle introduction to ANNS.pptx

AttaNox1 42 views 50 slides May 08, 2024
Slide 1
Slide 1 of 50
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27
Slide 28
28
Slide 29
29
Slide 30
30
Slide 31
31
Slide 32
32
Slide 33
33
Slide 34
34
Slide 35
35
Slide 36
36
Slide 37
37
Slide 38
38
Slide 39
39
Slide 40
40
Slide 41
41
Slide 42
42
Slide 43
43
Slide 44
44
Slide 45
45
Slide 46
46
Slide 47
47
Slide 48
48
Slide 49
49
Slide 50
50

About This Presentation

Artificial neural network and their working. Diagrams explain the main components used in an ANN.


Slide Content

Today’s Lecture?

PATTERN RECOGNITION A Gentle introduction to Artificial neural networks (Part I)

What is it? An artificial neural network is a crude way of trying to simulate the human brain (digitally) Human brain – Approx 10 billion neurons Each neuron connected with thousands of others Parts of neuron Cell body Dendrites – receive input signal Axons – Give output

introduction ANN – made up of artificial neurons Digitally modeled biological neuron Each input into the neuron has its own weight associated with it As each input enters the nucleus (blue circle) it's multiplied by its weight.

Introduction The nucleus sums all these new input values which gives us the  activation For n inputs and n weights – weights multiplied by input and summed a = x 1 w 1 +x 2 w 2 +x 3 w 3 ... + x n w n

introduction If the activation is greater than a threshold value - the neuron outputs a signal – (for example 1) If the activation is less than threshold the neuron outputs zero. This is typically called a   step   function

Introduction The combination of summation and thresholding is called a node For step (activation) function – The output is 1 if: http://www-cse.uta.edu/~cook/ai1/lectures/figures/neuron.jpg x 1 w 1 +x 2 w 2 +x 3 w 3 ... + x n w n > T

Introduction x 1 w 1 +x 2 w 2 +x 3 w 3 ... + x n w n > T x 1 w 1 +x 2 w 2 +x 3 w 3 ... + x n w n -T > 0 Let w = -T and x = 1 D = x w + x 1 w 1 +x 2 w 2 +x 3 w 3 ... + x n w n > 0 Output is 1 if D> 0; Output is 0 otherwise w is called a bias weight

Typical activation functions Controls when unit is “active” or “inactive”

An artificial neuron- summary so far Receives n-inputs Multiplies each input by its weight Applies activation function to the sum of results Outputs result http://www-cse.uta.edu/~cook/ai1/lectures/figures/neuron.jpg

Simplest classifier Can a single neuron learn a task?

A motivating example Each day you get lunch at the cafeteria. Your diet consists of fish, chips, and drink. You get several portions of each The cashier only tells you the total price of the meal After several days, you should be able to figure out the price of each portion. Each meal price gives a linear constraint on the prices of the portions:

Solving the problem The prices of the portions are like the weights in of a linear neuron. We will start with guesses for the weights and then adjust the guesses to give a better fit to the prices given by the cashier.

The cashier’s brain Price of meal = 850 portions of fish portions of chips portions of drink 150 50 100 2 5 3 Linear neuron

Residual error = 350 Apply learning rules and update weights A model of the cashier’s brain with arbitrary initial weights Price of meal = 500 portions of fish portions of chips portions of drink 50 50 50 2 5 3

Perceptron In 1958, Frank Rosenblatt introduced a training algorithm that provided the first procedure for training a simple ANN: a perceptron . A two input perceptron

Perceptron A perceptron takes several inputs,  x1, x2, ……, and produces a single binary output. The model consists of a linear combiner followed by a hard limiter. The weighted sum of the inputs is applied to the hard limiter, which produces an output equal to +1 if its input is positive and -1 if it is negative. (1/0 in some models).    

perceptron 1 x 1 Y -10 1 4 This is Equation of a line - Decision boundary     x 2

perceptron This is Equation of a line - Decision boundary  

Perceptron learning A perceptron (threshold unit) can learn anything that it can represent (i.e. anything separable with a hyperplane ) X1 X2 Y 1 1 1 1 1 1 1

21 OR function The two-input perceptron can implement the OR function when we set the weights: w = -0.3, w 1 = w 2 = 0.5 Decision hyperplane : w + w 1 x 1 + w 2 x 2 = 0 -0.3 + 0.5 x 1 + 0.5 x 2 = 0 X1 X2 Y 1 1 1 1 Training Data -1 +1 +1 +1

22 OR function Decision hyperplane : w + w 1 x 1 + w 2 x 2 = 0 -0.3 + 0.5 x 1 + 0.5 x 2 = 0 Test Results X1 X2 Y -0.3 -1 1 0.2 +1 1 0.2 +1 1 1 0.7 +1 X1 X2 Y -0.3 -1 1 0.2 +1 1 0.2 +1 1 1 0.7 +1

23 A single perceptron can be used to represent many boolean functions.  AND function : Decision hyperplane : w + w 1 x 1 + w 2 x 2 = 0 -0.8 + 0.5 x 1 + 0.5 x 2 = 0 X1 X2 Y -1 1 -1 1 -1 1 1 +1 X1 X2 Y -0.8 -1 1 -0.3 -1 1 -0.3 -1 1 1 0.2 +1 X1 X2 Y -0.8 -1 1 -0.3 -1 1 -0.3 -1 1 1 0.2 +1 Training Examples Test Results

XOR Function A Perceptron cannot represent Exclusive OR since it is not linearly separable. X1 X2 Y -1 1 +1 1 +1 1 1 -1 XOR Function

25 XOR function : It is impossible to implement the XOR function by a single perceptron Two perceptrons ? X1 X2 Y -1 1 +1 1 +1 1 1 -1 XOR Function

2D Plot of basic logical operators ( a ) AND ( x 1 Ç x 2 ) A perceptron can learn the operations AND and OR , but not Exclusive-OR .

perceptron The aim of the perceptron is to classify inputs, x 1 , x 2 , . . ., x n , into one of two classes, say A1 and A2. In the case of an elementary perceptron , the n - dimensional space is divided by a hyperplane into two decision regions. The hyperplane is defined by the function:

Linear separability with perceptron

Perceptron Learning

Gradient descent Error Surface Use gradient descente to find the minimum value of E

Training rule derivation – Gradient Descent Objective: Find the values of weights which minimize the error function O (d) is the observed and T (d) is the target output for training example ‘d’

Batch gradient descente Gradient-Descent( training_examples , ) Each training example is a pair of the form <(x 1 ,… x n ),t> where (x 1 ,…, x n ) is the vector of input values, and t is the target output value,  is the learning rate (e.g. 0.1) Initialize each w i to some small random value Until the termination condition is met Do Initialize each  w i to zero For each <(x 1 ,… x n ),t> in training_examples Do Input the instance (x 1 ,…, x n ) to the linear unit and compute the output o For each linear unit weight w i Do w i = w i +  (t-o) x i For each linear unit weight w i Do w i = w i + w i

Incremental gradient descente The gradient decent training rule updates summing over all the training examples Stochastic gradient approximates gradient decent by updating weights incrementally Calculate error for each example

incremental gradient descente Gradient-Descent( training_examples , ) Each training example is a pair of the form <(x 1 ,… x n ),t> where (x 1 ,…, x n ) is the vector of input values, and t is the target output value,  is the learning rate (e.g. 0.1) Initialize each w i to some small random value Until the termination condition is met Do Initialize each  w i to zero For each <(x 1 ,… x n ),t> in training_examples Do Input the instance (x 1 ,…, x n ) to the linear unit and compute the output o For each linear unit weight w i w i = w i + w i

Gradient Descente Algorithm

perceptron learning: logical operation AND    

perceptron learning: logical operation AND   = = 0.2 < 0 Output: 0 Update Rule:           Training Example 1:      

perceptron learning: logical operation AND   = = 0.2 < 0 Output: 0 Update Rule:           Training Example 1:      

perceptron learning: logical operation AND   = = < 0 Output: 0 Update Rule:           Training Example 2:      

perceptron learning: logical operation AND   = = < 0 Output: 0 Update Rule:           Training Example 2:      

perceptron learning: logical operation AND   = = > 0 Output: 1 Update Rule:           Training Example 3:      

perceptron learning: logical operation AND   = = > 0 Output: 1 Update Rule:           Training Example 3:      

perceptron learning: logical operation AND   = = > 0 Output: 1 Update Rule:           Training Example 3:      

perceptron learning: logical operation AND

       

     

[Russell & Norvig, 1995] XOR - Revisited Piece-wise linear separation 0,0 0,1 1,0 1,1 0,0 0,1 1,0 1,1 AND XOR

0,0 0,1 1,0 1,1 XOR 0,0 0,1 1,0 1,1 XOR 0,0 0,1 1,0 1,1 XOR 1 2 3

Multi-layer perceptron - MLP Minsky & Papert (1969) offered solution to XOR problem by combining perceptron unit responses using a second layer of Units Piecewise linear classification using an MLP with threshold ( perceptron ) units

More to come…..Part II