Artificial neural networks - A gentle introduction to ANNS.pptx
AttaNox1
42 views
50 slides
May 08, 2024
Slide 1 of 50
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
About This Presentation
Artificial neural network and their working. Diagrams explain the main components used in an ANN.
Size: 1.84 MB
Language: en
Added: May 08, 2024
Slides: 50 pages
Slide Content
Today’s Lecture?
PATTERN RECOGNITION A Gentle introduction to Artificial neural networks (Part I)
What is it? An artificial neural network is a crude way of trying to simulate the human brain (digitally) Human brain – Approx 10 billion neurons Each neuron connected with thousands of others Parts of neuron Cell body Dendrites – receive input signal Axons – Give output
introduction ANN – made up of artificial neurons Digitally modeled biological neuron Each input into the neuron has its own weight associated with it As each input enters the nucleus (blue circle) it's multiplied by its weight.
Introduction The nucleus sums all these new input values which gives us the activation For n inputs and n weights – weights multiplied by input and summed a = x 1 w 1 +x 2 w 2 +x 3 w 3 ... + x n w n
introduction If the activation is greater than a threshold value - the neuron outputs a signal – (for example 1) If the activation is less than threshold the neuron outputs zero. This is typically called a step function
Introduction The combination of summation and thresholding is called a node For step (activation) function – The output is 1 if: http://www-cse.uta.edu/~cook/ai1/lectures/figures/neuron.jpg x 1 w 1 +x 2 w 2 +x 3 w 3 ... + x n w n > T
Introduction x 1 w 1 +x 2 w 2 +x 3 w 3 ... + x n w n > T x 1 w 1 +x 2 w 2 +x 3 w 3 ... + x n w n -T > 0 Let w = -T and x = 1 D = x w + x 1 w 1 +x 2 w 2 +x 3 w 3 ... + x n w n > 0 Output is 1 if D> 0; Output is 0 otherwise w is called a bias weight
Typical activation functions Controls when unit is “active” or “inactive”
An artificial neuron- summary so far Receives n-inputs Multiplies each input by its weight Applies activation function to the sum of results Outputs result http://www-cse.uta.edu/~cook/ai1/lectures/figures/neuron.jpg
Simplest classifier Can a single neuron learn a task?
A motivating example Each day you get lunch at the cafeteria. Your diet consists of fish, chips, and drink. You get several portions of each The cashier only tells you the total price of the meal After several days, you should be able to figure out the price of each portion. Each meal price gives a linear constraint on the prices of the portions:
Solving the problem The prices of the portions are like the weights in of a linear neuron. We will start with guesses for the weights and then adjust the guesses to give a better fit to the prices given by the cashier.
The cashier’s brain Price of meal = 850 portions of fish portions of chips portions of drink 150 50 100 2 5 3 Linear neuron
Residual error = 350 Apply learning rules and update weights A model of the cashier’s brain with arbitrary initial weights Price of meal = 500 portions of fish portions of chips portions of drink 50 50 50 2 5 3
Perceptron In 1958, Frank Rosenblatt introduced a training algorithm that provided the first procedure for training a simple ANN: a perceptron . A two input perceptron
Perceptron A perceptron takes several inputs, x1, x2, ……, and produces a single binary output. The model consists of a linear combiner followed by a hard limiter. The weighted sum of the inputs is applied to the hard limiter, which produces an output equal to +1 if its input is positive and -1 if it is negative. (1/0 in some models).
perceptron 1 x 1 Y -10 1 4 This is Equation of a line - Decision boundary x 2
perceptron This is Equation of a line - Decision boundary
Perceptron learning A perceptron (threshold unit) can learn anything that it can represent (i.e. anything separable with a hyperplane ) X1 X2 Y 1 1 1 1 1 1 1
21 OR function The two-input perceptron can implement the OR function when we set the weights: w = -0.3, w 1 = w 2 = 0.5 Decision hyperplane : w + w 1 x 1 + w 2 x 2 = 0 -0.3 + 0.5 x 1 + 0.5 x 2 = 0 X1 X2 Y 1 1 1 1 Training Data -1 +1 +1 +1
22 OR function Decision hyperplane : w + w 1 x 1 + w 2 x 2 = 0 -0.3 + 0.5 x 1 + 0.5 x 2 = 0 Test Results X1 X2 Y -0.3 -1 1 0.2 +1 1 0.2 +1 1 1 0.7 +1 X1 X2 Y -0.3 -1 1 0.2 +1 1 0.2 +1 1 1 0.7 +1
23 A single perceptron can be used to represent many boolean functions. AND function : Decision hyperplane : w + w 1 x 1 + w 2 x 2 = 0 -0.8 + 0.5 x 1 + 0.5 x 2 = 0 X1 X2 Y -1 1 -1 1 -1 1 1 +1 X1 X2 Y -0.8 -1 1 -0.3 -1 1 -0.3 -1 1 1 0.2 +1 X1 X2 Y -0.8 -1 1 -0.3 -1 1 -0.3 -1 1 1 0.2 +1 Training Examples Test Results
XOR Function A Perceptron cannot represent Exclusive OR since it is not linearly separable. X1 X2 Y -1 1 +1 1 +1 1 1 -1 XOR Function
25 XOR function : It is impossible to implement the XOR function by a single perceptron Two perceptrons ? X1 X2 Y -1 1 +1 1 +1 1 1 -1 XOR Function
2D Plot of basic logical operators ( a ) AND ( x 1 Ç x 2 ) A perceptron can learn the operations AND and OR , but not Exclusive-OR .
perceptron The aim of the perceptron is to classify inputs, x 1 , x 2 , . . ., x n , into one of two classes, say A1 and A2. In the case of an elementary perceptron , the n - dimensional space is divided by a hyperplane into two decision regions. The hyperplane is defined by the function:
Linear separability with perceptron
Perceptron Learning
Gradient descent Error Surface Use gradient descente to find the minimum value of E
Training rule derivation – Gradient Descent Objective: Find the values of weights which minimize the error function O (d) is the observed and T (d) is the target output for training example ‘d’
Batch gradient descente Gradient-Descent( training_examples , ) Each training example is a pair of the form <(x 1 ,… x n ),t> where (x 1 ,…, x n ) is the vector of input values, and t is the target output value, is the learning rate (e.g. 0.1) Initialize each w i to some small random value Until the termination condition is met Do Initialize each w i to zero For each <(x 1 ,… x n ),t> in training_examples Do Input the instance (x 1 ,…, x n ) to the linear unit and compute the output o For each linear unit weight w i Do w i = w i + (t-o) x i For each linear unit weight w i Do w i = w i + w i
Incremental gradient descente The gradient decent training rule updates summing over all the training examples Stochastic gradient approximates gradient decent by updating weights incrementally Calculate error for each example
incremental gradient descente Gradient-Descent( training_examples , ) Each training example is a pair of the form <(x 1 ,… x n ),t> where (x 1 ,…, x n ) is the vector of input values, and t is the target output value, is the learning rate (e.g. 0.1) Initialize each w i to some small random value Until the termination condition is met Do Initialize each w i to zero For each <(x 1 ,… x n ),t> in training_examples Do Input the instance (x 1 ,…, x n ) to the linear unit and compute the output o For each linear unit weight w i w i = w i + w i
Gradient Descente Algorithm
perceptron learning: logical operation AND
perceptron learning: logical operation AND = = 0.2 < 0 Output: 0 Update Rule: Training Example 1:
perceptron learning: logical operation AND = = 0.2 < 0 Output: 0 Update Rule: Training Example 1:
perceptron learning: logical operation AND = = < 0 Output: 0 Update Rule: Training Example 2:
perceptron learning: logical operation AND = = < 0 Output: 0 Update Rule: Training Example 2:
perceptron learning: logical operation AND = = > 0 Output: 1 Update Rule: Training Example 3:
perceptron learning: logical operation AND = = > 0 Output: 1 Update Rule: Training Example 3:
perceptron learning: logical operation AND = = > 0 Output: 1 Update Rule: Training Example 3:
perceptron learning: logical operation AND
[Russell & Norvig, 1995] XOR - Revisited Piece-wise linear separation 0,0 0,1 1,0 1,1 0,0 0,1 1,0 1,1 AND XOR
Multi-layer perceptron - MLP Minsky & Papert (1969) offered solution to XOR problem by combining perceptron unit responses using a second layer of Units Piecewise linear classification using an MLP with threshold ( perceptron ) units