Make lots of neurons, identical except for weights To keep our picture clear, weights will either be 1.0 (white) -1.0 (black) or 0.0 (missing) e2eml.school
Receptive fields get more complex e2eml.school
Repeat for additional layers e2eml.school
Receptive fields get still more complex e2eml.school
Repeat with a variation e2eml.school
Rectified linear units (ReLUs) 1.0 .5 -1.0 -.5 1.0 .5 1.5 2.0 -1.0 -.5 -1.5 -2.0 If your number is positive, keep it. Otherwise you get a zero. e2eml.school
e2eml.school
Add an output layer solid vertical diagonal horizontal e2eml.school
Learn all the weights: Gradient descent Error at: Weight original weight e2eml.school
Learn all the weights: Gradient descent Error at: Weight original weight lower weight e2eml.school
Learn all the weights: Gradient descent Error at: Weight original weight lower weight higher weight e2eml.school
Numerically calculating the gradient is very expensive e2eml.school
Calculate the gradient (slope) directly Error at: Weight original weight e2eml.school
Slope Error at: Weight original weight change in weight = +1 e2eml.school
Slope Error at: Weight original weight change in weight = +1 move along the curve e2eml.school
Slope Error at: Weight original weight change in weight = +1 change in error = -2 e2eml.school
Slope Error at: Weight original weight change in weight = +1 change in error = -2 slope = change in error change in weight e2eml.school
Slope Error at: Weight original weight change in weight = +1 change in error = -2 slope = change in error change in weight = ∆ error ∆ weight = d(error) d(weight) = ∂e ∂w = -2 +1 e2eml.school
Slope Error at: Weight original weight You have to know your error function. For example: error = weight ^2 -1 +1 e2eml.school
Slope Error at: Weight original weight You have to know your error function. For example: error = weight ^2 ∂e = 2 * weight ∂w -1 +1 e2eml.school
Slope Error at: Weight original weight You have to know your error function. For example: error = weight ^2 ∂e = 2 * weight ∂w = 2 * -1 = -2 -1 +1 e2eml.school
y = x * w 1 Chaining + + x (input) e (output) y (intermediate value) w 1 w 2 e2eml.school
y = x * w 1 ∂y = x ∂w 1 Chaining + + x (input) e (output) y (intermediate value) w 1 w 2 e2eml.school
y = x * w 1 ∂y = x ∂w 1 e = y * w 2 ∂e = w 2 ∂y Chaining + + x (input) e (output) y (intermediate value) w 1 w 2 e2eml.school
y = x * w 1 ∂y = x ∂w 1 e = y * w 2 ∂e = w 2 ∂y e = x * w 1 * w 2 ∂e = x * w 2 ∂w 1 Chaining + + x (input) e (output) y (intermediate value) w 1 w 2 e2eml.school
y = x * w 1 ∂y = x ∂w 1 e = y * w 2 ∂e = w 2 ∂y e = x * w 1 * w 2 ∂e = x * w 2 ∂w 1 ∂e = ∂y * ∂e ∂w 1 ∂w 1 ∂y Chaining + + x (input) e (output) y (intermediate value) w 1 w 2 e2eml.school
y = x * w 1 ∂y = x ∂w 1 e = y * w 2 ∂e = w 2 ∂y e = x * w 1 * w 2 ∂e = x * w 2 ∂w 1 ∂e = ∂y * ∂e ∂w 1 ∂w 1 ∂y Chaining + + x (input) e (output) y (intermediate value) w 1 w 2 e2eml.school
∂err = ∂a * ∂b * ∂c * ∂d * … * ∂y * ∂z * ∂err ∂weight ∂weight ∂a ∂b ∂c ∂x ∂y ∂z Chaining + err a weight b c x y z ... e2eml.school
Backpropagation + err a weight b c x y z ... e2eml.school
∂err = ∂a * ∂b * ∂c * ∂d * … * ∂y * ∂z * ∂err ∂weight ∂weight ∂a ∂b ∂c ∂x ∂y ∂z Backpropagation + err a weight b c x y z ... e2eml.school
∂err = ∂a * ∂b * ∂c * ∂d * … * ∂y * ∂z * ∂err ∂weight ∂weight ∂a ∂b ∂c ∂x ∂y ∂z Backpropagation + err a weight b c x y z ... e2eml.school
∂err = ∂a * ∂b * ∂c * ∂d * … * ∂y * ∂z * ∂err ∂weight ∂weight ∂a ∂b ∂c ∂x ∂y ∂z Backpropagation + err a weight b c x y z ... e2eml.school
∂err = ∂a * ∂b * ∂c * ∂d * … * ∂y * ∂z * ∂err ∂weight ∂weight ∂a ∂b ∂c ∂x ∂y ∂z Backpropagation + err a weight b c x y z ... e2eml.school
Backpropagation + err a weight b c x y z ... e2eml.school