Review Perceptron Sigmoid neurons (Logistic Regression) What is Neural Network anyway ? 2
PERCEPTRON
CLASSIFY BY DRAWING A “LINE” (OR HYPERPLANE) This can only be done when the data is linearly seperable (why though ?) Idea: Some “features” are more precised , and should be used more for classification Feature selection problems (Hard one LOL) Better question: How to choose that line ? => Choose good “bias and coefficient” 4
The loss FUNCTION of PERCEPTRON 5
HOW TO DRAW IT AS A “NEURAL” GRAPH ? ( don’t ASK WHY) 6
SIGMOID NEURON
WHAT IS THE NOVELTY HERE ? In reality, when we learn something, small changing in what we learned result in small changing in what we showed 8
MATHEMATICAL MODEL FOR THE HEURISTIC => L R! The result of L R is a probability , that an object is classified into a “group” Loss function: 9
DISCUSSION: WHY NOT MEAN-SQUARED ERROR ? For some intuition, when we use the result output as a prob, we want to punish wrong behaviour The “mean-squared” loss function is smooth, and, “expectedly”, it doesn’t punish the in-the-middle decision as much as the cross-entropy loss function 10
MULTILAYER PERCEPTRON & WHAT IS A NEURAL NETWORK ?
MULTICLASS CLASSIFICATION IS MY NEW FRIEND (or nightmare) Idea: We try to classify a object as belong or not belong to a group => Partially solved the multiclass problem Use multiple binary classification => Multiple Sigmoid neurons/Perceptron => Neural Networks (!!!) 12
HOW WE DEFINE A SOLUTION OF A “MULTICLASS” PROBLEM One-hot coding: Calculate the “Score” that an object is most likely to be in a group => Array of “Score” Binary coding: The MUX Selection, defind a choice by a binary number 13
HOW WE DEFINE A SOLUTION OF A “MULTICLASS” PROBLEM Hierarchical: Grouping the “identical” or “related” desired output, make less (and larger) groups, then binary classification them 14
WHAT IS NEURAL NETWORK NOW ? As inspired by how brain works, we create a “web” (or, a graph) to present the connection between different, calculate the importantness of a feature, then use it to guide our decision => It seems like the more layers one have, the more compilicated problems one can solved 15
MATHEMATICAL PREPARATION Layers: “Step” to reach the output, an array of number help to guide the next decision Activated function: Like sigmoid, linear, etc Units: A node in a layer 16
MATHEMATICAL PREPARATION The idea here: is the output of unit i in layer l So when we consider a matrix W, we just basically said we choose a linear transformation for each 17
CONCLUSION Neural network is just designing a bunch of hidden layers in between, then try to find good coefficients Problems: Gradient descent in this case is costly (for badly designed cost function), Learning rate can also be a problem Usage of ReLU , or Softmax Chain Rules 18
SADLY, NO CODE, IT’S ALL TALK Sorry :((
THANK YOU FOR LISTENING! Resource for the presentation: Machine learning cơ bản Neural network and Deep learning: Michael Nielsen Neural network and Deep learning, a text book: Charu C. Aggarwal 20