Fundamentals of Deep Neural Networks Jiaul Paik Email: [email protected] 1
First Things to Note In Machine Learning / Deep Learning We almost always deal with Mathematical function A model almost always refers to a mathematical function INPUT F() Output / Target 2
ML: Design Pattern Problem statement: Know your data well What is the input? What is your output /target Define your model: Choose a reasonable math function Contains unknown parameters Define the objective function: Error / loss / profit Optimize the objective: Use training data and an optimization method to estimate parameters Now you have all for your model: Freeze the model and start using it 3
Let us Start with a Problem 4
Movie R ecommendation You have two friends John and Mary. You have prior data telling the movies you liked or disliked that they rated. Now a new movie named ‘gravity’ has been released that they rated and you want to decide whether you should watch it. 5
Know Your Data Sample/example Features / input variables Output / target 6
Logistic Regression and Decision Boundary 7
Preliminaries Inherently a 2-class classifier The classes traditionally called 0 and 1 OR positive and negative Logistic Regression is a probabilistic classifier For an input feature vector , computes the probability that is from class 1 (or positive) 8
Logistic Regression Model Main assumption The logarithm of odds is a linear function of the features of input x Mathematically [assume that x has n features: ] Organizing the above you get For convenience we write, instead 9
What kind of boundary it can model? Class 0 Class 1 Z > 0 > 0.5 Z < 0 < 0.5 10
What about non-linear boundary? 11
Dealing with Non-linear Boundary Z determines the boundary In this case, Z needs to be a quadratic function 12
Getting Model Parameters 13
Estimating Model Parameters Take one sample from the training data Let be the input sample/feature vector Let be its label/class Given, the input sample x and the model parameters What is the probability that x belongs to class 1 ? What is the probability that x belongs to class 0 ? Putting together 14
Estimating Model Parameters Likelihood function ( We have n training examples ) Assume that training examples are independent Apply: 15
Estimating Model Parameters We now have the likelihood function: = What we have to do now? Maximize the likelihood function: = But is complex to manage Alternative solution Maximize instead So, the final likelihood function is = 16
Estimating Model Parameters A pply stochastic gradient ascent (we will see in a moment) Updating parameters using one training sample at a time Gradient ascent: We just need to find out for all j Likelihood function: = 17
Optimization: Gradient Descent 18
Review of Differential Calculus Consider the univariate function y = f(x) (for example: y = ) at x= means rate of change of y w.r.t x < 0 means the function is decreasing > 0 means the function is increasing What is the sign of here? What is the sign of here ? 19
Review of Differential Calculus Multivariate function Function containing more than one independent variable Example: y = + + Partial differentiation Vary one variable at a time and keep others fixed 2z, 20
Local and Global minima x F(x) 21
Gradient Descent Goal Minimize a function w.r.t the parameters that the function contains T he error/objective function Two objectives Only minimum value of the function Find the value of the parameters that will minimize the function Input Output Input Output Training data 22
Gradient Descent Role of gradient descent To find the values of parameters that minimize an objective function F( ) 1. If we start from red circle , what is the direction of movement? Ans : downward 2. How can we get that? Ans : by looking at the derivative 3. What is the sign of ? Ans : positive Then, which direction we should move? Ans : - How to change parameter to reach minimum? - r > 0 (scalar) is a parameter called rate constant we want to reach here 23