INT255: Mathematics behind Machine Learning Unit 3: Dimensionality Reduction and Regression Techniques
Contents to be covered Principal Component Analysis Linear Discriminant Analysis Least Squared Approximation Minimum Normed Solution Regression Analysis: Linear, Multiple, Logistic
Principal Component Analysis (PCA) Aims to reduce dimensionality. Capture directions that explain maximum variance in data. Select top k eigen vectors corresponding to largest eigen values. Project data onto the selected eigen vectors (principal components). PCA is useful to visualize high dimensional data.
Linear Discriminant Analysis (LDA) Reduce dimensionality Maximizing the separability of classes. Calculate the within-class scatter matrix ( ) Calculate the between-class scatter matrix ( ) Compute eigen values and eigen vectors of Select top k eigen vectors. Project data onto the selected eigen vectors.
Linear Discriminant Analysis (LDA) LDA is method used for classification and dimensionality reduction . It finds a linear combination of features Best separates two or more classes of data. LDA finds a line (or plane) that best separates the classes. It uses means and scatter matrices to compute the optimal projection. The projected data is used for classification.
Least Squared Approximation Best-fit solution Overdetermined system of equations (more equations than unknowns) Minimizes sum of squared differences (i.e., errors) Typically used in Regression Analysis
Minimum Normed Solution Underdetermined systems of linear equations More unknowns than equations Infinite solutions possible Solution that has smallest “norm” (i.e., magnitude)
Regression Analysis Relationship between a dependent variable (target) and one or more independent variables (predictors) Linear Regression: Relationship between a dependent and one independent variable Straight line model ( ) e.g., house_price = m( size_sq_ft ) + c
Regression Analysis Multiple Linear Regression Relationship between the dependent and two or more independent variables e.g.,
Regression Analysis
Regression Analysis
Regression Analysis Logistic Regression dependent variable is categorical e.g., binary classification such as "yes" or "no“ Model: ; p is the probability of outcome
Logistic Regression Logistic Regression is a supervised machine learning algorithm used for binary classification tasks, the goal is to predict the probability of an input belonging to one of two classes (e.g., 0 or 1). Unlike Linear Regression, which predicts continuous values, Logistic Regression predicts probabilities Classifies data points based on a threshold (e.g., 0.5).
Logistic Regression Logistic Regression is a supervised machine learning algorithm used for binary classification tasks, the goal is to predict the probability of an input belonging to one of two classes (e.g., 0 or 1). Unlike Linear Regression, which predicts continuous values, Logistic Regression predicts probabilities Classifies data points based on a threshold (e.g., 0.5).