Machine learning Machi ne learning is a method of data analysis that automates analytical model building. It is a branch of Artificial intelligence based on the idea that systems can learn from data, identify patterns and make decisions with minimal human intervention.
Why we need Machine Learning? Healthcare. Government Systems Marketing and sales. E-commerce and social media Transportation Etc.
Application
Application
How machine learning works
Types of Machine learning
Supervised learning is the type of machine learning which involves the task of learning a function that maps an input to an output based on example input-output pairs. It infers a function from labeled training data consisting of a set of training examples. Example algorithms: Linear Regression Logistic Regression Decision Trees Support Vector Machines (SVM)
Unsupervised learning is a type of machine learning algorithm used to draw inferences from data-sets consisting of input data without labeled responses. The most common unsupervised learning method is cluster analysis, which is used for exploratory data analysis to find hidden patterns or grouping in data. Example techniques: K-Means Clustering Hierarchical Clustering Principal Component Analysis (PCA)
Import required packages Load dataset Identify/ Group features and target attributes Visualize data to apply efficient algorithm Split dataset into training set and testing set load the model Train the model by providing training data Prediction based on test data score the model Determine the error and accuracy Steps involved in building machine learning model
Evaluation Metrics for Regression
Evaluation metrics for class Accuracy Measures the percentage of correctly predicted instances. Formula: Accuracy = (TP + TN) / (TP + TN + FP + FN)
Precision Measures the correctness of positive predictions. Formula: Precision = TP / (TP + FP)
Recall (Sensitivity) Measures the model's ability to capture actual positive cases. Formula: Recall = TP / (TP + FN)
F1-Score Balances precision and recall for imbalanced datasets. Formula: F1-Score = 2 * (Precision * Recall) / (Precision + Recall)
ROC-AUC Score Measures model performance across different classification thresholds. ROC Curve: Plots True Positive Rate vs. False Positive Rate. AUC Score: 1 = Perfect, 0.5 = Random Guessing.
Confusion Matrix Summarizes model predictions compared to actual values.
Supervised Machine Learning Regression For regression analysis there is one independent set and one dependent set. When we have to model the relationship between dependent and independent variable, regression analysis will be applied. Whenever any dependent set is continuous or having numerical data, regression analysis can be applied
Linear Regression Dependent variable is continuous in nature Applied when there is a linear relationship between independent and dependent variable
Logistic Regression Logistic regression is kind of like linear regression but is used when the dependent variable is not a number, but something else (like a Yes/No response). Its called Regression but performs classification as based on the regression it classifies the dependent variable into either of the classes.
Logistic regression is used for prediction of output which is binary For example, if a credit card company is going to build a model to decide whether to issue a credit card to a customer or not. Firstly, Linear Regression is performed on the relationship between variables to get the model . The threshold for the classification line is assumed to be at 0.5 Linear Regression Logistic Sigmoid Function
Logistic Function is applied to the regression to get the probabilities of it belonging in either class. It gives the log of the probability of the event occurring to log of the probability of it not occurring. In the end, it classifies the variable based on the higher probability of either class. Log Odds
K-Nearest Neighbours (K-NN) K-NN algorithm is one of the simplest classification algorithm and it is used to identify the data points that are separated into several classes to predict the classification of a new sample point. K-NN is a non-parametric, lazy learning algorithm. It classifies new cases based on a similarity measure (e.g. distance functions). KNN does not learn. It is lazy and it just memorizes the data. KNN works well with a small number of input variables (p), but struggles when the number of inputs is very large.
Decision Tree Classification Decision tree builds classification or regression models in the form of a tree structure. It breaks down a dataset into smaller and smaller subsets while at the same time an associated decision tree is incrementally developed. The final result is a tree with decision nodes and leaf nodes . Decision Tree uses Entropy and Information Gain to construct a decision tree.
Information Gain It measures the relative change in entropy with respect to the independent attribute. Constructing a decision tree is all about finding the attribute that returns the highest information gain (i.e., the most homogeneous branches). The information gain is calculated for the target attribute.
Entropy Entropy is the degree or amount of uncertainty in the randomness of elements or in other words it is a measure of impurity . Entropy of an attribute is the multiplication of Information gain of that attribute and the probability of that attribute. Out of all the features attributes one will become the root node.
age competition type profit old yes s/w down old no s/w down old no h/w down mid yes s/w down mid yes h/w down mid no h/w up mid no s/w up new yes s/w up new no h/w up new no s/w up 1 st find out entropy of Age Down Up Old 3 Mid 2 2 New 3
P(old) = 3/10 P(mid) = 4/10 P(new) = 3/10 E(Age) = P(old)*I(old) + P(mid)*I(mid) + P(new)*I(new) E(Age) = 3/10*0 + 4/10*1 + 3/10*0 E(Age) = 0.4 Gain(Age) = 1 – 0.4 Gain(Age) = 0.6 Gain(Age) = 0.6 Gain(Competition) = 0.124 Gain(Type) = 0 I.G. (target) = 1 Age Competition old mid new yes no down up down up
Decision Making Decision-making is the anticipation of conditions occurring during the execution of a program and specified actions taken according to the conditions. if statements: An if statement consists of a Boolean expression followed by one or more statements. if...else statements: An if statement can be followed by an optional else statement, which executes when the Boolean expression is FALSE. nested if statements: You can use one if or else if statement inside another if or else if statement(s).
Loops A loop statement allows us to execute a statement or group of statements multiple times. while loop: Repeats a statement or group of statements while a given condition is TRUE. It tests the condition before executing the loop body. for loop: Executes a sequence of statements multiple times and abbreviates the code that manages the loop variable. nested loops: You can use one or more loop inside any another while, or for loop. The Infinite Loop: A loop becomes infinite loop if a condition never becomes FALSE. The range() function The built-in function range() is the right function to iterate over a sequence of numbers. It generates an iterator of arithmetic progressions.
Loop Control Statements break statement: Terminates the loop statement and transfers execution to the statement immediately following the loop. continue statement: Causes the loop to skip the remainder of its body and immediately retest its condition prior to reiterating. pass statement: The pass statement in Python is used when a statement is required syntactically but you do not want any command or code to execute.
Functions A function is a block of organized, reusable code that is used to perform a single, related action. Functions provide better modularity for your application and a high degree of code reusing. Defining a Function Calling a Function
Function Arguments Required arguments Keyword arguments Default arguments Variable-length arguments
File handling
Supervised Machine Learning Regression For regression analysis there is one independent set and one dependent set. When we have to model the relationship between dependent and independent variable, regression analysis will be applied. Whenever any dependent set is continuous or having numerical data, regression analysis can be applied
Limitations of Traditional Machine Learning Feature Engineering is Manual Scalability Issues Poor Performance on Unstructured Data Inability to Handle High-Dimensional Data Lack of Hierarchical Representations
What is Deep Neural Networks (DNNs)? A type of Artificial Neural Network (ANN) with multiple hidden layers. Designed to automatically learn features and representations from data. Used in fields like computer vision, speech recognition, and natural language processing (NLP).