LEARNING OBJECTIVES Analyze various problem statements and accurately categorize them as either classification or regression tasks Develop critical thinking skills by evaluating and considering the implications of categorizing them correctly Apply their understanding of classification and regression to real-world scenarios
Traditional Programming Vs Machine Learning
Traditional Program: Input → Processing (based on rules) → Output The logic is defined explicitly, and the outcome is deterministic. Machine Learning Program: Input Data → Training → Model → Prediction The model learns from training data and can adapt to new, unseen data, making the output probabilistic.
Why is Machine Learning Important ?
1. Identify Use Cases Business Problem: Understand the specific problem you are addressing (e.g., customer segmentation, predictive maintenance). 1.1. Define the Problem Type of Problem : Determine whether it’s a classification, regression, clustering, etc. Business Objectives : Understand what you want to achieve and how success will be measured.
Define Project Requirements Objective: Clearly state what you want to achieve (e.g., classification, regression, clustering). Data Availability: Assess the type and volume of data available. Performance Metrics: Decide how you will measure success (e.g., accuracy, precision, recall).
TYPES OF LEARNING
Supervised Learning is a type of machine learning where an algorithm is trained on a labeled dataset. The goal of supervised learning is to learn a mapping from inputs to outputs so that the model can accurately predict the output for new, unseen data. Key Characteristics: Labeled Data : The training data consists of input-output pairs, where the output (or label) is known. Training Process : The model learns by adjusting its parameters to minimize the difference between its predictions and the actual labels. Types of Problems : Supervised learning can be used for: Classification : Predicting discrete categories (e.g., spam detection). Regression : Predicting continuous values (e.g., house prices).
Unsupervised Learning is a type of machine learning where the algorithm is trained on a dataset that does not have labeled outputs. In this approach, the model attempts to identify patterns, structures, or relationships in the data without any guidance on what the output should be. Key Characteristics: Unlabeled Data : The training data consists only of input features, with no corresponding output labels. Pattern Discovery : The model seeks to learn the underlying structure or distribution of the data, grouping similar data points together or identifying anomalies. Types of Problems : Unsupervised learning can be used for: Clustering : Grouping similar data points (e.g., customer segmentation). Dimensionality Reduction : Reducing the number of features while retaining important information (e.g., PCA).
Activity 1 Problem Statements : Predicting whether an email is spam or not. Estimating the price of a used car based on its features. Classifying customer reviews as positive, negative, or neutral. Predicting the temperature for tomorrow. Identifying whether an image contains a cat or a dog. Forecasting sales numbers for the next quarter. Determining if a loan application is approved or denied.
Problem Statement You have the following dataset consisting of three points in a 2D space (each point has two features: x and y): Point A: (2, 3) Point B: (4, 5) Point C: (6, 7) Task : Calculate the centroid of these three points. Steps to Solve the Problem List the Points : Identify the coordinates of each point. Calculate the Centroid : Compute the Final Result .
SUMMARY Classification deals with categorical outputs. Regression deals with continuous outputs.