Introduction to Machine Learning -Basic concepts,Models and Description
ShitalBedse4
303 views
67 slides
Aug 29, 2025
Slide 1 of 67
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
About This Presentation
This PDF provides a comprehensive introduction to Machine Learning (ML), covering its definitions, real-world applications, and comparison with traditional programming, AI, and Data Science. It explains different learning paradigms including supervised, unsupervised, semi-supervised, reinforcement, ...
This PDF provides a comprehensive introduction to Machine Learning (ML), covering its definitions, real-world applications, and comparison with traditional programming, AI, and Data Science. It explains different learning paradigms including supervised, unsupervised, semi-supervised, reinforcement, and self-supervised learning. The document also explores models of ML such as geometric, probabilistic, logical, grouping, grading, parametric, and non-parametric models. Additionally, it covers feature transformation techniques like PCA and LDA for dimensionality reduction. Illustrated examples like email spam detection, chess learning, recommendation systems, and healthcare applications make the concepts more practical and relatable. This serves as an excellent resource for students, researchers, and beginners in the field of AI and ML.
Size: 5.1 MB
Language: en
Added: Aug 29, 2025
Slides: 67 pages
Slide Content
Introduction to Machine
LearningLearning
Prof. Shital R. Bedse
What is Machine Learning?
ArthurSamuelanAmericanleaderinthefieldofComputergamingand
ArtificialIntelligencecoinedtheterm“MachineLearning”in1959atIBM.
Definition1(byArthurSamuel,1959):
“MachineLearningisthefieldofstudythatgivescomputerstheabilityto“MachineLearningisthefieldofstudythatgivescomputerstheabilityto
learnwithoutbeingexplicitlyprogrammed.”
What is Machine Learning?
Definition2(byTomMitchell,1997):
“AcomputerprogramissaidtolearnfromexperienceEwithrespectto
someclassoftasksTandperformancemeasureP,ifitsperformanceat
tasksinT,asmeasuredbyP,improveswithexperienceE.”tasksinT,asmeasuredbyP,improveswithexperienceE.”
Example1 : Email Spam Detection
●Task (T):Classifying emails as “spam” or “not spam”.
Performance Measure (P):Accuracy (percentage of correctly classified emails).
Experience (E):Training on a dataset of labeled emails (i.e., emails already marked as
spam or not spam)
What is Machine Learning?
Example2: Chess Learning Program
●Task (T):Playing chess against human or computer opponents.
●Performance Measure (P):Percentage of Won against opponent
●Experience (E):Playing thousands of games against other players or itself (self-play),
or studying past professional games.or studying past professional games.
Simplified Definition:
Machine Learning is a subset of AI that enables a system to learn
patterns from data and make decisions or predictions without being
explicitly told what to do.
Applications of Machine Learning
Recommendation engines are ubiquitous in our daily digital lives. They are algorithms that suggest
products, content, or services to users based on their past behavior and preferences, or the behavior of
similar users.
●Example:Platforms like Netflix and Amazon use sophisticated recommendation systems to suggest
movies, TV shows, and products to their users.movies, TV shows, and products to their users.
●Impact:A staggering 80% of content viewed on Netflix comes through its recommendation engine,
highlighting its critical role in user engagement.
●Method:These engines often employ collaborative filtering (suggesting items based on what similar
users liked) and content-based filtering (suggesting items similar to those a user previously
enjoyed).
●Benefit:For businesses, recommendations drive increased user engagement and significant
revenue growth, with Amazon reporting up to a 35% increase in sales attributed to its
recommendation system
Area Application Example
Healthcare
Disease prediction, medical image diagnosis (e.g., cancer detection in X-rays)
Finance
Credit scoring, fraud detection, stock market prediction
Retail
Recommendation systems (Amazon, Flipkart), demand forecasting
Transportatio
n
Self-driving cars (Tesla), route optimization (Google Maps)
Social Media
Friend suggestions (Facebook), content moderation
Agriculture
Crop disease detection, yield prediction
Education
Personalized learning platforms, student performance prediction
Entertainment
Movie/music recommendation (Netflix, Spotify)
Learning Paradigms: Learning Tasks-Descriptive and
Predictive Tasks,
Descriptive and Predictive Tasks
Descriptive and predictive tasks represent fundamental categories of learning.
Descriptive tasks focus on summarizing and understanding past data to identify patterns
and insights,
Use Visualization like Bar graph,pie chart,line graph etc, then create a dashboard using
POWERBi, Tableau etc then report generation.
Eg. Performance of Loreal Shampoo in 2020.Eg. Performance of Loreal Shampoo in 2020.
Predictive tasks aim to forecast future outcomes based on historical data Eg. Sale of abc
shampoo in year 2026
Prescriptive analytics :What actions to be taken to achieve predicted result?(Predictive +
Prescriptive)
Supervised, Unsupervised, Semi-supervised and
Reinforcement Learnings.
Reinforcement Learning
Other Real-World Examples of RL:
●Self-driving Cars:Learning to drive safely through traffic signals and roads.
●Stock Trading Bots:Buying/selling stocks based on market conditions.
●Robotics:Robot arm learning to pick up objects correctly.
●Chatbots:Learning to reply to users in a helpful way.
Semi Supervised Learning
Models of Machine learning: Geometric model,
Probabilistic Models, Logical Models, Grouping and
grading models, Parametric and non-parametric models.
Geometric Model
These models treat data points as vectors in a multidimensional space and attempt to
solve classification/regression by geometric constructs like lines, planes or hyperplanes.
➤
Key Models:1.Linear Regression:
○Fits a straight line through data points.
○Equation: y=w0+W1X1+....Wn Xn
○Objective: Minimize error(Mean Squared Error).
2.Logistic Regression:
○Classification using a sigmoid function.
○Decision boundary is a linear separator (hyperplane).
3.Support Vector Machine (SVM):
○Finds the best separating hyperplane with maximum marginbetween classes.
○Uses kernel trickfor non-linear separation.
Probabilistic Model
These models assume the data is generated from a known probability distribution. They learn by
estimating parameters of that distribution.
❖
Advantages:
➢
Can handle uncertainty and missing data.
➢
Easy to update with new data (Bayesian updating)
❖
Limitations:
➢
Requires assumptions about data distribution
➢
Sensitive to incorrect assumptions (e.g., independence in Naive Bayes).
❖
Applications:
➢
Text classification
➢
Speech recognition
➢
Anomaly detection
Probabilistic Model
Key Models:
1.
Naive Bayes Classifier:
○
Based on Bayes' Theorem.
○
Assumes independence between features.
○
Equation:
∣ ∣ ∣ ∣
○
Equation:
P(y∣x)=P(x∣y)P(y)P(x)P(y|x) = \frac{P(x|y)P(y)}{P(x)}P(y∣x)=P(x)P(x∣y)P(y)
2.
Hidden Markov Models (HMM):
○
Used for sequential data (like speech/text).
○
Contains hidden statesand observable outputs.
○
Used in POS tagging, speech recognition.
3.
Gaussian Mixture Models (GMM):
○
Assumes data comes from multiple Gaussian distributions
○
Common in unsupervised learning(like clustering).
Logical Model
Logicalmodelsusealogicalexpressiontodividetheinstancespace
intosegmentsandhenceconstructgroupingmodels.Alogical
expressionisanexpressionthatreturnsaBooleanvalue,i.e.,aTrueor
Falseoutcome.Oncethedataisgroupedusingalogicalexpression,theFalseoutcome.Oncethedataisgroupedusingalogicalexpression,the
dataisdividedintohomogeneousgroupingsfortheproblemweare
tryingtosolve.
Forexample,foraclassificationproblem,alltheinstancesinthegroup
belongtooneclass.
Logical Model
These are rule-based modelsthat make decisions using logical operations or
decision paths.
1.Decision Trees:
○Hierarchical model with internal nodes(tests), branches(outcomes) &
leaves(results).leaves(results).
○Uses metrics like Gini Indexor Entropyfor splitting.
2.Rule-Based Systems:
○Set of IF-THENrules.
○e.g., IF “age < 18” THEN “not eligible”.
3.Inductive Logic Programming (ILP):
○Combines machine learning with logic programming.
○Learns rules from relational data.
Logical Model
❖
Advantages:
➢
High interpretability.
➢
Easy to visualize and explain.
❖
Limitations:
❖
Limitations:
➢
Can overfit training data (especially decision trees).
➢
May become complex with large rule sets.
❖
Applications
:
➢
Rule-based expert systems
➢
Medical decision support
➢
Fraud detection
Grouping and Gradient Models
Groupingreferstoclusteringorunsupervisedlearningtechniqueswherethe
modellearnsto groupdatapointsbasedonsimilarity orshared
characteristics,withoutusinglabeledoutcomes.
??????
Key Concepts:
??????
Key Concepts:
●
No labeled data (unsupervised)
●
Based on distance (Euclidean, cosine, etc.)
●
Often used for exploratory data analysis
Grouping and Gradient Models
❖
GradingModels:Rankorscoreitemsbasedonsome
criteria.
❖
RecommendationSystems:RecommendationSystems:
➢
Grades or ranks items for a user (like movies, books).
➢
Techniques: Collaborative filtering, Matrix Factorization.
❖
Learning to Rank (LTR):
➢
Used in search engines to rank web pages.
➢
Examples: RankNet, LambdaRank.
Gradient Models
These models use gradient-based optimization techniques(like gradient
descent) to minimize a loss functionand improve performance on
supervised tasks (regression/classification).
Key Concepts:
??????
Key Concepts:
●
Loss function (e.g., MSE, cross-entropy)
●
Optimization via gradient descent
●
Used in deep learning, regression, and classification
Gradient Models
Grouping and Gradient Models
❖
Advantages:
➢
Useful for unsupervised data.
➢
Effective in customer segmentation, ranking tasks.
❖
Limitations:Limitations:
➢
Number of clusters (k) often needs manual setting.
➢
Sensitive to scaling and noise.
❖
Applications:
➢
Product recommendations
➢
Social network analysis
➢
Market segmentation
Grouping VS Gradient
Parametric & Non Parametric Model
Parametric Model
●Definition: Models with a fixed number of parameters.
●Examples:
○Linear Regression
○Logistic Regression
○Neural Networks (with fixed architecture)
●Advantages: Fast, less data required.
●Limitations: May underfit complex data.
Non Parametric Model
●Definition: Models that grow with data, no fixed parameters.
●Examples:
○
K-Nearest Neighbors (KNN)
○
Decision Trees
○
Decision Trees
○
Kernel SVM
●
Advantages: More flexible, can model complex patterns.
●
Limitations: More data and computation needed.
Model Type Examples Core Idea
Geometric Linear Regression, SVM, Logistic
Regression
Geometry-based separation or fit
ProbabilisticNaive Bayes, HMM, GMM Based on probability distributions
Logical Decision Trees, Rule-based SystemsLogic and rules drive learning
Grouping &
Grading
K-Means, DBSCAN, Recommender
Systems
Cluster or rank based on
similarity/relevance
Parametric Linear Models, Neural Nets (fixed size)Fixed number of learnable parameters
Non-ParametricKNN, Decision Trees, Kernel SVM Parameters grow with data
Feature Transformation: Dimensionality reduction
techniques-PCA and LDA
Feature Transformation: Dimensionality reduction
techniques-PCA and LDA
Feature transformation involves modifying or projecting features into a new space
that simplifies the problem without losing important information.
Why is it important?Why is it important?
●
Reduces dimensionality of data
●
Removes redundancy and noise
●
Improves model performance andtraining speed
●
Helps in visualization of high-dimensional data
Dimensionality Reduction
Dimensionalityreductioninmachinelearningistheprocessof
reducingthenumberoffeaturesorvariablesinadatasetwhile
retainingasmuchoftheoriginalinformationaspossible.Inother
words,itisawayofsimplifyingthedatabyreducingitswords,itisawayofsimplifyingthedatabyreducingits
complexity.
Theneedfordimensionalityreductionariseswhenadatasethasa
largenumberoffeaturesorvariables.Havingtoomanyfeatures
canleadtooverfittingandincreasethecomplexityofthemodel.
Itcanalsomakeitdifficulttovisualizethedataandcanslow
downthetrainingprocess.
●
Feature Selection: Selects a subset of original features.
Feature Selection
By only keeping the most relevant variables from the original dataset
1.Correlation
2.Forward Selection
3.Backward Elimination
4.Select K Best
5.Missing value Ratio