Introduction to Machine Learning -Basic concepts,Models and Description

ShitalBedse4 303 views 67 slides Aug 29, 2025
Slide 1
Slide 1 of 67
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27
Slide 28
28
Slide 29
29
Slide 30
30
Slide 31
31
Slide 32
32
Slide 33
33
Slide 34
34
Slide 35
35
Slide 36
36
Slide 37
37
Slide 38
38
Slide 39
39
Slide 40
40
Slide 41
41
Slide 42
42
Slide 43
43
Slide 44
44
Slide 45
45
Slide 46
46
Slide 47
47
Slide 48
48
Slide 49
49
Slide 50
50
Slide 51
51
Slide 52
52
Slide 53
53
Slide 54
54
Slide 55
55
Slide 56
56
Slide 57
57
Slide 58
58
Slide 59
59
Slide 60
60
Slide 61
61
Slide 62
62
Slide 63
63
Slide 64
64
Slide 65
65
Slide 66
66
Slide 67
67

About This Presentation

This PDF provides a comprehensive introduction to Machine Learning (ML), covering its definitions, real-world applications, and comparison with traditional programming, AI, and Data Science. It explains different learning paradigms including supervised, unsupervised, semi-supervised, reinforcement, ...


Slide Content

Introduction to Machine
LearningLearning
Prof. Shital R. Bedse

Contents
Introduction:WhatisMachineLearning,DefinitionsandReal-life
applications,ComparisonofMachinelearningwithtraditional
programming,MLvsAIvsDataScience.
LearningParadigms:LearningTasks-DescriptiveandPredictiveTasks,
Supervised,Unsupervised,Semi-supervisedandReinforcementLearnings.
ModelsofMachinelearning:Geometricmodel,ProbabilisticModels,
LogicalModels,Groupingandgradingmodels,Parametricandnon-
parametricmodels.
FeatureTransformation:Dimensionalityreductiontechniques-PCAand
LDA

What is Machine Learning?
ArthurSamuelanAmericanleaderinthefieldofComputergamingand
ArtificialIntelligencecoinedtheterm“MachineLearning”in1959atIBM.
Definition1(byArthurSamuel,1959):
“MachineLearningisthefieldofstudythatgivescomputerstheabilityto“MachineLearningisthefieldofstudythatgivescomputerstheabilityto
learnwithoutbeingexplicitlyprogrammed.”

What is Machine Learning?
Definition2(byTomMitchell,1997):
“AcomputerprogramissaidtolearnfromexperienceEwithrespectto
someclassoftasksTandperformancemeasureP,ifitsperformanceat
tasksinT,asmeasuredbyP,improveswithexperienceE.”tasksinT,asmeasuredbyP,improveswithexperienceE.”
Example1 : Email Spam Detection
●Task (T):Classifying emails as “spam” or “not spam”.
Performance Measure (P):Accuracy (percentage of correctly classified emails).
Experience (E):Training on a dataset of labeled emails (i.e., emails already marked as
spam or not spam)

What is Machine Learning?
Example2: Chess Learning Program
●Task (T):Playing chess against human or computer opponents.
●Performance Measure (P):Percentage of Won against opponent
●Experience (E):Playing thousands of games against other players or itself (self-play),
or studying past professional games.or studying past professional games.
Simplified Definition:
Machine Learning is a subset of AI that enables a system to learn
patterns from data and make decisions or predictions without being
explicitly told what to do.

Applications of Machine Learning
Recommendation engines are ubiquitous in our daily digital lives. They are algorithms that suggest
products, content, or services to users based on their past behavior and preferences, or the behavior of
similar users.
●Example:Platforms like Netflix and Amazon use sophisticated recommendation systems to suggest
movies, TV shows, and products to their users.movies, TV shows, and products to their users.
●Impact:A staggering 80% of content viewed on Netflix comes through its recommendation engine,
highlighting its critical role in user engagement.
●Method:These engines often employ collaborative filtering (suggesting items based on what similar
users liked) and content-based filtering (suggesting items similar to those a user previously
enjoyed).
●Benefit:For businesses, recommendations drive increased user engagement and significant
revenue growth, with Amazon reporting up to a 35% increase in sales attributed to its
recommendation system

Area Application Example
Healthcare
Disease prediction, medical image diagnosis (e.g., cancer detection in X-rays)
Finance
Credit scoring, fraud detection, stock market prediction
Retail
Recommendation systems (Amazon, Flipkart), demand forecasting
Transportatio
n
Self-driving cars (Tesla), route optimization (Google Maps)
Social Media
Friend suggestions (Facebook), content moderation
Agriculture
Crop disease detection, yield prediction
Education
Personalized learning platforms, student performance prediction
Entertainment
Movie/music recommendation (Netflix, Spotify)

Learning Paradigms: Learning Tasks-Descriptive and
Predictive Tasks,

Descriptive and Predictive Tasks
Descriptive and predictive tasks represent fundamental categories of learning.
Descriptive tasks focus on summarizing and understanding past data to identify patterns
and insights,
Use Visualization like Bar graph,pie chart,line graph etc, then create a dashboard using
POWERBi, Tableau etc then report generation.
Eg. Performance of Loreal Shampoo in 2020.Eg. Performance of Loreal Shampoo in 2020.
Predictive tasks aim to forecast future outcomes based on historical data Eg. Sale of abc
shampoo in year 2026
Prescriptive analytics :What actions to be taken to achieve predicted result?(Predictive +
Prescriptive)

Supervised, Unsupervised, Semi-supervised and
Reinforcement Learnings.

Machine Learning Types
MachineLearningismainlydividedintothreecoretypes:Supervised,Unsupervisedand
ReinforcementLearningalongwithtwoadditionaltypes,Semi-SupervisedandSelf-Supervised
Learning.

SupervisedLearning:Trainsmodelsonlabeleddatatopredictorclassifynew,unseen
data.

UnsupervisedLearning:Findspatternsorgroupsinunlabeleddata,likeclusteringor

UnsupervisedLearning:Findspatternsorgroupsinunlabeleddata,likeclusteringor
dimensionalityreduction.

ReinforcementLearning:Learnsthroughtrialanderrortomaximizerewards,idealfor
decision-makingtasks.

Self-SupervisedLearning:Self-supervisedlearningisoftenconsideredasubsetof
unsupervisedlearning,butithasgrownintoitsownfieldduetoitssuccessintraininglarge-
scalemodels.Itgeneratesitsownlabelsfromthedata,withoutanymanuallabeling.

Semi-SupervisedLearning:Thisapproachcombinesasmallamountoflabeleddatawitha
largeamountofunlabeleddata.It’suseful whenlabelingdataisexpensiveortime-
consuming.

Supervised Learning

Reinforcement Learning
ReinforcementLearningisatype
ofmachinelearningwherean
agentlearnstomakedecisions
byinteractingwithan
environment,takingactions,environment,takingactions,
andreceiving rewardsor
penaltiesasfeedback.
Thegoalisto maximizetotal
rewardovertimebylearningthe
beststrategy(policy).

Reinforcement Learning
Other Real-World Examples of RL:
●Self-driving Cars:Learning to drive safely through traffic signals and roads.
●Stock Trading Bots:Buying/selling stocks based on market conditions.
●Robotics:Robot arm learning to pick up objects correctly.
●Chatbots:Learning to reply to users in a helpful way.

Semi Supervised Learning

Models of Machine learning: Geometric model,
Probabilistic Models, Logical Models, Grouping and
grading models, Parametric and non-parametric models.

Models of Machine Learning

Inmachinelearning,modellearningreferstotheprocesswhereamachine
learningalgorithmlearnsfromdatatomakepredictionsordecisions,
essentiallybuildingamodelthatcangeneralizetonew,unseendata.This
involvestrainingthealgorithmonadataset,allowingittoidentifypatternsandinvolvestrainingthealgorithmonadataset,allowingittoidentifypatternsand
relationshipswithinthedata,andthenusingthatlearnedknowledgetomake
predictionsonnew,incomingdata.
●Geometricmodelsfocusonthespatialrelationshipsbetweendatapoints,
whileprobabilisticmodelsuseprobabilitydistributionstorepresentuncertainty
●Logicalmodelsemploylogicalrulesandgrouping/gradingmodelsclusteror
classifydata.Finally,parametricmodelshaveafixednumberofparameters,
whilenon-parametricmodelscanadapttheircomplexitytothedata.

Geometric Model
These models treat data points as vectors in a multidimensional space and attempt to
solve classification/regression by geometric constructs like lines, planes or hyperplanes.

Key Models:1.Linear Regression:
○Fits a straight line through data points.
○Equation: y=w0+W1X1+....Wn Xn
○Objective: Minimize error(Mean Squared Error).
2.Logistic Regression:
○Classification using a sigmoid function.
○Decision boundary is a linear separator (hyperplane).
3.Support Vector Machine (SVM):
○Finds the best separating hyperplane with maximum marginbetween classes.
○Uses kernel trickfor non-linear separation.

Geometric Model

Advantages:

Easytovisualizein2Dor3D.

Fastandinterpretable.

Limitations:
○Noteffectiveonnon-linearlyseparabledatawithoutkernel
methods.

Applications:
○Spamdetection
○Medicaldiagnosis
○Stockpriceprediction

Probabilistic Model
These models assume the data is generated from a known probability distribution. They learn by
estimating parameters of that distribution.

Advantages:

Can handle uncertainty and missing data.

Easy to update with new data (Bayesian updating)

Limitations:

Requires assumptions about data distribution

Sensitive to incorrect assumptions (e.g., independence in Naive Bayes).

Applications:

Text classification

Speech recognition

Anomaly detection

Probabilistic Model
Key Models:
1.
Naive Bayes Classifier:

Based on Bayes' Theorem.

Assumes independence between features.

Equation:
∣ ∣ ∣ ∣

Equation:
P(y∣x)=P(x∣y)P(y)P(x)P(y|x) = \frac{P(x|y)P(y)}{P(x)}P(y∣x)=P(x)P(x∣y)P(y)
2.
Hidden Markov Models (HMM):

Used for sequential data (like speech/text).

Contains hidden statesand observable outputs.

Used in POS tagging, speech recognition.
3.
Gaussian Mixture Models (GMM):

Assumes data comes from multiple Gaussian distributions

Common in unsupervised learning(like clustering).

Logical Model
Logicalmodelsusealogicalexpressiontodividetheinstancespace
intosegmentsandhenceconstructgroupingmodels.Alogical
expressionisanexpressionthatreturnsaBooleanvalue,i.e.,aTrueor
Falseoutcome.Oncethedataisgroupedusingalogicalexpression,theFalseoutcome.Oncethedataisgroupedusingalogicalexpression,the
dataisdividedintohomogeneousgroupingsfortheproblemweare
tryingtosolve.
Forexample,foraclassificationproblem,alltheinstancesinthegroup
belongtooneclass.

Logical Model

Therearemainlytwokindsoflogicalmodels:TreemodelsandRule
models.

RulemodelsconsistofacollectionofimplicationsorIF-THENrules.

Fortree-basedmodels,the‘if-part’definesasegmentandthe‘then-part’
definesthebehaviourofthemodelforthissegment.definesthebehaviourofthemodelforthissegment.

Treemodelscanbeseenasaparticulartypeofrulemodelwheretheif-
partsoftherulesareorganisedinatreestructure.

BothTreemodelsandRulemodelsusethesameapproachtosupervised
learning.

Theapproachcanbesummarisedintwostrategies:wecouldfirstfindthe
bodyoftherule(theconcept)thatcoversasufficientlyhomogeneousset
ofexamplesandthenfindalabeltorepresentthebody.

Logical Model
These are rule-based modelsthat make decisions using logical operations or
decision paths.
1.Decision Trees:
○Hierarchical model with internal nodes(tests), branches(outcomes) &
leaves(results).leaves(results).
○Uses metrics like Gini Indexor Entropyfor splitting.
2.Rule-Based Systems:
○Set of IF-THENrules.
○e.g., IF “age < 18” THEN “not eligible”.
3.Inductive Logic Programming (ILP):
○Combines machine learning with logic programming.
○Learns rules from relational data.

Logical Model

Advantages:

High interpretability.

Easy to visualize and explain.

Limitations:

Limitations:

Can overfit training data (especially decision trees).

May become complex with large rule sets.

Applications
:

Rule-based expert systems

Medical decision support

Fraud detection

Grouping and Gradient Models
Groupingreferstoclusteringorunsupervisedlearningtechniqueswherethe
modellearnsto groupdatapointsbasedonsimilarity orshared
characteristics,withoutusinglabeledoutcomes.
??????
Key Concepts:
??????
Key Concepts:

No labeled data (unsupervised)

Based on distance (Euclidean, cosine, etc.)

Often used for exploratory data analysis

Grouping and Gradient Models

GradingModels:Rankorscoreitemsbasedonsome
criteria.

RecommendationSystems:RecommendationSystems:

Grades or ranks items for a user (like movies, books).

Techniques: Collaborative filtering, Matrix Factorization.

Learning to Rank (LTR):

Used in search engines to rank web pages.

Examples: RankNet, LambdaRank.

Gradient Models
These models use gradient-based optimization techniques(like gradient
descent) to minimize a loss functionand improve performance on
supervised tasks (regression/classification).
Key Concepts:
??????
Key Concepts:

Loss function (e.g., MSE, cross-entropy)

Optimization via gradient descent

Used in deep learning, regression, and classification

Gradient Models

Grouping and Gradient Models

Advantages:

Useful for unsupervised data.

Effective in customer segmentation, ranking tasks.

Limitations:Limitations:

Number of clusters (k) often needs manual setting.

Sensitive to scaling and noise.

Applications:

Product recommendations

Social network analysis

Market segmentation

Grouping VS Gradient

Parametric & Non Parametric Model

Parametric Model
●Definition: Models with a fixed number of parameters.
●Examples:
○Linear Regression
○Logistic Regression
○Neural Networks (with fixed architecture)
●Advantages: Fast, less data required.
●Limitations: May underfit complex data.

Non Parametric Model
●Definition: Models that grow with data, no fixed parameters.
●Examples:

K-Nearest Neighbors (KNN)

Decision Trees

Decision Trees

Kernel SVM

Advantages: More flexible, can model complex patterns.

Limitations: More data and computation needed.

Model Type Examples Core Idea
Geometric Linear Regression, SVM, Logistic
Regression
Geometry-based separation or fit
ProbabilisticNaive Bayes, HMM, GMM Based on probability distributions
Logical Decision Trees, Rule-based SystemsLogic and rules drive learning
Grouping &
Grading
K-Means, DBSCAN, Recommender
Systems
Cluster or rank based on
similarity/relevance
Parametric Linear Models, Neural Nets (fixed size)Fixed number of learnable parameters
Non-ParametricKNN, Decision Trees, Kernel SVM Parameters grow with data

Feature Transformation: Dimensionality reduction
techniques-PCA and LDA

Feature Transformation: Dimensionality reduction
techniques-PCA and LDA
Feature transformation involves modifying or projecting features into a new space
that simplifies the problem without losing important information.
Why is it important?Why is it important?

Reduces dimensionality of data

Removes redundancy and noise

Improves model performance andtraining speed

Helps in visualization of high-dimensional data

Dimensionality Reduction
Dimensionalityreductioninmachinelearningistheprocessof
reducingthenumberoffeaturesorvariablesinadatasetwhile
retainingasmuchoftheoriginalinformationaspossible.Inother
words,itisawayofsimplifyingthedatabyreducingitswords,itisawayofsimplifyingthedatabyreducingits
complexity.
Theneedfordimensionalityreductionariseswhenadatasethasa
largenumberoffeaturesorvariables.Havingtoomanyfeatures
canleadtooverfittingandincreasethecomplexityofthemodel.
Itcanalsomakeitdifficulttovisualizethedataandcanslow
downthetrainingprocess.

Feature Selection: Selects a subset of original features.

Dimensionality Reduction
Feature Selection
Feature Extraction
Feature Selection
1.Filter method
2.Wrapper method
3.Embedded method
1.Principal Component Analysis
2.Missing Value Ratio
3.Backward Feature Selection
4.Forward Feature Selection
5.Factor Analysis
6.Independent Component Analysis

Feature Selection

Feature selection
Featureselectionchoosesthemostrelevantfeaturesfromthedatasetwithout
alteringthem.Ithelpsremoveredundantorirrelevantfeatures,improving
modelefficiency.Somecommonmethodsare:

Filtermethodsrankthefeaturesbasedontheirrelevancetothetarget

Filtermethodsrankthefeaturesbasedontheirrelevancetothetarget
variable.

Wrappermethodsusethemodelperformanceasthecriteriafor
selectingfeatures.

Embeddedmethodscombinefeatureselectionwiththemodeltraining
process.

Feature Selection
By only keeping the most relevant variables from the original dataset
1.Correlation
2.Forward Selection
3.Backward Elimination
4.Select K Best
5.Missing value Ratio

Feature extraction

Feature Extraction vs Feature Selection

Feature extraction
Featureextractioninvolvescreatingnewfeaturesbycombiningor
transformingtheoriginalfeatures.Thesenewfeaturesretainmostofthe
dataset’simportantinformationinfewerdimensions.Commonfeature
extractionmethodsare:
1.PrincipalComponentAnalysis(PCA):Convertscorrelatedvariables
intouncorrelated'principalcomponents,reducingdimensionalityintouncorrelated'principalcomponents,reducingdimensionality
whilemaintainingasmuchvarianceaspossibleenablingmore
efficientanalysis.
2.MissingValueRatio:Variableswithmissingdatabeyondaset
thresholdareremoved,improvingdatasetreliability.
3.
BackwardFeatureElimination:Startswithallfeaturesandremoves
theleastsignificantonesineachiteration.Theprocesscontinuesuntil
onlythemostimpactfulfeaturesremain,optimizingmodel
performance.

Feature extraction
1.ForwardFeatureSelection:ForwardFeatureSelectionBeginswith
onefeature,addsothersincrementallyandkeepsthoseimproving
modelperformance.
2.RandomForest:RandomforestUsesdecisiontreestoevaluate
featureimportance,automaticallyselectingthemostrelevantfeatureimportance,automaticallyselectingthemostrelevant
featureswithouttheneedformanualcoding,enhancingmodel
accuracy.
3.FactorAnalysis:Groupsvariablesbycorrelationandkeepsthe
mostrelevantonesforfurtheranalysis.
4.
IndependentComponentAnalysis(ICA):Identifiesstatistically
independentcomponents,idealforapplicationslike‘blindsource
separation’wheretraditionalcorrelation-basedmethodsfallshort.


Faster Computation

Better Visualization

Data Loss &
Reduced Accuracy:
Advantages Disadvantages

Better Visualization

Prevent Overfitting
Reduced Accuracy:

Choosing the Right
Components
Introduction to Dimensionality Reduction -GeeksforGeeks