Introduction to Machine Learning -Basic concepts,Models and Description

Introduction to Machine
LearningLearning
Prof. Shital R. Bedse

Contents
Introduction:WhatisMachineLearning,DefinitionsandReal-life
applications,ComparisonofMachinelearningwithtraditional
programming,MLvsAIvsDataScience.
LearningParadigms:LearningTasks-DescriptiveandPredictiveTasks,
Supervised,Unsupervised,Semi-supervisedandReinforcementLearnings.
ModelsofMachinelearning:Geometricmodel,ProbabilisticModels,
LogicalModels,Groupingandgradingmodels,Parametricandnon-
parametricmodels.
FeatureTransformation:Dimensionalityreductiontechniques-PCAand
LDA

What is Machine Learning?
ArthurSamuelanAmericanleaderinthefieldofComputergamingand
ArtificialIntelligencecoinedtheterm“MachineLearning”in1959atIBM.
Definition1(byArthurSamuel,1959):
“MachineLearningisthefieldofstudythatgivescomputerstheabilityto“MachineLearningisthefieldofstudythatgivescomputerstheabilityto
learnwithoutbeingexplicitlyprogrammed.”

What is Machine Learning?
Definition2(byTomMitchell,1997):
“AcomputerprogramissaidtolearnfromexperienceEwithrespectto
someclassoftasksTandperformancemeasureP,ifitsperformanceat
tasksinT,asmeasuredbyP,improveswithexperienceE.”tasksinT,asmeasuredbyP,improveswithexperienceE.”
Example1 : Email Spam Detection
●Task (T):Classifying emails as “spam” or “not spam”.
Performance Measure (P):Accuracy (percentage of correctly classified emails).
Experience (E):Training on a dataset of labeled emails (i.e., emails already marked as
spam or not spam)

What is Machine Learning?
Example2: Chess Learning Program
●Task (T):Playing chess against human or computer opponents.
●Performance Measure (P):Percentage of Won against opponent
●Experience (E):Playing thousands of games against other players or itself (self-play),
or studying past professional games.or studying past professional games.
Simplified Definition:
Machine Learning is a subset of AI that enables a system to learn
patterns from data and make decisions or predictions without being
explicitly told what to do.

Applications of Machine Learning
Recommendation engines are ubiquitous in our daily digital lives. They are algorithms that suggest
products, content, or services to users based on their past behavior and preferences, or the behavior of
similar users.
●Example:Platforms like Netflix and Amazon use sophisticated recommendation systems to suggest
movies, TV shows, and products to their users.movies, TV shows, and products to their users.
●Impact:A staggering 80% of content viewed on Netflix comes through its recommendation engine,
highlighting its critical role in user engagement.
●Method:These engines often employ collaborative filtering (suggesting items based on what similar
users liked) and content-based filtering (suggesting items similar to those a user previously
enjoyed).
●Benefit:For businesses, recommendations drive increased user engagement and significant
revenue growth, with Amazon reporting up to a 35% increase in sales attributed to its
recommendation system

Area Application Example
Healthcare
Disease prediction, medical image diagnosis (e.g., cancer detection in X-rays)
Finance
Credit scoring, fraud detection, stock market prediction
Retail
Recommendation systems (Amazon, Flipkart), demand forecasting
Transportatio
n
Self-driving cars (Tesla), route optimization (Google Maps)
Social Media
Friend suggestions (Facebook), content moderation
Agriculture
Crop disease detection, yield prediction
Education
Personalized learning platforms, student performance prediction
Entertainment
Movie/music recommendation (Netflix, Spotify)

Learning Paradigms: Learning Tasks-Descriptive and
Predictive Tasks,

Descriptive and Predictive Tasks
Descriptive and predictive tasks represent fundamental categories of learning.
Descriptive tasks focus on summarizing and understanding past data to identify patterns
and insights,
Use Visualization like Bar graph,pie chart,line graph etc, then create a dashboard using
POWERBi, Tableau etc then report generation.
Eg. Performance of Loreal Shampoo in 2020.Eg. Performance of Loreal Shampoo in 2020.
Predictive tasks aim to forecast future outcomes based on historical data Eg. Sale of abc
shampoo in year 2026
Prescriptive analytics :What actions to be taken to achieve predicted result?(Predictive +
Prescriptive)

Supervised, Unsupervised, Semi-supervised and
Reinforcement Learnings.

Machine Learning Types
MachineLearningismainlydividedintothreecoretypes:Supervised,Unsupervisedand
ReinforcementLearningalongwithtwoadditionaltypes,Semi-SupervisedandSelf-Supervised
Learning.
●
SupervisedLearning:Trainsmodelsonlabeleddatatopredictorclassifynew,unseen
data.
●
UnsupervisedLearning:Findspatternsorgroupsinunlabeleddata,likeclusteringor
●
UnsupervisedLearning:Findspatternsorgroupsinunlabeleddata,likeclusteringor
dimensionalityreduction.
●
ReinforcementLearning:Learnsthroughtrialanderrortomaximizerewards,idealfor
decision-makingtasks.
●
Self-SupervisedLearning:Self-supervisedlearningisoftenconsideredasubsetof
unsupervisedlearning,butithasgrownintoitsownfieldduetoitssuccessintraininglarge-
scalemodels.Itgeneratesitsownlabelsfromthedata,withoutanymanuallabeling.
●
Semi-SupervisedLearning:Thisapproachcombinesasmallamountoflabeleddatawitha
largeamountofunlabeleddata.It’suseful whenlabelingdataisexpensiveortime-
consuming.

Supervised Learning

Reinforcement Learning
ReinforcementLearningisatype
ofmachinelearningwherean
agentlearnstomakedecisions
byinteractingwithan
environment,takingactions,environment,takingactions,
andreceiving rewardsor
penaltiesasfeedback.
Thegoalisto maximizetotal
rewardovertimebylearningthe
beststrategy(policy).

Reinforcement Learning
Other Real-World Examples of RL:
●Self-driving Cars:Learning to drive safely through traffic signals and roads.
●Stock Trading Bots:Buying/selling stocks based on market conditions.
●Robotics:Robot arm learning to pick up objects correctly.
●Chatbots:Learning to reply to users in a helpful way.

Semi Supervised Learning

Models of Machine learning: Geometric model,
Probabilistic Models, Logical Models, Grouping and
grading models, Parametric and non-parametric models.

Models of Machine Learning
●
Inmachinelearning,modellearningreferstotheprocesswhereamachine
learningalgorithmlearnsfromdatatomakepredictionsordecisions,
essentiallybuildingamodelthatcangeneralizetonew,unseendata.This
involvestrainingthealgorithmonadataset,allowingittoidentifypatternsandinvolvestrainingthealgorithmonadataset,allowingittoidentifypatternsand
relationshipswithinthedata,andthenusingthatlearnedknowledgetomake
predictionsonnew,incomingdata.
●Geometricmodelsfocusonthespatialrelationshipsbetweendatapoints,
whileprobabilisticmodelsuseprobabilitydistributionstorepresentuncertainty
●Logicalmodelsemploylogicalrulesandgrouping/gradingmodelsclusteror
classifydata.Finally,parametricmodelshaveafixednumberofparameters,
whilenon-parametricmodelscanadapttheircomplexitytothedata.

Geometric Model
These models treat data points as vectors in a multidimensional space and attempt to
solve classification/regression by geometric constructs like lines, planes or hyperplanes.
➤
Key Models:1.Linear Regression:
○Fits a straight line through data points.
○Equation: y=w0+W1X1+....Wn Xn
○Objective: Minimize error(Mean Squared Error).
2.Logistic Regression:
○Classification using a sigmoid function.
○Decision boundary is a linear separator (hyperplane).
3.Support Vector Machine (SVM):
○Finds the best separating hyperplane with maximum marginbetween classes.
○Uses kernel trickfor non-linear separation.

Geometric Model
➢
Advantages:
○
Easytovisualizein2Dor3D.
○
Fastandinterpretable.
➢
Limitations:
○Noteffectiveonnon-linearlyseparabledatawithoutkernel
methods.
➢
Applications:
○Spamdetection
○Medicaldiagnosis
○Stockpriceprediction

Probabilistic Model
These models assume the data is generated from a known probability distribution. They learn by
estimating parameters of that distribution.
❖
Advantages:
➢
Can handle uncertainty and missing data.
➢
Easy to update with new data (Bayesian updating)
❖
Limitations:
➢
Requires assumptions about data distribution
➢
Sensitive to incorrect assumptions (e.g., independence in Naive Bayes).
❖
Applications:
➢
Text classification
➢
Speech recognition
➢
Anomaly detection

Probabilistic Model
Key Models:
1.
Naive Bayes Classifier:
○
Based on Bayes' Theorem.
○
Assumes independence between features.
○
Equation:
∣ ∣ ∣ ∣
○
Equation:
P(y∣x)=P(x∣y)P(y)P(x)P(y|x) = \frac{P(x|y)P(y)}{P(x)}P(y∣x)=P(x)P(x∣y)P(y)
2.
Hidden Markov Models (HMM):
○
Used for sequential data (like speech/text).
○
Contains hidden statesand observable outputs.
○
Used in POS tagging, speech recognition.
3.
Gaussian Mixture Models (GMM):
○
Assumes data comes from multiple Gaussian distributions
○
Common in unsupervised learning(like clustering).

Logical Model
Logicalmodelsusealogicalexpressiontodividetheinstancespace
intosegmentsandhenceconstructgroupingmodels.Alogical
expressionisanexpressionthatreturnsaBooleanvalue,i.e.,aTrueor
Falseoutcome.Oncethedataisgroupedusingalogicalexpression,theFalseoutcome.Oncethedataisgroupedusingalogicalexpression,the
dataisdividedintohomogeneousgroupingsfortheproblemweare
tryingtosolve.
Forexample,foraclassiﬁcationproblem,alltheinstancesinthegroup
belongtooneclass.

Logical Model
●
Therearemainlytwokindsoflogicalmodels:TreemodelsandRule
models.
●
RulemodelsconsistofacollectionofimplicationsorIF-THENrules.
●
Fortree-basedmodels,the‘if-part’deﬁnesasegmentandthe‘then-part’
deﬁnesthebehaviourofthemodelforthissegment.deﬁnesthebehaviourofthemodelforthissegment.
●
Treemodelscanbeseenasaparticulartypeofrulemodelwheretheif-
partsoftherulesareorganisedinatreestructure.
●
BothTreemodelsandRulemodelsusethesameapproachtosupervised
learning.
●
Theapproachcanbesummarisedintwostrategies:wecouldfirstfindthe
bodyoftherule(theconcept)thatcoversasufficientlyhomogeneousset
ofexamplesandthenfindalabeltorepresentthebody.

Logical Model
These are rule-based modelsthat make decisions using logical operations or
decision paths.
1.Decision Trees:
○Hierarchical model with internal nodes(tests), branches(outcomes) &
leaves(results).leaves(results).
○Uses metrics like Gini Indexor Entropyfor splitting.
2.Rule-Based Systems:
○Set of IF-THENrules.
○e.g., IF “age < 18” THEN “not eligible”.
3.Inductive Logic Programming (ILP):
○Combines machine learning with logic programming.
○Learns rules from relational data.

Logical Model
❖
Advantages:
➢
High interpretability.
➢
Easy to visualize and explain.
❖
Limitations:
❖
Limitations:
➢
Can overfit training data (especially decision trees).
➢
May become complex with large rule sets.
❖
Applications
:
➢
Rule-based expert systems
➢
Medical decision support
➢
Fraud detection

Grouping and Gradient Models
Groupingreferstoclusteringorunsupervisedlearningtechniqueswherethe
modellearnsto groupdatapointsbasedonsimilarity orshared
characteristics,withoutusinglabeledoutcomes.
??????
Key Concepts:
??????
Key Concepts:
●
No labeled data (unsupervised)
●
Based on distance (Euclidean, cosine, etc.)
●
Often used for exploratory data analysis

Grouping and Gradient Models
❖
GradingModels:Rankorscoreitemsbasedonsome
criteria.
❖
RecommendationSystems:RecommendationSystems:
➢
Grades or ranks items for a user (like movies, books).
➢
Techniques: Collaborative filtering, Matrix Factorization.
❖
Learning to Rank (LTR):
➢
Used in search engines to rank web pages.
➢
Examples: RankNet, LambdaRank.

Gradient Models
These models use gradient-based optimization techniques(like gradient
descent) to minimize a loss functionand improve performance on
supervised tasks (regression/classification).
Key Concepts:
??????
Key Concepts:
●
Loss function (e.g., MSE, cross-entropy)
●
Optimization via gradient descent
●
Used in deep learning, regression, and classification

Gradient Models

Grouping and Gradient Models
❖
Advantages:
➢
Useful for unsupervised data.
➢
Effective in customer segmentation, ranking tasks.
❖
Limitations:Limitations:
➢
Number of clusters (k) often needs manual setting.
➢
Sensitive to scaling and noise.
❖
Applications:
➢
Product recommendations
➢
Social network analysis
➢
Market segmentation

Grouping VS Gradient

Parametric & Non Parametric Model

Parametric Model
●Definition: Models with a fixed number of parameters.
●Examples:
○Linear Regression
○Logistic Regression
○Neural Networks (with fixed architecture)
●Advantages: Fast, less data required.
●Limitations: May underfit complex data.

Non Parametric Model
●Definition: Models that grow with data, no fixed parameters.
●Examples:
○
K-Nearest Neighbors (KNN)
○
Decision Trees
○
Decision Trees
○
Kernel SVM
●
Advantages: More flexible, can model complex patterns.
●
Limitations: More data and computation needed.

Model Type Examples Core Idea
Geometric Linear Regression, SVM, Logistic
Regression
Geometry-based separation or fit
ProbabilisticNaive Bayes, HMM, GMM Based on probability distributions
Logical Decision Trees, Rule-based SystemsLogic and rules drive learning
Grouping &
Grading
K-Means, DBSCAN, Recommender
Systems
Cluster or rank based on
similarity/relevance
Parametric Linear Models, Neural Nets (fixed size)Fixed number of learnable parameters
Non-ParametricKNN, Decision Trees, Kernel SVM Parameters grow with data

Feature Transformation: Dimensionality reduction
techniques-PCA and LDA

Feature Transformation: Dimensionality reduction
techniques-PCA and LDA
Feature transformation involves modifying or projecting features into a new space
that simplifies the problem without losing important information.
Why is it important?Why is it important?
●
Reduces dimensionality of data
●
Removes redundancy and noise
●
Improves model performance andtraining speed
●
Helps in visualization of high-dimensional data

Dimensionality Reduction
Dimensionalityreductioninmachinelearningistheprocessof
reducingthenumberoffeaturesorvariablesinadatasetwhile
retainingasmuchoftheoriginalinformationaspossible.Inother
words,itisawayofsimplifyingthedatabyreducingitswords,itisawayofsimplifyingthedatabyreducingits
complexity.
Theneedfordimensionalityreductionariseswhenadatasethasa
largenumberoffeaturesorvariables.Havingtoomanyfeatures
canleadtooverfittingandincreasethecomplexityofthemodel.
Itcanalsomakeitdifficulttovisualizethedataandcanslow
downthetrainingprocess.
●
Feature Selection: Selects a subset of original features.

Dimensionality Reduction
Feature Selection
Feature Extraction
Feature Selection
1.Filter method
2.Wrapper method
3.Embedded method
1.Principal Component Analysis
2.Missing Value Ratio
3.Backward Feature Selection
4.Forward Feature Selection
5.Factor Analysis
6.Independent Component Analysis

Feature Selection

Feature selection
Featureselectionchoosesthemostrelevantfeaturesfromthedatasetwithout
alteringthem.Ithelpsremoveredundantorirrelevantfeatures,improving
modelefficiency.Somecommonmethodsare:
●
Filtermethodsrankthefeaturesbasedontheirrelevancetothetarget
●
Filtermethodsrankthefeaturesbasedontheirrelevancetothetarget
variable.
●
Wrappermethodsusethemodelperformanceasthecriteriafor
selectingfeatures.
●
Embeddedmethodscombinefeatureselectionwiththemodeltraining
process.

Feature Selection
By only keeping the most relevant variables from the original dataset
1.Correlation
2.Forward Selection
3.Backward Elimination
4.Select K Best
5.Missing value Ratio

Feature extraction

Feature Extraction vs Feature Selection

Feature extraction
Featureextractioninvolvescreatingnewfeaturesbycombiningor
transformingtheoriginalfeatures.Thesenewfeaturesretainmostofthe
dataset’simportantinformationinfewerdimensions.Commonfeature
extractionmethodsare:
1.PrincipalComponentAnalysis(PCA):Convertscorrelatedvariables
intouncorrelated'principalcomponents,reducingdimensionalityintouncorrelated'principalcomponents,reducingdimensionality
whilemaintainingasmuchvarianceaspossibleenablingmore
efficientanalysis.
2.MissingValueRatio:Variableswithmissingdatabeyondaset
thresholdareremoved,improvingdatasetreliability.
3.
BackwardFeatureElimination:Startswithallfeaturesandremoves
theleastsignificantonesineachiteration.Theprocesscontinuesuntil
onlythemostimpactfulfeaturesremain,optimizingmodel
performance.

Feature extraction
1.ForwardFeatureSelection:ForwardFeatureSelectionBeginswith
onefeature,addsothersincrementallyandkeepsthoseimproving
modelperformance.
2.RandomForest:RandomforestUsesdecisiontreestoevaluate
featureimportance,automaticallyselectingthemostrelevantfeatureimportance,automaticallyselectingthemostrelevant
featureswithouttheneedformanualcoding,enhancingmodel
accuracy.
3.FactorAnalysis:Groupsvariablesbycorrelationandkeepsthe
mostrelevantonesforfurtheranalysis.
4.
IndependentComponentAnalysis(ICA):Identifiesstatistically
independentcomponents,idealforapplicationslike‘blindsource
separation’wheretraditionalcorrelation-basedmethodsfallshort.

●
Faster Computation
●
Better Visualization
●
Data Loss &
Reduced Accuracy:
Advantages Disadvantages
●
Better Visualization
●
Prevent Overfitting
Reduced Accuracy:
●
Choosing the Right
Components
Introduction to Dimensionality Reduction -GeeksforGeeks

Introduction to Machine Learning -Basic concepts,Models and Description

About This Presentation

Slide Content

Tags

Categories

Download

Quick Actions

Statistics

Related Slideshows

Introduction to Machine Learning -Basic concepts,Models and Description

About This Presentation

Slide Content

Slide 1

Slide 2

Slide 3

Slide 4

Slide 5

Slide 7

Slide 11

Slide 17

Slide 18

Slide 22

Slide 23

Slide 24

Slide 26

Slide 27

Slide 28

Slide 30

Slide 31

Slide 32

Slide 33

Slide 34

Slide 35

Slide 36

Slide 37

Slide 38

Slide 39

Slide 40

Slide 42

Slide 43

Slide 44

Slide 45

Slide 46

Slide 47

Slide 48

Slide 49

Slide 50

Slide 51

Slide 52

Slide 53

Slide 54

Slide 55

Slide 60

Slide 61

Slide 62

Slide 63

Slide 64

Slide 65

Slide 66

Tags

Categories

Download

Quick Actions

Statistics

Related Slideshows

8-top-ai-courses-for-customer-support-representatives-in-2025.pptx

7-essential-ai-courses-for-call-center-supervisors-in-2025.pptx

25-essential-ai-courses-for-user-support-specialists-in-2025.pptx

8-essential-ai-courses-for-insurance-customer-service-representatives-in-2025.pptx

Know for Certain

PPT OPD LES 3ertt4t4tqqqe23e3e3rq2qq232.pptx