machine learning basic unit1 for third year cse studnets

sachinjadhav990783 15 views 80 slides Mar 02, 2025
Slide 1
Slide 1 of 80
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27
Slide 28
28
Slide 29
29
Slide 30
30
Slide 31
31
Slide 32
32
Slide 33
33
Slide 34
34
Slide 35
35
Slide 36
36
Slide 37
37
Slide 38
38
Slide 39
39
Slide 40
40
Slide 41
41
Slide 42
42
Slide 43
43
Slide 44
44
Slide 45
45
Slide 46
46
Slide 47
47
Slide 48
48
Slide 49
49
Slide 50
50
Slide 51
51
Slide 52
52
Slide 53
53
Slide 54
54
Slide 55
55
Slide 56
56
Slide 57
57
Slide 58
58
Slide 59
59
Slide 60
60
Slide 61
61
Slide 62
62
Slide 63
63
Slide 64
64
Slide 65
65
Slide 66
66
Slide 67
67
Slide 68
68
Slide 69
69
Slide 70
70
Slide 71
71
Slide 72
72
Slide 73
73
Slide 74
74
Slide 75
75
Slide 76
76
Slide 77
77
Slide 78
78
Slide 79
79
Slide 80
80

About This Presentation

machine learning basic unit1 for third year cse studnets


Slide Content

PCET’S Pimpri Chinchwad University
Department of Computer Science and Engineering
Course Name : Machine Learning
Course Code/Course Type : UBTML303A/PEC
TY. B.Tech
Prepared By: Dr. SachinJadhav

Course Objectives (CO):
•The objectives of Machine Learning are:
1.ToexploretheknowledgeofMachinelearninganditstypes.
2.Toanalyzevariousdatapre-processingmethods.
3.TolearnSuperviselearningmethods.
4.Toanalyzetheneedofunsupervisedlearningmethods.
5.Tolearnfundamentalneuralnetworkalgorithms.

Course Learning Outcomes (CLO):
Students would be able to:
1.Identify the needs and challenges of machine learning for real time
applications.
2.Apply various data pre-processing techniques to simplify and
speed up machine learning algorithms.
3.Apply appropriately supervised machine learning algorithms for
real time applications.
4.Compare and contrast different clustering algorithms.
5.Design a neural network for solving engineering problems.

UNIT I Hours-9
Syllabuscontents:
•Introduction To Machine Learning:
•Introduction to Machine Learning, Comparison of
Machine learning with traditional programming, ML vs
AI vs Data Science.
•Types of learning: Supervised, Unsupervised, and semi-
supervised, reinforcement learning techniques.

Introduction
•Brief overview of Traditional Programming and
Machine Learning.
•Key question: How do these paradigms differ in solving
problems?

What is Traditional Programming?
•Definition: A programming paradigm where rules and logic
are explicitly coded by developers.
•Example Workflow:
•-Input Data + Rules →Output.
•Applications:
•-Accounting systems.
•-Static websites.
•-Games with fixed logic.

What is Machine Learning?
•Definition: A programming paradigm where systems learn
patterns from data to make predictions or decisions.
•Example Workflow:
•-Input Data + Output →Algorithm generates Rules.
•Applications:
•-Image recognition.
•-Autonomous vehicles.
•-Recommendation systems.

Key Differences
•Aspect | Traditional Programming | Machine
Learning
•Logic | Explicitly coded by developers. | Learned
automatically from data.
•Adaptability | Static –needs manual updates. | Dynamic –
adapts with new data.
•Data Dependency | Limited dependency on data. | Heavily
reliant on large datasets.
•Output Accuracy | Deterministic –follows strict rules. |
Probabilistic –predictions may vary.
•Development Focus | Writing rules and conditions. | Preparing
data and tuning models.

Advantages of Machine Learning
•Handles complex, data-rich problems.
•Learns and improves over time.
•Automates pattern recognition tasks.

Advantages of Traditional
Programming
•Simplicity for rule-based tasks.
•Easier to debug and understand.
•Works well for predefined tasks.

When to Use Machine Learning?
•Predicting outcomes (e.g., stock prices).
•Recognizing patterns (e.g., facial recognition).
•Automating decision-making (e.g., fraud detection).

When to Use Traditional
Programming?
•Tasks with clearly defined rules.
•Problems where outcomes are fixed and predictable.
•Systems requiring full transparency and control.

Feature/Aspect ArtificialIntelligence(AI)MachineLearning(ML)DataScience
Definition
The simulation of human
intelligence in machines.
A subset of AI that involves
the use of algorithms to
enable machines to learn
from data.
An interdisciplinary field
that uses statistics,
algorithms, and technology
to extract insights from data.
Goal
To create systems capable of
performing tasks that
normally require human
intelligence.
To develop models that
allow computers to learn
from and make decisions
based on data.
To analyze and interpret
complex data to aid
decision-making.
TechniquesUsed
Neural networks, natural
language processing,
computer vision, robotics.
Supervised learning,
unsupervised learning,
reinforcement learning.
Data cleaning, data
transformation, statistical
modeling, machine learning.
Applications
Self-driving cars, virtual
assistants, robotics, game
playing.
Image recognition,
recommendation systems,
predictive analytics.
Business intelligence,
market analysis, healthcare
analytics, scientific research.
Comparison between AI/ML/DS

Feature/Aspect
ArtificialIntelligence
(AI)
MachineLearning(ML)DataScience
Tools/Frameworks
TensorFlow,Keras,
PyTorch, OpenCV.
Scikit-learn, XGBoost,
LightGBM, TensorFlow,
Keras.
Python, R, SQL, Hadoop,
Apache Spark, Jupyter
Notebook.
DataRequirements
Requires large amounts of
diverse data for training.
Requires labeled data for
supervised learning and
large datasets for training.
Can work with structured,
unstructured, and semi-
structured data.
ExpertiseNeeded
Knowledge of algorithms,
data structures, advanced
mathematics, domain-
specific knowledge.
Proficiency in
programming, statistics,
and understanding of
specific ML algorithms.
Strong foundation in
statistics, programming,
and domain-specific
knowledge.
Outcome
Intelligent systems that can
mimic human behavior.
Models that can predict or
classify data.
Actionable insights and
data-driven decision-
making.
Comparison between AI/ML/DS

WhatisMachineLearning?

Humancanlearnfrompastexperience
andmakedecisionofitsown
1
6

Whatisthisobject?
1
7

Whatisthisobject?
CAR
CAR
BIKE
BIKE
ItisaCAR
1
8

Letusaskthesame
questiontohim
Whatisthisobject?
1
9

Letusaskthesame
questiontohim
Whatisthisobject?
?

[But,heisahumanbeing.Hecanobserveand
learn]

Letusmakehimlearn
showhim

Letusmakehimlearn
showhim
CAR
CAR
BIKE
BIKE
2
3

Letusaskthesamequestionnow
Whatisthisobject?
10
CAR
CAR
BIKE
BIKE
Pastexperience

Letusaskthesamequestionnow
Whatisthisobject?
CAR
CAR
CAR
BIKE
BIKE

Machinesfollowinstructions
2
6
WhataboutaMachine?
[Itcannottakedecisionofitsown]

Machinesfollowinstructions
2
7
WhataboutaMachine?
Wecanaskamachine
•Toperformanarithmeticoperationssuchas
•Addition
•Multiplication
•Division

Machinesfollowinstructions
2
8
WhataboutaMachine?
•Comparison
•Print
•Plottingachart

2
9
WhatisMachine
Learning?
[Wewantamachinetoactlikeahuman]

WhatisMachineLearning?
[toidentifythisobject.]

WhatisMachine
Learning?
[predictthepriceinfuture]
3
1
Pricein2025?

WhatisMachine
Learning?
3
2
[NaturalLanguageunderstand,andcorrectgrammar]
Imademethimyesterday

WhatisMachine
Learning?
recognizeface
[RecognizeFaces]
3
3

WhatisMachineLearning?
[Whatdowedo?
Justlike,whatwedidtohuman,
weneedtoprovideexperience
tothemachine.
]

WhatisMachineLearning?
Dataset
[
ThiswhatwecalledasData
orTrainingdataset
So,wefirstneedtoprovide
trainingdatasettothe
machine
]
+

WhatisMachineLearning?
[Then,devisealgorithmsandexecuteprogramsonthe
data
Withrespecttotheunderlyingtargettasks]
Dataset
++

WhatisMachineLearning?
Dataset
+
[ Then,usingtheprograms,Identify
requiredrules]
+ +

WhatisMachineLearning?
Dataset
+
[extractrequiredpatterns]
+ +

WhatisMachineLearning?
Dataset
+
[Identifyrelations]
+ +

WhatisMachineLearning?
Dataset
+
[Sothatmachinecanderiveinferences
fromthedata]
+ + =

Insummary,whatismachinelearning?
Givenamachinelearningproblem
•Identifyandcreatetheappropriatedataset
•Performcomputationtolearn
•Requiredrules,patternandrelations
•Outputthedecision

MachineLearningParadigms
•Supervised
•UnsupervisedLearning
•Reinforcementlearning
[Weashumanbeingsolvevarioustypesofprobleminourday-to-daylife,<pause>Variousdecisions
needtobetaken.
Dependingonthenatureoftheproblem,machinelearningtaskscanbebroadlydividedin]

WhatisSupervisedLearning?
[Insupervisedlearning,weneedsomethingcalledaLabelledTrainingDataset]
CAR
CAR
BIKE
BIKE
Samples
+
Labels
=
TrainingDataset

WhatisSupervisedLearning?
CAR
CAR
BIKE
BIKE
Samples
+
Labels
=
TrainingDataset??????(,)=
[Givenalabelleddataset,thetaskistodeviseafunctionwhichtakesthedataset,andanewsample,and
producesanoutputvalue.]
44

WhatisSupervisedLearning?
CAR
CAR
BIKE
BIKE
Samples
+
Labels
=
TrainingDataset??????(,)=
[Givenalabelleddataset,thetaskistodeviseafunctionwhichtakesthedataset,andanewsample,and
producesanoutputvalue.]
45

WhatisSupervisedLearning?
CAR
CAR
BIKE
BIKE
Samples
+
Labels
=
TrainingDataset??????(,)=CAR
[Givenalabelleddataset,thetaskistodeviseafunctionwhichtakesthedataset,andanewsample,and
producesanoutputvalue.]
46

WhatisSupervisedLearning?
[Ifthepossibleoutputvaluesofthefunctionarepredefinedanddiscrete/categorical,itiscalled
Classification
CAR
CAR
BIKE
BIKE
Samples
+
Labels
=
TrainingDataset
Classification
??????(,)=CAR
33

WhatisSupervisedLearning?
[Predefinedclassesmeans,itwillproduceoutputonlyfromthelabelsdefinedinthedataset.Forexample,
evenifweinputabus,itwillproduceeitherCAR orBIKE ]
CAR
CAR
BIKE
BIKE
Samples
+
Labels
=
TrainingDataset
Classification
??????(,)=CAR

Classifier
Elephant
Tiger
Dataset
IdentifytheAnimal?
Classifier
Elephant

Regression
Dataset
[Ifthepossibleoutputvaluesofthefunctionarecontinuousrealvalues,thenitiscalledRegression
•Regression
•??????(, )=20500.50

5
1
[
TheclassificationandRegressionproblemsaresupervised,becausethedecisiondependsonthe
characteristicsofthegroundtruthlabelsorvaluespresentinthedataset,whichwedefineasexperience
]

WhatisUnsupervisedLearning
[Intheunsupervisedlearning,wedonotneedtoknowthelabelsorGroundtruthvalues]
CAR
CAR
BIKE
BIKE
Dataset

WhatisUnsupervisedLearning
Dataset
[Thetaskistoidentifythepatternslikegroupthesimilarobjectstogether]
Clustering
39

WhatisUnsupervisedLearning
Dataset
[Associationruleslike]
AssociationRulesMining

MoreExample
UnsupervisedLearning
Dataset

MoreExample
UnsupervisedLearning
Dataset

MoreExampleUnsupervisedLearning
5
7

5
8
WhatisReinforcement
Learning
[Itisalsoknownaslearningfrom trialsanderrors]

WhatisReinforcementLearning
5
9

WhatisReinforcementLearning
6
0

WhatisReinforcementLearning
6
1

AnotherExample
Agent
6
2
Task
Environment

ReinforcementLearning
Punishment

ReinforcementLearning
Reward

ReinforcementLearning
Reward
BabyLearnfromtheTrialsandErrors
ReinforcementLearning

2/3/2025PIMPRI CHINCHWAD
UNIVERSITY
66

Machine Learning Activities
•Thefirststepinmachinelearningactivitystartswithdata.Incaseof
supervisedlearning,itisthelabelledtrainingdatasetfollowedbytest
datawhichisnotlabelled.
•Incaseofunsupervisedlearning,thereisnoquestionoflabelleddata
butthetaskistofindpatternsintheinputdata.
•Athoroughreviewandexplorationofthedataisneededtounderstand
thetypeofthedata,thequalityofthedataandrelationshipbetween
thedifferentdataelements.
•Basedonthat,multiplepre-processingactivitiesmayneedtobedone
ontheinputdatabeforewecangoaheadwithcoremachinelearning
activities.
PIMPRI CHINCHWAD UNIVERSITY
67

Machine Learning Activities
•Following are the typical preparation activities done once the input data comes into the machine
learning system:
1.Understand the type of data in the given input data set.
2.Explore the data to understand the nature and quality.
3.Explore the relationships amongst the data elements, e.g. inter-feature relationship.
4.Find potential issues in data.
5.Do the necessary remediation, e.g. impute missing data values, etc., if needed.
6.Apply pre-processing steps, as necessary.
7.Once the data is prepared for modelling, then the learning tasks start off. As a part of it, do the
following activities:
1.The input data is first divided into parts –the training data and the test data (called
holdout). This step is applicable for supervised learning only.
2.Consider different models or learning algorithms for selection.
3.Train the model based on the training data for supervised learning problem and apply to
unknown data. Directly apply the chosen unsupervised model on the input data for
unsupervised learning problem.
8.After the model is selected, trained (for supervised learning), and applied on input data, the
performance of the model is evaluated. Based on options available, specific actions can be taken to
improve the performance of the model, if possible.
PIMPRI CHINCHWAD UNIVERSITY
68

2/3/2025
PIMPRI CHINCHWAD UNIVERSITY
69
Machine Learning process steps

2/3/2025
PIMPRI CHINCHWAD UNIVERSITY
70
Machine Learning process steps

BASIC TYPES OF DATA IN MACHINE LEARNING
•Beforestartingwithtypesofdata,let’sfirstunderstandwhatadatasetisandwhatarethe
elementsofadataset.Adatasetisacollectionofrelatedinformationor
records.Theinformationmaybeonsomeentityorsomesubjectarea.For
example(Fig.),wemayhaveadatasetonstudentsinwhicheachrecordconsistsof
informationaboutaspecificstudent.Again,wecanhaveadatasetonstudent
performancewhichhasrecordsprovidingperformance,i.e.marksontheindividual
subjects.
•Eachrowofadatasetiscalledarecord.Eachdatasetalsohasmultiple
attributes,eachofwhichgivesinformationonaspecificcharacteristic.
•Forexample,inthedatasetonstudents,therearefourattributesnamelyRollNumber,
Name,Gender,andAge,eachofwhichunderstandablyisaspecificcharacteristicabout
thestudententity.
•Attributescanalsobetermedasfeature,variable,dimensionorfield.Both
thedatasets,StudentandStudentPerformance,arehavingfourfeaturesor
dimensions;hencetheyaretoldtohavefourdimensionaldataspace.
•Aroworrecordrepresentsapointinthefour-dimensionaldataspaceas
eachrowhasspecificvaluesforeachofthefourattributesorfeatures.Value
ofanattribute,quiteunderstandably,mayvaryfromrecordtorecord.Forexample,ifwe
refertothefirsttworecordsintheStudentdataset,thevalueofattributesName,Gender,
andAgearedifferent(Fig.). PIMPRI CHINCHWAD UNIVERSITY
71

PIMPRI CHINCHWAD UNIVERSITY
72

Data types
•Datacanbroadlybedividedintofollowingtwotypes:
•1.Qualitativedata
•2.Quantitativedata
•Qualitativedataprovidesinformationaboutthequalityofanobjectorinformation
whichcannotbemeasured.Forexample,ifweconsiderthequalityofperformanceof
studentsintermsof‘Good’,‘Average’,and‘Poor’,itfallsunderthecategoryofqualitative
data.Also,nameorrollnumberofstudentsareinformationthatcannotbemeasured
usingsomescaleofmeasurement.Sotheywouldfallunderqualitativedata.Qualitative
dataisalsocalledcategoricaldata.
•Qualitativedatacanbefurthersubdividedintotwotypesasfollows:
▫1.Nominaldata
▫2.Ordinaldata
•Nominaldataisonewhichhasnonumericvalue,butanamedvalue.Itisusedfor
assigningnamedvaluestoattributes.Nominalvaluescannotbequantified.Examplesof
nominaldataare
▫1.Bloodgroup:A,B,O,AB,etc.
▫2.Nationality:Indian,American,British,etc.
▫3.Gender:Male,Female,Other
PIMPRI CHINCHWAD UNIVERSITY
73

Data types-Qualitative data
•Itisobvious,mathematicaloperationssuchasaddition,subtraction,
multiplication,etc.cannotbeperformedonnominaldata.Forthatreason,
statisticalfunctionssuchasmean,variance,etc.canalsonotbeappliedon
nominaldata.However,abasiccountispossible.Somode,i.e.most
frequentlyoccurringvalue,canbeidentifiedfornominaldata.
•Ordinaldata,inadditiontopossessingthepropertiesofnominaldata,can
alsobenaturallyordered.Thismeansordinaldataalsoassignsnamed
valuestoattributesbutunlikenominaldata,theycanbearrangedina
sequenceofincreasingordecreasingvaluesothatwecansaywhetheravalueis
betterthanorgreaterthananothervalue.
•Examples of ordinal data are
▫1. Customer satisfaction: ‘Very Happy’, ‘Happy’, ‘Unhappy’, etc.
▫2. Grades: A, B, C, etc.
▫3. Hardness of Metal: ‘Very Hard’, ‘Hard’, ‘Soft’, etc.
•Like nominal data, basic counting is possible for ordinal data. Hence, the
mode can be identified. Since ordering is possible in case of ordinal data,
median, and quartiles can be identified in addition. Mean can still not be
calculated.
PIMPRI CHINCHWAD UNIVERSITY
74

Data types-Quantitative data
•Quantitativedatarelatestoinformationaboutthequantityof
anobject–henceitcanbemeasured.Forexample,ifweconsider
theattribute‘marks’,itcanbemeasuredusingascaleof
measurement.Quantitativedataisalsotermedasnumericdata.
Therearetwotypesofquantitativedata:
▫1.Intervaldata
▫2.Ratiodata
•Intervaldataisnumericdataforwhichnotonlytheorderis
known,buttheexactdifferencebetweenvaluesisalsoknown.
•AnidealexampleofintervaldataisCelsiustemperature.The
differencebetweeneachvalueremainsthesameinCelsius
temperature.
•Forexample,thedifferencebetween12°Cand18°Cdegreesis
measurableandis6°Casinthecaseofdifferencebetween15.5°C
and21.5°C.Otherexamplesincludedate,time,etc.
PIMPRI CHINCHWAD UNIVERSITY
75

Data types-Quantitative data
•For interval data, mathematical operations such as addition
and subtraction are possible. For that reason, for interval
data, the central tendency can be measured by mean,
median, or mode. Standard deviation can also be calculated.
•Ratio data represents numeric data for which exact value
can be measured. Absolute zero is available for ratio data.
Also, these variables can be added, subtracted, multiplied, or
divided. The central tendency can be measured by mean,
median, or mode and methods of dispersion such as
standard deviation. Examples of ratio data include height,
weight, age, salary, etc.
PIMPRI CHINCHWAD UNIVERSITY
76

Data types-Quantitative data
•Apart from the approach detailed above, attributes can also be categorized
into types based on a number of values that can be assigned. The
attributes can be either discrete or continuous based on this factor.
•Discrete attributes can assume a finite or countably infinite number of
values. Nominal attributes such as roll number, street number, pin code, etc.
can have a finite number of values whereas numeric attributes such as count,
rank of students, etc. can have countably infinite values. A special type of
discrete attribute which can assume two values only is called binary
attribute. Examples of binary attribute include male/ female,
positive/negative, yes/no, etc.
•Continuous attributes can assume any possible value which is a real
number. Examples of continuous attribute include length, height, weight,
price, etc.
•In general, nominal and ordinal attributes are discrete. On the other
hand, interval and ratio attributes are continuous, barring a few
exceptions, e.g. ‘count’ attribute.
PIMPRI CHINCHWAD UNIVERSITY
77

Data Exploration
• Data can broadly be
• divided into following two types:
• 1. Qualitative data
• 2. Quantitative data
• Qualitative data provides information about the quality of
• an object or information which cannot be measured. For
• example, if we consider the quality of performance of students
• in terms of ‘Good’, ‘Average’, and ‘Poor’, it falls under the
• category of qualitative data. Also, name or roll number of
• students are information that cannot be measured using some
• scale of measurement. So they would fall under qualitative
• data. Qualitative data is also called categorical data.
• Qualitative data can be further subdivided into two types as
• follows:
• 1. Nominal data
• 2. Ordinal data
• Nominal data is one which has no numeric value, but a
• named value. It is used for assigning named values to
• attributes. Nominal values cannot be quantified. Examples of
• nominal data are
• 1. Blood group: A, B, O, AB, etc.
• 2. Nationality: Indian, American, British, etc.
• 3. Gender: Male, Female, Other
PIMPRI CHINCHWAD UNIVERSITY
78

• Data can broadly be
• divided into following two types:
• 1. Qualitative data
• 2. Quantitative data
• Qualitative data provides information about the quality of
• an object or information which cannot be measured. For
• example, if we consider the quality of performance of students
• in terms of ‘Good’, ‘Average’, and ‘Poor’, it falls under the
• category of qualitative data. Also, name or roll number of
• students are information that cannot be measured using some
• scale of measurement. So they would fall under qualitative
• data. Qualitative data is also called categorical data.
• Qualitative data can be further subdivided into two types as
• follows:
• 1. Nominal data
• 2. Ordinal data
• Nominal data is one which has no numeric value, but a
• named value. It is used for assigning named values to
• attributes. Nominal values cannot be quantified. Examples of
• nominal data are
• 1. Blood group: A, B, O, AB, etc.
• 2. Nationality: Indian, American, British, etc.
• 3. Gender: Male, Female, Other
PIMPRI CHINCHWAD UNIVERSITY
79

Thank You
Tags