machine learning basic unit1 for third year cse studnets

PCET’S Pimpri Chinchwad University
Department of Computer Science and Engineering
Course Name : Machine Learning
Course Code/Course Type : UBTML303A/PEC
TY. B.Tech
Prepared By: Dr. SachinJadhav

Course Objectives (CO):
•The objectives of Machine Learning are:
1.ToexploretheknowledgeofMachinelearninganditstypes.
2.Toanalyzevariousdatapre-processingmethods.
3.TolearnSuperviselearningmethods.
4.Toanalyzetheneedofunsupervisedlearningmethods.
5.Tolearnfundamentalneuralnetworkalgorithms.

Course Learning Outcomes (CLO):
Students would be able to:
1.Identify the needs and challenges of machine learning for real time
applications.
2.Apply various data pre-processing techniques to simplify and
speed up machine learning algorithms.
3.Apply appropriately supervised machine learning algorithms for
real time applications.
4.Compare and contrast different clustering algorithms.
5.Design a neural network for solving engineering problems.

UNIT I Hours-9
Syllabuscontents:
•Introduction To Machine Learning:
•Introduction to Machine Learning, Comparison of
Machine learning with traditional programming, ML vs
AI vs Data Science.
•Types of learning: Supervised, Unsupervised, and semi-
supervised, reinforcement learning techniques.

Introduction
•Brief overview of Traditional Programming and
Machine Learning.
•Key question: How do these paradigms differ in solving
problems?

What is Traditional Programming?
•Definition: A programming paradigm where rules and logic
are explicitly coded by developers.
•Example Workflow:
•-Input Data + Rules →Output.
•Applications:
•-Accounting systems.
•-Static websites.
•-Games with fixed logic.

What is Machine Learning?
•Definition: A programming paradigm where systems learn
patterns from data to make predictions or decisions.
•Example Workflow:
•-Input Data + Output →Algorithm generates Rules.
•Applications:
•-Image recognition.
•-Autonomous vehicles.
•-Recommendation systems.

Advantages of Machine Learning
•Handles complex, data-rich problems.
•Learns and improves over time.
•Automates pattern recognition tasks.

Advantages of Traditional
Programming
•Simplicity for rule-based tasks.
•Easier to debug and understand.
•Works well for predefined tasks.

When to Use Machine Learning?
•Predicting outcomes (e.g., stock prices).
•Recognizing patterns (e.g., facial recognition).
•Automating decision-making (e.g., fraud detection).

When to Use Traditional
Programming?
•Tasks with clearly defined rules.
•Problems where outcomes are fixed and predictable.
•Systems requiring full transparency and control.

Feature/Aspect ArtificialIntelligence(AI)MachineLearning(ML)DataScience
Definition
The simulation of human
intelligence in machines.
A subset of AI that involves
the use of algorithms to
enable machines to learn
from data.
An interdisciplinary field
that uses statistics,
algorithms, and technology
to extract insights from data.
Goal
To create systems capable of
performing tasks that
normally require human
intelligence.
To develop models that
allow computers to learn
from and make decisions
based on data.
To analyze and interpret
complex data to aid
decision-making.
TechniquesUsed
Neural networks, natural
language processing,
computer vision, robotics.
Supervised learning,
unsupervised learning,
reinforcement learning.
Data cleaning, data
transformation, statistical
modeling, machine learning.
Applications
Self-driving cars, virtual
assistants, robotics, game
playing.
Image recognition,
recommendation systems,
predictive analytics.
Business intelligence,
market analysis, healthcare
analytics, scientific research.
Comparison between AI/ML/DS

Feature/Aspect
ArtificialIntelligence
(AI)
MachineLearning(ML)DataScience
Tools/Frameworks
TensorFlow,Keras,
PyTorch, OpenCV.
Scikit-learn, XGBoost,
LightGBM, TensorFlow,
Keras.
Python, R, SQL, Hadoop,
Apache Spark, Jupyter
Notebook.
DataRequirements
Requires large amounts of
diverse data for training.
Requires labeled data for
supervised learning and
large datasets for training.
Can work with structured,
unstructured, and semi-
structured data.
ExpertiseNeeded
Knowledge of algorithms,
data structures, advanced
mathematics, domain-
specific knowledge.
Proficiency in
programming, statistics,
and understanding of
specific ML algorithms.
Strong foundation in
statistics, programming,
and domain-specific
knowledge.
Outcome
Intelligent systems that can
mimic human behavior.
Models that can predict or
classify data.
Actionable insights and
data-driven decision-
making.
Comparison between AI/ML/DS

WhatisMachineLearning?

Humancanlearnfrompastexperience
andmakedecisionofitsown
1
6

Whatisthisobject?
1
7

Whatisthisobject?
CAR
CAR
BIKE
BIKE
ItisaCAR
1
8

Letusaskthesame
questiontohim
Whatisthisobject?
1
9

Letusaskthesame
questiontohim
Whatisthisobject?
?

[But,heisahumanbeing.Hecanobserveand
learn]

Letusmakehimlearn
showhim

Letusmakehimlearn
showhim
CAR
CAR
BIKE
BIKE
2
3

Letusaskthesamequestionnow
Whatisthisobject?
10
CAR
CAR
BIKE
BIKE
Pastexperience

Letusaskthesamequestionnow
Whatisthisobject?
CAR
CAR
CAR
BIKE
BIKE

Machinesfollowinstructions
2
6
WhataboutaMachine?
[Itcannottakedecisionofitsown]

Machinesfollowinstructions
2
7
WhataboutaMachine?
Wecanaskamachine
•Toperformanarithmeticoperationssuchas
•Addition
•Multiplication
•Division

Machinesfollowinstructions
2
8
WhataboutaMachine?
•Comparison
•Print
•Plottingachart

2
9
WhatisMachine
Learning?
[Wewantamachinetoactlikeahuman]

WhatisMachineLearning?
[toidentifythisobject.]

WhatisMachine
Learning?
[predictthepriceinfuture]
3
1
Pricein2025?

WhatisMachine
Learning?
3
2
[NaturalLanguageunderstand,andcorrectgrammar]
Imademethimyesterday

WhatisMachine
Learning?
recognizeface
[RecognizeFaces]
3
3

WhatisMachineLearning?
[Whatdowedo?
Justlike,whatwedidtohuman,
weneedtoprovideexperience
tothemachine.
]

WhatisMachineLearning?
Dataset
[
ThiswhatwecalledasData
orTrainingdataset
So,wefirstneedtoprovide
trainingdatasettothe
machine
]
+

WhatisMachineLearning?
[Then,devisealgorithmsandexecuteprogramsonthe
data
Withrespecttotheunderlyingtargettasks]
Dataset
++

WhatisMachineLearning?
Dataset
+
[ Then,usingtheprograms,Identify
requiredrules]
+ +

WhatisMachineLearning?
Dataset
+
[extractrequiredpatterns]
+ +

WhatisMachineLearning?
Dataset
+
[Identifyrelations]
+ +

WhatisMachineLearning?
Dataset
+
[Sothatmachinecanderiveinferences
fromthedata]
+ + =

Insummary,whatismachinelearning?
Givenamachinelearningproblem
•Identifyandcreatetheappropriatedataset
•Performcomputationtolearn
•Requiredrules,patternandrelations
•Outputthedecision

MachineLearningParadigms
•Supervised
•UnsupervisedLearning
•Reinforcementlearning
[Weashumanbeingsolvevarioustypesofprobleminourday-to-daylife,<pause>Variousdecisions
needtobetaken.
Dependingonthenatureoftheproblem,machinelearningtaskscanbebroadlydividedin]

WhatisSupervisedLearning?
[Insupervisedlearning,weneedsomethingcalledaLabelledTrainingDataset]
CAR
CAR
BIKE
BIKE
Samples
+
Labels
=
TrainingDataset

WhatisSupervisedLearning?
CAR
CAR
BIKE
BIKE
Samples
+
Labels
=
TrainingDataset??????(,)=
[Givenalabelleddataset,thetaskistodeviseafunctionwhichtakesthedataset,andanewsample,and
producesanoutputvalue.]
44

WhatisSupervisedLearning?
CAR
CAR
BIKE
BIKE
Samples
+
Labels
=
TrainingDataset??????(,)=
[Givenalabelleddataset,thetaskistodeviseafunctionwhichtakesthedataset,andanewsample,and
producesanoutputvalue.]
45

WhatisSupervisedLearning?
CAR
CAR
BIKE
BIKE
Samples
+
Labels
=
TrainingDataset??????(,)=CAR
[Givenalabelleddataset,thetaskistodeviseafunctionwhichtakesthedataset,andanewsample,and
producesanoutputvalue.]
46

WhatisSupervisedLearning?
[Ifthepossibleoutputvaluesofthefunctionarepredefinedanddiscrete/categorical,itiscalled
Classification
CAR
CAR
BIKE
BIKE
Samples
+
Labels
=
TrainingDataset
Classification
??????(,)=CAR
33

WhatisSupervisedLearning?
[Predefinedclassesmeans,itwillproduceoutputonlyfromthelabelsdefinedinthedataset.Forexample,
evenifweinputabus,itwillproduceeitherCAR orBIKE ]
CAR
CAR
BIKE
BIKE
Samples
+
Labels
=
TrainingDataset
Classification
??????(,)=CAR

Classifier
Elephant
Tiger
Dataset
IdentifytheAnimal?
Classifier
Elephant

Regression
Dataset
[Ifthepossibleoutputvaluesofthefunctionarecontinuousrealvalues,thenitiscalledRegression
•Regression
•??????(, )=20500.50

5
1
[
TheclassificationandRegressionproblemsaresupervised,becausethedecisiondependsonthe
characteristicsofthegroundtruthlabelsorvaluespresentinthedataset,whichwedefineasexperience
]

WhatisUnsupervisedLearning
[Intheunsupervisedlearning,wedonotneedtoknowthelabelsorGroundtruthvalues]
CAR
CAR
BIKE
BIKE
Dataset

WhatisUnsupervisedLearning
Dataset
[Thetaskistoidentifythepatternslikegroupthesimilarobjectstogether]
Clustering
39

WhatisUnsupervisedLearning
Dataset
[Associationruleslike]
AssociationRulesMining

MoreExample
UnsupervisedLearning
Dataset

MoreExampleUnsupervisedLearning
5
7

5
8
WhatisReinforcement
Learning
[Itisalsoknownaslearningfrom trialsanderrors]

WhatisReinforcementLearning
5
9

WhatisReinforcementLearning
6
0

WhatisReinforcementLearning
6
1

AnotherExample
Agent
6
2
Task
Environment

ReinforcementLearning
Punishment

ReinforcementLearning
Reward

ReinforcementLearning
Reward
BabyLearnfromtheTrialsandErrors
ReinforcementLearning

2/3/2025PIMPRI CHINCHWAD
UNIVERSITY
66

Machine Learning Activities
•Thefirststepinmachinelearningactivitystartswithdata.Incaseof
supervisedlearning,itisthelabelledtrainingdatasetfollowedbytest
datawhichisnotlabelled.
•Incaseofunsupervisedlearning,thereisnoquestionoflabelleddata
butthetaskistofindpatternsintheinputdata.
•Athoroughreviewandexplorationofthedataisneededtounderstand
thetypeofthedata,thequalityofthedataandrelationshipbetween
thedifferentdataelements.
•Basedonthat,multiplepre-processingactivitiesmayneedtobedone
ontheinputdatabeforewecangoaheadwithcoremachinelearning
activities.
PIMPRI CHINCHWAD UNIVERSITY
67

Machine Learning Activities
•Following are the typical preparation activities done once the input data comes into the machine
learning system:
1.Understand the type of data in the given input data set.
2.Explore the data to understand the nature and quality.
3.Explore the relationships amongst the data elements, e.g. inter-feature relationship.
4.Find potential issues in data.
5.Do the necessary remediation, e.g. impute missing data values, etc., if needed.
6.Apply pre-processing steps, as necessary.
7.Once the data is prepared for modelling, then the learning tasks start off. As a part of it, do the
following activities:
1.The input data is first divided into parts –the training data and the test data (called
holdout). This step is applicable for supervised learning only.
2.Consider different models or learning algorithms for selection.
3.Train the model based on the training data for supervised learning problem and apply to
unknown data. Directly apply the chosen unsupervised model on the input data for
unsupervised learning problem.
8.After the model is selected, trained (for supervised learning), and applied on input data, the
performance of the model is evaluated. Based on options available, specific actions can be taken to
improve the performance of the model, if possible.
PIMPRI CHINCHWAD UNIVERSITY
68

2/3/2025
PIMPRI CHINCHWAD UNIVERSITY
69
Machine Learning process steps

2/3/2025
PIMPRI CHINCHWAD UNIVERSITY
70
Machine Learning process steps

BASIC TYPES OF DATA IN MACHINE LEARNING
•Beforestartingwithtypesofdata,let’sfirstunderstandwhatadatasetisandwhatarethe
elementsofadataset.Adatasetisacollectionofrelatedinformationor
records.Theinformationmaybeonsomeentityorsomesubjectarea.For
example(Fig.),wemayhaveadatasetonstudentsinwhicheachrecordconsistsof
informationaboutaspecificstudent.Again,wecanhaveadatasetonstudent
performancewhichhasrecordsprovidingperformance,i.e.marksontheindividual
subjects.
•Eachrowofadatasetiscalledarecord.Eachdatasetalsohasmultiple
attributes,eachofwhichgivesinformationonaspecificcharacteristic.
•Forexample,inthedatasetonstudents,therearefourattributesnamelyRollNumber,
Name,Gender,andAge,eachofwhichunderstandablyisaspecificcharacteristicabout
thestudententity.
•Attributescanalsobetermedasfeature,variable,dimensionorfield.Both
thedatasets,StudentandStudentPerformance,arehavingfourfeaturesor
dimensions;hencetheyaretoldtohavefourdimensionaldataspace.
•Aroworrecordrepresentsapointinthefour-dimensionaldataspaceas
eachrowhasspecificvaluesforeachofthefourattributesorfeatures.Value
ofanattribute,quiteunderstandably,mayvaryfromrecordtorecord.Forexample,ifwe
refertothefirsttworecordsintheStudentdataset,thevalueofattributesName,Gender,
andAgearedifferent(Fig.). PIMPRI CHINCHWAD UNIVERSITY
71

PIMPRI CHINCHWAD UNIVERSITY
72

Data types
•Datacanbroadlybedividedintofollowingtwotypes:
•1.Qualitativedata
•2.Quantitativedata
•Qualitativedataprovidesinformationaboutthequalityofanobjectorinformation
whichcannotbemeasured.Forexample,ifweconsiderthequalityofperformanceof
studentsintermsof‘Good’,‘Average’,and‘Poor’,itfallsunderthecategoryofqualitative
data.Also,nameorrollnumberofstudentsareinformationthatcannotbemeasured
usingsomescaleofmeasurement.Sotheywouldfallunderqualitativedata.Qualitative
dataisalsocalledcategoricaldata.
•Qualitativedatacanbefurthersubdividedintotwotypesasfollows:
▫1.Nominaldata
▫2.Ordinaldata
•Nominaldataisonewhichhasnonumericvalue,butanamedvalue.Itisusedfor
assigningnamedvaluestoattributes.Nominalvaluescannotbequantified.Examplesof
nominaldataare
▫1.Bloodgroup:A,B,O,AB,etc.
▫2.Nationality:Indian,American,British,etc.
▫3.Gender:Male,Female,Other
PIMPRI CHINCHWAD UNIVERSITY
73

Data types-Qualitative data
•Itisobvious,mathematicaloperationssuchasaddition,subtraction,
multiplication,etc.cannotbeperformedonnominaldata.Forthatreason,
statisticalfunctionssuchasmean,variance,etc.canalsonotbeappliedon
nominaldata.However,abasiccountispossible.Somode,i.e.most
frequentlyoccurringvalue,canbeidentifiedfornominaldata.
•Ordinaldata,inadditiontopossessingthepropertiesofnominaldata,can
alsobenaturallyordered.Thismeansordinaldataalsoassignsnamed
valuestoattributesbutunlikenominaldata,theycanbearrangedina
sequenceofincreasingordecreasingvaluesothatwecansaywhetheravalueis
betterthanorgreaterthananothervalue.
•Examples of ordinal data are
▫1. Customer satisfaction: ‘Very Happy’, ‘Happy’, ‘Unhappy’, etc.
▫2. Grades: A, B, C, etc.
▫3. Hardness of Metal: ‘Very Hard’, ‘Hard’, ‘Soft’, etc.
•Like nominal data, basic counting is possible for ordinal data. Hence, the
mode can be identified. Since ordering is possible in case of ordinal data,
median, and quartiles can be identified in addition. Mean can still not be
calculated.
PIMPRI CHINCHWAD UNIVERSITY
74

Data types-Quantitative data
•Quantitativedatarelatestoinformationaboutthequantityof
anobject–henceitcanbemeasured.Forexample,ifweconsider
theattribute‘marks’,itcanbemeasuredusingascaleof
measurement.Quantitativedataisalsotermedasnumericdata.
Therearetwotypesofquantitativedata:
▫1.Intervaldata
▫2.Ratiodata
•Intervaldataisnumericdataforwhichnotonlytheorderis
known,buttheexactdifferencebetweenvaluesisalsoknown.
•AnidealexampleofintervaldataisCelsiustemperature.The
differencebetweeneachvalueremainsthesameinCelsius
temperature.
•Forexample,thedifferencebetween12°Cand18°Cdegreesis
measurableandis6°Casinthecaseofdifferencebetween15.5°C
and21.5°C.Otherexamplesincludedate,time,etc.
PIMPRI CHINCHWAD UNIVERSITY
75

Data types-Quantitative data
•For interval data, mathematical operations such as addition
and subtraction are possible. For that reason, for interval
data, the central tendency can be measured by mean,
median, or mode. Standard deviation can also be calculated.
•Ratio data represents numeric data for which exact value
can be measured. Absolute zero is available for ratio data.
Also, these variables can be added, subtracted, multiplied, or
divided. The central tendency can be measured by mean,
median, or mode and methods of dispersion such as
standard deviation. Examples of ratio data include height,
weight, age, salary, etc.
PIMPRI CHINCHWAD UNIVERSITY
76

Data types-Quantitative data
•Apart from the approach detailed above, attributes can also be categorized
into types based on a number of values that can be assigned. The
attributes can be either discrete or continuous based on this factor.
•Discrete attributes can assume a finite or countably infinite number of
values. Nominal attributes such as roll number, street number, pin code, etc.
can have a finite number of values whereas numeric attributes such as count,
rank of students, etc. can have countably infinite values. A special type of
discrete attribute which can assume two values only is called binary
attribute. Examples of binary attribute include male/ female,
positive/negative, yes/no, etc.
•Continuous attributes can assume any possible value which is a real
number. Examples of continuous attribute include length, height, weight,
price, etc.
•In general, nominal and ordinal attributes are discrete. On the other
hand, interval and ratio attributes are continuous, barring a few
exceptions, e.g. ‘count’ attribute.
PIMPRI CHINCHWAD UNIVERSITY
77

Data Exploration
• Data can broadly be
• divided into following two types:
• 1. Qualitative data
• 2. Quantitative data
• Qualitative data provides information about the quality of
• an object or information which cannot be measured. For
• example, if we consider the quality of performance of students
• in terms of ‘Good’, ‘Average’, and ‘Poor’, it falls under the
• category of qualitative data. Also, name or roll number of
• students are information that cannot be measured using some
• scale of measurement. So they would fall under qualitative
• data. Qualitative data is also called categorical data.
• Qualitative data can be further subdivided into two types as
• follows:
• 1. Nominal data
• 2. Ordinal data
• Nominal data is one which has no numeric value, but a
• named value. It is used for assigning named values to
• attributes. Nominal values cannot be quantified. Examples of
• nominal data are
• 1. Blood group: A, B, O, AB, etc.
• 2. Nationality: Indian, American, British, etc.
• 3. Gender: Male, Female, Other
PIMPRI CHINCHWAD UNIVERSITY
78

• Data can broadly be
• divided into following two types:
• 1. Qualitative data
• 2. Quantitative data
• Qualitative data provides information about the quality of
• an object or information which cannot be measured. For
• example, if we consider the quality of performance of students
• in terms of ‘Good’, ‘Average’, and ‘Poor’, it falls under the
• category of qualitative data. Also, name or roll number of
• students are information that cannot be measured using some
• scale of measurement. So they would fall under qualitative
• data. Qualitative data is also called categorical data.
• Qualitative data can be further subdivided into two types as
• follows:
• 1. Nominal data
• 2. Ordinal data
• Nominal data is one which has no numeric value, but a
• named value. It is used for assigning named values to
• attributes. Nominal values cannot be quantified. Examples of
• nominal data are
• 1. Blood group: A, B, O, AB, etc.
• 2. Nationality: Indian, American, British, etc.
• 3. Gender: Male, Female, Other
PIMPRI CHINCHWAD UNIVERSITY
79

Thank You

machine learning basic unit1 for third year cse studnets

About This Presentation

Slide Content

Tags

Categories

Download

Quick Actions

Statistics

Related Slideshows

machine learning basic unit1 for third year cse studnets

About This Presentation

Slide Content

Slide 1

Slide 2

Slide 3

Slide 4

Slide 5

Slide 6

Slide 7

Slide 8

Slide 9

Slide 10

Slide 11

Slide 12

Slide 13

Slide 14

Slide 15

Slide 16

Slide 17

Slide 18

Slide 19

Slide 20

Slide 21

Slide 22

Slide 23

Slide 24

Slide 25

Slide 26

Slide 27

Slide 28

Slide 29

Slide 30

Slide 31

Slide 32

Slide 33

Slide 34

Slide 35

Slide 36

Slide 37

Slide 38

Slide 39

Slide 40

Slide 41

Slide 42

Slide 43

Slide 44

Slide 45

Slide 46

Slide 47

Slide 48

Slide 49

Slide 50

Slide 51

Slide 52

Slide 53

Slide 54

Slide 55

Slide 56

Slide 57

Slide 58

Slide 59

Slide 60

Slide 61

Slide 62

Slide 63

Slide 64

Slide 65

Slide 66

Slide 67

Slide 68

Slide 69

Slide 70

Slide 71

Slide 72

Slide 73

Slide 74

Slide 75

Slide 76

Slide 77