Natural Language Processing of applications.pdf

pranavi452104 33 views 113 slides Jun 13, 2024
Slide 1
Slide 1 of 113
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27
Slide 28
28
Slide 29
29
Slide 30
30
Slide 31
31
Slide 32
32
Slide 33
33
Slide 34
34
Slide 35
35
Slide 36
36
Slide 37
37
Slide 38
38
Slide 39
39
Slide 40
40
Slide 41
41
Slide 42
42
Slide 43
43
Slide 44
44
Slide 45
45
Slide 46
46
Slide 47
47
Slide 48
48
Slide 49
49
Slide 50
50
Slide 51
51
Slide 52
52
Slide 53
53
Slide 54
54
Slide 55
55
Slide 56
56
Slide 57
57
Slide 58
58
Slide 59
59
Slide 60
60
Slide 61
61
Slide 62
62
Slide 63
63
Slide 64
64
Slide 65
65
Slide 66
66
Slide 67
67
Slide 68
68
Slide 69
69
Slide 70
70
Slide 71
71
Slide 72
72
Slide 73
73
Slide 74
74
Slide 75
75
Slide 76
76
Slide 77
77
Slide 78
78
Slide 79
79
Slide 80
80
Slide 81
81
Slide 82
82
Slide 83
83
Slide 84
84
Slide 85
85
Slide 86
86
Slide 87
87
Slide 88
88
Slide 89
89
Slide 90
90
Slide 91
91
Slide 92
92
Slide 93
93
Slide 94
94
Slide 95
95
Slide 96
96
Slide 97
97
Slide 98
98
Slide 99
99
Slide 100
100
Slide 101
101
Slide 102
102
Slide 103
103
Slide 104
104
Slide 105
105
Slide 106
106
Slide 107
107
Slide 108
108
Slide 109
109
Slide 110
110
Slide 111
111
Slide 112
112
Slide 113
113

About This Presentation

Its about natural language processing


Slide Content

Module 3
Supervised Learning Part 2:
•KNN Algorithm
•Support Vector Machine (SVM)
•Decision Tree

K-Nearest Neighbors Algorithm
•Thek-nearestneighborsalgorithm,alsoknownasKNNork-NN,
isanon-parametric,supervisedlearningclassifier,whichuses
proximitytomakeclassificationsorpredictionsaboutthe
groupingofanindividualdatapoint.
•Whileitcanbeusedforeitherregressionorclassification
problems,itistypicallyusedasaclassificationalgorithm,
workingofftheassumptionthatsimilarpointscanbefoundnear
oneanother.

K-Nearest Neighbors Algorithm
•Forclassificationproblems,aclasslabelisassignedonthebasisofa
majorityvote—i.e.thelabelthatismostfrequentlyrepresentedaround
agivendatapointisused.
•Whilethisistechnicallyconsidered“pluralityvoting”,theterm,
“majorityvote”ismorecommonlyusedinliterature.
•Thedistinctionbetweentheseterminologiesisthat“majorityvoting”
technicallyrequiresamajorityofgreaterthan50%,whichprimarily
workswhenthereareonlytwocategories.Whenyouhavemultiple
classes—e.g.fourcategories,youdon’tnecessarilyneed50%ofthevote
tomakeaconclusionaboutaclass;youcouldassignaclasslabelwitha
voteofgreaterthan25%.

K-Nearest Neighbors Algorithm
•Regressionproblemsuseasimilarconceptasclassificationproblem,but
inthiscase,theaveragetheknearestneighborsistakentomakea
predictionaboutaclassification.
•Themaindistinctionhereisthatclassificationisusedfordiscretevalues,
whereasregressionisusedwithcontinuousones.However,beforea
classificationcanbemade,thedistancemustbedefined.
•Euclideandistanceismostcommonlyused,whichwe’lldelveintomore
below.

K-Nearest Neighbors Algorithm
•It'salsoworthnotingthattheKNNalgorithmisalsopartofafamilyof
“lazylearning”models,meaningthatitonlystoresatrainingdataset
versusundergoingatrainingstage.
•Thisalsomeansthatallthecomputationoccurswhenaclassificationor
predictionisbeingmade.Sinceitheavilyreliesonmemorytostoreallits
trainingdata,itisalsoreferredtoasaninstance-basedormemory-
basedlearningmethod.

K-Nearest Neighbors Algorithm
•EvelynFixandJosephHodgesarecreditedwiththeinitialideasaround
theKNNmodelinthis1951paperwhileThomasCoverexpandson
theirconceptinhisresearch“NearestNeighborPatternClassification.”
•Whileit’snotaspopularasitoncewas,itisstilloneofthefirst
algorithmsonelearnsindatascience/dataanalyticsduetoits
simplicityandaccuracy.
•However,asadatasetgrows,KNNbecomesincreasinglyinefficient,
compromisingoverallmodelperformance.

K-Nearest Neighbors Algorithm
•Itiscommonlyusedfor
•simplerecommendationsystems,
•patternrecognition,
•datamining,
•financialmarketpredictions,
•intrusiondetection,andmore.

Compute KNN: distance metrics
•Torecap,thegoalofthek-nearestneighboralgorithmistoidentifythe
nearestneighborsofagivenquerypoint,sothatwecanassignaclass
labeltothatpoint.Inordertodothis,KNNhasafewrequirements:
Determineyourdistancemetrics
•Inordertodeterminewhichdatapointsareclosesttoagivenquery
point,thedistancebetweenthequerypointandtheotherdatapoints
willneedtobecalculated.
•Thesedistancemetricshelptoformdecisionboundaries,which
partitionsquerypointsintodifferentregions.

Compute KNN: distance metrics
Determineyourdistancemetrics
•YoucommonlywillseedecisionboundariesvisualizedwithVoronoi
diagrams.

Compute KNN: distance metrics
Determineyourdistancemetrics
•Whilethereareseveraldistancemeasuresthatyoucanchoosefrom,
thisarticlewillonlycoverthefollowing:
Euclideandistance:
•Thisisthemostcommonlyuseddistancemeasure,anditislimitedto
real-valuedvectors.
•Thisisnothingbutthecartesiandistancebetweenthetwopointswhich
areintheplane/hyperplane.

Compute KNN: distance metrics
Euclideandistance:
•Euclideandistancecanalsobe
visualizedasthelengthofthestraight
linethatjoinsthetwopointswhichare
intoconsideration.
•Thismetrichelpsuscalculatethenet
displacementdonebetweenthe
twostatesofanobject.

Compute KNN: distance metrics
ManhattanDistance
•Thisdistancemetricisgenerallyusedwhenweareinterestedinthe
totaldistancetraveledbytheobjectinsteadofthedisplacement.
•Thismetriciscalculatedbysummingtheabsolutedifferencebetween
thecoordinatesofthepointsinn-dimensions.

Compute KNN: distance metrics
MinkowskiDistance
•WecansaythattheEuclidean,aswellastheManhattandistance,are
specialcasesoftheMinkowskidistance.
•Fromtheformulaabovewecansaythatwhenp=2thenitisthe
sameastheformulafortheEuclideandistanceandwhenp=1then
weobtaintheformulafortheManhattandistance.

Compute KNN: distance metrics
•Theabove-discussedmetricsaremostcommonwhiledealingwitha
MachineLearningproblembutthereareotherdistancemetricsas
welllikeHammingDistancewhichcomeinhandywhiledealingwith
problemsthatrequireoverlappingcomparisonsbetweentwovectors
whosecontentscanbebooleanaswellasstringvalues.

How to choose the value of k for KNN Algorithm?
•ThevalueofkisverycrucialintheKNNalgorithmtodefinethe
numberofneighborsinthealgorithm.
•Thevalueofkinthek-nearestneighbors(k-NN)algorithmshouldbe
chosenbasedontheinputdata.Iftheinputdatahasmoreoutliersor
noise,ahighervalueofkwouldbebetter.
•Itisrecommendedtochooseanoddvalueforktoavoidtiesin
classification.
•Cross-validationmethodscanhelpinselectingthebestkvaluefor
thegivendataset.

Applications of the KNN Algorithm
•DataPreprocessing–WhiledealingwithanyMachineLearning
problemwefirstperformtheEDApartinwhichifwefindthatthe
datacontainsmissingvaluesthentherearemultipleimputation
methodsareavailableaswell.OneofsuchmethodisKNNImputer
whichisquiteeffectiveadgenerallyusedforsophisticatedimputation
methodologies.
•PatternRecognition–KNNalgorithmsworkverywellifyouhave
trainedaKNNalgorithmusingtheMNISTdataset(ModifiedNational
InstituteofStandardsandTechnologydatabase.Containsacollectionof70,000,28x28
imagesofhandwrittendigitsfrom0to9.Thedatasetisalreadydividedintotrainingand
testingsets.)andthenperformedtheevaluationprocessthenyoumust
havecomeacrossthefactthattheaccuracyistoohigh.

Applications of the KNN Algorithm
•RecommendationEngines–Themaintaskwhichisperformedbya
KNNalgorithmistoassignanewquerypointtoapre-existedgroup
thathasbeencreatedusingahugecorpusofdatasets.
•Thisisexactlywhatisrequiredintherecommendersystemstoassign
eachusertoaparticulargroupandthenprovidethem
recommendationsbasedonthatgroup’spreferences.

Advantages of the KNN Algorithm
•Easytoimplementasthecomplexityofthealgorithmisnotthathigh.
•AdaptsEasily–AspertheworkingoftheKNNalgorithmitstoresall
thedatainmemorystorageandhencewheneveranewexampleor
datapointisaddedthenthealgorithmadjustsitselfasperthatnew
exampleandhasitscontributiontothefuturepredictionsaswell.
•FewHyperparameters–Theonlyparameterswhicharerequiredin
thetrainingofaKNNalgorithmarethevalueofkandthechoiceof
thedistancemetricwhichwewouldliketochoosefromour
evaluationmetric.

Disadvantages of the KNN Algorithm
•Doesnotscale–AswehaveheardaboutthisthattheKNNalgorithm
isalsoconsideredaLazyAlgorithm.
•Themainsignificanceofthistermisthatthistakeslotsof
computingpoweraswellasdatastorage.Thismakesthis
algorithmbothtime-consumingandresourceexhausting.
•CurseofDimensionality–Thereisatermknownasthepeaking
phenomenonaccordingtothistheKNNalgorithmisaffectedbythe
curseofdimensionalitywhichimpliesthealgorithmfacesahardtime
classifyingthedatapointsproperlywhenthedimensionalityistoo
high.

Disadvantages of the KNN Algorithm
•PronetoOverfitting–Asthealgorithmisaffectedduetothecurseof
dimensionalityitispronetotheproblemofoverfittingaswell.
•Hencegenerallyfeatureselectionaswellasdimensionality
reductiontechniquesareappliedtodealwiththisproblem.

Some more details of the
KNN Algorithm

SmartIndiaHackathon,thisyear’seditionalsofocuses
on
“Softwareandhardwareapproachestodevelop
innovativesolutionsagainstchallengesfacedbyIndia's
governingagenciesincludingMinistries,Departments,
andPSUs.“
Thedeadlineforcollegeregistration,teamnomination,andidea
submissionis30September2023.Hurryup
visitwww.sih.gov.intoregister.

Introduction to
Support Vector Machines (SVM)

30
History of SVM
◼SVM is related tostatistical learning theory
◼SVM was first introduced in 1992
◼SVM becomes popular because of its success in handwritten digit
recognition
◼1.1% test error rate for SVM. This is the same as the error rates
of a carefully constructed neural network, LeNet4.

Support Vector Machines (SVM)
Can be used for both the supervised learning problems
•Classification
•(SVM Classifier)
•for binary as well as multi calss
classification also)
•Regression
•(SVM Regressor)
•Works similar like LR but with some additional features.
•One the very effective and popular algorithm in industry and academia.

Cost Function
Here in this “Cost Function”,
- “C” is the ‘hyperparameter’ tells how many number of misclassification
points can be avoided.
- “ξ (pronounces as ‘eeta’)“ is the summation of distance of wrong points
till the marginal points.
It will tell that “ how should be this
best fit line.”

Pros & Cons of SVM/C
Pros
•Accuracy
•Workswellonsmallercleanerdatasets
•Itcanbemoreefficientbecauseitusesasubsetoftrainingpoints
Cons
•Isn’tsuitedtolargerdatasetsasthetrainingtimewithSVMscan
behigh
•Lesseffectiveonnoisierdatasetswithoverlappingclasses

Pros & Cons of SVM/C
SVMUses
SVMisusedfor
•textclassificationtaskssuchascategoryassignment,
•detectingspamand
•sentimentanalysis.
Itisalsocommonlyusedfor
•imagerecognitionchallenges,
•performingparticularlywellinaspect-basedrecognitionandcolor-
basedclassification.
SVMalsoplaysavitalroleinmanyareasof
•handwrittendigitrecognition,suchaspostalautomationservices.

Available Data Sets in Sklearn
Scikit-learnmakesavailableahostofdatasetsfortestinglearningalgorithms.
Theycomeinthreeflavors:
PackagedData:thesesmalldatasetsarepackagedwiththescikit-learn
installation,andcanbedownloadedusingthetoolsin
sklearn.datasets.load_*
DownloadableData:theselargerdatasetsareavailablefordownload,and
scikit-learnincludestoolswhichstreamlinethisprocess.Thesetoolscanbe
foundinsklearn.datasets.fetch_*
GeneratedData:thereareseveraldatasetswhicharegeneratedfrommodels
basedonarandomseed.Theseareavailableinthesklearn.datasets.make_*

Available Data Sets in Sklearn
You can explore the available dataset loaders, fetchers, and generators
using IPython'stab-completion functionality. After importing the datasets
submodule from sklearn, type
datasets.load_<TAB>
or
datasets.fetch_<TAB>
or
datasets.make_<TAB>
to see a list of available functions.

2023/11/8 40
Linear Classifiers
f
x
y
est
denotes +1
denotes -1
f(x,w,b) = sign(w. x-b)
How would you
classify this data?
Estimation:
w: weight vector
x: data vector

2023/11/8 41
Linear Classifiers
f
x
a
y
est
denotes +1
denotes -1
f(x,w,b) = sign(w. x-b)
How would you
classify this data?

2023/11/8 42
Linear Classifiers
f
x
a
y
est
denotes +1
denotes -1
f(x,w,b) = sign(w. x-b)
How would you
classify this data?

2023/11/8 43
Linear Classifiers
f
x
a
y
est
denotes +1
denotes -1
f(x,w,b) = sign(w. x-b)
How would you
classify this data?

2023/11/8 44
Linear Classifiers
f
x
a
y
est
denotes +1
denotes -1
f(x,w,b) = sign(w. x-b)
Any of these would
be fine..
..but which is best?

2023/11/8 45
Classifier Margin
f
x
a
y
est
denotes +1
denotes -1
f(x,w,b) = sign(w. x-b)
Define the marginof a
linear classifier as the
width that the boundary
could be increased by
before hitting a datapoint.

2023/11/8 46
Maximum Margin
f
x
a
y
est
denotes +1
denotes -1
f(x,w,b) = sign(w. x-b)
The maximum margin
linear classifieris the linear
classifier with the, um,
maximum margin.
This is the simplest kind of
SVM (Called an LSVM)
Linear SVM

2023/11/8 47
Maximum Margin
f
x
a
y
est
denotes +1
denotes -1
f(x,w,b) = sign(w. x+ b)
The maximum margin
linear classifieris the linear
classifier with the, um,
maximum margin.
This is the simplest kind of
SVM (Called an LSVM)
Support Vectors
are those
datapoints that
the margin pushes
up against
Linear SVM

2023/11/8 48
Why Maximum Margin?
denotes +1
denotes -1
f(x,w,b) = sign(w. x-b)
The maximum margin
linear classifieris the linear
classifier with the, um,
maximum margin.
This is the simplest kind of
SVM (Called an LSVM)
Support Vectors
are those
datapoints that
the margin pushes
up against

2023/11/8 49
How to calculate the distance from a point to a line?
◼http://mathworld.wolfram.com/Point-LineDistance2-
Dimensional.html
◼In our case, w
1*x
1+w
2*x
2+b=0,
◼thus, w=(w
1,w
2),x=(x
1,x
2)
denotes +1
denotes -1
x
wx +b = 0
X–Vector
W–Normal Vector
b –Scale Value
W

2023/11/8 50
Estimate the Margin
•What is the distance expression for a point xto a
line wx+b= 0?
denotes +1
denotes -1
x
wx +b = 02 2
12
()
d
ii
bb
d
w
=
 +  +
==

x w x w
x
w
X–Vector
W–Normal Vector
b –Scale Value
W

2023/11/8 51
Large-margin Decision Boundary
•The decision boundary should be as far away from the data of both
classes as possible
•We should maximize the margin, m
•Distance between the origin and the line w
t
x=-b is b/||w||
Class 1
Class 2
m

2023/11/8 52
Finding the Decision Boundary
•Let {x
1, ..., x
n} be our data set and let y
i{1,-1} be the class label of
x
i
•The decision boundary should classify all points correctly 
•To see this: when y=-1, we wish (wx+b)<1, when y=1, we wish
(wx+b)>1. For support vectors, we wish y(wx+b)=1.
•The decision boundary can be found by solving the following
constrained optimization problem

2023/11/8 53
Next step… Optional
•Converting SVM to a form we can solve
•Dual form
•Allowing a few errors
•Soft margin
•Allowing nonlinear boundary
•Kernel functions

2023/11/8 54
The Dual Problem (we ignore the derivation)
•The new objective function is in terms of a
ionly
•It is known as the dual problem: if we know w, we know all a
i; if we
know all a
i, we know w
•The original problem is known as the primal problem
•The objective function of the dual problem needs to be maximized!
•The dual problem is therefore:
Properties of a
iwhen we introduce the
Lagrange multipliers
The result when we differentiate the
original Lagrangian w.r.t. b

2023/11/8 55
The Dual Problem
•This is a quadratic programming (QP) problem
•A global maximum of a
i can always be found
•wcan be recovered by

2023/11/8 56
Characteristics of the Solution
•Many of the a
iare zero (see next page for example)
•wis a linear combination of a small number of data points
•This “sparse” representation can be viewed as data
compression as in the construction of knn classifier
•x
iwith non-zero a
iare called support vectors (SV)
•The decision boundary is determined only by the SV
•Let t
j(j=1, ..., s) be the indices of the ssupport vectors.
We can write
•For testing with a new data z
•Compute and
classify zas class 1 if the sum is positive, and class 2
otherwise
•Note: wneed not be formed explicitly

2023/11/8 57
a
6=1.4
A Geometrical Interpretation
Class 1
Class 2
a
1=0.8
a
2=0
a
3=0
a
4=0
a
5=0
a
7=0
a
8=0.6
a
9=0
a
10=0

2023/11/8 58
Allowing errors in our solutions
•We allow “error” x
iin classification; it is based on the output of the
discriminant function w
T
x+b
•x
iapproximates the number of misclassified samples
Class 1
Class 2

2023/11/8 59
Soft Margin Hyperplane
•If we minimize 
ix
i, x
ican be computed by
•x
iare “slack variables” in optimization
•Note that x
i=0 if there is no error for x
i
•x
iis an upper bound of the number of errors
•We want to minimize
•C: tradeoff parameter between error and margin
•The optimization problem becomes

2023/11/8 60
Extension to Non-linear Decision Boundary
•So far, we have only considered large-margin classifier with a
linear decision boundary
•How to generalize it to become nonlinear?
•Key idea: transform x
ito a higher dimensional space to
“make life easier”
•Input space: the space the point x
iare located
•Feature space: the space of f(x
i) after transformation

2023/11/8 61
Transforming the Data
•Computation in the feature space can be costly because
it is high dimensional
•The feature space is typically infinite-dimensional!
•The kernel trick comes to rescue
f( )
f( )
f( )
f( )f( )
f( )
f( )
f( )
f(.)
f( )
f( )
f( )
f( )
f( )
f( )
f( )
f( )
f( )
f( )
Feature space
Input space
Note: feature space is of higher dimension
than the input space in practice

2023/11/8 62
The Kernel Trick
•Recall the SVM optimization problem
•The data points only appear asinner product
•As long as we can calculate the inner product in the feature space, we do
not need the mapping explicitly
•Many common geometric operations (angles, distances) can be expressed
by inner products
•Define the kernel function Kby

2023/11/8 63
An Example for f(.) and K(.,.)
•Suppose f(.) is given as follows
•An inner product in the feature space is
•So, if we define the kernel function as follows, there is no need to carry out f(.)
explicitly
•This use of kernel function to avoid carrying out f(.) explicitly is known as the
kernel trick

2023/11/8 64
More on Kernel Functions
•Not all similarity measures can be used as kernel function, however
•The kernel function needs to satisfy the Mercer function, i.e., the function is
“positive-definite”
•This implies that
•the nby nkernel matrix,
•in which the (i,j)-th entry is the K(x
i, x
j), is always positive definite
•This also means that optimization problem can be solved in
polynomial time!

2023/11/8 65
Examples of Kernel Functions
•Polynomial kernel with degree d
•Radial basis function kernel with width s
•Closely related to radial basis function neural networks
•The feature space is infinite-dimensional
•Sigmoid with parameter kand q
•It does not satisfy the Mercer condition on all kand q

2023/11/8 66
Non-linear SVMs: Feature spaces
◼General idea: the original input space can always be mapped to
some higher-dimensional feature space where the training set is
separable:
Φ: x→φ(x)

2023/11/8 67
Example
•Suppose we have 5 one-dimensional data points
•x
1=1, x
2=2, x
3=4, x
4=5, x
5=6, with 1, 2, 6 as class 1 and 4, 5 as class 2 y
1=1,
y
2=1, y
3=-1, y
4=-1, y
5=1
•We use the polynomial kernel of degree 2
•K(x,y) = (xy+1)
2
•C is set to 100
•We first find a
i(i=1, …, 5) by

2023/11/8 68
Example
•By using a QP solver, we get
•a
1=0, a
2=2.5, a
3=0, a
4=7.333, a
5=4.833
•Note that the constraints are indeed satisfied
•The support vectors are {x
2=2, x
4=5, x
5=6}
•The discriminant function is
•bis recovered by solving f(2)=1or by f(5)=-1or by f(6)=1, as x
2and x
5lie on
the line and x
4lies on the line
•All three give b=9

2023/11/8 69
Example
Value of discriminant function
1 2 4 5 6
class 2 class 1class 1

2023/11/8 70
Degree of
Polynomial
Features
X^1 X^2 X^3
X^4 X^5 X^6

2023/11/8 71
Choosing the Kernel Function
•Probably the most tricky part of using SVM.

2023/11/8 72
Software
•A list of SVM implementation can be found at http://www.kernel-
machines.org/software.html
•Some implementation (such as LIBSVM) can handle multi-class
classification
•SVMLight is among one of the earliest implementation of SVM
•Several Matlab toolboxes for SVM are also available

2023/11/8 73
Summary: Steps for Classification
•Prepare the pattern matrix
•Select the kernel function to use
•Select the parameter of the kernel function and the value of C
•You can use the values suggested by the SVM software, or you can set apart a
validation set to determine the values of the parameter
•Execute the training algorithm and obtain the a
i
•Unseen data can be classified using the a
i and the support vectors

Summary
SVMregression/SVR:
•SVMregressionorSupportVectorRegression(SVR)isamachinelearning
algorithmusedforregressionanalysis.
•Itisdifferentfromtraditionallinearregressionmethodsasitfindsa
hyperplanethatbestfitsthedatapointsinacontinuousspace,insteadof
fittingalinetothedatapoints.
•TheSVRalgorithmaimstofindthehyperplanethatpassesthroughas
manydatapointsaspossiblewithinacertaindistance,calledthemargin.

Summary
SVMregression/SVR:
•ThisapproachhelpstoreducethepredictionerrorandallowsSVRto
handlenon-linearrelationshipsbetweeninputvariablesandthetarget
variableusingakernelfunction.
•Asaresult,SVMregressionisapowerfultoolforregressiontaskswhere
theremaybecomplexrelationshipsbetweentheinputvariablesandthe
targetvariable.

DifferencebetweenSVM/CandSVR:
•SVM/Cisaclassificationalgorithmthatseparatesdatapointsinto
differentclasseswithahyperplanewhileminimizingthe
misclassificationerror.
•Ontheotherhand,SVR(SupportVectorRegression)isaregression
algorithmthatfindsahyperplanethatbestfitsdatapointsina
continuousspacewhileminimizingthepredictionerror.
•SVM/Cisusedforcategoricaltargetvariables,whileSVRisusedfor
continuoustargetvariables.
Summary

ApplicationsofSVMregression/SVR:
•SVMregressionorSupportVectorRegression(SVR)hasawiderange
ofapplicationsinvariousfields.
•Itiscommonlyusedin
•financeforpredictingstockprices,
•inengineeringforpredictingmachineperformance,and
•inbioinformaticsforpredictingproteinstructures.
Summary

ApplicationsofSVMregression/SVR:
•SVRisalsousedin
•naturallanguageprocessingfortextclassificationand
•sentimentanalysis.
•Additionally,itisusedin
•imageprocessingforobjectrecognitionand
•inhealthcareforpredictingmedicaloutcomes.
•Overall,SVMregressionisaversatilealgorithmthatcanbeusedin
manydomainsformakingaccuratepredictions.
Summary

•SVMisausefulalternativetoneuralnetworks
•TwokeyconceptsofSVM:maximizethemarginandthekerneltrick
•ManySVMimplementationsareavailableonthewebforyoutotry
onyourdataset!
•SVRacknowledgesthepresenceofnon-linearityinthedataand
providesaproficientpredictionmodel.
Summary

•Definition1:Adecisiontreeisasupervisedlearningalgorithm,whichis
utilizedforbothclassificationandregressiontasks.Ithasahierarchical,
treestructure,whichconsistsofarootnode,branches,internalnodes
andleafnodes.
•Definition2:Itisatree-structuredmodelthatsplitsthedataintosmaller
subsetsbasedonthemostsignificantinputfeature,recursivelydividing
thedatauntilapuresubsetisachieved.Eachsplitofthetreeresultsina
decisionnode,andeachleafofthetreerepresentsaprediction.
DECISION TREES

DECISION TREES

DECISION TREES

DECISION TREES

DECISION TREES

DECISION TREES

Decision Tree Regression
DecisionTreeRegression:
Decisiontreeregressionobservesfeaturesofanobjectandtrainsamodelinthe
structureofatreetopredictdatainthefuturetoproducemeaningful
continuousoutput.Continuousoutputmeansthattheoutput/resultisnot
discrete,i.e.,itisnotrepresentedjustbyadiscrete,knownsetofnumbersor
values.
Continuousoutputexample:Aprofitpredictionmodelthatstatestheprobable
profitthatcanbegeneratedfromthesaleofaproduct.
Here,continuousvaluesarepredictedwiththehelpofadecisiontreeregression
model.
Tags