Machine Learning deep learning artificial

AlaaShorbaji1 20 views 115 slides Jun 22, 2024
Slide 1
Slide 1 of 115
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27
Slide 28
28
Slide 29
29
Slide 30
30
Slide 31
31
Slide 32
32
Slide 33
33
Slide 34
34
Slide 35
35
Slide 36
36
Slide 37
37
Slide 38
38
Slide 39
39
Slide 40
40
Slide 41
41
Slide 42
42
Slide 43
43
Slide 44
44
Slide 45
45
Slide 46
46
Slide 47
47
Slide 48
48
Slide 49
49
Slide 50
50
Slide 51
51
Slide 52
52
Slide 53
53
Slide 54
54
Slide 55
55
Slide 56
56
Slide 57
57
Slide 58
58
Slide 59
59
Slide 60
60
Slide 61
61
Slide 62
62
Slide 63
63
Slide 64
64
Slide 65
65
Slide 66
66
Slide 67
67
Slide 68
68
Slide 69
69
Slide 70
70
Slide 71
71
Slide 72
72
Slide 73
73
Slide 74
74
Slide 75
75
Slide 76
76
Slide 77
77
Slide 78
78
Slide 79
79
Slide 80
80
Slide 81
81
Slide 82
82
Slide 83
83
Slide 84
84
Slide 85
85
Slide 86
86
Slide 87
87
Slide 88
88
Slide 89
89
Slide 90
90
Slide 91
91
Slide 92
92
Slide 93
93
Slide 94
94
Slide 95
95
Slide 96
96
Slide 97
97
Slide 98
98
Slide 99
99
Slide 100
100
Slide 101
101
Slide 102
102
Slide 103
103
Slide 104
104
Slide 105
105
Slide 106
106
Slide 107
107
Slide 108
108
Slide 109
109
Slide 110
110
Slide 111
111
Slide 112
112
Slide 113
113
Slide 114
114
Slide 115
115

About This Presentation

Dffvvvv


Slide Content

MachineLearning

ArthurSamuel,apioneerinthefieldofartificialintelligenceandcomputergaming,coined
theterm“MachineLearning”as–“Fieldofstudythatgivescomputersthecapabilityto
learnwithoutbeingexplicitlyprogrammed”.
MachineLearning
Howitisdifferentfromtraditional
Programming:


InTraditionalProgramming,wefeedtheInput,
Programlogicandruntheprogramtoget
output.
InMachineLearning,wefeedtheinput,output
andrunitonmachineduringtrainingandthe
machinecreatesitsownlogic,whichisbeing
evaluatedwhiletesting.

TerminologiesthatoneshouldknowbeforestartingMachineLearning:





Model:Amodelisaspecificrepresentationlearnedfromdatabyapplyingsome
machinelearningalgorithm.Amodelisalsocalledhypothesis.
Feature:Afeatureisanindividualmeasurablepropertyofourdata.Asetofnumeric
featurescanbeconvenientlydescribedbyafeaturevector.Featurevectorsarefedas
inputtothemodel.Forexample,inordertopredictafruit,theremaybefeatureslike
color,smell,taste,etc.
Target(Label):Atargetvariableorlabelisthevaluetobepredictedbyourmodel.For
thefruitexamplediscussedinthefeaturessection,thelabelwitheachsetofinput
wouldbethenameofthefruitlikeapple,orange,banana,etc.
Training:Theideaistogiveasetofinputs(features)andit’sexpectedoutputs(labels),
soaftertraining,wewillhaveamodel(hypothesis)thatwillthenmapnewdatatoone
ofthecategoriestrainedon.
Prediction:Onceourmodelisready,itcanbefedasetofinputstowhichitwillprovidea
predictedoutput(label).

TypesofLearning



SupervisedLearning
UnsupervisedLearning
Semi-SupervisedLearning
1.SupervisedLearning:Supervisedlearningiswhenthemodelisgettingtrainedonalabelled
dataset.Labelleddatasetisonewhichhavebothinputandoutputparameters.Inthistype
oflearningbothtrainingandvalidationdatasetsarelabelledasshowninthefiguresbelow.
Classification Regression



TypesofSupervisedLearning:
Classification
Regression
Classification:ItisaSupervisedLearningtaskwhereoutputishavingdefined
labels(discretevalue).ForexampleinaboveFigureA,Output–Purchasedhas
definedlabelsi.e.0or1;1meansthecustomerwillpurchaseand0meansthat
customerwon’tpurchase.Itcanbeeitherbinaryormulticlassclassification.In
binaryclassification,modelpredictseither0or1;yesornobutincaseofmulticlass
classification,modelpredictsmorethanoneclass.
Example:Gmailclassifiesmailsinmorethanoneclasseslikesocial,promotions,
updates,offers.
Regression:ItisaSupervisedLearningtaskwhereoutputishavingcontinuousvalue.
ExampleinbeforeregressionFigure,Output–WindSpeedisnothavinganydiscrete
valuebutiscontinuousintheparticularrange.Thegoalhereistopredictavalueas
muchclosertoactualoutputvalueasourmodelcanandthenevaluationisdoneby
calculatingerrorvalue.Thesmallertheerrorthegreatertheaccuracyofour
regressionmodel.







LinearRegression
NearestNeighbor
GaussianNaiveBayes
DecisionTrees
SupportVectorMachine(SVM)
RandomForest
ExampleofSupervisedLearningAlgorithms:

UnsupervisedLearning:
Unsupervisedlearningisthetrainingofmachineusinginformationthatisneither
classifiednorlabeledandallowingthealgorithmtoactonthatinformation
withoutguidance.Herethetaskofmachineistogroupunsortedinformation
accordingtosimilarities,patternsanddifferenceswithoutanypriortrainingof
data.Unsupervisedmachinelearningismorechallengingthansupervised
learningduetotheabsenceoflabels.


TypesofUnsupervisedLearning:
Clustering
Association

Clustering:Aclusteringproblemiswhereyouwanttodiscovertheinherent
groupingsinthedata,suchasgroupingcustomersbypurchasingbehavior.
Association:Anassociationrulelearningproblemiswhereyouwanttodiscover
rulesthatdescribelargeportionsofyourdata,suchaspeoplethatbuyXalso
tendtobuyY.


Examplesofunsupervisedlearningalgorithmsare:
k-meansforclusteringproblems.
Apriorialgorithmforassociationrulelearningproblems
ThemostbasicdisadvantageofanySupervisedLearningalgorithmisthat
thedatasethastobehand-labeledeitherbyaMachineLearningEngineeror
aDataScientist.Thisisavery costlyprocess,especiallywhendealingwith
largevolumesofdata.ThemostbasicdisadvantageofanyUnsupervised
Learningisthatit’sapplicationspectrumislimited.

Semi-supervisedmachinelearning:
Tocounterthesedisadvantages,theconceptofSemi-SupervisedLearningwas
introduced.Inthistypeoflearning,thealgorithmistraineduponacombinationoflabeled
andunlabeleddata.Typically,thiscombinationwillcontainaverysmallamountoflabeled
dataandaverylargeamountofunlabeleddata.
•Insemisupervisedlearninglabelled
dataisusedtolearnamodeland
usingthatmodelunlabeleddatais
labelledcalledpseudolabellingnow
usingwholedatamodelistrainedfor
furtheruse

Intuitively,onemayimaginethethreetypesoflearningalgorithmsasSupervisedlearning
whereastudentisunderthesupervisionofateacheratbothhomeandschool,
UnsupervisedlearningwhereastudenthastofigureoutaconcepthimselfandSemi-
Supervisedlearningwhereateacherteachesafewconceptsinclassandgivesquestions
ashomeworkwhicharebasedonsimilarconcepts.
Modelwithlabellleddataandmodelwithbothlabelledandunlabelled
data

REGRESSION
Regressionisastatisticalmeasurementusedinfinance,
investing,andotherdisciplinesthatattemptstodeterminethe
strengthoftherelationshipbetweenonedependentvariableand
aseriesofotherchangingvariablesorindependentvariable

Typesofregression






linearregression
Simplelinearregression
Multiplelinearregression
Polynomialregression
Decisiontreeregression
Randomforestregression

SimpleLinearregression


Thesimplelinearregression
modelsareusedtoshowor
predicttherelationship
betweenthetwovariablesor
factors
Thefactorthatbeingpredicted
iscalleddependentvariable
andthefactorsthatisareused
topredictthedependent
variablearecalled
independentvariables
SimpleLinearregression

PredictingC02emissionwithenginesizefeature
usingsimplelinearregression

fromsklearnimportlinear_model
regr=linear_model.LinearRegression()
train_x=np.asanyarray(train[['ENGINESIZE']])
train_y=np.asanyarray(train[['CO2EMISSIONS']])
regr.fit(train_x,train_y)
#Thecoefficients
print('Coefficients:',regr.coef_)
print('Intercept:',regr.intercept_)

Multiplelinearregression
Multipleregressionis
anextensionofsimple
linearregression.Itis
usedwhenwewantto
predictthevalueofa
variablebasedonthe
valueoftwoormore
othervariables.The
variablewewantto
predictiscalledthe
dependentvariable(or
sometimes,the
outcome,targetor
criterionvariable).



Simplelinearregression
PredictCO2emissionvsEnginesizeofallcars
-Independentvariable(x):Enginesize
-Dependentvariable(y):CO2emission
Multiplelinearregression
PredictCO2emissionvsEnginesizeandcylindersofallcar
-Independentvariable(x):enginesize,cylinders
-Dependentvariable(y):CO2emission

fromsklearnimportlinear_model
regr=linear_model.LinearRegression()
train_x=np.asanyarray(train[['ENGINESIZE','CYLINDERS']])
train_y=np.asanyarray(train[['CO2EMISSIONS']])
regr.fit(train_x,train_y)
#Thecoefficients
print('Coefficients:',regr.coef_)
print('Intercept:',regr.intercept_)

Polynomialregression
PolynomialRegressionisa
formoflinearregressionin
whichtherelationshipbetween
theindependentvariablexand
dependentvariableyis
modelledasan nthdegree
polynomial.Polynomial
regressionfitsanonlinear
relationshipbetweenthevalue
ofxandthecorresponding
conditionalmeanofy,
denotedE(y|x)

fromsklearn.preprocessingimportPolynomialFeatures
fromsklearnimportlinear_model
train_x=np.asanyarray(train[['ENGINESIZE','CYLINDERS']])
train_y=np.asanyarray(train[['CO2EMISSIONS']])
test_x=np.asanyarray(test[['ENGINESIZE']])
test_y=np.asanyarray(test[['CO2EMISSIONS']])
poly=PolynomialFeatures(degree=2)
train_x_poly=poly.fit_transform(train_x)
train_x_poly.shape

fit_transformtakesourxvalues,andoutputalistofourdataraisedfrompower
of0topowerof2(sincewesetthedegreeofourpolynomialto2).
inourexample
Now,wecandealwithitas'linearregression'problem.Therefore,this
polynomialregressionisconsideredtobeaspecialcaseoftraditional
multiplelinearregression.So,youcanusethesamemechanismaslinear
regressiontosolvesuchaproblems.
sowecanuseLinearRegression()functiontosolveit:
clf=linear_model.LinearRegression()
train_y_=clf.fit(train_x_poly,train_y)
#Thecoefficients
print('Coefficients:',clf.coef_)
print('Intercept:',clf.intercept_)

Decisiontreeregression
Decisiontreebuildsregressionmodelsintheformofatreestructure.Itbreaks
downadatasetintosmallerandsmallersubsetswhileatthesametimean
associateddecisiontreeisincrementallydeveloped.Thefinalresultisatree
withdecisionnodesandleafnodes.Adecisionnode(e.g.,Outlook)hastwoor
morebranches(e.g.,Sunny,OvercastandRainy),eachrepresentingvaluesfor
theattributetested.Leafnode(e.g.,HoursPlayed)representsadecisiononthe
numericaltarget.Thetopmostdecisionnodeinatreewhichcorrespondstothe
bestpredictorcalledrootnode.Decisiontreescanhandlebothcategoricaland
numericaldata.

Decision tree regression observes features of an object and trains a
model in the structure of a tree to predict data in the future to produce
meaningful continuous output. Continuous output means that the
output/result is not discrete, i.e., it is not represented just by a discrete,
known set of numbers or values.
Decisiontreeregressionobservesfeaturesofanobjectandtrainsa
modelinthestructureofatreetopredictdatainthefuturetoproduce
meaningfulcontinuousoutput.Continuousoutputmeansthatthe
output/resultisnotdiscrete,i.e.,itisnotrepresentedjustbyadiscrete,
knownsetofnumbersorvalues.
Discrete output example: A weather prediction model that predicts
whether or not there’ll be rain in a particular day.
Continuous output example: A profit prediction model that states the
probable profit that can be generated from the sale of a product.
Discreteoutputexample:Aweatherpredictionmodelthatpredicts
whetherornotthere’llberaininaparticularday.
Continuousoutputexample:Aprofitpredictionmodelthatstatesthe
probableprofitthatcanbegeneratedfromthesaleofaproduct.

Code:
#importtheregressor
fromsklearn.treeimportDecisionTreeRegressor
#createaregressorobject
regressor=DecisionTreeRegressor(random_state=0)
#fittheregressorwithXandYdata
regressor.fit(X,y)

Randomforestregression
The Random Forest is one of the most effective machine learning
models for predictive analytics, making it an industrial workhorse for
machine learning.
TheRandomForestisoneofthemosteffectivemachinelearning
modelsforpredictiveanalytics,makingitanindustrialworkhorsefor
machinelearning.
The random forest model is a type of additive model that makes
predictions by combining decisions from a sequence of base models.
Here, each base classifier is a simple decision tree. This broad
technique of using multiple models to obtain better predictive
performance is called model ensembling. In random forests, all the
base models are constructed independently using a different
subsample of the data
Therandomforestmodelisatypeofadditivemodelthatmakes
predictionsbycombiningdecisionsfromasequenceofbasemodels.
Here,eachbaseclassifierisasimpledecisiontree.Thisbroad
techniqueofusingmultiplemodelstoobtainbetterpredictive
performanceiscalledmodelensembling.Inrandomforests,allthe
basemodelsareconstructedindependentlyusingadifferent
subsampleofthedata

Approach:




Pick at random K data points from
the training set.
PickatrandomKdatapointsfrom
thetrainingset.
Build the decision tree associated
with those K data points.
Buildthedecisiontreeassociated
withthoseKdatapoints.
Choose the number Ntree of trees
you want to build and repeat step
1 & 2.
ChoosethenumberNtreeoftrees
youwanttobuildandrepeatstep
1&2.
For a new data point, make each
one of your Ntree trees predict the
value of Y for the data point, and
assign the new data point the
average across all of the predicted
Y values.
Foranewdatapoint,makeeach
oneofyourNtreetreespredictthe
valueofYforthedatapoint,and
assignthenewdatapointthe
averageacrossallofthepredicted
Yvalues.

Code
#importtheregressor
fromsklearn.treeimportDecisionTreeRegressor
#createaregressorobject
regressor=DecisionTreeRegressor(random_state=0)
#fittheregressorwithXandYdata
regressor.fit(X,y)

Prosandcons
Regressionmodel Pros Cons
Linearregression
Worksonanysizeofdataset,
givesinformationaboutfeatures.
TheLinearregression
assumptions.
Polynomialregression
Worksonanysizeofdataset,
worksverywellonnonlinear
problems
Needtochooserightpolynomial
degreefora.Goodbiasandtrade
off.
SVR
Easilyadaptable,worksverywell
onnonlinearproblems,notbiased
byoutliers
Compulsorytoapplyfeature
scaling,notwellknown,more
difficulttounderstand.
Decisiontreerecession
Interpretability,noneedforfeature
scaling,worksonbothlinearand
nonlinearproblems
Poorresultsonsmalldatasets,
overfittingcaneasilyoccur
Randomforestregression
Powerfulandaccurate,good
performancemanyproblems,
includingnonlinear
NoInterpretability,overfittingcan
easilyoccur,needtochoose
numberoftrees

LOGISTIC
REGRESSION
Instatistics,thelogisticmodelisusedtomodeltheprobabilityofacertainclassoreventexistingsuchaspass/fail,
win/lose,alive/deadorhealthy/sick.Thiscanbeextendedtomodelseveralclassesofeventssuchasdetermining
whetheranimagecontainsacat,dog,lion,etc
Basedonthenumberofcategories,Logisticregressioncanbeclassifiedas:
binomial:Targetvariablecanhaveonly2possibletypes:“0”or“1”whichmayrepresent“win”vs“loss”,“pass”vs“fail
”,“dead”vs“alive”,etc.
multinomial:Targetvariablecanhave3ormorepossibletypeswhicharenotordered(i.e.typeshavenoquantitative
significance)like“diseaseA”vs“diseaseB”vs“diseaseC”.
ordinal:Itdealswithtargetvariableswithorderedcategories.Forexample,atestscorecanbe
categorizedas:“verypoor”,“poor”,“good”,“verygood”.Here,eachcategorycanbegivena
scorelike0,1,2,3.






Startwithbinaryclassproblems
Howdowedevelopaclassificationalgorithm?
Tumoursizevsmalignancy(0or1)
Wecoulduselinearregression
Thenthresholdtheclassifieroutput(i.e.anythingoversomevalueisyes,elseno)
Inourexamplebelowlinearregressionwiththresholdingseemstowork









Wecanseeabovethisdoesareasonablejobofstratifyingthedatapointsintooneoftwoclasses
ButwhatifwehadasingleYeswithaverysmalltumour
Thiswouldleadtoclassifyingalltheexistingyesesasnos
Anotherissueswithlinearregression
WeknowYis0or1
Hypothesiscangivevalueslargethan1orlessthan0
So,logisticregressiongeneratesavaluewhereisalwayseither0or1
Logisticregressionisaclassificationalgorithm-don'tbeconfused








Hypothesisrepresentation
Whatfunctionisusedtorepresentourhypothesisinclassification
Wewantourclassifiertooutputvaluesbetween0and1
Whenusinglinearregressionwedidh
θ
(x)=(θ
T
x)
Forclassificationhypothesisrepresentationwedoh
θ
(x)=g((θ
T
x))
Wherewedefineg(z)
zisarealnumber
Thisisthesigmoidfunction,orthelogisticfunction
Ifwecombinetheseequationswecanwriteoutthehypothesisas




Howdoesthesigmoidfunctionlooklike
Crosses0.5attheorigin,thenflattensout]
Asymptotesat0and1












Interpretinghypothesisoutput
Whenourhypothesis(h
θ
(x))outputsanumber,wetreatthatvalueastheestimated
probabilitythaty=1oninputx
Example
IfXisafeaturevectorwithx
0
=1(asalways)andx
1
=tumourSize
h
θ
(x)=0.7
Tellsapatienttheyhavea70%chanceofatumorbeingmalignant
h
θ
(x)=P(y=1|x;θ)
Whatdoesthismean?
Probabilitythaty=1,givenx,parameterizedbyθ
Sincethisisabinaryclassificationtaskweknowy=0or1
Sothefollowingmustbetrue
P(y=1|x;θ)+P(y=0|x;θ)=1
P(y=0|x;θ)=1-P(y=1|x;θ)








Decisionboundary
Thisgivesabettersenseofwhatthehypothesisfunctioniscomputing
Onewayofusingthesigmoidfunctionis;
Whentheprobabilityofybeing1isgreaterthan0.5thenwecanpredicty=1
Elsewepredicty=0
Whenisitexactlythath
θ
(x)isgreaterthan0.5?
Lookatsigmoidfunction
g(z)isgreaterthanorequalto0.5whenzisgreaterthanorequalto0








Soifzispositive,g(z)isgreaterthan0.5
z=(θ
T
x)
Sowhen
θ
T
x>=0
Thenh
θ
>=0.5
Sowhatwe'veshownisthatthehypothesispredictsy=1whenθ
T
x>=0
Thecorollaryofthatwhenθ
T
x<=0thenthehypothesispredictsy=0
Let'susethistobetterunderstandhowthehypothesismakesitspredictions








Decisionboundary
Thisgivesabettersenseofwhatthehypothesisfunctioniscomputing
Onewayofusingthesigmoidfunctionis;
Whentheprobabilityofybeing1isgreaterthan0.5thenwecanpredicty=1
Elsewepredicty=0
Whenisitexactlythath
θ
(x)isgreaterthan0.5?
Lookatsigmoidfunction
g(z)isgreaterthanorequalto0.5whenzisgreaterthanorequalto0








Soifzispositive,g(z)isgreaterthan0.5
z=(θ
T
x)
Sowhen
θ
T
x>=0
Thenh
θ
>=0.5
Sowhatwe'veshownisthatthehypothesispredictsy=1whenθ
T
x>=0
Thecorollaryofthatwhenθ
T
x<=0thenthehypothesispredictsy=0
Let'susethistobetterunderstandhowthehypothesismakesitspredictions

Consider,
h
θ
(x)=g(θ
0

1
x
1

2
x
2
)















So,forexample
θ
0
=-3
θ
1
=1
θ
2
=1
Soourparametervectorisacolumnvectorwiththeabovevalues
So,θ
T
isarowvector=[-3,1,1]
Whatdoesthismean?
Thezherebecomesθ
T
x
Wepredict"y=1"if
-3x
0
+1x
1
+1x
2
>=0
-3+x
1
+x
2
>=0
Wecanalsore-writethisas
If(x
1
+x
2
>=3)thenwepredicty=1
Ifweplot
x
1
+x
2
=3wegraphicallyplotourdecisionboundary

h
θ
(x)=g(θ
0

1
x
1

3
x
1
2

4
x
2
2
)





Sayθ
T
was[-1,0,0,1,1]thenwesay;
Predictthat"y=1" if
-1+x
1
2
+x
2
2
>=0
or
x
1
2
+x
2
2
>=1
Ifweplotx
1
2
+x
2
2
=1

Costfunctionforlogisticregression
Linearregressionusesthefollowingfunctiontodetermineθ

•Ifweusethisfunctionforlogisticregressionthisisanon-convexfunctionforparameter
optimizationCouldwork!!!








Whatdowemeanbynonconvex?
Wehavesomefunction-J(θ)-fordeterminingtheparameters
Ourhypothesisfunctionhasanon-linearity(sigmoidfunctionofh
θ
(x))
Thisisacomplicatednon-linearfunction
Ifyoutakeh
θ
(x)andplugitintotheCost()function,andthemplugtheCost()function
intoJ(θ)andplotJ(θ)wefindmanylocaloptimum-> nonconvexfunction
Whyisthisaproblem
Lotsoflocalminimameangradientdescentmaynotfindtheglobaloptimum-may
getstuckinaglobalminimum
Wewouldlikeaconvexfunctionsoifyourungradientdescentyouconvergetoaglobal
minimum


Aconvexlogisticregressioncostfunction
Togetaroundthisweneedadifferent,convexCost()functionwhichmeanswecanapplygradient
descent

Theabovetwofunctionscanbecompressedintoasinglefunctioni.e.

GradientDescent
Nowthequestionarises,howdowereducethecostvalue.Well,thiscanbedonebyusingGradient
Descent.ThemaingoalofGradientdescentistominimizethecostvalue.i.e.minJ( θ).
Nowtominimizeourcostfunctionweneedtorunthegradientdescentfunctiononeachparameter
i.e.

Gradientdescenthasananalogyinwhichwehavetoimagineourselvesatthetopofamountain
valleyandleftstrandedandblindfolded,ourobjectiveistoreachthebottomofthehill.Feeling
theslopeoftheterrainaroundyouiswhateveryonewoulddo.Well,thisactionisanalogousto
calculatingthegradientdescent,andtakingastepisanalogoustooneiterationoftheupdateto
theparameters.




Multiclassclassificationproblems
Gettinglogisticregressionformulticlassclassificationusingonevs.all
Multiclass-morethanyesorno(1or0)
Classificationwithmultipleclassesforassignment












Givenadatasetwiththreeclasses,howdowegetalearningalgorithmtowork?
Useonevs.allclassificationmakebinaryclassificationworkformulticlassclassification
Onevs.allclassification
Splitthetrainingsetintothreeseparatebinaryclassificationproblems
i.e.createanewfaketrainingset
Triangle(1)vscrossesandsquares(0)h
θ
1
(x)
P(y=1|x
1
;θ)
Crosses(1)vstriangleandsquare(0)h
θ
2
(x)
P(y=1|x
2
;θ)
Square(1)vscrossesandsquare(0)h
θ
3
(x)
P(y=1|x
3
;θ)


Trainalogisticregressionclassifierh
θ
(i)
(x)foreachclassitopredictthe
probabilitythaty=i
Onanewinput,xtomakeaprediction,picktheclass ithatmaximizes
theprobabilitythath
θ
(i)
(x)=1

K-NearestNeighbors

Thisalgorithmclassifiescasesbasedontheirsimilaritytoothercases.
InK-NearestNeighbors,datapointsthatareneareachotheraresaidtobeneighbors.
K-NearestNeighborsisbasedonthisparadigm.
Similarcaseswiththesameclasslabelsareneareachother.
Thus,thedistancebetweentwocasesisameasureoftheirdissimilarity.
Therearedifferentwaystocalculatethesimilarityorconversely,
thedistanceordissimilarityoftwodatapoints.
Forexample,thiscanbedoneusingEuclideandistance.

-
-
-
-
-
-
theK-NearestNeighborsalgorithmworksasfollows.
pickavalueforK.
calculatethedistancefromthenewcaseholdoutfromeachofthecasesinthedataset.
searchfortheK-observationsinthetrainingdatathatarenearesttothemeasurementsofthe
unknowndatapoint.
predicttheresponseoftheunknowndatapointusingthemostpopularresponsevaluefrom
theK-NearestNeighbors.
Therearetwopartsinthisalgorithmthatmightbeabitconfusing.
First,howtoselectthecorrectK
second,howtocomputethesimilaritybetweencases,
Let'sfirststartwiththesecondconcern.

HowtoselectthecorrectK
Asmentioned,KandK-NearestNeighborsis
thenumberofnearestneighborstoexamine.
Itissupposedtobespecifiedbytheuser.
So,howdowechoosetherightK?
Assumethatwewanttofindtheclassof
thecustomernotedasquestionmarkonthe
chart.
Whathappensifwechooseaverylowvalueof
K?
Let'ssay,Kequalsone.
Thefirstnearestpointwouldbeblue,
whichisclassone.
Thiswouldbeabadprediction,
sincemoreofthepointsarounditaremagenta
orclassfour.

Infact,sinceitsnearestneighborisbluewecansaythatwecapturethenoiseinthedataorwechoseoneofthe
pointsthatwasananomalyinthedata.
AlowvalueofKcausesahighlycomplexmodelaswell,whichmightresultinoverfittingofthemodel.
Itmeansthepredictionprocessisnotgeneralizedenoughtobeusedforout-of-samplecases.
Out-of-sampledataisdatathatisoutsideofthedatasetusedtotrainthemodel.
Inotherwords,itcannotbetrustedtobeusedforpredictionofunknownsamples.It'simportanttorememberthat
overfittingisbad,aswewantageneralmodelthatworksforanydata,notjustthedatausedfortraining.
Now,ontheoppositesideofthespectrum,ifwechooseaveryhighvalueofKsuchasKequals20,
thenthemodelbecomesoverlygeneralized.

So,howcanwefindthebestvalueforK?
Thegeneralsolutionistoreserveapartofyourdatafortestingtheaccuracyofthemodel.
Onceyou'vedoneso,chooseKequalsoneandthenusethetrainingpartformodelingandcalculatethe
accuracyofpredictionusingallsamplesinyourtestset.
RepeatthisprocessincreasingtheKandseewhichKisbestforyourmodel.
Forexample,inourcase,
Kequalsfourwillgiveusthebestaccuracy.

AdvantagesofKNN

1.NoTrainingPeriod:KNNiscalledLazyLearner(Instancebasedlearning).Itdoesnotlearnanything
inthetrainingperiod.Itdoesnotderiveanydiscriminativefunctionfromthetrainingdata.Inother
words,thereisnotrainingperiodforit.Itstoresthetrainingdatasetandlearnsfromitonlyatthetime
ofmakingrealtimepredictions.ThismakestheKNNalgorithmmuchfasterthanotheralgorithmsthat
requiretraininge.g.SVM,LinearRegressionetc.

2.SincetheKNNalgorithmrequiresnotrainingbeforemakingpredictions,newdatacanbeadded
seamlesslywhichwillnotimpacttheaccuracyofthealgorithm.

3.KNNisveryeasytoimplement.ThereareonlytwoparametersrequiredtoimplementKNNi.e.the
valueofKandthedistancefunction(e.g.EuclideanorManhattanetc.)

DisadvantagesofKNN

1.Doesnotworkwellwithlargedataset:Inlargedatasets,thecostofcalculatingthe
distancebetweenthenewpointandeachexistingpointsishugewhichdegradesthe
performanceofthealgorithm.

2.Doesnotworkwellwithhighdimensions:TheKNNalgorithmdoesn'tworkwellwith
highdimensionaldatabecausewithlargenumberofdimensions,itbecomesdifficultfor
thealgorithmtocalculatethedistanceineachdimension.

3.Sensitivetonoisydata,missingvaluesandoutliers:KNNissensitivetonoiseinthe
dataset.Weneedtomanuallyimputemissingvaluesandremoveoutliers.

SUPPORTVECTORMACHINE(SVM)
ASupportVectorMachineisasupervisedalgorithmthatcanclassifycasesbyfindingaseparator.
SVMworksbyfirstmappingdatatoahighdimensionalfeaturespacesothatdatapointscanbe
categorized,evenwhenthedataarenotlinearlyseparable.
Then,aseparatorisestimatedforthedata.Thedatashouldbetransformedinsuchawaythata
separatorcouldbedrawnasahyperplane.

Therefore,theSVMalgorithmoutputsanoptimalhyperplanethatcategorizesnewexamples.

DATATRANFORMATION
Forthesakeofsimplicity,imaginethatourdatasetisone-dimensionaldata.
Thismeanswehaveonlyonefeaturex.
Asyoucansee,itisnotlinearlyseparable.
Well,wecantransferitintoatwo-dimensionalspace.Forexample,youcanincreasethedimensionofdataby
mappingxintoanewspaceusingafunctionwithoutputsxandxsquared.
Basically,mappingdataintoahigher-dimensionalspaceiscalled,kernelling.
Themathematicalfunctionusedforthetransformationisknownasthekernel
function,andcanbeofdifferenttypes,suchaslinear,polynomial,RadialBasisFunction,orRBF,andsigmoid.

SVMsarebasedontheideaoffindinga
hyperplanethatbestdividesadatasetinto
twoclassesasshownhere.
Aswe'reinatwo-dimensionalspace,you
canthinkofthehyperplaneasalinethat
linearlyseparatesthebluepointsfromthe
redpoints.
-
-
-
ADVANTAGES
-Accurateinhighdimensionplace
Memoryefficient
DISADVANTAGES
-Smalldatasets
-Pronetooverfitting
APPLICATIONS
ImageRecognition
Spamdetection

NaiveBayes
Classifiers
COLLECTIONOFCLASSIFICATIONALGORITHMS

PrincipleofNaiveBayesClassifier:



ANaiveBayesclassifierisaprobabilisticmachinelearningmodelthat’s
usedforclassificationtask.Thecruxoftheclassifierisbasedonthe
Bayestheorem.
Bayestheoremcanberewrittenas:
Itisnotasinglealgorithmbutafamilyofalgorithmswhereallofthem
shareacommonprinciple,i.e.everypairoffeaturesbeingclassifiedis
independentofeachother.

Example:
Letustakeanexampletogetsomebetterintuition.Considertheproblemof
playinggolf.Thedatasetisrepresentedasbelow.


Weclassifywhetherthedayissuitableforplayinggolf,giventhefeatures
oftheday.Thecolumnsrepresentthesefeaturesandtherowsrepresent
individualentries.Ifwetakethefirstrowofthedataset,wecanobserve
thatisnotsuitableforplayinggolfiftheoutlookisrainy,temperatureis
hot,humidityishighanditisnotwindy.Wemaketwoassumptionshere,
oneasstatedaboveweconsiderthatthesepredictorsareindependent.
Thatis,ifthetemperatureishot,itdoesnotnecessarilymeanthatthe
humidityishigh.Anotherassumptionmadehereisthatallthepredictors
haveanequaleffectontheoutcome.Thatis,thedaybeingwindydoes
nothavemoreimportanceindecidingtoplaygolfornot.Accordingtothisexample,Bayestheoremcanberewrittenas:
Thevariableyistheclassvariable(playgolf),whichrepresentsifitis
suitabletoplaygolfornotgiventheconditions.VariableXrepresentthe
parameters/features.

Xisgivenas,
Herex_1,x_2….x_nrepresentthefeatures,i.etheycanbemappedto
outlook,temperature,humidityandwindy.BysubstitutingforXand
expandingusingthechainruleweget,
Now,youcanobtainthevaluesforeachbylookingatthedatasetand
substitutethemintotheequation.Forallentriesinthedataset,the
denominatordoesnotchange,itremainstatic.Therefore,the
denominatorcanberemovedandaproportionalitycanbeintroduced.
Inourcase,theclassvariable(y)hasonlytwooutcomes,yesorno.There
couldbecaseswheretheclassificationcouldbemultivariate.Therefore,
weneedtofindtheclassywithmaximumprobability.
Usingtheabovefunction,wecanobtaintheclass,giventhepredictors.

WeneedtofindP(x
i|y
j)foreachx
iinXandy
jiny.Allthesecalculationshave
beendemonstratedinthetablesbelow:
So,inthefigureabove,wehavecalculatedP(x
i|y
j)foreachx
iinXandy
jiny
manuallyinthetables1-4.Forexample,probabilityofplayinggolfgiventhat
thetemperatureiscool,i.eP(temp.=cool|playgolf=Yes)=3/9.

Also,weneedtofindclassprobabilities(P(y))whichhasbeencalculatedinthetable5.For
example,P(playgolf=Yes)=9/14.
Sonow,wearedonewithourpre-computationsandtheclassifierisready!
Letustestitonanewsetoffeatures(letuscallittoday):

TypesofNaiveBayesClassifier:



MultinomialNaiveBayes:Thisismostlyusedfordocumentclassification
problem,i.ewhetheradocumentbelongstothecategoryofsports,
politics,technologyetc.Thefeatures/predictorsusedbytheclassifierare
thefrequencyofthewordspresent.
BernoulliNaiveBayes:Thisissimilartothemultinomialnaivebayesbut
thepredictorsarebooleanvariables.Theparametersthatweuseto
predicttheclassvariabletakeuponlyvaluesyesorno,forexampleifa
wordoccursinthetextornot.
GaussianNaiveBayes:Whenthepredictorstakeupacontinuousvalue
andarenotdiscrete,weassumethatthesevaluesaresampledfroma
gaussiandistribution.

GaussianDistribution(NormalDistribution)
Conclusion:
NaiveBayesalgorithmsaremostlyusedinsentimentanalysis,spamfiltering,
recommendationsystemsetc.Theyarefastandeasytoimplementbuttheir
biggestdisadvantageisthattherequirementofpredictorstobeindependent.
Inmostofthereallifecases,thepredictorsaredependent,thishindersthe
performanceoftheclassifier.

DecisionTree
CLASSIFICATIONALGORITHM

Decisiontreealgorithmfallsunderthecategoryofsupervisedlearning.Theycanbe
usedtosolvebothregressionandclassificationproblems..
Decisiontreebuildsclassificationorregressionmodelsintheformofatree
structure.Itbreaksdownadatasetintosmallerandsmallersubsetswhileatthe
sametimeanassociateddecisiontreeisincrementallydeveloped.Thefinalresult
isatreewithdecisionnodesandleafnodes.Adecisionnode(e.g.,Outlook)has
twoormorebranches(e.g.,Sunny,OvercastandRainy).Leafnode(e.g.,Play)
representsaclassificationordecision.Thetopmostdecisionnodeinatreewhich
correspondstothebestpredictorcalledrootnode.Decisiontreescanhandleboth
categoricalandnumericaldata.
Wecanrepresentanybooleanfunctionondiscreteattributesusingthedecision
tree.
Typesofdecisiontrees
CategoricalVariableDecisionTree:DecisionTreewhichhascategoricaltarget
variablethenitcalledascategoricalvariabledecisiontree.
ContinuousVariableDecisionTree:DecisionTreewhichhascontinuoustarget
variablethenitiscalledasContinuousVariableDecisionTree.

RootNode:Itrepresentsentirepopulationorsampleandthisfurthergetsdividedintotwoor
morehomogeneoussets.
Splitting:Itisaprocessofdividinganodeintotwoormoresub-nodes.
DecisionNode:Whenasub-nodesplitsintofurthersub-nodes,thenitiscalleddecisionnode.
Leaf/TerminalNode:Nodeswithnochildren(nofurthersplit)iscalledLeaforTerminalnode.
Pruning:Whenwereducethesizeofdecision
treesbyremovingnodes(oppositeofSplitting),
theprocessiscalledpruning.
Branch/Sub-Tree:Asubsectionofdecision
treeiscalledbranchorsub-tree.
ParentandChildNode:Anode,whichisdivided
intosub-nodesiscalledparentnodeofsub-nodes
whereassub-nodesarethechildofparentnode.

Algorithm







Algorithmsusedindecisiontrees:
ID3
GiniIndex
Chi-Square
ReductioninVariance
ThecorealgorithmforbuildingdecisiontreesiscalledID3.DevelopedbyJ.
R.Quinlanandituses Entropyand InformationGaintoconstructa
decisiontree.
TheID3algorithmbeginswiththeoriginalsetSastherootnode.Oneach
iterationofthealgorithm,ititeratesthrougheveryunusedattributeofthe
setSandcalculatestheentropyH(S)orinformationgainIG(S)ofthat
attribute.Itthenselectstheattributewhichhasthesmallestentropy(or
largestinformationgain)value.ThesetSisthensplitorpartitionedbythe
selectedattributetoproducesubsetsofthedata

Entropy
Entropyisameasureoftherandomnessintheinformationbeing
processed.Thehighertheentropy,theharderitistodrawanyconclusions
fromthatinformation.Decisiontreealgorithmusesentropytocalculate
thehomogeneityofasample.Ifthesampleiscompletelyhomogeneous
theentropyiszeroandifthesampleisanequallydividedithasentropyof
one.

Example:

Tobuildadecisiontree,weneedtocalculatetwotypesofentropyusing
frequencytablesasfollows:
a)Entropyusingthefrequencytableofoneattribute:

b)Entropyusingthefrequencytableoftwoattributes:

Informationgain
Theinformationgainisbasedonthedecreaseinentropyaftera
datasetissplitonanattribute.Constructingadecisiontreeisallabout
findingattributethatreturnsthehighestinformationgain(i.e.,themost
homogeneousbranches).
Step1:Calculateentropyofthetarget.

Step2:Thedatasetisthensplitonthedifferentattributes.Theentropy
foreachbranchiscalculated.Thenitisaddedproportionally,togettotal
entropyforthesplit.Theresultingentropyissubtractedfromtheentropy
beforethesplit.TheresultistheInformationGain,ordecreaseinentropy.

Step3:Chooseattributewiththelargestinformationgainasthedecision
node,dividethedatasetbyitsbranchesandrepeatthesameprocesson
everybranch.

Step4a:Abranchwithentropyof0isaleafnode
Step4b:Abranchwithentropymorethan0needsfurthersplitting

Step5:TheID3algorithmisrunrecursivelyonthenon-leafbranches,until
alldataisclassified.
DecisionTreetoDecisionRules
Adecisiontreecaneasilybetransformedtoasetofrulesbymapping
fromtherootnodetotheleafnodesonebyone.

LimitationstoDecisionTrees
Decisiontreestendtohavehighvariancewhentheyutilizedifferent
trainingandtestsetsofthesamedata,sincetheytendtooverfiton
trainingdata.Thisleadstopoorperformanceonunseendata.
Unfortunately,thislimitstheusageofdecisiontreesinpredictive
modeling.
Toovercometheseproblemsweuseensemblemethods,wecancreate
modelsthatutilizeunderlying(weak)decisiontreesasafoundationfor
producingpowerfulresultsandthisisdoneinRandomForestAlgorithm

Randomforest

Definition:
RandomforestalgorithmisasupervisedclassificationalgorithmBased
onDecisionTrees,alsoknownasrandomdecisionforests,areapopular
ensemblemethodthatcanbeusedtobuildpredictivemodelsforboth
classificationandregressionproblems.
Ensemblewemean(InRandomForestContext),CollectiveDecisionsof
DifferentDecisionTrees.InRFT(RandomForestTree),wemakea
predictionabouttheclass,notsimplybasedonOneDecisionTrees,butby
an(almost)UnanimousPrediction,madeby'K'DecisionTrees.
Construction:
'K'IndividualDecisionTreesaremadefromgivenDataset,byrandomly
dividingtheDatasetandtheFeatureSubspacebyprocesscalledas
BootstrapAggregation(Bagging),whichisprocessofrandomselection
withreplacement.Generally2/3rdoftheDataset(row-wise2/3rd)is
selectedbybagging,andOnthatSelectedDatasetweperformwhatwe
callisAttributeBagging.

NowAttributeBaggingisdonetoselect'm'featuresfromgivenM
features,(thisProcessisalsocalledRandomSubspaceCreation.)
Generallyvalueof'm'issquare-rootofM.Nowweselectsay,10such
valuesofm,andthenBuild10DecisionTreesbasedonthem,andtestthe
1/3rdremainingDatasetonthese(10DecisionTrees).Wewouldthen
SelecttheBestDecisionTreeoutofthis.AndRepeatthewholeProcess
'K'timesagaintobuildsuch'K'decisiontrees.
Classification:
PredictioninRandomForest(acollectionof'K'DecisionTrees)istruly
ensembleie,ForEachDecisionTree,PredicttheclassofInstanceand
thenreturntheclasswhichwaspredictedthemostoften.

UsingRandomForestClassifier
fromsklearn.ensembleimportRandomForestClassifier//Importinglibrary
TRAIN_DIR="../train-mails"
TEST_DIR="../test-mails"
dictionary=make_Dictionary(TRAIN_DIR)
print"readingandprocessingemailsfromfile."
features_matrix,labels=extract_features(TRAIN_DIR)
test_feature_matrix,test_labels=extract_features(TEST_DIR)
model=RandomForestClassifier()//Creatingmodel
print"Trainingmodel."
model.fit(features_matrix,labels)//trainingmodel
predicted_labels=model.predict(test_feature_matrix)
print"FINISHEDclassifying.accuracyscore:"
printaccuracy_score(test_labels,predicted_labels)//Predicting
wewillgetaccuracyaround95.7%.

Parameters






Letsunderstandandtrywithsomeofthetuningparameters.
n_estimators:Numberoftreesinforest.Defaultis10.
criterion:“gini”or“entropy”sameasdecisiontreeclassifier.
min_samples_split:minimumnumberofworkingsetsizeat
noderequiredtosplit.Defaultis2.
Playwiththeseparametersbychangingvaluesindividuallyand
incombinationandcheckifyoucanimproveaccuracy.
tryingfollowingcombinationandobtainedtheaccuracyas
showninnextslideimage.

FinalThoughts



RandomForestClassifierbeingensembledalgorithmtendsto
givemoreaccurateresult.Thisisbecauseitworkson
principle,
Numberofweakestimatorswhencombinedformsstrong
estimator.
Evenifoneorfewdecisiontreesarepronetoanoise,overall
resultwouldtendtobecorrect.Evenwithsmallnumberof
estimators=30itgaveushighaccuracyas97%.

Clustering
UNSUPERVISEDLEARNING

Clustering




Aclusterisasubsetofdatawhicharesimilar.
Clustering(alsocalledunsupervisedlearning)istheprocessofdividinga
datasetintogroupssuchthatthemembersofeachgroupareassimilar
(close)aspossibletooneanother,anddifferentgroupsareasdissimilar
(far)aspossiblefromoneanother.
Generally,itisusedasaprocesstofindmeaningfulstructure,generative
features,andgroupingsinherentinasetofexamples.
Clusteringcanuncoverpreviouslyundetectedrelationshipsinadataset.
Therearemanyapplicationsforclusteranalysis.Forexample,inbusiness,
clusteranalysiscanbeusedtodiscoverandcharacterizecustomer
segmentsformarketingpurposesandinbiology,itcanbeusedfor
classificationofplantsandanimalsgiventheirfeatures.

ClusteringAlgorithms



K-meansAlgorithm
Thesimplestamongunsupervisedlearningalgorithms.Thisworksonthe
principleofk-meansclustering.Thisactuallymeansthattheclustered
groups(clusters)foragivensetofdataarerepresentedbyavariable‘k’.
Foreachcluster,acentroid(arithmeticmeanofallthedatapointsthat
belongtothatcluster)isdefined.
Thecentroidisadatapointpresentatthecentreofeachcluster
(consideringEuclideandistance).Thetrickistodefinethecentroidsfar
awayfromeachothersothatthevariationisless.Afterthis,eachdata
pointintheclusterisassignedtothenearestcentroidsuchthatthesum
ofthesquareddistancebetweenthedatapointsandthecluster’scentroid
isattheminimum.

Algorithm
1.Clustersthedataintokgroupswherekispredefined.
2.kpointsatrandomasclustercenters.
3.AssignobjectstotheirclosestclustercenteraccordingtotheEuclidean
distancefunction.
4.Calculatethecentroidormeanofallobjectsineachcluster.
5.Repeatsteps2,3and4untilthesamepointsareassignedtoeach
clusterinconsecutiverounds.
TheEuclideandistancebetweentwopointsineithertheplaneor3-
dimensionalspacemeasuresthelengthofasegmentconnectingthetwo
points.

Thestepbystepprocess:

K-meansclusteringalgorithmhasfoundtobeveryusefulingroupingnew
data.Somepracticalapplicationswhichusek-meansclusteringare
sensormeasurements,activitymonitoringinamanufacturingprocess,
audiodetectionandimagesegmentation.

DisadvantageOfK-MEANS:
K-Meansformssphericalclustersonly.Thisalgorithmfailswhendatais
notspherical(i.e.samevarianceinalldirections).
K-Meansalgorithmissensitivetowardsoutlier.Outlierscanskewthe
clustersinK-Meansinverylargeextent.
K-Meansalgorithmrequiresonetospecifythenumberofclustersandfor
whichthereisnoglobalmethodtochoosebestvalue.

HierarchicalClusteringAlgorithms
Lastbutnottheleastarethehierarchicalclusteringalgorithms.These
algorithmshaveclusterssortedinanorderbasedonthehierarchyindata
similarityobservations.Hierarchicalclusteringiscategorisedintotwo
types,divisive(top-down)clusteringandagglomerative(bottom-up)
clustering.
AgglomerativeHierarchicalclusteringTechnique:Inthistechnique,
initiallyeachdatapointisconsideredasanindividualcluster.Ateach
iteration,thesimilarclustersmergewithotherclustersuntiloneclusteror
Kclustersareformed.
DivisiveHierarchicalclusteringTechnique:DivisiveHierarchicalclustering
isexactlytheoppositeoftheAgglomerativeHierarchicalclustering.In
DivisiveHierarchicalclustering,weconsiderallthedatapointsasasingle
clusterandineachiteration,weseparatethedatapointsfromthecluster
whicharenotsimilar.Eachdatapointwhichisseparatedisconsideredas
anindividualcluster.
Mostofthehierarchicalalgorithmssuchassinglelinkage,complete
linkage,medianlinkage,Ward’smethod,amongothers,followthe
agglomerativeapproach.

Calculatingthesimilaritybetweentwoclustersisimportanttomergeor
dividetheclusters.Therearecertainapproacheswhichareusedto
calculatethesimilaritybetweentwoclusters:
MIN:Alsoknownassinglelinkagealgorithmcanbedefinedasthe
similarityoftwoclustersC1andC2isequaltotheminimumofthe
similaritybetweenpointsPiandPjsuchthatPibelongstoC1andPj
belongstoC2.
Thisapproachcanseparatenon-ellipticalshapesaslongasthegap
betweentwoclustersisnotsmall.
MINapproachcannotseparateclustersproperlyifthereisnoisebetween
clusters.

MAX:Alsoknownasthecompletelinkagealgorithm,thisisexactly
oppositetotheMINapproach.ThesimilarityoftwoclustersC1andC2is
equaltothemaximumofthesimilaritybetweenpointsPiandPjsuchthat
PibelongstoC1andPjbelongstoC2.
MAXapproachdoeswellinseparatingclustersifthereisnoisebetween
clustersbutMaxapproachtendstobreaklargeclusters.

GroupAverage:Takeallthepairsofpointsandcomputetheirsimilarities
andcalculatetheaverageofthesimilarities.
ThegroupAverageapproachdoeswellinseparatingclustersifthereis
noisebetweenclustersbutitislesspopulartechniqueintherealworld.
LimitationsofHierarchicalclusteringTechnique:
ThereisnomathematicalobjectiveforHierarchicalclustering.
Alltheapproachestocalculatethesimilaritybetweenclustershasitsown
disadvantages.
HighspaceandtimecomplexityforHierarchicalclustering.Hencethis
clusteringalgorithmcannotbeusedwhenwehavehugedata.

Density-basedspatialclusteringofapplicationswithnoise(DBSCAN)isa
well-knowndataclusteringalgorithmthatiscommonlyusedindatamining
andmachinelearning.
UnliketoK-means,DBSCANdoesnotrequiretheusertospecifythe
numberofclusterstobegenerated
DBSCANcanfindanyshapeofclusters.Theclusterdoesn’thavetobe
circular.
DBSCANcanidentifyoutliers
Thebasicideabehinddensity-basedclusteringapproachisderivedfroma
humanintuitiveclusteringmethod.bylookingatthefigurebelow,onecan
easilyidentifyfourclustersalongwithseveralpointsofnoise,becauseofthe
differencesinthedensityofpoints

DBSCANalgorithmhastwoparameters:
ɛ:Theradiusofourneighborhoodsaroundadatapoint p.
minPts:Theminimumnumberofdatapointswewantinaneighborhoodtodefineacluster.
Usingthesetwoparameters,DBSCANcategoriesthedatapointsintothreecategories:
CorePoints:Adatapoint pisa corepointifNbhd( p, ɛ)[ɛ-neighborhoodof p]containsatleast
minPts;|Nbhd( p, ɛ)|>= minPts.
BorderPoints:Adatapoint*qisa borderpointifNbhd( q, ɛ)containslessthan minPtsdata
points,butqis reachablefromsome corepointp.
Outlier:Adatapoint oisan outlierifitisneitheracorepointnoraborderpoint.Essentially,
thisisthe“other”class.

ThestepstotheDBSCANalgorithmare:
Pickapointatrandomthathasnotbeenassignedtoaclusterorbeen
designatedasan outlier.Computeitsneighborhoodtodetermineifit’sa
corepoint.Ifyes,startaclusteraroundthispoint.Ifno,labelthepointas
anoutlier.
Oncewefinda corepointandthusacluster,expandtheclusterbyadding
alldirectly-reachablepointstothecluster.Perform“neighborhoodjumps”
tofindalldensity-reachablepointsandaddthemtothecluster.Ifanan
outlierisadded,changethatpoint’sstatusfrom outlierto borderpoint.
Repeatthesetwostepsuntilallpointsareeitherassignedtoaclusteror
designatedasan outlier.
.

BelowistheDBSCANclusteringalgorithminpseudocode:
DBSCAN(dataset,eps,MinPts){
#clusterindex
C=1
foreachunvisitedpointpindataset{
markpasvisited
#findneighbors
NeighborsN=findtheneighboringpointsofp
if|N|>=MinPts:
N=NUN'
ifp'isnotamemberofanycluster:
addp'toclusterC
}

1.
2.
3.
CROSSVALIDATION:
Cross-validationisatechniqueinwhichwetrainourmodelusingthesubsetofthe
data-setandthenevaluateusingthecomplementarysubsetofthedata-set.
Thethreestepsinvolvedincross-validationareasfollows:
Splitdatasetintotrainingandtestset
Usingthetrainingsettrainthemodel.
Testthemodelusingthetestset
USE:Togetgoodoutofsampleaccuracy
Eventhoughweusecrossvalidationtechniquewegetvariationinaccuracywhen
wetrainourmodelforthatweuseK-foldcrossvalidationtechnique

InK-foldcrossvalidation,wesplitthedata-setintoknumberof
subsets(knownasfolds)thenweperformtrainingontheallthe
subsetsbutleaveone(k-1)subsetfortheevaluationofthe
trainedmodel.Inthismethod,weiteratektimeswithadifferent
subsetreservedfortestingpurposeeachtime.



Codeinpythonfork-crossvalidation:
fromsklearn.model_selectionimportcross_val_score
List=cross_val_score(estimator=#nameofyourmodelobject,
X=#yourtrainedinput,y=#correctoutput,cv=#numberoffolds(k))
Firstlineisimportingkfoldcrossvalidationfunctionfrom
model_selectionsublibraryfromsklearnlibrary
secondlinewillgivealistofaccuraciesbasedonkvalue.we
needtoaveragethemtogetthemodelaccurateaccuracy

1.
Howtochoosetheoptimalvaluesforthehyperparameters?
Hyperparameters,aretheparametersthatcannotbedirectly
learnedfromtheregulartrainingprocess.Theyareusuallyfixed
beforetheactualtrainingprocessbegins.Theseparameters
expressimportantpropertiesofthemodelsuchasitscomplexityor
howfastitshouldlearn.
Examples:
Thekink-nearestneighbours.
Modelscanhavemanyhyperparametersandfindingthebest
combinationofparameterscanbetreatedasasearchproblem.
OneofthebeststrategiesforHyperparametertuningisgrid_search.

GridSearchCV:
InGridSearchCVapproach,machinelearningmodelisevaluatedforarange
ofhyperparametervalues.ThisapproachiscalledGridSearchCV,itsearches
forbestsetofhyperparametersfromagridofhyperparametersvalues.
Forexample,ifwewanttosethyperparameterK-nearestneighboursmodel,
withdifferentsetofvalues.Thegridsearchtechniquewillcheckmodelwith
allpossiblecombinationsofhyperparameters,andwillreturnthebestone.
Fromgraphwecansaybest
valueforkis10andgrid
searchwillsearchallthe
valuesofkthatwegivenin
rangeandreturnthebest
one

Codeinpythonforgettingoptimalhyperparameterusing
gridsearchforsupportvectormachine:
#importingsvmfromsvclibrary
fromsklearn.svmimportSVC
Classifier=SVC()
#Toimportgridsearcvclassfromsklearnlibrary
fromsklearn.model_selectionimportGridSearchCV
#creatingalistofdictonartiesthatneedtobeinputedforgridsearch
parameters=[{'C':[1,10,100,1000],'kernel':['linear']},{'C':[1,10,100,1000],'kernel':['rbf'
],'gamma':[0.5,0.1,0.01,0.001]}]
#creatinggridsearchobject
gridsearch=GridSearchCV(estimator=classifier,param_grid=parameters,
scoring='accuracy',cv=10,n_jobs=-1)
#fittinggridsearchwithdataset
gd=gridsearch.fit(X_train,y_train)

#fittinggridsearchwithdataset
gd=gridsearch.fit(X_train,y_train)
#bestscoreamongallmodelsingridsearch
bestaccuracy=gd.best_score_
#returntheparametersofbestmodel
best_param=gd.best_params_

XGBOOST
XGBoostisanimplementationofgradientboosteddecisiontreesdesignedforspeed
andperformance
Inthisalgorithm,decisiontrees
arecreatedinsequentialform.
Weightsplayanimportantrolein
XGBoost.Weightsareassigned
toalltheindependentvariables
whicharethenfedintothe
decisiontreewhichpredicts
results.

Weightofvariablespredictedwrongbythetreeisincreasedand
thesethevariablesarethenfedtotheseconddecisiontree.
Theseindividualclassifiers/predictorsthenensembletogivea
strongandmoreprecisemodel.Itcanworkonregression,
classification,ranking,anduser-definedpredictionproblems.
Finalvalueoftheclassifieristheclasswhichisrepeatedmore
numberoftimesamongallclassifierforclassificationproblem
andforregressionproblemfinalvalueistheaverageofallthe
regressorvaluesgotinsequentialtrees

CodeinpythonforXGBOOST
#FittingXGBoosttothetrainingdataforclassifier
Importxgboostasxgb
my_model=xgb.XGBClassifier()
my_model.fit(X_train,y_train)
#predictingthetestsetresults
Y_pred=my_model.predict(X_test)
Tags