Machine Learning deep learning artificial

MachineLearning

ArthurSamuel,apioneerintheﬁeldofartiﬁcialintelligenceandcomputergaming,coined
theterm“MachineLearning”as–“Fieldofstudythatgivescomputersthecapabilityto
learnwithoutbeingexplicitlyprogrammed”.
MachineLearning
Howitisdifferentfromtraditional
Programming:


InTraditionalProgramming,wefeedtheInput,
Programlogicandruntheprogramtoget
output.
InMachineLearning,wefeedtheinput,output
andrunitonmachineduringtrainingandthe
machinecreatesitsownlogic,whichisbeing
evaluatedwhiletesting.

TerminologiesthatoneshouldknowbeforestartingMachineLearning:





Model:Amodelisaspeciﬁcrepresentationlearnedfromdatabyapplyingsome
machinelearningalgorithm.Amodelisalsocalledhypothesis.
Feature:Afeatureisanindividualmeasurablepropertyofourdata.Asetofnumeric
featurescanbeconvenientlydescribedbyafeaturevector.Featurevectorsarefedas
inputtothemodel.Forexample,inordertopredictafruit,theremaybefeatureslike
color,smell,taste,etc.
Target(Label):Atargetvariableorlabelisthevaluetobepredictedbyourmodel.For
thefruitexamplediscussedinthefeaturessection,thelabelwitheachsetofinput
wouldbethenameofthefruitlikeapple,orange,banana,etc.
Training:Theideaistogiveasetofinputs(features)andit’sexpectedoutputs(labels),
soaftertraining,wewillhaveamodel(hypothesis)thatwillthenmapnewdatatoone
ofthecategoriestrainedon.
Prediction:Onceourmodelisready,itcanbefedasetofinputstowhichitwillprovidea
predictedoutput(label).

TypesofLearning
•
•
•
SupervisedLearning
UnsupervisedLearning
Semi-SupervisedLearning
1.SupervisedLearning:Supervisedlearningiswhenthemodelisgettingtrainedonalabelled
dataset.Labelleddatasetisonewhichhavebothinputandoutputparameters.Inthistype
oflearningbothtrainingandvalidationdatasetsarelabelledasshownintheﬁguresbelow.
Classiﬁcation Regression

•
•
TypesofSupervisedLearning:
Classiﬁcation
Regression
Classiﬁcation:ItisaSupervisedLearningtaskwhereoutputishavingdeﬁned
labels(discretevalue).ForexampleinaboveFigureA,Output–Purchasedhas
deﬁnedlabelsi.e.0or1;1meansthecustomerwillpurchaseand0meansthat
customerwon’tpurchase.Itcanbeeitherbinaryormulticlassclassiﬁcation.In
binaryclassiﬁcation,modelpredictseither0or1;yesornobutincaseofmulticlass
classiﬁcation,modelpredictsmorethanoneclass.
Example:Gmailclassiﬁesmailsinmorethanoneclasseslikesocial,promotions,
updates,offers.
Regression:ItisaSupervisedLearningtaskwhereoutputishavingcontinuousvalue.
ExampleinbeforeregressionFigure,Output–WindSpeedisnothavinganydiscrete
valuebutiscontinuousintheparticularrange.Thegoalhereistopredictavalueas
muchclosertoactualoutputvalueasourmodelcanandthenevaluationisdoneby
calculatingerrorvalue.Thesmallertheerrorthegreatertheaccuracyofour
regressionmodel.







LinearRegression
NearestNeighbor
GaussianNaiveBayes
DecisionTrees
SupportVectorMachine(SVM)
RandomForest
ExampleofSupervisedLearningAlgorithms:

UnsupervisedLearning:
Unsupervisedlearningisthetrainingofmachineusinginformationthatisneither
classiﬁednorlabeledandallowingthealgorithmtoactonthatinformation
withoutguidance.Herethetaskofmachineistogroupunsortedinformation
accordingtosimilarities,patternsanddifferenceswithoutanypriortrainingof
data.Unsupervisedmachinelearningismorechallengingthansupervised
learningduetotheabsenceoflabels.


TypesofUnsupervisedLearning:
Clustering
Association

Clustering:Aclusteringproblemiswhereyouwanttodiscovertheinherent
groupingsinthedata,suchasgroupingcustomersbypurchasingbehavior.
Association:Anassociationrulelearningproblemiswhereyouwanttodiscover
rulesthatdescribelargeportionsofyourdata,suchaspeoplethatbuyXalso
tendtobuyY.


Examplesofunsupervisedlearningalgorithmsare:
k-meansforclusteringproblems.
Apriorialgorithmforassociationrulelearningproblems
ThemostbasicdisadvantageofanySupervisedLearningalgorithmisthat
thedatasethastobehand-labeledeitherbyaMachineLearningEngineeror
aDataScientist.Thisisavery costlyprocess,especiallywhendealingwith
largevolumesofdata.ThemostbasicdisadvantageofanyUnsupervised
Learningisthatit’sapplicationspectrumislimited.

Semi-supervisedmachinelearning:
Tocounterthesedisadvantages,theconceptofSemi-SupervisedLearningwas
introduced.Inthistypeoflearning,thealgorithmistraineduponacombinationoflabeled
andunlabeleddata.Typically,thiscombinationwillcontainaverysmallamountoflabeled
dataandaverylargeamountofunlabeleddata.
•Insemisupervisedlearninglabelled
dataisusedtolearnamodeland
usingthatmodelunlabeleddatais
labelledcalledpseudolabellingnow
usingwholedatamodelistrainedfor
furtheruse

Intuitively,onemayimaginethethreetypesoflearningalgorithmsasSupervisedlearning
whereastudentisunderthesupervisionofateacheratbothhomeandschool,
UnsupervisedlearningwhereastudenthastoﬁgureoutaconcepthimselfandSemi-
Supervisedlearningwhereateacherteachesafewconceptsinclassandgivesquestions
ashomeworkwhicharebasedonsimilarconcepts.
Modelwithlabellleddataandmodelwithbothlabelledandunlabelled
data

REGRESSION
Regressionisastatisticalmeasurementusedinﬁnance,
investing,andotherdisciplinesthatattemptstodeterminethe
strengthoftherelationshipbetweenonedependentvariableand
aseriesofotherchangingvariablesorindependentvariable

Typesofregression

linearregression
Simplelinearregression
Multiplelinearregression
Polynomialregression
Decisiontreeregression
Randomforestregression

SimpleLinearregression

Thesimplelinearregression
modelsareusedtoshowor
predicttherelationship
betweenthetwovariablesor
factors
Thefactorthatbeingpredicted
iscalleddependentvariable
andthefactorsthatisareused
topredictthedependent
variablearecalled
independentvariables
SimpleLinearregression

PredictingC02emissionwithenginesizefeature
usingsimplelinearregression

fromsklearnimportlinear_model
regr=linear_model.LinearRegression()
train_x=np.asanyarray(train[['ENGINESIZE']])
train_y=np.asanyarray(train[['CO2EMISSIONS']])
regr.ﬁt(train_x,train_y)
#Thecoeﬃcients
print('Coeﬃcients:',regr.coef_)
print('Intercept:',regr.intercept_)

Multiplelinearregression
Multipleregressionis
anextensionofsimple
linearregression.Itis
usedwhenwewantto
predictthevalueofa
variablebasedonthe
valueoftwoormore
othervariables.The
variablewewantto
predictiscalledthe
dependentvariable(or
sometimes,the
outcome,targetor
criterionvariable).

•
•
Simplelinearregression
PredictCO2emissionvsEnginesizeofallcars
-Independentvariable(x):Enginesize
-Dependentvariable(y):CO2emission
Multiplelinearregression
PredictCO2emissionvsEnginesizeandcylindersofallcar
-Independentvariable(x):enginesize,cylinders
-Dependentvariable(y):CO2emission

fromsklearnimportlinear_model
regr=linear_model.LinearRegression()
train_x=np.asanyarray(train[['ENGINESIZE','CYLINDERS']])
train_y=np.asanyarray(train[['CO2EMISSIONS']])
regr.ﬁt(train_x,train_y)
#Thecoeﬃcients
print('Coeﬃcients:',regr.coef_)
print('Intercept:',regr.intercept_)

Polynomialregression
PolynomialRegressionisa
formoflinearregressionin
whichtherelationshipbetween
theindependentvariablexand
dependentvariableyis
modelledasan nthdegree
polynomial.Polynomial
regressionﬁtsanonlinear
relationshipbetweenthevalue
ofxandthecorresponding
conditionalmeanofy,
denotedE(y|x)

fromsklearn.preprocessingimportPolynomialFeatures
fromsklearnimportlinear_model
train_x=np.asanyarray(train[['ENGINESIZE','CYLINDERS']])
train_y=np.asanyarray(train[['CO2EMISSIONS']])
test_x=np.asanyarray(test[['ENGINESIZE']])
test_y=np.asanyarray(test[['CO2EMISSIONS']])
poly=PolynomialFeatures(degree=2)
train_x_poly=poly.ﬁt_transform(train_x)
train_x_poly.shape

ﬁt_transformtakesourxvalues,andoutputalistofourdataraisedfrompower
of0topowerof2(sincewesetthedegreeofourpolynomialto2).
inourexample
Now,wecandealwithitas'linearregression'problem.Therefore,this
polynomialregressionisconsideredtobeaspecialcaseoftraditional
multiplelinearregression.So,youcanusethesamemechanismaslinear
regressiontosolvesuchaproblems.
sowecanuseLinearRegression()functiontosolveit:
clf=linear_model.LinearRegression()
train_y_=clf.ﬁt(train_x_poly,train_y)
#Thecoeﬃcients
print('Coeﬃcients:',clf.coef_)
print('Intercept:',clf.intercept_)

Decisiontreeregression
Decisiontreebuildsregressionmodelsintheformofatreestructure.Itbreaks
downadatasetintosmallerandsmallersubsetswhileatthesametimean
associateddecisiontreeisincrementallydeveloped.Theﬁnalresultisatree
withdecisionnodesandleafnodes.Adecisionnode(e.g.,Outlook)hastwoor
morebranches(e.g.,Sunny,OvercastandRainy),eachrepresentingvaluesfor
theattributetested.Leafnode(e.g.,HoursPlayed)representsadecisiononthe
numericaltarget.Thetopmostdecisionnodeinatreewhichcorrespondstothe
bestpredictorcalledrootnode.Decisiontreescanhandlebothcategoricaland
numericaldata.

Decision tree regression observes features of an object and trains a
model in the structure of a tree to predict data in the future to produce
meaningful continuous output. Continuous output means that the
output/result is not discrete, i.e., it is not represented just by a discrete,
known set of numbers or values.
Decisiontreeregressionobservesfeaturesofanobjectandtrainsa
modelinthestructureofatreetopredictdatainthefuturetoproduce
meaningfulcontinuousoutput.Continuousoutputmeansthatthe
output/resultisnotdiscrete,i.e.,itisnotrepresentedjustbyadiscrete,
knownsetofnumbersorvalues.
Discrete output example: A weather prediction model that predicts
whether or not there’ll be rain in a particular day.
Continuous output example: A proﬁt prediction model that states the
probable proﬁt that can be generated from the sale of a product.
Discreteoutputexample:Aweatherpredictionmodelthatpredicts
whetherornotthere’llberaininaparticularday.
Continuousoutputexample:Aproﬁtpredictionmodelthatstatesthe
probableproﬁtthatcanbegeneratedfromthesaleofaproduct.

Code:
#importtheregressor
fromsklearn.treeimportDecisionTreeRegressor
#createaregressorobject
regressor=DecisionTreeRegressor(random_state=0)
#ﬁttheregressorwithXandYdata
regressor.ﬁt(X,y)

Randomforestregression
The Random Forest is one of the most effective machine learning
models for predictive analytics, making it an industrial workhorse for
machine learning.
TheRandomForestisoneofthemosteffectivemachinelearning
modelsforpredictiveanalytics,makingitanindustrialworkhorsefor
machinelearning.
The random forest model is a type of additive model that makes
predictions by combining decisions from a sequence of base models.
Here, each base classiﬁer is a simple decision tree. This broad
technique of using multiple models to obtain better predictive
performance is called model ensembling. In random forests, all the
base models are constructed independently using a different
subsample of the data
Therandomforestmodelisatypeofadditivemodelthatmakes
predictionsbycombiningdecisionsfromasequenceofbasemodels.
Here,eachbaseclassiﬁerisasimpledecisiontree.Thisbroad
techniqueofusingmultiplemodelstoobtainbetterpredictive
performanceiscalledmodelensembling.Inrandomforests,allthe
basemodelsareconstructedindependentlyusingadifferent
subsampleofthedata

Approach:

Pick at random K data points from
the training set.
PickatrandomKdatapointsfrom
thetrainingset.
Build the decision tree associated
with those K data points.
Buildthedecisiontreeassociated
withthoseKdatapoints.
Choose the number Ntree of trees
you want to build and repeat step
1 & 2.
ChoosethenumberNtreeoftrees
youwanttobuildandrepeatstep
1&2.
For a new data point, make each
one of your Ntree trees predict the
value of Y for the data point, and
assign the new data point the
average across all of the predicted
Y values.
Foranewdatapoint,makeeach
oneofyourNtreetreespredictthe
valueofYforthedatapoint,and
assignthenewdatapointthe
averageacrossallofthepredicted
Yvalues.

Code
#importtheregressor
fromsklearn.treeimportDecisionTreeRegressor
#createaregressorobject
regressor=DecisionTreeRegressor(random_state=0)
#ﬁttheregressorwithXandYdata
regressor.ﬁt(X,y)

Prosandcons
Regressionmodel Pros Cons
Linearregression
Worksonanysizeofdataset,
givesinformationaboutfeatures.
TheLinearregression
assumptions.
Polynomialregression
Worksonanysizeofdataset,
worksverywellonnonlinear
problems
Needtochooserightpolynomial
degreefora.Goodbiasandtrade
off.
SVR
Easilyadaptable,worksverywell
onnonlinearproblems,notbiased
byoutliers
Compulsorytoapplyfeature
scaling,notwellknown,more
diﬃculttounderstand.
Decisiontreerecession
Interpretability,noneedforfeature
scaling,worksonbothlinearand
nonlinearproblems
Poorresultsonsmalldatasets,
overﬁttingcaneasilyoccur
Randomforestregression
Powerfulandaccurate,good
performancemanyproblems,
includingnonlinear
NoInterpretability,overﬁttingcan
easilyoccur,needtochoose
numberoftrees

LOGISTIC
REGRESSION
Instatistics,thelogisticmodelisusedtomodeltheprobabilityofacertainclassoreventexistingsuchaspass/fail,
win/lose,alive/deadorhealthy/sick.Thiscanbeextendedtomodelseveralclassesofeventssuchasdetermining
whetheranimagecontainsacat,dog,lion,etc
Basedonthenumberofcategories,Logisticregressioncanbeclassiﬁedas:
binomial:Targetvariablecanhaveonly2possibletypes:“0”or“1”whichmayrepresent“win”vs“loss”,“pass”vs“fail
”,“dead”vs“alive”,etc.
multinomial:Targetvariablecanhave3ormorepossibletypeswhicharenotordered(i.e.typeshavenoquantitative
signiﬁcance)like“diseaseA”vs“diseaseB”vs“diseaseC”.
ordinal:Itdealswithtargetvariableswithorderedcategories.Forexample,atestscorecanbe
categorizedas:“verypoor”,“poor”,“good”,“verygood”.Here,eachcategorycanbegivena
scorelike0,1,2,3.

•
•
•
•
•
Startwithbinaryclassproblems
Howdowedevelopaclassiﬁcationalgorithm?
Tumoursizevsmalignancy(0or1)
Wecoulduselinearregression
Thenthresholdtheclassiﬁeroutput(i.e.anythingoversomevalueisyes,elseno)
Inourexamplebelowlinearregressionwiththresholdingseemstowork

•
•
•
•
•
•
•
•
Wecanseeabovethisdoesareasonablejobofstratifyingthedatapointsintooneoftwoclasses
ButwhatifwehadasingleYeswithaverysmalltumour
Thiswouldleadtoclassifyingalltheexistingyesesasnos
Anotherissueswithlinearregression
WeknowYis0or1
Hypothesiscangivevalueslargethan1orlessthan0
So,logisticregressiongeneratesavaluewhereisalwayseither0or1
Logisticregressionisaclassiﬁcationalgorithm-don'tbeconfused
•
•
•
•
•
•
•
•
Hypothesisrepresentation
Whatfunctionisusedtorepresentourhypothesisinclassiﬁcation
Wewantourclassiﬁertooutputvaluesbetween0and1
Whenusinglinearregressionwedidh
θ
(x)=(θ
T
x)
Forclassiﬁcationhypothesisrepresentationwedoh
θ
(x)=g((θ
T
x))
Wherewedeﬁneg(z)
zisarealnumber
Thisisthesigmoidfunction,orthelogisticfunction
Ifwecombinetheseequationswecanwriteoutthehypothesisas

•
•
•
Howdoesthesigmoidfunctionlooklike
Crosses0.5attheorigin,thenﬂattensout]
Asymptotesat0and1

•
•
•
•
•
•
•
•
•
•
•
Interpretinghypothesisoutput
Whenourhypothesis(h
θ
(x))outputsanumber,wetreatthatvalueastheestimated
probabilitythaty=1oninputx
Example
IfXisafeaturevectorwithx
0
=1(asalways)andx
1
=tumourSize
h
θ
(x)=0.7
Tellsapatienttheyhavea70%chanceofatumorbeingmalignant
h
θ
(x)=P(y=1|x;θ)
Whatdoesthismean?
Probabilitythaty=1,givenx,parameterizedbyθ
Sincethisisabinaryclassiﬁcationtaskweknowy=0or1
Sothefollowingmustbetrue
P(y=1|x;θ)+P(y=0|x;θ)=1
P(y=0|x;θ)=1-P(y=1|x;θ)

•
•
•
•
•
•
•
Decisionboundary
Thisgivesabettersenseofwhatthehypothesisfunctioniscomputing
Onewayofusingthesigmoidfunctionis;
Whentheprobabilityofybeing1isgreaterthan0.5thenwecanpredicty=1
Elsewepredicty=0
Whenisitexactlythath
θ
(x)isgreaterthan0.5?
Lookatsigmoidfunction
g(z)isgreaterthanorequalto0.5whenzisgreaterthanorequalto0
•
•
•
•
•
•
•
•
Soifzispositive,g(z)isgreaterthan0.5
z=(θ
T
x)
Sowhen
θ
T
x>=0
Thenh
θ
>=0.5
Sowhatwe'veshownisthatthehypothesispredictsy=1whenθ
T
x>=0
Thecorollaryofthatwhenθ
T
x<=0thenthehypothesispredictsy=0
Let'susethistobetterunderstandhowthehypothesismakesitspredictions

Consider,
h
θ
(x)=g(θ
0
+θ
1
x
1
+θ
2
x
2
)
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
So,forexample
θ
0
=-3
θ
1
=1
θ
2
=1
Soourparametervectorisacolumnvectorwiththeabovevalues
So,θ
T
isarowvector=[-3,1,1]
Whatdoesthismean?
Thezherebecomesθ
T
x
Wepredict"y=1"if
-3x
0
+1x
1
+1x
2
>=0
-3+x
1
+x
2
>=0
Wecanalsore-writethisas
If(x
1
+x
2
>=3)thenwepredicty=1
Ifweplot
x
1
+x
2
=3wegraphicallyplotourdecisionboundary

h
θ
(x)=g(θ
0
+θ
1
x
1
+θ
3
x
1
2
+θ
4
x
2
2
)
•
•
•
•
•
Sayθ
T
was[-1,0,0,1,1]thenwesay;
Predictthat"y=1" if
-1+x
1
2
+x
2
2
>=0
or
x
1
2
+x
2
2
>=1
Ifweplotx
1
2
+x
2
2
=1

Costfunctionforlogisticregression
Linearregressionusesthefollowingfunctiontodetermineθ

•Ifweusethisfunctionforlogisticregressionthisisanon-convexfunctionforparameter
optimizationCouldwork!!!
•
•
•
•
•
•
•
•
Whatdowemeanbynonconvex?
Wehavesomefunction-J(θ)-fordeterminingtheparameters
Ourhypothesisfunctionhasanon-linearity(sigmoidfunctionofh
θ
(x))
Thisisacomplicatednon-linearfunction
Ifyoutakeh
θ
(x)andplugitintotheCost()function,andthemplugtheCost()function
intoJ(θ)andplotJ(θ)weﬁndmanylocaloptimum-> nonconvexfunction
Whyisthisaproblem
Lotsoflocalminimameangradientdescentmaynotﬁndtheglobaloptimum-may
getstuckinaglobalminimum
Wewouldlikeaconvexfunctionsoifyourungradientdescentyouconvergetoaglobal
minimum

•
Aconvexlogisticregressioncostfunction
Togetaroundthisweneedadifferent,convexCost()functionwhichmeanswecanapplygradient
descent

Theabovetwofunctionscanbecompressedintoasinglefunctioni.e.

GradientDescent
Nowthequestionarises,howdowereducethecostvalue.Well,thiscanbedonebyusingGradient
Descent.ThemaingoalofGradientdescentistominimizethecostvalue.i.e.minJ( θ).
Nowtominimizeourcostfunctionweneedtorunthegradientdescentfunctiononeachparameter
i.e.

Gradientdescenthasananalogyinwhichwehavetoimagineourselvesatthetopofamountain
valleyandleftstrandedandblindfolded,ourobjectiveistoreachthebottomofthehill.Feeling
theslopeoftheterrainaroundyouiswhateveryonewoulddo.Well,thisactionisanalogousto
calculatingthegradientdescent,andtakingastepisanalogoustooneiterationoftheupdateto
theparameters.

•
•
•
Multiclassclassiﬁcationproblems
Gettinglogisticregressionformulticlassclassiﬁcationusingonevs.all
Multiclass-morethanyesorno(1or0)
Classiﬁcationwithmultipleclassesforassignment

•
•
•
•
•
•
•
•
•
•
•
Givenadatasetwiththreeclasses,howdowegetalearningalgorithmtowork?
Useonevs.allclassiﬁcationmakebinaryclassiﬁcationworkformulticlassclassiﬁcation
Onevs.allclassiﬁcation
Splitthetrainingsetintothreeseparatebinaryclassiﬁcationproblems
i.e.createanewfaketrainingset
Triangle(1)vscrossesandsquares(0)h
θ
1
(x)
P(y=1|x
1
;θ)
Crosses(1)vstriangleandsquare(0)h
θ
2
(x)
P(y=1|x
2
;θ)
Square(1)vscrossesandsquare(0)h
θ
3
(x)
P(y=1|x
3
;θ)
•
•
Trainalogisticregressionclassiﬁerh
θ
(i)
(x)foreachclassitopredictthe
probabilitythaty=i
Onanewinput,xtomakeaprediction,picktheclass ithatmaximizes
theprobabilitythath
θ
(i)
(x)=1

K-NearestNeighbors

Thisalgorithmclassiﬁescasesbasedontheirsimilaritytoothercases.
InK-NearestNeighbors,datapointsthatareneareachotheraresaidtobeneighbors.
K-NearestNeighborsisbasedonthisparadigm.
Similarcaseswiththesameclasslabelsareneareachother.
Thus,thedistancebetweentwocasesisameasureoftheirdissimilarity.
Therearedifferentwaystocalculatethesimilarityorconversely,
thedistanceordissimilarityoftwodatapoints.
Forexample,thiscanbedoneusingEuclideandistance.

-
-
-
-
-
-
theK-NearestNeighborsalgorithmworksasfollows.
pickavalueforK.
calculatethedistancefromthenewcaseholdoutfromeachofthecasesinthedataset.
searchfortheK-observationsinthetrainingdatathatarenearesttothemeasurementsofthe
unknowndatapoint.
predicttheresponseoftheunknowndatapointusingthemostpopularresponsevaluefrom
theK-NearestNeighbors.
Therearetwopartsinthisalgorithmthatmightbeabitconfusing.
First,howtoselectthecorrectK
second,howtocomputethesimilaritybetweencases,
Let'sﬁrststartwiththesecondconcern.

HowtoselectthecorrectK
Asmentioned,KandK-NearestNeighborsis
thenumberofnearestneighborstoexamine.
Itissupposedtobespeciﬁedbytheuser.
So,howdowechoosetherightK?
Assumethatwewanttoﬁndtheclassof
thecustomernotedasquestionmarkonthe
chart.
Whathappensifwechooseaverylowvalueof
K?
Let'ssay,Kequalsone.
Theﬁrstnearestpointwouldbeblue,
whichisclassone.
Thiswouldbeabadprediction,
sincemoreofthepointsarounditaremagenta
orclassfour.

Infact,sinceitsnearestneighborisbluewecansaythatwecapturethenoiseinthedataorwechoseoneofthe
pointsthatwasananomalyinthedata.
AlowvalueofKcausesahighlycomplexmodelaswell,whichmightresultinoverﬁttingofthemodel.
Itmeansthepredictionprocessisnotgeneralizedenoughtobeusedforout-of-samplecases.
Out-of-sampledataisdatathatisoutsideofthedatasetusedtotrainthemodel.
Inotherwords,itcannotbetrustedtobeusedforpredictionofunknownsamples.It'simportanttorememberthat
overﬁttingisbad,aswewantageneralmodelthatworksforanydata,notjustthedatausedfortraining.
Now,ontheoppositesideofthespectrum,ifwechooseaveryhighvalueofKsuchasKequals20,
thenthemodelbecomesoverlygeneralized.

So,howcanweﬁndthebestvalueforK?
Thegeneralsolutionistoreserveapartofyourdatafortestingtheaccuracyofthemodel.
Onceyou'vedoneso,chooseKequalsoneandthenusethetrainingpartformodelingandcalculatethe
accuracyofpredictionusingallsamplesinyourtestset.
RepeatthisprocessincreasingtheKandseewhichKisbestforyourmodel.
Forexample,inourcase,
Kequalsfourwillgiveusthebestaccuracy.

AdvantagesofKNN

1.NoTrainingPeriod:KNNiscalledLazyLearner(Instancebasedlearning).Itdoesnotlearnanything
inthetrainingperiod.Itdoesnotderiveanydiscriminativefunctionfromthetrainingdata.Inother
words,thereisnotrainingperiodforit.Itstoresthetrainingdatasetandlearnsfromitonlyatthetime
ofmakingrealtimepredictions.ThismakestheKNNalgorithmmuchfasterthanotheralgorithmsthat
requiretraininge.g.SVM,LinearRegressionetc.

2.SincetheKNNalgorithmrequiresnotrainingbeforemakingpredictions,newdatacanbeadded
seamlesslywhichwillnotimpacttheaccuracyofthealgorithm.

3.KNNisveryeasytoimplement.ThereareonlytwoparametersrequiredtoimplementKNNi.e.the
valueofKandthedistancefunction(e.g.EuclideanorManhattanetc.)

DisadvantagesofKNN

1.Doesnotworkwellwithlargedataset:Inlargedatasets,thecostofcalculatingthe
distancebetweenthenewpointandeachexistingpointsishugewhichdegradesthe
performanceofthealgorithm.

2.Doesnotworkwellwithhighdimensions:TheKNNalgorithmdoesn'tworkwellwith
highdimensionaldatabecausewithlargenumberofdimensions,itbecomesdiﬃcultfor
thealgorithmtocalculatethedistanceineachdimension.

3.Sensitivetonoisydata,missingvaluesandoutliers:KNNissensitivetonoiseinthe
dataset.Weneedtomanuallyimputemissingvaluesandremoveoutliers.

SUPPORTVECTORMACHINE(SVM)
ASupportVectorMachineisasupervisedalgorithmthatcanclassifycasesbyﬁndingaseparator.
SVMworksbyﬁrstmappingdatatoahighdimensionalfeaturespacesothatdatapointscanbe
categorized,evenwhenthedataarenotlinearlyseparable.
Then,aseparatorisestimatedforthedata.Thedatashouldbetransformedinsuchawaythata
separatorcouldbedrawnasahyperplane.

Therefore,theSVMalgorithmoutputsanoptimalhyperplanethatcategorizesnewexamples.

DATATRANFORMATION
Forthesakeofsimplicity,imaginethatourdatasetisone-dimensionaldata.
Thismeanswehaveonlyonefeaturex.
Asyoucansee,itisnotlinearlyseparable.
Well,wecantransferitintoatwo-dimensionalspace.Forexample,youcanincreasethedimensionofdataby
mappingxintoanewspaceusingafunctionwithoutputsxandxsquared.
Basically,mappingdataintoahigher-dimensionalspaceiscalled,kernelling.
Themathematicalfunctionusedforthetransformationisknownasthekernel
function,andcanbeofdifferenttypes,suchaslinear,polynomial,RadialBasisFunction,orRBF,andsigmoid.

SVMsarebasedontheideaofﬁndinga
hyperplanethatbestdividesadatasetinto
twoclassesasshownhere.
Aswe'reinatwo-dimensionalspace,you
canthinkofthehyperplaneasalinethat
linearlyseparatesthebluepointsfromthe
redpoints.
-
-
-
ADVANTAGES
-Accurateinhighdimensionplace
Memoryeﬃcient
DISADVANTAGES
-Smalldatasets
-Pronetooverﬁtting
APPLICATIONS
ImageRecognition
Spamdetection

NaiveBayes
Classiﬁers
COLLECTIONOFCLASSIFICATIONALGORITHMS

PrincipleofNaiveBayesClassiﬁer:

ANaiveBayesclassiﬁerisaprobabilisticmachinelearningmodelthat’s
usedforclassiﬁcationtask.Thecruxoftheclassiﬁerisbasedonthe
Bayestheorem.
Bayestheoremcanberewrittenas:
Itisnotasinglealgorithmbutafamilyofalgorithmswhereallofthem
shareacommonprinciple,i.e.everypairoffeaturesbeingclassiﬁedis
independentofeachother.

Example:
Letustakeanexampletogetsomebetterintuition.Considertheproblemof
playinggolf.Thedatasetisrepresentedasbelow.

Weclassifywhetherthedayissuitableforplayinggolf,giventhefeatures
oftheday.Thecolumnsrepresentthesefeaturesandtherowsrepresent
individualentries.Ifwetaketheﬁrstrowofthedataset,wecanobserve
thatisnotsuitableforplayinggolfiftheoutlookisrainy,temperatureis
hot,humidityishighanditisnotwindy.Wemaketwoassumptionshere,
oneasstatedaboveweconsiderthatthesepredictorsareindependent.
Thatis,ifthetemperatureishot,itdoesnotnecessarilymeanthatthe
humidityishigh.Anotherassumptionmadehereisthatallthepredictors
haveanequaleffectontheoutcome.Thatis,thedaybeingwindydoes
nothavemoreimportanceindecidingtoplaygolfornot.Accordingtothisexample,Bayestheoremcanberewrittenas:
Thevariableyistheclassvariable(playgolf),whichrepresentsifitis
suitabletoplaygolfornotgiventheconditions.VariableXrepresentthe
parameters/features.

Xisgivenas,
Herex_1,x_2….x_nrepresentthefeatures,i.etheycanbemappedto
outlook,temperature,humidityandwindy.BysubstitutingforXand
expandingusingthechainruleweget,
Now,youcanobtainthevaluesforeachbylookingatthedatasetand
substitutethemintotheequation.Forallentriesinthedataset,the
denominatordoesnotchange,itremainstatic.Therefore,the
denominatorcanberemovedandaproportionalitycanbeintroduced.
Inourcase,theclassvariable(y)hasonlytwooutcomes,yesorno.There
couldbecaseswheretheclassiﬁcationcouldbemultivariate.Therefore,
weneedtoﬁndtheclassywithmaximumprobability.
Usingtheabovefunction,wecanobtaintheclass,giventhepredictors.

WeneedtoﬁndP(x
i|y
j)foreachx
iinXandy
jiny.Allthesecalculationshave
beendemonstratedinthetablesbelow:
So,intheﬁgureabove,wehavecalculatedP(x
i|y
j)foreachx
iinXandy
jiny
manuallyinthetables1-4.Forexample,probabilityofplayinggolfgiventhat
thetemperatureiscool,i.eP(temp.=cool|playgolf=Yes)=3/9.

Also,weneedtoﬁndclassprobabilities(P(y))whichhasbeencalculatedinthetable5.For
example,P(playgolf=Yes)=9/14.
Sonow,wearedonewithourpre-computationsandtheclassiﬁerisready!
Letustestitonanewsetoffeatures(letuscallittoday):

TypesofNaiveBayesClassiﬁer:

MultinomialNaiveBayes:Thisismostlyusedfordocumentclassiﬁcation
problem,i.ewhetheradocumentbelongstothecategoryofsports,
politics,technologyetc.Thefeatures/predictorsusedbytheclassiﬁerare
thefrequencyofthewordspresent.
BernoulliNaiveBayes:Thisissimilartothemultinomialnaivebayesbut
thepredictorsarebooleanvariables.Theparametersthatweuseto
predicttheclassvariabletakeuponlyvaluesyesorno,forexampleifa
wordoccursinthetextornot.
GaussianNaiveBayes:Whenthepredictorstakeupacontinuousvalue
andarenotdiscrete,weassumethatthesevaluesaresampledfroma
gaussiandistribution.

GaussianDistribution(NormalDistribution)
Conclusion:
NaiveBayesalgorithmsaremostlyusedinsentimentanalysis,spamﬁltering,
recommendationsystemsetc.Theyarefastandeasytoimplementbuttheir
biggestdisadvantageisthattherequirementofpredictorstobeindependent.
Inmostofthereallifecases,thepredictorsaredependent,thishindersthe
performanceoftheclassiﬁer.

DecisionTree
CLASSIFICATIONALGORITHM

Decisiontreealgorithmfallsunderthecategoryofsupervisedlearning.Theycanbe
usedtosolvebothregressionandclassiﬁcationproblems..
Decisiontreebuildsclassiﬁcationorregressionmodelsintheformofatree
structure.Itbreaksdownadatasetintosmallerandsmallersubsetswhileatthe
sametimeanassociateddecisiontreeisincrementallydeveloped.Theﬁnalresult
isatreewithdecisionnodesandleafnodes.Adecisionnode(e.g.,Outlook)has
twoormorebranches(e.g.,Sunny,OvercastandRainy).Leafnode(e.g.,Play)
representsaclassiﬁcationordecision.Thetopmostdecisionnodeinatreewhich
correspondstothebestpredictorcalledrootnode.Decisiontreescanhandleboth
categoricalandnumericaldata.
Wecanrepresentanybooleanfunctionondiscreteattributesusingthedecision
tree.
Typesofdecisiontrees
CategoricalVariableDecisionTree:DecisionTreewhichhascategoricaltarget
variablethenitcalledascategoricalvariabledecisiontree.
ContinuousVariableDecisionTree:DecisionTreewhichhascontinuoustarget
variablethenitiscalledasContinuousVariableDecisionTree.

RootNode:Itrepresentsentirepopulationorsampleandthisfurthergetsdividedintotwoor
morehomogeneoussets.
Splitting:Itisaprocessofdividinganodeintotwoormoresub-nodes.
DecisionNode:Whenasub-nodesplitsintofurthersub-nodes,thenitiscalleddecisionnode.
Leaf/TerminalNode:Nodeswithnochildren(nofurthersplit)iscalledLeaforTerminalnode.
Pruning:Whenwereducethesizeofdecision
treesbyremovingnodes(oppositeofSplitting),
theprocessiscalledpruning.
Branch/Sub-Tree:Asubsectionofdecision
treeiscalledbranchorsub-tree.
ParentandChildNode:Anode,whichisdivided
intosub-nodesiscalledparentnodeofsub-nodes
whereassub-nodesarethechildofparentnode.

Algorithm

Algorithmsusedindecisiontrees:
ID3
GiniIndex
Chi-Square
ReductioninVariance
ThecorealgorithmforbuildingdecisiontreesiscalledID3.DevelopedbyJ.
R.Quinlanandituses Entropyand InformationGaintoconstructa
decisiontree.
TheID3algorithmbeginswiththeoriginalsetSastherootnode.Oneach
iterationofthealgorithm,ititeratesthrougheveryunusedattributeofthe
setSandcalculatestheentropyH(S)orinformationgainIG(S)ofthat
attribute.Itthenselectstheattributewhichhasthesmallestentropy(or
largestinformationgain)value.ThesetSisthensplitorpartitionedbythe
selectedattributetoproducesubsetsofthedata

Entropy
Entropyisameasureoftherandomnessintheinformationbeing
processed.Thehighertheentropy,theharderitistodrawanyconclusions
fromthatinformation.Decisiontreealgorithmusesentropytocalculate
thehomogeneityofasample.Ifthesampleiscompletelyhomogeneous
theentropyiszeroandifthesampleisanequallydividedithasentropyof
one.

Example:

Tobuildadecisiontree,weneedtocalculatetwotypesofentropyusing
frequencytablesasfollows:
a)Entropyusingthefrequencytableofoneattribute:

b)Entropyusingthefrequencytableoftwoattributes:

Informationgain
Theinformationgainisbasedonthedecreaseinentropyaftera
datasetissplitonanattribute.Constructingadecisiontreeisallabout
ﬁndingattributethatreturnsthehighestinformationgain(i.e.,themost
homogeneousbranches).
Step1:Calculateentropyofthetarget.

Step2:Thedatasetisthensplitonthedifferentattributes.Theentropy
foreachbranchiscalculated.Thenitisaddedproportionally,togettotal
entropyforthesplit.Theresultingentropyissubtractedfromtheentropy
beforethesplit.TheresultistheInformationGain,ordecreaseinentropy.

Step3:Chooseattributewiththelargestinformationgainasthedecision
node,dividethedatasetbyitsbranchesandrepeatthesameprocesson
everybranch.

Step4a:Abranchwithentropyof0isaleafnode
Step4b:Abranchwithentropymorethan0needsfurthersplitting

Step5:TheID3algorithmisrunrecursivelyonthenon-leafbranches,until
alldataisclassiﬁed.
DecisionTreetoDecisionRules
Adecisiontreecaneasilybetransformedtoasetofrulesbymapping
fromtherootnodetotheleafnodesonebyone.

LimitationstoDecisionTrees
Decisiontreestendtohavehighvariancewhentheyutilizedifferent
trainingandtestsetsofthesamedata,sincetheytendtooverﬁton
trainingdata.Thisleadstopoorperformanceonunseendata.
Unfortunately,thislimitstheusageofdecisiontreesinpredictive
modeling.
Toovercometheseproblemsweuseensemblemethods,wecancreate
modelsthatutilizeunderlying(weak)decisiontreesasafoundationfor
producingpowerfulresultsandthisisdoneinRandomForestAlgorithm

Randomforest

Deﬁnition:
RandomforestalgorithmisasupervisedclassiﬁcationalgorithmBased
onDecisionTrees,alsoknownasrandomdecisionforests,areapopular
ensemblemethodthatcanbeusedtobuildpredictivemodelsforboth
classiﬁcationandregressionproblems.
Ensemblewemean(InRandomForestContext),CollectiveDecisionsof
DifferentDecisionTrees.InRFT(RandomForestTree),wemakea
predictionabouttheclass,notsimplybasedonOneDecisionTrees,butby
an(almost)UnanimousPrediction,madeby'K'DecisionTrees.
Construction:
'K'IndividualDecisionTreesaremadefromgivenDataset,byrandomly
dividingtheDatasetandtheFeatureSubspacebyprocesscalledas
BootstrapAggregation(Bagging),whichisprocessofrandomselection
withreplacement.Generally2/3rdoftheDataset(row-wise2/3rd)is
selectedbybagging,andOnthatSelectedDatasetweperformwhatwe
callisAttributeBagging.

NowAttributeBaggingisdonetoselect'm'featuresfromgivenM
features,(thisProcessisalsocalledRandomSubspaceCreation.)
Generallyvalueof'm'issquare-rootofM.Nowweselectsay,10such
valuesofm,andthenBuild10DecisionTreesbasedonthem,andtestthe
1/3rdremainingDatasetonthese(10DecisionTrees).Wewouldthen
SelecttheBestDecisionTreeoutofthis.AndRepeatthewholeProcess
'K'timesagaintobuildsuch'K'decisiontrees.
Classiﬁcation:
PredictioninRandomForest(acollectionof'K'DecisionTrees)istruly
ensembleie,ForEachDecisionTree,PredicttheclassofInstanceand
thenreturntheclasswhichwaspredictedthemostoften.

UsingRandomForestClassiﬁer
fromsklearn.ensembleimportRandomForestClassiﬁer//Importinglibrary
TRAIN_DIR="../train-mails"
TEST_DIR="../test-mails"
dictionary=make_Dictionary(TRAIN_DIR)
print"readingandprocessingemailsfromﬁle."
features_matrix,labels=extract_features(TRAIN_DIR)
test_feature_matrix,test_labels=extract_features(TEST_DIR)
model=RandomForestClassiﬁer()//Creatingmodel
print"Trainingmodel."
model.ﬁt(features_matrix,labels)//trainingmodel
predicted_labels=model.predict(test_feature_matrix)
print"FINISHEDclassifying.accuracyscore:"
printaccuracy_score(test_labels,predicted_labels)//Predicting
wewillgetaccuracyaround95.7%.

Parameters

Letsunderstandandtrywithsomeofthetuningparameters.
n_estimators:Numberoftreesinforest.Defaultis10.
criterion:“gini”or“entropy”sameasdecisiontreeclassiﬁer.
min_samples_split:minimumnumberofworkingsetsizeat
noderequiredtosplit.Defaultis2.
Playwiththeseparametersbychangingvaluesindividuallyand
incombinationandcheckifyoucanimproveaccuracy.
tryingfollowingcombinationandobtainedtheaccuracyas
showninnextslideimage.

FinalThoughts

RandomForestClassiﬁerbeingensembledalgorithmtendsto
givemoreaccurateresult.Thisisbecauseitworkson
principle,
Numberofweakestimatorswhencombinedformsstrong
estimator.
Evenifoneorfewdecisiontreesarepronetoanoise,overall
resultwouldtendtobecorrect.Evenwithsmallnumberof
estimators=30itgaveushighaccuracyas97%.

Clustering
UNSUPERVISEDLEARNING

Clustering

Aclusterisasubsetofdatawhicharesimilar.
Clustering(alsocalledunsupervisedlearning)istheprocessofdividinga
datasetintogroupssuchthatthemembersofeachgroupareassimilar
(close)aspossibletooneanother,anddifferentgroupsareasdissimilar
(far)aspossiblefromoneanother.
Generally,itisusedasaprocesstoﬁndmeaningfulstructure,generative
features,andgroupingsinherentinasetofexamples.
Clusteringcanuncoverpreviouslyundetectedrelationshipsinadataset.
Therearemanyapplicationsforclusteranalysis.Forexample,inbusiness,
clusteranalysiscanbeusedtodiscoverandcharacterizecustomer
segmentsformarketingpurposesandinbiology,itcanbeusedfor
classiﬁcationofplantsandanimalsgiventheirfeatures.

ClusteringAlgorithms

K-meansAlgorithm
Thesimplestamongunsupervisedlearningalgorithms.Thisworksonthe
principleofk-meansclustering.Thisactuallymeansthattheclustered
groups(clusters)foragivensetofdataarerepresentedbyavariable‘k’.
Foreachcluster,acentroid(arithmeticmeanofallthedatapointsthat
belongtothatcluster)isdeﬁned.
Thecentroidisadatapointpresentatthecentreofeachcluster
(consideringEuclideandistance).Thetrickistodeﬁnethecentroidsfar
awayfromeachothersothatthevariationisless.Afterthis,eachdata
pointintheclusterisassignedtothenearestcentroidsuchthatthesum
ofthesquareddistancebetweenthedatapointsandthecluster’scentroid
isattheminimum.

Algorithm
1.Clustersthedataintokgroupswherekispredeﬁned.
2.kpointsatrandomasclustercenters.
3.AssignobjectstotheirclosestclustercenteraccordingtotheEuclidean
distancefunction.
4.Calculatethecentroidormeanofallobjectsineachcluster.
5.Repeatsteps2,3and4untilthesamepointsareassignedtoeach
clusterinconsecutiverounds.
TheEuclideandistancebetweentwopointsineithertheplaneor3-
dimensionalspacemeasuresthelengthofasegmentconnectingthetwo
points.

Thestepbystepprocess:

K-meansclusteringalgorithmhasfoundtobeveryusefulingroupingnew
data.Somepracticalapplicationswhichusek-meansclusteringare
sensormeasurements,activitymonitoringinamanufacturingprocess,
audiodetectionandimagesegmentation.

DisadvantageOfK-MEANS:
K-Meansformssphericalclustersonly.Thisalgorithmfailswhendatais
notspherical(i.e.samevarianceinalldirections).
K-Meansalgorithmissensitivetowardsoutlier.Outlierscanskewthe
clustersinK-Meansinverylargeextent.
K-Meansalgorithmrequiresonetospecifythenumberofclustersandfor
whichthereisnoglobalmethodtochoosebestvalue.

HierarchicalClusteringAlgorithms
Lastbutnottheleastarethehierarchicalclusteringalgorithms.These
algorithmshaveclusterssortedinanorderbasedonthehierarchyindata
similarityobservations.Hierarchicalclusteringiscategorisedintotwo
types,divisive(top-down)clusteringandagglomerative(bottom-up)
clustering.
AgglomerativeHierarchicalclusteringTechnique:Inthistechnique,
initiallyeachdatapointisconsideredasanindividualcluster.Ateach
iteration,thesimilarclustersmergewithotherclustersuntiloneclusteror
Kclustersareformed.
DivisiveHierarchicalclusteringTechnique:DivisiveHierarchicalclustering
isexactlytheoppositeoftheAgglomerativeHierarchicalclustering.In
DivisiveHierarchicalclustering,weconsiderallthedatapointsasasingle
clusterandineachiteration,weseparatethedatapointsfromthecluster
whicharenotsimilar.Eachdatapointwhichisseparatedisconsideredas
anindividualcluster.
Mostofthehierarchicalalgorithmssuchassinglelinkage,complete
linkage,medianlinkage,Ward’smethod,amongothers,followthe
agglomerativeapproach.

Calculatingthesimilaritybetweentwoclustersisimportanttomergeor
dividetheclusters.Therearecertainapproacheswhichareusedto
calculatethesimilaritybetweentwoclusters:
MIN:Alsoknownassinglelinkagealgorithmcanbedeﬁnedasthe
similarityoftwoclustersC1andC2isequaltotheminimumofthe
similaritybetweenpointsPiandPjsuchthatPibelongstoC1andPj
belongstoC2.
Thisapproachcanseparatenon-ellipticalshapesaslongasthegap
betweentwoclustersisnotsmall.
MINapproachcannotseparateclustersproperlyifthereisnoisebetween
clusters.

MAX:Alsoknownasthecompletelinkagealgorithm,thisisexactly
oppositetotheMINapproach.ThesimilarityoftwoclustersC1andC2is
equaltothemaximumofthesimilaritybetweenpointsPiandPjsuchthat
PibelongstoC1andPjbelongstoC2.
MAXapproachdoeswellinseparatingclustersifthereisnoisebetween
clustersbutMaxapproachtendstobreaklargeclusters.

GroupAverage:Takeallthepairsofpointsandcomputetheirsimilarities
andcalculatetheaverageofthesimilarities.
ThegroupAverageapproachdoeswellinseparatingclustersifthereis
noisebetweenclustersbutitislesspopulartechniqueintherealworld.
LimitationsofHierarchicalclusteringTechnique:
ThereisnomathematicalobjectiveforHierarchicalclustering.
Alltheapproachestocalculatethesimilaritybetweenclustershasitsown
disadvantages.
HighspaceandtimecomplexityforHierarchicalclustering.Hencethis
clusteringalgorithmcannotbeusedwhenwehavehugedata.

Density-basedspatialclusteringofapplicationswithnoise(DBSCAN)isa
well-knowndataclusteringalgorithmthatiscommonlyusedindatamining
andmachinelearning.
UnliketoK-means,DBSCANdoesnotrequiretheusertospecifythe
numberofclusterstobegenerated
DBSCANcanﬁndanyshapeofclusters.Theclusterdoesn’thavetobe
circular.
DBSCANcanidentifyoutliers
Thebasicideabehinddensity-basedclusteringapproachisderivedfroma
humanintuitiveclusteringmethod.bylookingattheﬁgurebelow,onecan
easilyidentifyfourclustersalongwithseveralpointsofnoise,becauseofthe
differencesinthedensityofpoints

DBSCANalgorithmhastwoparameters:
ɛ:Theradiusofourneighborhoodsaroundadatapoint p.
minPts:Theminimumnumberofdatapointswewantinaneighborhoodtodeﬁneacluster.
Usingthesetwoparameters,DBSCANcategoriesthedatapointsintothreecategories:
CorePoints:Adatapoint pisa corepointifNbhd( p, ɛ)[ɛ-neighborhoodof p]containsatleast
minPts;|Nbhd( p, ɛ)|>= minPts.
BorderPoints:Adatapoint*qisa borderpointifNbhd( q, ɛ)containslessthan minPtsdata
points,butqis reachablefromsome corepointp.
Outlier:Adatapoint oisan outlierifitisneitheracorepointnoraborderpoint.Essentially,
thisisthe“other”class.

ThestepstotheDBSCANalgorithmare:
Pickapointatrandomthathasnotbeenassignedtoaclusterorbeen
designatedasan outlier.Computeitsneighborhoodtodetermineifit’sa
corepoint.Ifyes,startaclusteraroundthispoint.Ifno,labelthepointas
anoutlier.
Onceweﬁnda corepointandthusacluster,expandtheclusterbyadding
alldirectly-reachablepointstothecluster.Perform“neighborhoodjumps”
toﬁndalldensity-reachablepointsandaddthemtothecluster.Ifanan
outlierisadded,changethatpoint’sstatusfrom outlierto borderpoint.
Repeatthesetwostepsuntilallpointsareeitherassignedtoaclusteror
designatedasan outlier.
.

BelowistheDBSCANclusteringalgorithminpseudocode:
DBSCAN(dataset,eps,MinPts){
#clusterindex
C=1
foreachunvisitedpointpindataset{
markpasvisited
#ﬁndneighbors
NeighborsN=ﬁndtheneighboringpointsofp
if|N|>=MinPts:
N=NUN'
ifp'isnotamemberofanycluster:
addp'toclusterC
}

1.
2.
3.
CROSSVALIDATION:
Cross-validationisatechniqueinwhichwetrainourmodelusingthesubsetofthe
data-setandthenevaluateusingthecomplementarysubsetofthedata-set.
Thethreestepsinvolvedincross-validationareasfollows:
Splitdatasetintotrainingandtestset
Usingthetrainingsettrainthemodel.
Testthemodelusingthetestset
USE:Togetgoodoutofsampleaccuracy
Eventhoughweusecrossvalidationtechniquewegetvariationinaccuracywhen
wetrainourmodelforthatweuseK-foldcrossvalidationtechnique

InK-foldcrossvalidation,wesplitthedata-setintoknumberof
subsets(knownasfolds)thenweperformtrainingontheallthe
subsetsbutleaveone(k-1)subsetfortheevaluationofthe
trainedmodel.Inthismethod,weiteratektimeswithadifferent
subsetreservedfortestingpurposeeachtime.

•
•
Codeinpythonfork-crossvalidation:
fromsklearn.model_selectionimportcross_val_score
List=cross_val_score(estimator=#nameofyourmodelobject,
X=#yourtrainedinput,y=#correctoutput,cv=#numberoffolds(k))
Firstlineisimportingkfoldcrossvalidationfunctionfrom
model_selectionsublibraryfromsklearnlibrary
secondlinewillgivealistofaccuraciesbasedonkvalue.we
needtoaveragethemtogetthemodelaccurateaccuracy

1.
Howtochoosetheoptimalvaluesforthehyperparameters?
Hyperparameters,aretheparametersthatcannotbedirectly
learnedfromtheregulartrainingprocess.Theyareusuallyﬁxed
beforetheactualtrainingprocessbegins.Theseparameters
expressimportantpropertiesofthemodelsuchasitscomplexityor
howfastitshouldlearn.
Examples:
Thekink-nearestneighbours.
Modelscanhavemanyhyperparametersandﬁndingthebest
combinationofparameterscanbetreatedasasearchproblem.
OneofthebeststrategiesforHyperparametertuningisgrid_search.

GridSearchCV:
InGridSearchCVapproach,machinelearningmodelisevaluatedforarange
ofhyperparametervalues.ThisapproachiscalledGridSearchCV,itsearches
forbestsetofhyperparametersfromagridofhyperparametersvalues.
Forexample,ifwewanttosethyperparameterK-nearestneighboursmodel,
withdifferentsetofvalues.Thegridsearchtechniquewillcheckmodelwith
allpossiblecombinationsofhyperparameters,andwillreturnthebestone.
Fromgraphwecansaybest
valueforkis10andgrid
searchwillsearchallthe
valuesofkthatwegivenin
rangeandreturnthebest
one

Codeinpythonforgettingoptimalhyperparameterusing
gridsearchforsupportvectormachine:
#importingsvmfromsvclibrary
fromsklearn.svmimportSVC
Classiﬁer=SVC()
#Toimportgridsearcvclassfromsklearnlibrary
fromsklearn.model_selectionimportGridSearchCV
#creatingalistofdictonartiesthatneedtobeinputedforgridsearch
parameters=[{'C':[1,10,100,1000],'kernel':['linear']},{'C':[1,10,100,1000],'kernel':['rbf'
],'gamma':[0.5,0.1,0.01,0.001]}]
#creatinggridsearchobject
gridsearch=GridSearchCV(estimator=classiﬁer,param_grid=parameters,
scoring='accuracy',cv=10,n_jobs=-1)
#ﬁttinggridsearchwithdataset
gd=gridsearch.ﬁt(X_train,y_train)

#ﬁttinggridsearchwithdataset
gd=gridsearch.ﬁt(X_train,y_train)
#bestscoreamongallmodelsingridsearch
bestaccuracy=gd.best_score_
#returntheparametersofbestmodel
best_param=gd.best_params_

XGBOOST
XGBoostisanimplementationofgradientboosteddecisiontreesdesignedforspeed
andperformance
Inthisalgorithm,decisiontrees
arecreatedinsequentialform.
Weightsplayanimportantrolein
XGBoost.Weightsareassigned
toalltheindependentvariables
whicharethenfedintothe
decisiontreewhichpredicts
results.

Weightofvariablespredictedwrongbythetreeisincreasedand
thesethevariablesarethenfedtotheseconddecisiontree.
Theseindividualclassiﬁers/predictorsthenensembletogivea
strongandmoreprecisemodel.Itcanworkonregression,
classiﬁcation,ranking,anduser-deﬁnedpredictionproblems.
Finalvalueoftheclassiﬁeristheclasswhichisrepeatedmore
numberoftimesamongallclassiﬁerforclassiﬁcationproblem
andforregressionproblemﬁnalvalueistheaverageofallthe
regressorvaluesgotinsequentialtrees

CodeinpythonforXGBOOST
#FittingXGBoosttothetrainingdataforclassiﬁer
Importxgboostasxgb
my_model=xgb.XGBClassiﬁer()
my_model.ﬁt(X_train,y_train)
#predictingthetestsetresults
Y_pred=my_model.predict(X_test)

Machine Learning deep learning artificial

About This Presentation

Slide Content

Tags

Categories

Download

Quick Actions

Statistics

Related Slideshows

Machine Learning deep learning artificial

About This Presentation

Slide Content

Slide 1

Slide 2

Slide 3

Slide 4

Slide 5

Slide 6

Slide 7

Slide 8

Slide 9

Slide 10

Slide 11

Slide 12

Slide 13

Slide 14

Slide 15

Slide 16

Slide 17

Slide 18

Slide 19

Slide 21

Slide 22

Slide 23

Slide 24

Slide 25

Slide 26

Slide 27

Slide 28

Slide 29

Slide 30

Slide 31

Slide 32

Slide 33

Slide 34

Slide 35

Slide 36

Slide 37

Slide 38

Slide 39

Slide 40

Slide 41

Slide 42

Slide 43

Slide 44

Slide 45

Slide 46

Slide 47

Slide 48

Slide 50

Slide 51

Slide 52

Slide 53

Slide 54

Slide 56

Slide 57

Slide 58

Slide 59

Slide 60

Slide 61

Slide 62

Slide 63

Slide 64

Slide 65

Slide 66

Slide 67

Slide 68

Slide 69

Slide 70

Slide 71

Slide 72

Slide 73

Slide 74

Slide 75

Slide 76

Slide 77

Slide 78

Slide 79

Slide 80