MLT Unit4.pdffdhngnrfgrgrfflmbpmpphfhbomf

1052LaxmanrajS 16 views 53 slides Mar 06, 2025
Slide 1
Slide 1 of 53
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27
Slide 28
28
Slide 29
29
Slide 30
30
Slide 31
31
Slide 32
32
Slide 33
33
Slide 34
34
Slide 35
35
Slide 36
36
Slide 37
37
Slide 38
38
Slide 39
39
Slide 40
40
Slide 41
41
Slide 42
42
Slide 43
43
Slide 44
44
Slide 45
45
Slide 46
46
Slide 47
47
Slide 48
48
Slide 49
49
Slide 50
50
Slide 51
51
Slide 52
52
Slide 53
53

About This Presentation

dsalmfeal;fmelgmrwplknvrbfrobebee


Slide Content

UnsupervisedLearning
Clustering–K-means–K-mode-K-median–Hierarchicalclustering–
DBSCAN–PrincipalComponentAnalysis–IndependentComponent
Analysis

ADVANCEDLEARNING
Sampling–Basicsamplingmethods–MonteCarlo–GibbsSampling–ComputationalLearning
theory–Reinforcementlearning–MarkovDecisionProcesses.

INTRODUCTION-
WhatIsClustering?
•Clusteringistheclassificationofobjectsintodifferentgroups,ormore
precisely,thepartitioningofadatasetintosubsets(clusters),sothatthe
dataineachsubset(ideally)sharesomecommontrait-oftenaccordingto
somedefineddistancemeasure.
•Intra-ClusterSimilarity
•Pointswithinthesameclusterarehighlysimilar.
•Inter-ClusterDissimilarity
•Pointsfromdifferentclustersaresignificantlydifferent.
•Centroid
•Thecenterofacluster,oftenrepresentingthe"average"ofallpointsinthecluster

TypesOfClustering
1.Connectivity-basedclustering:Thistypeofclustering
algorithmbuildstheclusterbasedontheconnectivitybetween
thedatapoints.
2.Centroid-basedclustering:Thistypeofclusteringalgorithm
formsaroundthecentroidsofthedatapoints.
3.Distribution-basedclustering:Thistypeofclustering
algorithmismodeledusingstatisticaldistributions.Itassumes
thatthedatapointsinaclusteraregeneratedfromaparticular
probabilitydistribution,andthealgorithmaimstoestimatethe
parametersofthedistributiontogroupsimilardatapointsinto
clusters
4.Density-basedclustering:Thistypeofclusteringalgorithm
groupstogetherdatapointsthatareinhigh-density
concentrationsandseparatespointsinlow-concentrations
regions.Thebasicideaisthatitidentifiesregionsinthedata
spacethathaveahighdensityofdatapointsandgroupsthose
pointstogetherintoclusters.

K-MEANSCLUSTERING
Kmeansalgorithm–unsupervisedalgorithmforpartitioning
datasetsinto‘k’groupsbasedonthesimilarityoffeature,where
k<n.
Datapointswithinthesamegroup(cluster)aremoresimilarto
eachotherthantothoseinothergroups.
Doesn’trequirelabeleddata.
Centroidisthecentralpointofaclusterwhichiscalculatedas
themeanofallpointsinthecluster.
K-meansaimstominimizethesum-of-squarederrorswithin
clusters:
wherexnisavectorrepresentingthen
th
datapointandujisthe
geometriccentroidofthedatapointsinSj.
Kisthenumberofclusters
Sjisthesetofpointsincluserj

CommonDistancemeasures
willdeterminehowthe oftwo
elementsiscalculatedanditwillinfluencetheshapeofthe
clusters.
Theyinclude:
1.TheEuclideandistance(alsocalled2-normdistance)isgiven
by:
2.TheManhattandistance(alsocalledtaxicabnormor1-norm)
isgivenby:
3.Minkowskidistance
4.CosineSimilarity

Stepsofk-meansalgorithm
•Initialization:
•Choosethenumberofclusters,K.
•RandomlyselectKinitialcentroids(eitherrandomlychosendatapointsorrandomlygeneratedpo
•AssignmentStep:
•Assigneachdatapointtothenearestcentroidusingadistancemetric(usuallyEuclideandistanc
•ThisformsKclusters.
•UpdateStep:
•Calculatethenewcentroidforeachclusterbyfindingthemeanofalldatapointsassignedtothat
•ConvergenceCheck:
•Repeattheassignmentandupdatestepsuntil:
•Thecentroidsnolongerchangesignificantly,or
•Amaximumnumberofiterationsisreached.

Advantagesofk-means
•Simple&Fast:Easytoimplementandcomputationally
efficient,especiallyforlargedatasets.
•Scalable:Workswellwithhigh-dimensionaldata.
•Flexible:Canbeadaptedwithdifferentdistancemeasures.

Limitationsofk-means
•RequiresK:Youneedtospecifythenumberofclustersinadvance.
•SensitivetoInitialization:Poorinitializationcanleadtosuboptimal
solutions(solvedbyK-means++initialization).
•AssumesSphericalClusters:Worksbestwhenclustersareof
similarsizeanddensity.
•OutliersImpact:Sensitivetooutlierswhichcandistortthecentroid
positions.

Applicationsofk-means
•ImageCompression:Reducingthenumberofcolorsinanimage.
•CustomerSegmentation:Groupingcustomersbasedonpurchasing
behavior.
•AnomalyDetection:Identifyingoutliersindata.
•DocumentClustering:Organizingdocumentswithsimilarcontent.

K-meansexampledataset
Point X Y
A 1 2
B 1 4
C 1 0
D 10 2
E 10 4
F 10 0
Thefollowingisa2Ddatasetwith6datapointsrepresenting(x,y)coordinates

•Step1:ChoosetheNumberofClusters(K)
•Let’ssetK=2(Expectingtwoclustersbasedonthedata
distribution).
•Step2:InitializeCentroids
•Randomlyselecttwopointsasinitialcentroids:
•Centroid1(C1)=A(1,2)
•Centroid2(C2)=D(10,2)

•AssignPointstotheNearestCentroid
UsingtheEuclideandistanceformula:
•d=sqrt{(x2-x1)^2+(y2-y1)^2}
•DistanceCalculations:
•PointA(1,2):
•ToC1(1,2):d=0(samepoint)
•ToC2(10,2):sqrt{(10-1)^2+(2-2)^2}=9
•PointB(1,4):
•ToC1:d=sqrt{(1-1)^2+(4-2)^2}=2
•ToC2:d=sqrt{(10-1)^2+(4-2)^2} ≈9.22

•PointC(1,0):
•ToC1:sqrt{(1-1)^2+(0-2)^2}=2
•ToC2:sqrt{(10-1)^2+(0-2)^2} ≈9.22
•PointD(10,2):
•ToC1:d=9
•ToC2:d=0
•PointE(10,4):
•ToC1:sqrt{(10-1)^2+(4-2)^2} ≈9.22
•ToC2:sqrt{(10-10)^2+(4-2)^2}=2
•PointF(10,0):
•ToC1:sqrt{(10-1)^2+(0-2)^2} ≈9.22
•ToC2:sqrt{(10-10)^2+(0-2)^2} =2

•ClusterAssignments(Iteration1):
•Cluster1(C1):A(1,2),B(1,4),C(1,0)
•Cluster2(C2):D(10,2),E(10,4),F(10,0)
•Step4:UpdateCentroids
•Calculatethemeanpositionforeachcluster.
•NewC1:
•(1+1+1/3,2+4+0/3)=(1,2)//Nochangefromtheinitialcentroid
•NewC2:
•(10+10+10/3,2+4+0/3)=(10,2)//Nochangefromtheinitialcentroid
•Step5:CheckforConvergence
•Sincethecentroidsdidn’tchange,thealgorithmconverges,andclusteringiscomplete.
•FinalClusters:
•Cluster1:A(1,2),B(1,4),C(1,0)
•Cluster2:D(10,2),E(10,4),F(10,0)

K-ModeClustering
K-modeclusteringisanunsupervisedmachine-learningtechniqueusedtogroupaset
ofdataobjectsintoaspecifiednumberofclusters,basedontheircategoricalattributes.The
algorithmiscalled“K-Mode”becauseitusesmodes(i.e.themostfrequentvalues)insteadof
meansormedianstorepresenttheclusters.
WhyK-ModeClustering?
InK-meansclusteringwhenweusedcategoricaldataafterconvertingitintoa
numericalform.itdoesn’tgiveagoodresultforhigh-dimensionaldata.
So,Somechangesaremadeforcategoricaldatat.
•ReplaceEuclideandistancewithDissimilaritymetric
•ReplaceMeanbyModeforclustercenters.
•Applyafrequency-basedmethodineachiterationtoupdatethemode.
AndthenthisiscalledK-MODEClusteringbecauseofMODE.

K-ModesvsK-Means
Aspect K-Means K-Modes
TypeofData Workswithcontinuous
(numerical)data.
Designedforcategoricaldata.
MeasureofSimilarity Usesdistancemeasures(e.g.,
Euclideandistance).
Usesdissimilaritymeasures
(totalmismatches).
SimilarityInterpretation Thelesserthedistance,themore
similarthedatapointsare.
Thelesserthedissimilarity
(mismatches),themoresimilar
thedatapointsare.
CentroidUpdateMechanismCentroidsareupdatedusingthe
meanofthedatapoints.
Centroidsareupdatedusing
themodeofthedatapoints.
ApplicationUseCase Suitableforclusteringnumerical
data,suchasheights,weights,
etc.
Suitableforclustering
categoricaldata,suchascolors
brands,orgenres.

DissimilarityMeasurementsBetweenTwoDataObjects
•TheK-modesalgorithmclusterscategoricaldatabypartitioninga
datasetintoapredefinednumberofclusters.
•Eachclusterisdefinedbyitsmode,whichrepresentsthemost
frequentcategoricalvalueinthatcluster.
•Thealgorithmcalculatesthedistancebetweendataobjectsusing
similarityanddissimilaritymeasurements.
•ForK-modes,theHammingdistanceisusedasthedissimilarity
measure.
•TheHammingdistancedeterminesthenumberofdiffering
categoricalattributesbetweentwodataobjects.
•Letxandybetwocategoricaldataobjectsdefinedbymfeatures
orattributes.
Where,

HowDoestheK-ModesAlgorithmWork?
UnlikeHierarchical
clusteringmethods,weneedto
upfrontspecifytheK.
•PickKobservationsatrandom
andusethemasleaders/clusters
•Calculatethedissimilaritiesand
assigneachobservationtoitsclosest
cluster
•Definenewmodesforthe
clusters
•Repeat2–3stepsuntilthereare
isnore-assignmentrequired

Example:Imaginewehaveadatasetthathastheinformationabouthaircolor,
eyecolor,andskincolorofpersons.Weaimtogroupthembasedonthe
availableinformation.
Haircolor,eyecolor,andskincolorareallcategoricalvariables.Belowis
howdatasetlookslike.
Numberofclusters(K)=3

Step1:

Step2:
Iterativelycomparetheclusterdatapointstoeachoftheobservations.Similardata
pointsgive0,dissimilardatapointsgive1.

Comparingleader/ClusterP1totheobservationP1gives0dissimilarities.
Comparingleader/clusterP1totheobservationP2gives3(1+1+1)dissimilarities.

Likewise,calculateallthedissimilaritiesandputtheminamatrixasshownbelowand
assigntheobservationstotheirclosestcluster(clusterthathastheleastdissimilarity)
Afterstep2,theobservationsP1,P2,P5areassignedtocluster1;P3,P7areassignedto
Cluster2;andP4,P6,P8areassignedtocluster3.

Step3:
Modeissimplythe
Marktheobservationsaccordingtotheclustertheybelongto.ObservationsofCluster1are
markedinYellow,Cluster2aremarkedinBrickred,andCluster3aremarkedinPurple.

Consideringoneclusteratatime,foreachfeature,lookfortheModeandupdatethe
newleaders.
Explanation:Cluster1observations(P1,P2,P5)hasbrunetteasthemostobservedhair
color,amberasthemostobservedeyecolor,andfairasthemostobservedskincolor.
Note:Ifweobservethesameoccurrenceofvalues,takethemoderandomly.Inthis
case,theobservationsofCluster3(P3,P7)haveoneoccurrenceofbrown,fairskin
color.Sochosebrownasthemode.
Belowarenewleadersaftertheupdate.
Repeatsteps2–4

Afterobtainingthenewleaders,againcalculatethedissimilaritiesbetweenthe
observationsandthenewlyobtainedleaders.

ComparingCluster1totheobservationP1gives1dissimilarity.
ComparingCluster1totheobservationP2gives2dissimilarities.

Likewise,calculateallthedissimilaritiesandputtheminamatrix.Assigneach
observationtoitsclosestcluster.
TheobservationsP1,P2,P5areassignedtoCluster1;P3,P7areassignedtoCluster2;
andP4,P6,P8areassignedtoCluster3.
Sincetherearenochangesintheassignmentofobservations,wecanstophere.

HierarchicalClustering

WhatisHierarchicalClustering?
•Hierarchicalclusteringisaconnectivity-basedclusteringmodel
usedtogroupdatapointsbasedontheirsimilarityordistance.
•Themodelassumesthatdatapointsclosertoeachotherare
moresimilarcomparedtothosefartherapart.
•Theclusteringprocessoperatesontheprincipleofproximity,
groupingpointsintoclustersinastep-by-stepmanner.

WhatistheDistanceMeasure?
Distancemeasuredeterminesthesimilaritybetweentwo
elementsanditinfluencestheshapeoftheclusters.
Someofthewayswecancalculatedistancemeasuresinclude:
•Euclideandistancemeasure
•SquaredEuclideandistancemeasure
•Manhattandistancemeasure
•Cosinedistancemeasure

EuclideanDistanceMeasure
•Themostcommonmethodtocalculate
distancemeasuresistodeterminethe
distancebetweenthetwopoints.Let’s
saywehaveapointPandpointQ:the
.
•Asthisisthesumofmorethantwo
dimensions,wecalculatethedistance
betweeneachofthedifferent
dimensionssquaredandthentakethe
squarerootofthattogettheactual
distancebetweenthem.
Theformulafordistancebetween
twopointsisshownbelow:

SquaredEuclideanDistanceMeasurement
ThisisidenticaltotheEuclideanmeasurement
method;exceptwedon'ttakethesquarerootatthe
end.
Dependingonwhetherthepointsarefartherapartor
closertogether,thenthedifferenceindistancescanbe
computedfasterbyusingsquaredEuclideandistance
measurement.
Whilethismethodgivesustheexactdistance,itwon't
makeadifferencewhencalculatingwhichissmaller
andwhichislarger.Removingthesquarerootcan
makethecomputationfaster.
Theformulaisshownbelow:

ManhattanDistanceMeasurement
Thismethodisasimplesumofhorizontalandvertical
componentsorthedistancebetweentwopointsmeasured
alongaxesatrig
Thismethodisdifferentbecausewe'renotlookingatthe
directline,andincertaincases,theindividualdistances
measuredwillgiveabetterresult.
Sinceitisfaster,Euclideansquaredmethodisusedmore
often.ButwhenusingtheManhattandistance,youmeasure
eithertheXdifferenceortheYdifferenceandtakethe
absolutevalueofit.htangles.
Theformulaisshownbelow:

CosineDistanceMeasure
Thecosinedistancesimilaritymeasurestheangle
betweenthetwovectors.
Asthetwovectorsseparate,thecosinedistance
becomesgreater.Thismethodissimilartothe
Euclideandistancemeasure,andyoucanexpectto
getsimilarresultswithbothofthem.
Note:TheManhattanmeasurementmethodwill
produceaverydifferentresult.Itcanendupwith
biasifthedataisveryskewedorifbothsetsof
valueshaveadramaticsizedifference.
Theformulaisshownbelow:

TheRoleoftheDendrogram
•Adendrogramisatree-likediagramproducedbyhierarchical
clustering.
•Bottomofthedendrogram:Representsindividualdata
points.
•Topofthedendrogram:Representsthelargestcluster,
whichincludesalldatapoints.
•Thedendrogramshowsthehierarchicalrelationshipsbetween
clusters.
•Byslicingthedendrogramatvariousheights,different
numbersofclusterscanbeobtained.

HowtheDendrogramisConstructed?
•IterativeProcess:
•Clustersareeithermerged(agglomerativeclustering)orsplit(divisive
clustering)basedontheirsimilarityordistance.
•Thisisrepeateduntil:
•Alldatapointsformasinglecluster.
•Or,thepredefinednumberofclustersisachieved.
•SimilarityorDistanceMeasures:
•CommonmeasuresincludeEuclideandistance,Manhattandistance,or
cosinesimilarity.
•Thesemeasuresdeterminehowcloselyrelatedthedatapointsorclusters
are.

TypesofHierarchicalClustering
•Agglomerative(Bottom-Up):
•Startswitheachdatapointasitsown
cluster.
•Graduallymergestheclosestclusters
untilonelargeclusterisformed.
•Divisive(Top-Down):
•Startswithalldatapointsinasingle
cluster.
•Recursivelysplitsclustersintosmaller
clustersbasedondissimilarity.

HierarchicalAgglomerativeClustering
•HierarchicalAgglomerativeClusteringisalsoknownasthe
•Astructurethatismoreinformativethantheunstructuredsetofclusters
returnedbyflatclustering.
•Thisclusteringalgorithmdoesnotrequireustoprespecifythenumberof
clusters.
•Bottom-upalgorithmstreateachdataasasingletonclusterattheoutset
andthensuccessivelyagglomeratepairsofclustersuntilallclustershave
beenmergedintoasingleclusterthatcontainsalldata.

AgglomerativeClusteringSteps
1.Computethedissimilaritymatrixbyusingaparticular
distancemetric.
2.Assigneachdatapointtoacluster.
3.Mergetheclustersbasedonalinkagecriterionforthe
similaritybetweenclusters.
4.Updatethedistancematrix.
5.Repeatsteps3and4untilasingleclusterremainsoranystop
criterionismet.

AgglomerativeClusteringSteps
Algorithm:
givenadataset(d1,d2,d3,....dN)ofsizeN
#computethedistancematrix
fori=1toN:
#asthedistancematrixissymmetricabout
#theprimarydiagonalsowecomputeonlylower
#partoftheprimarydiagonal
forj=1toi:
dis_mat[i][j]=distance[di,dj]
eachdatapointisasingletoncluster
repeat
mergethetwoclusterhavingminimumdistance
updatethedistancematrix
untilonlyasingleclusterremains

HierarchicalAgglomerativeClustering
1.Considereachletterasitsownclusterandcalculate
thedistancebetweeneachcluster.
2.Inthesecondstep,similarclustersaremerged
together.
3.Forexample,clusters(B)and(C)aresimilar,sotheyare
merged.Thesamehappenswithclusters(D)and(E).
4.Afterthis,wehavetheclusters[(A),(BC),(DE),(F)].
5.Werecalculatethedistancesandmergetheclosest
clusters,whichare[(DE)and(F)],tocreate[(A),(BC),
(DEF)].
6.Repeatingthisprocess,clusters(DEF)and(BC)are
mergedinto[(A),(BCDEF)].
7.Finally,thelasttwoclustersaremergedintoone:
[(ABCDEF)].

HierarchicalDivisiveclustering
•Itisalsoknownasatop-downapproach.
•Thisalgorithmalsodoesnotrequiretoprespecifythenumberof
clusters.
•Top-downclusteringrequiresamethodforsplittingaclusterthat
containsthewholedataandproceedsbysplittingclustersrecursively
untilindividualdatahavebeensplitintosingletonclusters.

HierarchicalDivisiveClusteringSteps
1.StartwithalldatapointsforadatasetsizeN(d1,d2,d3...
dN)inonecluster.
2.Splittheclusterintotwodissimilarorheterogeneousclusters
byusingaflatclusteringmethodsuchasthek-means
algorithm.
3.Repeatstep2,choosingthebestclustertosplitand
removingtheoutliersfromtheleastcohesiveclusterafter
eachiteration.
4.Stopwheneachexampleisinitsownsinglecluster,
otherwiserepeatstep3.

HierarchicalDivisiveClustering
Algorithm
1.Givenadataset(d1,d2,d3,....dN)ofsizeN
2.Atthetopwehavealldatainonecluster
3.Theclusterissplitusingaflatclusteringmethodeg.K-
Meansetc
4.Repeat
5.Choosethebestclusteramongalltheclusterstosplit
6.Splitthatclusterbytheflatclusteringalgorithm
7.Untileachdataisinitsownsingletoncluster

ComputingDistanceMatrix
Whilemergingtwoclusterswecheckthedistancebetweentwoeverypairofclustersand
mergethepairwiththeleastdistance/mostsimilarity.Butthequestionishowisthat
distancedetermined.TherearedifferentwaysofdefiningInterClusterdistance/similarity.
Someofthemare:
•MinDistance:Findtheminimumdistancebetweenanytwopointsofthecluster.
•MaxDistance:Findthemaximumdistancebetweenanytwopointsofthecluster.
•GroupAverage:Findtheaveragedistancebetweeneverytwopointsoftheclusters.
•Ward’sMethod:Thesimilarityoftwoclustersisbasedontheincreaseinsquarederror
whentwoclustersaremerged.

ComputingDistanceMatrix
Forexample,ifwegroupagivendatausingdifferentmethods,wemaygetdifferentresults:

ImplementationsCodeandOutput
importnumpyasnp
fromscipy.cluster.hierarchyimportdendrogram,linkage
importmatplotlib.pyplotasplt
#randomlychosendataset
X=np.array([[1,2],[1,4],[1,0],[4,2],[4,4],[4,0]])
#Performhierarchicalclustering
Z=linkage(X,'ward')
#Plotdendrogram
dendrogram(Z)
plt.title('HierarchicalClusteringDendrogram')
plt.xlabel('Datapoint')
plt.ylabel('Distance')
plt.show()

HierarchicalAgglomerativevsDivisive
Clustering
Aspect AgglomerativeClustering DivisiveClustering
Simpler,startswithalldatapointsandmergesthem.Morecomplex,asitrequiresaflatclustering
methodasasubroutineforsplittingclusters.
TimecomplexityisO(n³)withoutoptimizations. Moreefficientifacompletehierarchyisnot
generated.
O(n³)fornaiveapproach,O(n²logn)withpriority
queues,canbereducedtoO(n²)withoptimizations.
Linear(O(n))withefficientalgorithmslike
K-Meanswhenfixednumberoflevelsareused.
Lessaccurate,asdecisionsarebasedonlocal
patternsornearbypoints,andearlydecisionscannot
bechanged.
Moreaccurateasitconsiderstheglobal
distributionofdatafortop-levelpartitioning.
Makesdecisionsbasedonlocalpatterns,without
consideringtheoveralldistributionofthedataatfirst.
Makestop-leveldecisionsconsideringtheglobal
datadistribution.
Reliesheavilyonlocalneighborpointstomake
decisions.
Doesn'trelyonneighboringpointsinthesame
wayasagglomerativeclustering.

K-meansvsHierarchicalClustering
Aspect
K-meansClustering
HierarchicalClustering
NumberofClustersPre-definedandspecifiedby“K”. Thenumberofclusterscanrangefromone
tothetotalnumberofdataobservations.
DatasetSize Moreeffectiveforlargerdatasets. Bettersuitedforsmallerdatasets.
ClusterShape Workswellforspheroidal(round)clusters. Lesseffectiveforspheroidalclusters.
Scalability Highlyscalableforlargedatasets. Limitedscalabilityduetohigher
computationalcomplexity.
AlgorithmType Partitionalclustering(dividesthedataintoK
groups).
Hierarchicalclustering(createsatree-like
structure).

References
•ParthShukla,
https://www.analyticsvidhya.com/blog/2022/11/hierarchical-clustering-in-mach
ine-learning/
•GbeminiyiJohnOyewole,etal ,
Volume56,pages6439–6475,(2023)
•“ ”,
https://geeksforgeeks.org/hierarchical-clustering/?ref=ml_lbp

https://www.ibm.com/think/topics/hierarchical-clustering
Tags