Machine Learning for Indoor Localization: Regression

dwijokosuroso 0 views 36 slides Oct 20, 2025
Slide 1
Slide 1 of 36
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27
Slide 28
28
Slide 29
29
Slide 30
30
Slide 31
31
Slide 32
32
Slide 33
33
Slide 34
34
Slide 35
35
Slide 36
36

About This Presentation

Indoor localization using machine learning regressor


Slide Content

Machine Learning for Indoor
Localization: Regression
DwiJoko Suroso
School of Engineering
King Mongkut’s Institute of Technology Ladkrabang
16 November 2021

Machine Learning Project
2
Project Scope
•Individual Project
•Regression or Classification
•A dataset has more than 5 features and 1,000 samples
•Adopt more than 4machine learning techniques to compare
Project Scoring [50 pts]
•Motivation, Objectives, and Literature Review[5 pts]
•Dataset Explanation, Feature Selection, and Data Pre-processing [10 pts]
•Proper Machine Learning Techniques with Parameter Tuning[15 pts]
•Proper Evaluation Methods [10 pts]
•Outstanding Output[10 pts]

Table of Contents
3
1.Machine Learning Workflow
2.Machine Learning Taxonomy, Supervised and Unsupervised
Learning
3.Machine Learning Training and Testing
4.Machine Learning Evaluation: in Regression, in Classification
5.MachineLearningforIndoor Localization: Review and
Challenge
6.ProjectDescription and Results
7.Conclusions

Data Evaluation
Testing
Dataset
Production
Data
ModelAlgorithm
https://towardsdatascience.com/workflow-of-a-machine-learning-project-ec1dba419b94
Machine Learning Workflow
4
Training
Dataset
Prediction
We can define the machine learning workflow in 3 stages.
1)Gathering data
2)Data pre-processing
3)Researching the model that will be best for the type of data
4)Training and testing the model
5)Evaluation

Machine Learning
Supervised LearningUnsupervised LearningSemi-supervised LearningReinforcement Learning
RegressionClassificationClusteringDimensionality ReductionQ-learning
k-meansHierarchicalFuzzy-c-meansSVDICAPCA
Decision TreesRandom ForestANNDeep LearningBayesianSVMk-NN
D. Praveen Kumar, T. Amgoth, and C. S. R. Annavarapu, “Machine learning algorithms for wireless sensor
networks: A survey,” Information Fusion, vol. 49, pp. 1–25, 2019.
Machine Learning Taxonomy
5

Supervised Learning
Decision TreesRandom ForestANNDeep LearningBayesianSVMk-NN
D. Praveen Kumar, T. Amgoth, and C. S. R. Annavarapu, “Machine learning algorithms for wireless sensor networks:
A survey,” Information Fusion, vol. 49, pp. 1–25, 2019.
Machine Learning: Supervised Learning
6
Consist of Decision
Trees, each tree
gives classification.
The output is the
mode (classification
case), and or mean
for prediction
(regression)
Back propagation.
Mimic human brain
of neurons. Consists
of input, output, and
hidden layers.
Robust to noise.
Form learning trees from
classification or regression
problems. Split the training
data into labels. The new
data is predicted by
iterating it to learning tree
Works on margin
calculator, plot data
items to be different
(maximum) to
hyperplane
Simple and effective. Classify the data
in feature space according to distance.
A. Nessa, B. Adhikari, F. Hussain and X. N. Fernando, "A Survey of Machine Learning for Indoor Positioning," inIEEE
Access, vol. 8, pp. 214945-214965, 2020, doi: 10.1109/ACCESS.2020.3039271.
Branch of ML
that is based
on ANN
concept.

Unsupervised Learning
ClusteringDimensionality Reduction
k-meansHierarchicalFuzzy-c-meansSVDICAPCA
D. Praveen Kumar, T. Amgoth, and C. S. R. Annavarapu, “Machine learning algorithms for wireless sensor networks:
A survey,” Information Fusion, vol. 49, pp. 1–25, 2019. 7
Principal Component Analysis
Multivariate technique for data
compression. Use orthogonal
transformation to identify
principal components.
Part the dataset
according to the
features into K
number of cluster. K
is positive integer.
Machine Learning: Unsupervised Learning

8
Machine Learning: Training and Testing the Model on Data
•Fortrainingamodelweinitiallysplitthemodelinto3threesectionswhichare‘Trainingdata’,‘Validationdata’and‘Testingdata’.
•Youtraintheclassifierusing‘trainingdataset’,tunetheparametersusing‘validationset’andthentesttheperformanceofyourclassifieronunseen‘testdataset’.
•Animportantpointtonoteisthatduringtrainingtheclassifieronlythetrainingand/orvalidationsetisavailable.Thetestdatasetmustnotbeusedduringtrainingtheclassifier.Thetestsetwillonlybeavailableduringtestingtheclassifier.
https://towardsdatascience.com/workflow-of-a-machine-learning-project-ec1dba419b94

9
Machine Learning: Training and Testing the Model on Data
https://towardsdatascience.com/workflow-of-a-machine-learning-project-ec1dba419b94
Validation Set
Training Set
Test Set
The training set is the material through which the computer learns how to
process information. Machine learning uses algorithms to perform the training
part. A set of data used for learning, that is to fit the parameters of the classifier
Cross-validation is primarily used in applied machine learning to estimate the
skill of a machine learning model on unseen data. A set of unseen data is used
from the training data to tune the parameters of a classifier.
A set of unseen data used only to assess the performance of a fully-specified classifier.

10
Machine Learning: Training and Testing the Model on Data
https://towardsdatascience.com/workflow-of-a-machine-learning-project-ec1dba419b94
•Inadataset,atrainingsetisimplementedtobuildupamodel,whilea
test(orvalidation)setistovalidatethemodelbuilt.
•Datapointsinthetrainingsetareexcludedfromthetest(validation)set.
Usually,adatasetisdividedintoatrainingset,avalidationset(some
peopleuse‘testset’instead)ineachiteration,ordividedintoatraining
set,avalidationsetandatestsetineachiteration.
•Oncethemodelistrainedwecanusethesametrainedmodeltopredict
usingthetestingdatai.e.theunseendata.
•Oncethisisdonewecanevaluate(forregressionusuallywehaveerror
analysisforclassificationwehaveaconfusionmatrix).

11
Machine Learning: Regression Model Evaluation
https://towardsdatascience.com/workflow-of-a-machine-learning-project-ec1dba419b94
•Theskillorperformanceofaregressionmodelmustbereportedasanerrorinthose
predictions.
•Wewanttoknowhowclosethepredictionsweretotheexpectedvalues.
•Erroraddressesexactlythisandsummarizesonaveragehowclosepredictionswereto
theirexpectedvalues.
•Therearethreeerrormetricsthatarecommonlyusedforevaluatingandreportingthe
performanceofaregressionmodel:
MeanSquaredError(MSE).
RootMeanSquaredError(RMSE)
Mean Absolute Error (MAE)

12
Machine Learning: Confusion Matrix (Classification Model Evaluation)
https://towardsdatascience.com/workflow-of-a-machine-learning-project-ec1dba419b94
•Truepositives:Thesearecasesinwhichwepredicted
TRUEandourpredictedoutputiscorrect.
•Truenegatives:WepredictedFALSEandourpredicted
outputiscorrect.
•Falsepositives:WepredictedTRUE,buttheactual
predictedoutputisFALSE.
•Falsenegatives:WepredictedFALSE,buttheactual
predictedoutputisTRUE.
Wecanalsofindouttheaccuracyofthemodelusingtheconfusionmatrix.
Accuracy=(TruePositives+TrueNegatives)/(Totalnumberofclasses)
i.e.fortheaboveexample:
Accuracy=(100+50)/165=0.9090(90.9%accuracy)

13
Machine Learning: Evaluation
https://towardsdatascience.com/workflow-of-a-machine-learning-project-ec1dba419b94
•ModelEvaluationisanintegralpartofthemodeldevelopmentprocess.Ithelpstofindthebestmodelthatrepresentsourdataandhowwellthechosenmodelwillworkinthefuture.
•Toimprovethemodelwemighttunethehyper-parametersofthemodelandtrytoimprovetheaccuracyandalsolookingattheconfusionmatrixtotrytoincreasethenumberoftruepositivesandtruenegatives.

•MachineLearningalgorithmscaneffectivelysolvemanyofthelimitationsoftheconventionaltechniquesusedforlocalizationinindoorenvironments.
•Conventionalmethodsoftenlackscalability;therefore,cannotperformwellinthelarge-scaleindoorlocalizationsuchasairports,shoppingmallsandmulti-storeybuildingswithlargetrainingdatasets.
•Traditionalindoorlocalizationmethodsarenotveryflexibleinadaptingwelltodynamicallychangingenvironmentsandinthepresenceofmulti-dimensionalandheterogeneousdataapplications.
14
Machine Learning for Indoor Localization

•FluctuationinRSSIisthemostchallengingprobleminindoor
localizationanditeffectsthelocationaccuracyadversely.Themost
significantadvantageofMLisitsabilitytolearnusefulinformation
fromtheinputdatawithknownorunknownstatistics.
•Forinstance,recurrentneuralnetworkscouldeffectivelyexploitthe
sequentialcorrelationoftime-varyingRSSImeasurementsanduse
thetrajectoryinformationtomitigateRSSIfluctuations.
M. T. Hoang, B. Yuen, X. Dong, T. Lu, R. Westendorp, and K. Reddy, ‘‘Recurrent neural networks for accurate RSSI indoor
localization,’’ IEEE Internet Things J., vol. 6, no. 6, pp. 10639–10651, Dec. 2019. 15
Machine Learning for Indoor Localization: Motivation

16
Machine Learning for Indoor Localization: RSSI
RSSI information can be collected from an
access point easily without extra hardware
The fluctuation of RSSI often leads to severe
performance degradation.
SVM
Convolutional
Neural Network
Deep Auto Encoder
Recurrent Neural
Network
•Fluctuationinreceivedsignalstrengthindicator(RSSI)isthemostchallengingproblemin
indoorlocalizationanditeffectsthelocationaccuracyadversely.
•ThemostsignificantadvantageofMLisitsabilitytolearnusefulinformationfromtheinput
datawithknownorunknownstatistics.
M. T. Hoang, B. Yuen, X. Dong, T. Lu, R. estendorp, and K. Reddy, ‘‘Recurrent neural networks for accurate RSSI indoor localization,’’ IEEE Internet Things J., vol.
6, no. 6, pp. 10639–10651, Dec. 2019.

Offline
Area of Interest
Fingerprint
Database
Localization
Algorithm
Location
Estimation
Signal
Measurement
(Location,
Fingerprint)
Online
-36-48-51
Fingerprint node
(location)
User, Target
Fingerprint Technique
17
S. He and S. -. G. Chan, "Wi-Fi Fingerprint-Based Indoor Positioning: Recent Advances and
Comparisons," inIEEE Communications Surveys & Tutorials, vol. 18, no. 1, pp. 466-490,
Firstquarter2016, doi: 10.1109/COMST.2015.2464084.
General Fingerprint Technique Illustration

18
Machine Learning for Indoor Localization: Pattern Matching Algorithm
Offline
Area of Interest
Fingerprint
Database
Localization
Algorithm
Location
Estimation
Signal
Measurement
(Location,
Fingerprint)
Online
SVMk-NN ANNK-MeansRandom
Forest
A. Nessa, B. Adhikari, F. Hussain and X. N. Fernando, "A Survey of Machine Learning for Indoor Positioning," inIEEE
Access, vol. 8, pp. 214945-214965, 2020, doi: 10.1109/ACCESS.2020.3039271.

19
Device-free Indoor Localization: RSSI-based
∆"#$#!"#$%!,'="#$#!"#$%!,'−"#$#(")"*!
'=1,2,…,,
•Weconsiderdevice-freeindoorlocalizationwherethetargetdoesnotneedcarryingdevice.
•Thedataset/fingerprintdatabaseisthedatawhenthereispersoninroomsubtractedthe
empty/vacantroom’sdata(environmentalchange).
Device-freeDevice-based
Device-free Database by Environmental Change
!is number of fingerprint points in area of interest.

Fingerprint Technique: RSSI-based
20RSSI-based Fingerprint Technique Illustration
FP Location 1
FP Location 2
FP Location !
Coordinate
"#!: $%%&!−$%%&"#$%&
Fingerprint Database
"#': $%%&'−$%%&"#$%&
"#(: $%%&(−$%%&"#$%&
Site Survey
Known
Fingerprint
Location
Fingerprint Database
Target Unknown Location
Pattern Matching Algorithm
Record the RSSI when target inside
Match the "##Match the "()
"()is estimated target
location
)(: $%%&(−$%%&"#$%&

21
Research problems in localization and applied ML techniques
A. Nessa, B. Adhikari, F. Hussain and X. N.
Fernando, "A Survey of Machine Learning for
Indoor Positioning," inIEEE Access, vol. 8, pp.
214945-214965, 2020, doi:
10.1109/ACCESS.2020.3039271.

1.Availabilityandstandardizationoftrainingdata
•ThesuccessofMLisdatadependent.MostDLalgorithmsneedadequatedata
2.Costoftrainingandestimationtime
•Two-timemetrics,trainingtimeandresponsetime,areindispensablepartsofaMLmodel.
•Duringthetrainingtiming,thealgorithmtrainsitselftopredicttheoutputoffuturetestdata.Duringtheresponsetiming,themodelpredictstheoutputforagiveninput.
3.Challengesofdeeplearning
•Difficulttoimplementindevice-based(limitedstoredcapacity,computationalability,andperformthelocationestimation.
•DLmodelsareverymuchapplicationspecific
4.Lackofvariability
•Machine-learningapproachlacksvariability,incaseswherehistoricaldataisunavailable.
•Therefore,itisdifficulttoascertainthatpredictionsmadebyMLsystemsaresuitableinallscenarios. 22
Future challenges and limitation of ML

1.Datasetisdatasetfrommeasurement
•Datasetfordatatraining(datasetforfingerprintdatabase)
•Testdataforindoorlocalizationperformance(datatest)
2.MachineLearning:Regressor
•RandomForest.
•K-NN
•MLP
•NN
3.EvaluationforPerformanceComparison
•MeanSumError.
•StandardDeviation
23
Project Description

24
Data
•Thedatasetanddatatestaredatafromourownmeasurementcampaign.
•Itisconsistof56featuresofRSSI,8featuresofillumination,and2featuresofcoordinates(x
andy)forthedataset.
•Fortestdataonlyhas56featuresofRSSI,8featuresofillumination(total64).Withoutthe
coordinatedbecausewewanttopredictthecoordinateoftheseastargetposition.
•TheRSSIcollectionisdonebyusing8nodesandeachnodescollecttheRSSIandillumination
readingandsenttoserver/sinknode.









Reference Node 1
Reference Node 2
Reference Node 3
Reference Node 8
Reference Node 4
Reference Node 7
Reference Node 6
Reference Node 5
Sink Node

25
Datasets Collection

26
Datasets Collection
Fingerprint Database/Training Data (dataset)
NodeID(1)RSSI2RSSI3RSSI4RSSI5RSSI6RSSI7RSSI8luxLocation(25)
.
.
NodeID(2)RSSI1RSSI3RSSI4RSSI5RSSI6RSSI7RSSI8luxLocation(25)
NodeID(8)RSSI1RSSI2RSSI3RSSI4RSSI5RSSI6RSSI7luxLocation(25)
12345
678910
1112131415
1617181920
2122232425
NodeID(1)RSSI2RSSI3RSSI4RSSI5RSSI6RSSI7RSSI8luxLocation(5)
.
.
NodeID(2)RSSI1RSSI3RSSI4RSSI5RSSI6RSSI7RSSI8luxLocation(5)
NodeID(8)RSSI1RSSI2RSSI3RSSI4RSSI5RSSI6RSSI7luxLocation(5)
{
{
NodeID(1)RSSI2RSSI3RSSI4RSSI5RSSI6RSSI7RSSI8lux
.
.
NodeID(2)RSSI1RSSI3RSSI4RSSI5RSSI6RSSI7RSSI8lux
NodeID(8)RSSI1RSSI2RSSI3RSSI4RSSI5RSSI6RSSI7lux
Known
coordinate/location
Pattern
Matching
Algorithm
Predicted
Location
Unknown Location Test Data (datatest)

27
Brief: Random Forest
Random
Forest
=
Decision
Tree
xnumber of trees +
Bagging+
Random selection
of features
to split each node
Breiman, L. Random Forests.Machine Learning45,5–32 (2001). https://doi.org/10.1023/A:1010933404324
•RandomForesthaveasetofdecisiontreesensembledwith“baggingmethod”to
obtainclassificationandregressionoutputs.Inclassification,itcalculatestheoutput
usingmajorityvoting,whereasinregression,meaniscalculated.
•Lossfunction:entropy/Giniscoretocalculatethelossvalueofthedataset.
•Advantages:
•Accurateandpowerfulmodel.
•handlesoverfittingefficiently.
•Supportsimplicitfeatureselectionandderivesfeatureimportance.
•Disadvantages:
•computationallycomplexandslowerwhenforestbecomeslarge.
•Notawelldescriptivemodelovertheprediction.
https://medium.com/@dannymvarghese/comparative-study-on-classic-machine-learning-algorithms-part-2-5ab58b683ec0

28
Breiman, L. Random Forests.Machine Learning45,5–32 (2001). https://doi.org/10.1023/A:1010933404324
•n_estimators:Itisthenumberoftreesintheforest.Withlargenumberoftreescomeshigh
accuracy,buthighcomputationalcomplexity.
•maximumfeatures:maximumnumberoffeaturesallowedinanindividualtree..
•minimumsampleleaf:Itistheminimumnumberofsamplesrequiredtosplitaninternal
node.
https://medium.com/@dannymvarghese/comparative-study-on-classic-machine-learning-algorithms-part-2-5ab58b683ec0
Random Forest vs Neural Networks :
•Both are very powerful and high accuracy algorithms.
•Both have feature interactions internally and are less explainable.
•Random Forest needs no feature scaling whereas NN needs features to be
scaled.
•An ensemble version of both models will be powerful.
Hyperparameters
Brief: Random Forest

29
Brief: k-NN and NN
•k-NNrequiresnotrainingtime,whereastrainingneuralnetworks(NN)israthertime-intensive.
Howeverk-NNwillprobablytakemuchlongeratevaluationtime,especiallyifyouhavemanydata
pointsanddonotresorttoapproximatesearch.
•k-NNisverysimpleandrequirestuningonlyonehyperparameter(thevalueofk),whileNNtraining
involvesmanyhyperparameterscontrollingthesizeandstructureofthenetworkandthe
optimizationprocedure.
•NNhaveachievedthestateoftheartinmoredomainsthank-NN.(Thisdoesn’tnecessarilymean
neuralnetworkswillworkbetteronyourparticularproblem,butempiricallyneuralnetworksare
effectiveinmanysettings.)
•Therearemoretheoreticalguaranteesfork-NNthanforNN,althoughasweknowthereisalarge
gapbetweenthetheoreticalandempiricalperformanceofneuralnetworks.(Aside:itisagoodidea
tocarefullyexaminetheassumptionsmadebytheoreticalworkinmachinelearning,tomakesure
thattheyarereasonablefortheproblembeingapproached.)
•Onceaneuralnetworkistrained,thetrainingdataisnolongerneededtoproducenewpredictions.
Thisisobviouslynotthecasewithk-NN.
•Onceaneuralnetworkistrainedononetask,itsparameterscanbeusedasagoodinitializerfor
another(similar)task.Thisisaformoftransferlearningthatcannotbeachievedwithk-NN.
https://www.quora.com/How-does-KNN-classification-compare-to-classification-by-neural-networks

30
Brief: Multilayer Perceptron (MLP)
•MLP is a network consisting of the input layer, some
hidden layers, and the output layer.
•It is a feed-forward neural network primarily used in
pattern recognition, prediction, and classification.
•As it is feed-forward, the flow of data processing is; the
input layer receives the input signal. Then, the hidden
layers between the input and output layer are the
central part of MLP which computational taking place.
The output layer performs prediction and classification.
There is backpropagation learning in the MLP neurons.
•MLP then can approximate the continuous function
and solve nonlinear separable problems. Each layer
has an activation function, e.g., ReLU, sigmoid,
SoftMax, tanh.
•This activation function defines the sum of weighting
and decides whether a neuron will be activated or not.
The sole purpose is to add the non-linearity into the
neuron's output
S.AbiramiandP.Chitra,“Energy-efficientedgebasedreal-timehealthcaresupportsystem,”Adv.Comput.,vol.117,no.1,pp.339–368,Jan.2020.
J.Tang,C.Deng,andG.BinHuang,“ExtremeLearningMachineforMultilayerPerceptron,”IEEETrans.NeuralNetworksLearn.Syst.,vol.27,no.4,pp.809–821,2016.

31
Localization System Evaluation
X. Wang, L. Gao and S. Mao, "CSI Phase Fingerprinting for Indoor Localization With a Deep Learning Approach,"
inIEEE Internet of Things Journal, vol. 3, no. 6, pp. 1113-1123, Dec. 2016, doi: 10.1109/JIOT.2016.2558659.
•Theperformancemetricforthecomparisonoflocalizationalgorithmsisthemeansum
error,!.Considertheestimatedlocationofanunknownuser"is($!"#$,&,&!"#$,&)andthe
actualpositionoftheuseris($"#'(,&,&"#'(,&).For(estimatedlocations,themeansum
erroriscomputedas
-=1
./
'+,
*
((1-#%.,'−1#%"/,')0+(4-#%.,'−4#%"/,')0)
•Thestandarddeviation()*+)isalsoconsideredtoseehowvarythepredictedlocations
comparetothereal/actuallocation.
5$6=1
.−1/
'+,
1
((1-#%.,'−1#%"/,')0+(4-#%.,'−4#%"/,')0−-)0

32
Parameterization
•Trainandtestdatainpercentageof80%:20%.
•Thedatasetcontainof1480x66(rowsxcolumns)
•Thedatatestforvalidation444x64(rowsxcolumns)withoutcoordinatelabel(location
information).
•ThebestparameterforMLalgorithms
ML AlgorithmBest parameter
Random Forest'n_estimators': 50, 'min_samples_split': 5,
'min_samples_leaf': 1, 'max_depth': 8, 'bootstrap':
True
K-NN'weights': 'distance', 'n_neighbors': 2
MLPsolver': 'sgd', 'max_iter': 2000, 'hidden_layer_sizes':
(10,), 'alpha': 5e-05
NNhidden_layers': 3, 'neurons0': 50, 'neurons1': 100,
'neurons2': 100

•Byusingdatabase(dataset)asthetrainandtest,theerrorandstandarddeviationis
acceptablesincetheerrorislessthan1m(fingerprintgridsizeis1mx1m).
•k-NNhasthebesterror(minimum),NNhasslightlybetteratdatapointprecision(less
variation)asshowninlowestdeviationerror.
Performance Comparison: Training Results

Performance Comparison: Testing Results
•Simulatonoftheonlinephaseoffingerprinttechniquebyusingdatatestasdatavalidation
forourMLmodels.
•Theresultsshowsthattheallerrorishigherthan1mforallfourMLalgorithms(Ineedto
checkmore)
•k-NNisstillbetterinerror,but,RandomForestissuperiorinthedeviationerror.

Conclusions
•ApplyingMLalgorithmsintheindoorlocalizationispromising.
•TheseveralbaselineMLalgorithmsdonotneedhighnumberofdataand
stillyieldacceptableaccuracy.
•Intraining,thek-NNshowsbetterintermsoferror,andNNshowsgoodin
deviationerror.
•Intesting,theerrorperformanceispoor,butthetrendisstillink-NNfavor
togivethebesterror(minimum).
•RandomForestshowsbetterprecisionthanotheralgorithmsintermof
deviationerror(lessthan1m).
•Needtocheckthedatasetfurtherandapplyasimpledatasetfirstwith
othersetofdatafrominternet.
•Stillneedimprovementinsystemmodel.

Thank you very much!
36