MachineLearningForBigDataInHealthcare.pdf

UAVProcess 47 views 34 slides Sep 09, 2024
Slide 1
Slide 1 of 34
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27
Slide 28
28
Slide 29
29
Slide 30
30
Slide 31
31
Slide 32
32
Slide 33
33
Slide 34
34

About This Presentation

Machine Learning for Big Data in Healthcare


Slide Content

Session 5: Machine Learning for Big Data in Healthcare
Cloud-Based Machine & Deep Learning
Pr. Jean-Claude Franchitti

Thispresentationisdevotedtostudyinghowmachinelearninganddataanalytics
canbeappliedtohealthcareapplications.Westartwithanintroductionofthe
medicalandhealthcareproblemenvironments.ThenwepresentIoT-assisted
healthcaremonitoringsystems.Indataanalyticsformedicalapplications,wefocus
oncloud-baseddetectionofchronicdiseases.Thispresentationishighlyrelevantto
supervisedandunsupervisedmachinelearning(ML)algorithm.Wewillnotrepeatthe
theoriesbehindthoseMLalgorithms,onlybiomedicalandhealthcareapplicationsare
studiedinthispresentation,includingbothsystemsettingsandreportedperformance.
Inparticular,wepresentanintelligenthuman-machineinterfaceapproachwithsmart
clothingandroboticinteractionhealthcareclouds.
Overview

Healthcare and Chronic Disease Detection Problem
Factors that affect the detection accuracy in detecting chronic diseasesFactors to Cause
Chronic Diseases
Changeable Factors
Typical Chronic Diseases
Social and Environmental
Impact
GlobalizationUrbanizationPopulation aging
Hyperlipemia
Cardiovascular
disease
Diabetes ObesityTumor
Chronic respiratory
disease
Unchangeable Factors
Insufficient physical
exercises
Unhealthy eating
habit
Smoking and drinking
Unstable emotion
status
Age and gender Heredity
Factors Hard to be changed
Country Living standardBurn the midnight oilOther factors …
Determinants of health (statistics from
centers for disease control in 2003)

Software Libraries for Machine Learning Applications
Here,weidentifyafewsoftwaretoolkitsthatcananalyze
datasethelpusersputtogetherrunningprograms.Surprisingly,
manyoftheseMLpackagesarefromopensource.Readers
maycheckthedeveloperwebsitesformoredetailsofthe
functionalityandcapabilityofthoseprogramsorruntime
supportsystemsprovided.
Weidentifyothersoftwarelibrariesandtoolkitsthatcan
analyzedatasetandhelpusersputtogetherprogramsfor
runningdeeplearningtasks.Onlybriefintroductoryinformation
isgivenhere.However,wewilldigmuchdeeperwithGoogle
TensorFlowframeworkinsubsequentsections.TheSparkand
TensorFlowlibrarieshaveenrichedourcapabilitytodevelop
newMLorDLapplications.Alotofcognitiveactivitieswhich
humans(evenanewbornbaby)canperformeasilybutnot
alwayswithcertainty,nowwecantrainthecomputertohandle
thosescreeningandfilteringtasksroutinelytosaveustimeand
augmentourdecision-makingprocesseswithbetterevidence
andsupport.

IoT-based Healthcare Systems and Applications
HealthyInternetofThings(Health-IoT)isanimportantpathtosolvemedicalhealthproblems,andalsohasanimportant
realisticmeaningforpromotingthedevelopmentofthemedicalhealthindustryandimprovingpeople’squalityoflife.Compared
withthetraditionalthings-centeredIoT,theHealth-IoTis“human-centered”,andallnetworkaccesses,dataanalysesand
servicesareconductedsurroundinghumans;thesensoratthedatacollectionlayerisnotacommonsensor,butahumanbody
sensorforcollectingphysiologicalhealthparameters,andnetworkaccesses,dataanalysesandservicesareallconducted
basedonthe“human-centered”idea.
ThepreviousHealth-IoTemphasizedthedesignofahumanbodysensorandthecollectionofhumanbodyphysiological
data,butdidnotfullyconsidertheusers’mobility.Therefore,itisinconvenienttouseindailylifeandmayevenadverselyaffect
dailylife.Thedevelopmentofthemobileinternetbringstheintegrationofphysicalworld,virtualworldandsocialnetwork,thus
generatingCyber-PhysicalSocietySystems(CPSS).IntegratingtheHealth-IoTintoCPSS,allowsuserstoobtaintheservices
andconveniencebroughtbymobilehealthandmobilemedicaltreatment,whileusersareunderhighlymobileconditionsinthe
physicalworldandthesocialnetworkspaceisaninevitabletrendfordevelopmentofHealth-IoT.
TraditionalIoThasbeenwidelyappliedinthetraffic,logisticsandretailindustries.Withitsmaturity,theIoTattracts
people’sattentioninthefieldofhealthcare.However,lotsofapplications,whichpromotehealthservicestofamiliesor
individualsbyutilizingIoTtechnology,laterwereproventobeunsuccessful.Duetoitsimportancetoimprovemedicaltreatment
qualityandserviceefficiency,theHealth-IoTisamilestoneinhealthinformationdevelopment.Itwillplayanimportantrolein
improvingpeople’shealthlevelsandenhancingtheirqualityoflife.

IoTSensing for Body Signals
■EmeddedSensors:TheMHSdeviceadoptsdedicatedasensor,andpossessesthe
advantageofhighcollectingprecision,whileitalsofeaturesbydisadvantages
includinghighcostandinsufficientportabilityandusability.Thiskindofdevice
possessesthefollowingfeatures:
■Wearability:MostMHSsmustbelocatedonthehumanbodysoastocollectdata
preciselyforthemtotakevitalsignsofhumansascollectingtargets.Therefore,almost
allexistingmedicalhealthcollectingdevicestakewearabilityasthebasicrequirement.
Inthiscase,theuserscomfortcanbeimprovedandtheaccuracyofthecollecteddata
canbeguaranteedduringthecollectingprocedure.
■Longworkingtime:Themethodofthededicatedmedicalhealthcollectingdeviceis
differenttouniversalmobilecollectingdevices.Thepurposeoftheformeristocollect
datafromthehumanbodyoverarelativelylongtimeperiod,whichrequiresahighly
powersupplycapabilityofMHS.
■Stability:MHSstillcancollectdatanormallywhenusersareunderstrenuousexercise
orinanextremeenvironment.
■Lowparticipationdegreeofusers:DifferenttothemethodofGMD,thefunctionsofMHSarerelativelyindependent,andmost
MHSdevicesdonotneedtheinterventionofusersduringthedatacollectingprocedure,andusersonlyneedtostartupthepower
source,andtheMHSwillstartcollecting.
■Possessingdatainterimstoragemechanism:TheweightanddimensionsofMHSmaybelimitedstrictlytomeetthewearable
feature.Therefore,mostMHSdeviceswillnotintegratethedatatransmissionmodule,butwillselectthedatastoragemodulewith
relativelysmalldimensions,andadoptthedatainterimstoragemechanismtostorethecollecteddatainadvance,andthentransmit
thedatathroughothernetworkaccessdevices.
The layout of common human body sensors

Healthcare Monitoring System
Common health monitoring system based on community services.
Common health monitoring services
Several common health monitoring devices

➢HealthCyber-PhysicalSystem:Health-orientedmobileCyber-PhysicalSystem(CPS)playsavitalroleinexistingmedical
monitoringapplications,suchasdiagnosis,diseasetreatmentandemergencyrescue,etc.Someelectronicmedicalintelligent
networksystemssuitableforalargenumberofpatientshavebeendesigned.TheEnd-to-Enddelayofmedicalinformation
deliveryisthemainconcern,especiallyintheeventofanaccident,orintheperiodwhenthereisepidemicdiseaseoutbreak.
➢MobileHealthMonitoring:Severalyearsago,amobilehealthmonitoringsystembasedonportablemedicalequipmentand
smartphoneswasproposed.Smartphonesareusedtocollectphysiologicalsignalsofthehumanbodyfromavarietyofhealth
monitoringdevicesbyvirtueofdedicatedsmartphoneapplicationsoftware.Thenthosephysiologicalsignalsaretransmittedto
medicalcenters.Ifnecessary,theycanalsonotifycaregiversandmedicalemergencyinstitutionsusingtheshortmessage
serviceofmobilephone.
➢WearableComputingforHealthMonitoring:Overalongperiod,wearabledevicesandwearablecomputingarethekey
researchtopicstoenablehealthmonitoring.Asnewkindsofbodysensornodes,smartphonesandsmartwatchesareadapted
tomeasureSpO2andheartrate;however,suchmeasurementdatahaslowaccuracy,fewsignaltypesandlimitedmedicaluses.
➢HealthInternetofThings:HealthIoTisanotherwaytoprovidehealthmonitoringservice.Themobilesensing,localizationand
networkanalysisbasedonIoTtechnologiescanbeusedforhealthcare.
➢AmbientAssistedLiving:AmbientAssistedLiving(AAL)aimsatimprovingthelifequalityofpatients,anditcannotifyrelevant
relatives,caregiversandhealthcareexperts.AAL-relatedtechnologiesincludesensingtechnology,physiologicalsignal
monitoring,homeenvironmentmonitoring,video-basedsensing,smarthometechnology,patternanalysisandmachinelearning.
➢BodyAreaNetworkbasedHealthMonitoring:Existingworkonbodyareanetwork(BAN)focusesonsensornode'senergy
saving,intra-BANnetworkdesign,implantablemicro-sensors,physiologicalsignalacquisition,etc.Portablesmartwearable
healthmonitoringsystembasedonBANhasbeendeveloped.However,stability,sustainableandreliabilityofthesystemneed
tobeimproved.
Healthcare Monitoring
System

Physical Exercise Promotion and Smart Clothing
Communication architecture of
exercise promotion devices
Exercise promotion products available in 2016
(a) Mobile application software
(b) The testbedsettings
Smart clothing application software and
testbed
settings

Example: Robotics and cloud-assisted healthcare system
Healthcare Robotics and Mobile Health Cloud
Asafrontendequipment,therobottakeschargeinsignalcollection,specificaction
performanceandsomesimpletasksofanalysisandprocessingwhilethemore
complicatedtaskswhichneedalarge-scalecomputingclusterwilldependonthecloud.
Theclouditselfhasstrongstorageandcalculatingability,training,learningandbuilding
effectivemodelsbyadvancedmachinelearningalgorithmandtransmittingtheresultsof
calculationoranalysisbacktotherobots.Inthatway,therobotwillbeprovidedwiththe
second-wisebrainwiththehelpofcloud’sstronganalysisandprocessingabilities.
Cloudcomputingisanewtypecomputingandservice
modebasedontheInternet.Throughthismethod,poolinghardware
andsoftwareresourcesandinformationcanbesuppliedtotheservice
requestoronabasisofrequirement.Thetraditionalrobotsarealways
restrictedinthehardwareandsoftwarefunctionswherethereare
seriousproblems.Butthecloudcomputing,asagoodsupporttothe
robottechnologycaneasilycombinethecloudcomputingwiththerobot
technologytobuildacloudrobot.
A typical health monitoring system built with smart clothing and backend cloud

Big Data Analytics for Healthcare Applications
Example: Predictive Disease Diagnosis using Logistic Regression
Table7.5haslistedadatasetoftriglyceride,totalcholesterolcontent,high-densitylipoprotein,low-densitylipoprotein,and
hyperlipemiaornot(1foryesand0forno).Thesearecollectedfromhealthexaminationdatainahospital,inWuhan,China.
Let’sattempttoconductpreliminaryjudgmentonwhetherthepersonhashyperlipemia,ifhisorherhealthexaminationdata
are{3.16,5.20,0.97,3.49}inasequence.
Inthisexample,weneedtojudgewhetheranunknown
personwhoreceivedahealthexaminationhashyperlipemia.
Asperdatainthetable,itisknownthatthisproblemisa
dichotomyproblem(1forhyperlipemiaor0forhealthy)with
fourattributes(features).Therefore,wemayconduct
predictionandclassificationbyuseoflogisticregression.
Firstly,extractfourattributesandcombinethemintoone
attribute,as0 1 1 2 2 3 3 4 4
z x x x x    = + + + +
Where standfortriglyceride,totalcholesterolcontent,
high-densitylipoproteinandlow-densitylipoprotein
respectively,zstandsforthefeatureaftercombination.
Secondly,estimateweightbyuseofmaximumlikelihood
method,adoptsoftwareMATLABhere,andconductiteration
solutiontolikelihoodequationsetwithNewton-Raphson
Method.1 2 3 4
, , ,x x x x 
Patient ID Triglyceride
Total
Cholesterol
High-Density
Lipoprotein
Low-Density
Lipoprotein
Whether
hyperlipemiaor not
1 3.62 7 2.75 3.13 1
2 1.65 6.06 1.1 5.15 1
3 1.81 6.62 1.62 4.8 1
4 2.26 5.58 1.67 3.49 1
5 2.65 5.89 1.29 3.83 1
6 1.88 5.4 1.27 3.83 1
7 5.57 6.12 0.98 3.4 1
8 6.13 1 4.14 1.65 0
9 5.97 1.06 4.67 2.82 0
10 6.27 1.17 4.43 1.22 0
11 4.87 1.47 3.04 2.22 0
12 6.2 1.53 4.16 2.84 0
13 5.54 1.36 3.63 1.01 0
14 3.24 1.35 1.82 0.97 0
Health examination data for patients with hyperlipemia

Hyperlipemia
Healthy
More Examination
3127
6
5
4
810 111412139
Feature
Class Classification results using logistic regression in Example
Inaccordancewithresultsabove,isrelativelylarger;thusitcanbe
seenthatwhetheronepersonhashyperlipemiaornotislargely
influencedbytotalcholesterolcontentinthehealthexamination.
Thenfigureoutclassforeachsampleintrainingdatasetbyuseof
sigmoidfunction.Theresultsare
Thenumberinthefigurestandsforidofthepersontested,andthe
circlesindottedlinestandforclass.Itcanbeseenfromthefigure
thattheaccuracyofclassificationwithlogisticregressioninthis
instanceis100%,thusthismodelcouldbeadoptedforprediction.
Lastly,let’spredictwhetherapersonwhosedataare{3.16,5.20,
0.97,3.49}respectivelyhashyperlipemia.Adoptthemodelabove
andconductsolving-equation-by-substitution,then .
Therefore,thatpersonispredictedtohavehyperlipemia.2
 [1,1,1,1,1,1,1,0,0,0,0,0,0,0]class= 0 1 2 3 4
132.3, 3.1, 39.6 , 2.9 , 3.2    = − = − = = − = 1class=
Big Data Analytics for Healthcare Applications
Example: Predictive Disease Diagnosis using Logistic Regression

Big Data Analytics for Healthcare Applications
Example: Use of BaysianClassifier in Diabetic Analysis and Prediction
Thisexampleanalyzesdiabeticpatientsandpredictswhethertheyhaveacquiredthedisease.Thepredictionis
basedontrainingfromsampledataonlabelledpatientsontheirobesityandbloodsugarcontent.Thesampledata
aregiveninTable7.6.Here,YesstandsforobesityordiabeticpatientandNofornormalweightorhealthypersons.
id Obesity(A)
Blood Sugar Content(B)
(mmol/L)
Diabetics Patient or Not
1 No 14.3 Yes
2 No 4.7 No
3 Yes 17.5 Yes
4 Yes 7.9 Yes
5 Yes 5.0 No
6 No 4.6 No
7 No 5.1 No
8 Yes 7.6 Yes
9 Yes 5.3 No
Forsimplicity,wedenoteattributeAforobesityandattributeBforbloodsugarcontent.Basedonthestatisticsfrom
Table7.6.,weobtainthefollowingprobabilitydistributionsonpatientobesityandbloodsugarcontentsinTable7.7
Table 7.6 Health examination data of diabetics patients
Diabetics
Obesity
Blood Sugar Content
(mmol/L)
Yes No Mean Value Variance
Yes 3/4 1/4 11.83 18.15
No 2/5 3/5 4.94 0.07
Table 7.7 Probabilistic results on patient obesity and blood sugar content

Big Data Analytics for Healthcare Applications
Example: Use of BaysianClassifier in Diabetic Analysis and Prediction()P Yes X ()P No X
Topredictclasslabelofapersonwhoreceivedhealthexamination,ifX=(A=Yes,B=7.9),theCalculationof and
isrequired.UsingstatisticsdatainTable7.7,wehave:431
()( ) ( = )
944
23 5
( = ) ( = ) ()
55 9
P YesP A Yes Yes P A No Yes
P A Yes No P A No No P No

== = =




== =

 
Asforindexofbloodsugarcontent,ifclassisYes,then:2 2 2
2
14.3 17.5 7.9 7.6
11.83
4
(14.3 11.83) (17.5 11.83) (7.6 11.83)
18.15
4
yes
yes
x
s
+ + +
==



− + − + + −

==

IftheclassisNo,then:2 2 2
2
4.7 5.0 4.6 5.1 5.3
4.94
5
(4.7 4.94) (5.0 4.94) (5.3 4.94)
0.07
5
yes
yes
x
s
+ + + +
==



− + − + + −

==


Big Data Analytics for Healthcare Applications
Example: Use of BaysianClassifier in Diabetic Analysis and Prediction
WithGaussiandistributioninbloodsugarcontent,wehave
Atthemoment,conductclassificationforXwithnaiveBayesianclassificationmethod,2
2
(7.9 11.83)
2 18.15
(7.9 4.94)
282 0.07
1
( =7.9 ) 0.062
2 18.15
1
( =7.9 ) 9.98 10
2 0.07
P B Yes e
P B No e







−

==



= = 

 3
( | ) ( | ) ( 7.9| ) 0.062 0.0465
4
P X Yes P A Yes Yes P B Yes= = = =  = ( | )P X No
Inasimilarway,theprobabilityof isobtainedasfollowswitherrorsestimated.28 282
( | ) ( | ) ( 7.9| ) 9.98 10 3.99 10
5
P X No P A Yes No P B No
−−
= = = =   =  28 28
( | ) ( ) 4
( ) 0.062 0.0276
( ) 9
1
()( | ) ( ) 5
( ) 3.99 10 2.218 10
( ) 9
P X Yes P Yes
P Yes X
PX
PXP X No P No
p No X
PX



−−

= =   = 


=

= =    =  


Weget .Therefore,theclassofthepersonisYesif .Thus,the
personhasacquireddiabetes.28
( ) ( ) 0.0276 2.218 10 ( ) ( )P Yes X P X P X P No X

=   = ( ), 7.9X A Yes B= = =

Big Data Analytics for Healthcare Applications
Example: Selection for HyperlipemiaDetection Methods over Medical Data
Patient ID
Triglyceride
(mmol/L)
Total
Cholesterol
(mmol/L)
High-Density
Lipoprotein
(mmol/L)
Low-Density
Lipoprotein
(mmol/L)
hyperlipemia or
not
1 3.07 5.45 0.9 4.02 1
2 0.57 3.59 1.43 2.14 0
3 2.24 6 1.27 4.43 1
4 1.95 6.18 1.57 4.16 1
5 0.87 4.96 1.36 3.61 0
6 8.11 5.08 0.73 2.05 1
7 1.33 5.73 1.88 3.71 1
8 7.77 3.84 0.53 1.63 1
9 8.84 6.09 0.95 2.28 0
10 4.17 5.87 1.33 3.61 1
11 1.52 6.11 1.29 4.58 1
12 1.11 4.62 1.63 2.85 0
13 1.67 5.11 1.64 3.06 0
14 0.87 3.45 1.25 1.92 0
15 0.61 4.05 1.87 2.05 0
16 9.96 4.57 0.53 1.73 1
17 1.38 5.61 1.77 3.62 0
18 1.65 5.1 1.77 3.16 0
19 1.22 5.71 1.53 3.93 1
20 1.65 5.24 1.47 3.41 1
ML Algorithm MemoryDemand
(in KB)
Training Time
( in second)
Accuracy
Decision Tree 1,768 1.226 90%
KNN 556 0.741 100%
SVM 256 0.196 100%
Table 7.8 Labeled samples from examination reports of 20
hyperlipemiapatients.
Table 7.9 Measured performance of three competing classiferchoices
Inordertodeterminewhetherstudentssufferfromhyperlipidemia,
physicalexaminationisconductedtomeasurethetriglyceride,total
cholesterol,high-densitylipoprotein,andlow-densitylipoprotein.and
otherprojects.Table7.8liststheresultsof20studentsundertesting.
Here,thosestudentsdetectedtohaveacquiredhyperlipidemiaare
markedbya“1”andthosewhohavenotacquiredhyperlipidemiaby
a“0”intherightmostcolumn.
Byobservingthesampledatasets,weknowalldatahaveclass
labels,sothiscanbesolvedbyasupervisedclassificationmethod.
Table7.9summarizesthememorydemand,trainingtimeand
accuracymeasuredinusingthethreecandaidatemachinelearning
methods.Byaccuracydemand,obviousKNNandSVMmethodsare
perfecttoservethepurpose.Ifmemorydemandandtrainingtimeare
important,theSVMmethodisevenabetterchoice.

Performance Analysis of Five Disease Detection Methods
Bigdatacanbeappliedtopredictwhetherapersonisamongthehigh-riskpopulationofacertainchronicdisease,
basedontheirpersonalinformationsuchasage,gender,theprevalenceofsymptoms,medicalhistoryandlivinghabits(e.g.,
smokingornot,etc.).Figure7.11lists5distinctmachinelearningmethods,namelynaiveBayesian(NB),k-nearestneighbor
(KNN),SVM,neuralnetwork(NN)anddecisiontrees(DT),thatweevaluatefordiseasedetection.
WeapplythenaiveBayesian(NB),k-nearestneighbor(KNN),SVM,artificialneuralnetwork(ANN),decisiontrees(DT)
modelstopredicttheriskofchronicdisease.Themodel’sbasicframeworkisshowninFigure.Werandomlydividedthedata
intotrainingdataandtestdata,andtheratioofthetrainingsetandthetestingsetwas3:1.Themethodmentionedabove
wasusedtotrainthemodel.Randomly
Divided
Naïve Bayesian
K-Nearest Neighbour
Support Vector Machine
Neural Network
Decision Tree
Hospital Data
Multiple
Machine Learning
Testing
Training Set
Test Set
Figure 7.11 Five machine learning models for disease prediction based on medical big data

Performance Analysis of Five Disease Detection Methods
A.PredictionUsingNearestNeighborAlgorithm
NBclassificationisasimpleprobabilisticclassifierpresentedinearlierslides.Basedonapatient’sinputfeature
vector ,wecancalculate andthepriorprobabilitydistribution.Bayesiantheorem,
isappliedtoobtaintheposterioriprobabilitydistribution, .Throughsolvingtheproblemof ,theNB
classifiercanpredictthediseaseofapatient.12
x ( , , , )
n
x x x= p(x | )
i
c p( )
i
c ()
p( )p(x | )
p |x
p(x)
ii
i
cc
c=  p(|x)
i
c argmax | )  p( x
i
ci
c
B.RiskPredictionUsingtheNearestNeighborAlgorithm
KNNwasdiscussedinearlierslides.Inthisexample,weuseEuclideandistance.Basedonthemedicalbigdata,x=(�
1,
�
2,⋯,�
??????)andy=(�
1,�
2,⋯,�
??????)arethecharacteristicvectorsoftwogivenpatients,witheachofthevectorscontainingn
characteristics.TheEuclideandistancebetweentwopatientsiscalculatedasfollows: .The
parameterKissensitivetothemodelperformance.Wechoosefrom5to25intypicalhealthcareapplications.Forthedataset
weused,whenK=10,themodelexhibitsthehighestperformance.Thus,wesetKto10.()
2
1
d x, y ( ) 
n
ii
i
yx
=
=−
C.PredictionUsingSupportVectorMachine
SVMwasstudiedinearlierslides.Itisusedtofindamaxhyperplanetodivideann-dimensionalspaceintosubspaces.In
typicalmedicalapplications,thepatient’scharacteristicsvector islinearlyinseparable.Tomapthedatatoa
transformedfeaturespace,usingkernel-basedlearning,itiseasiertoclassifythelineardecisionsurfacesand,therefore,to
reformulatetheproblemsothatthedataaremappedexplicitlytothisspace.Thekernelfunctioncanhavemanyforms.Here,
weusetheradialbasisfunction(RBF)kernel.TheSVMclassifiercanbeimplementedusingtheLibSVMlibrary.12
x ( , , , )
n
x x x=

Performance Analysis of Five Disease Detection Methods
D.PredictionUsingNeuralNetwork
NNclassifierswereinventedbymimickingbiologicalneuralnetworks.Inthisexample,weneedtosetparametersfirst:i)
thenumberoflayers.TheNNmodelcontainsfourlayersgenerally,includinganinputlayer,twohiddenlayersandan
outputlayerand;ii)thenumberofneuronsineachlayer.Here,thedimensionoftheinputlayerisequaltothenumberof
patient'scharacteristics.Theinputisdenotedby .Inthisexample,weset10neuronsinthefirsthiddenlayer,
whilesetting5asthenumberofneuronsinthesecondhiddenlayer.Theoutputonlyhastworesults,i.e.high-riskorlow-
risk.Thus,theoutputlayeronlycontainstwoneurons.
AfterconstructingthestructureofNN,weneedtotrainthemodel.Foreachconnectionweightwandbiasbineach
layer,weusethebackpropagationalgorithm.Fortheactivationfunction,weapplythesigmoidfunction.12
x ( , , , )
n
x x x=
E.PredictionUsingDecisionTree
DecisionTrees(DT)basedclassificationwasintroduced earlier.Itsbasicideaisthatanobjectisclassifiedby
minimizingthedataimpurity,whichisdeterminedbytheuseofinformationgain.Theinformationgainisbasedonthe
conceptofentropy,whosedefinitionisasfollows: ,inwhich isthenon-zeroprobabilityof.
TheexpectedinformationrequiredfortheclassificationofSaccordingtoattributeAisdenotedby .Then,wecan
obtain ,wherevrepresentsthevsubsetsdividedfromSaccordingtoattributeA.Wecanthenobtain
theinformationgainasfollows: .()H S log 
i
ii
pp=− ,
 | | / | |
i i s
p C S= () / ( ) 
A v v
vV
H S S S H S

=  
i
C  ()
A
HS ()()Gain S,A (  H S )
A
HS=−

Performance Analysis of Five Disease Detection Methods
E.PredictionUsingDecisionTreeAccura  cy
FN
TP TN
TP FP TN
+
=
+++ Precision 
TP
TP FP
=
+ Precision 
TP
TP FP
=
+ 2
F1 Measur  e
Precision Recall
Precision Recall

−=
+
Toimprovethemodel,the10-foldcross-validationmethodisusedonthetrainingset,wheredatafromthetestingparticipant
arenotusedinthetrainingphase.LetTP,FP,TNandFNbethetruepositive(thenumberoflegitimateinstancescorrectly
predicted),falsepositive(thenumberoflegitimateinstancesincorrectlypredicted),truenegative(thenumberofnegative
instancescorrectlypredicted)andfalsenegative(thenumberofnegativeinstancesincorrectlypredicted),respectively.We
definefourmeasurements:accuracy,precision,recallandF1-Measureasfollows:
TheF1-Measureistheweightedharmonicmeanoftheprecisionandrecallandrepresentstheoverallperformance.Inadditionto
theevaluationcriteriaabove,wemostoftenusethereceiveroperatingcharacteristic(ROC)curveandtheareaunderthecurve
(AUC)toevaluatetheprosandconsoftheclassifier.TheROCcurveshowedthetrade-offbetweenthetruepositiverate(TPR)
andthefalsepositiverate(FPR),inwhich , .Whentheareaiscloserto1,thebetterthemodel.TPR / )  (TP TP FN=+ TFR F / T )  (FP P N=+

12
x ( , , , )
n
x x x= The inputs to the model are the attribute values of the patient, denoted by TheoutputvalueisC={??????
0,??????
1},
where??????
0indicateswhetherthepatientisamongstthehyperlipemiahigh-riskpopulationclass,and??????
1indicateswhetherthe
patientisamongstthehyperlipemialow-riskpopulationclass.Weareconcernedabouttheaccuracy,precision,recalland
F1-Measureofthehospital’sdataset.TheDThadthehighestaccuracyinthetrainingsetandthetestset.Therelative
performanceandtrainingtimeoffivemachinelearningmodelsaregiveninFig.7.12Percentage
(%)
b
Time
(
sec
)
(a) Relative performance (b) Training and testing times
Figure 7.12 Relative performance of 5 machine learning methods for disease prediction
Figure7.12(a)plotstheaccuracy,precision,recallrate,andF1performanceofall5predictionmethods.Basedonthe
datasetswehaveprocessed,theyallperforminthesamerangebetween82%and95%.Consideringaccuracyalone,the
SVMandDTmethodsarehigherataround92%,whiletheother3methodsstayataround90%.Byprecision
measurements,wefindthatNNandDTarebetterandKNNislowestataround80%.Intermsofrecallrate,theKNN
methodistheworstandtheremainderisaboutatthesamelevelabove90%.Finally,theDThasthehighestF1measure
of95%,whileothersstayaround90%.
Big Data Analytics for Healthcare Applications
Example: Prediction of High-Risk Disease with Five Machine Learning Algorithms

Insummary,intermsoftrainingtime,asplottedinFigure
7.12(b),wefindKNNtakesamuchlongertimetobetrained,whiletherest
havemuchlowertrainingtimes.Basedontheseresults,weranktheDT
methodasthehighestinperformanceandtheKNNmethodasthelowest
inoverallscores.However,wehavetoindicatethatthisrankingresultisby
nomeansthesameingeneralsituations.Therelativeperformanceisvery
sensitivetothedatasetsizeandcharacteristics.ByROCresults,wefind
thatSVMexhibitshighperformanceforhigh-dimensionalcases,whereas
theDTworksbetterforlow-dimensionalcases(Figure7.13).Finally,we
summarizetheprosandconsofusingthesefivemachinelearningmodels
inTable7.10.d e
Figure 7.13 ROC curve of the disease prediction results using hospital data
Algorithm Pros Cons
Naïve Bayesian
Easy to implement; has strong robustness to the
independent attribute and noise points; the training
time is fast.
Attribute assumption of the data set occurs
independently from one another; generally, the
accuracy of the classification is not that high.
K-Nearest
Neighbour
Easy to understand; there is no assumption about the
distribution of the data set; the data can be multi-
dimensional.
Classification speed is slow; all training sets are stored
in memory and are faced with the problem of storage;
sensitive to noise.
Support Vector
Machine
Can handle high-dimensional data; generally, the
accuracy is high; the abnormal value has good
processing abilities.
With high dimension, it is necessary to choose a good
kernel function; the time to train is longer; and
demands for storage and CPU are both high.
Neural Network
Handle multiple feature data; classification speed is
fast; it can address redundant characteristics.
Training time is relatively long; the training of the
concentration noise is relatively sensitive.
Decision Tree
Has no potential distribution assumption for the data
set; the classification of the data sets is fast;
comparisons are easy to explain.
Is prone to the problem of data fragments; the best DT
is difficult to identify.
Table7.10Strengthandweakness
ofdiseasedetectionmethods
Big Data Analytics for Healthcare Applications
Example: Prediction of High-Risk Disease with Five Machine Learning Algorithms

Mobile Big Data for Disease ControlDatabases
Cleaning
&Integration
Selection
&Transformation Data Mining
Data Exploration
Datawarehouse Data Mining Patterns Knowledge
Data Preprocessing
Problem Understanding Visualization, Interpretation and Validation
Classification ModelModel Evaluation
Evaluation
&Presentation
Domain Expert
Methods for the high-risk patient prediction process
encompassing exploration, preprocessing and evaluation stages
Thisstudyusedgeneraldatamining,includingdatapre-processing,data
miningmodelsanddatapost-processing.Medicalbigdatamustbediscussed
withthedoctortoobtainanunderstandingoftheproblemandthedata.The
hospital’sdatawerestoredinthecloud.Toprotecttheuser’sprivacyand
security,wecreatedasecurityaccessmechanism.Wefirstpre-processedthe
data,includingtheprocessinganddimensionreductionofmissingvalues,
repeatvaluesandexceptionvalues.Accordingtothedoctor’sopinionto
extractfeaturevalues,weusedmachinelearningalgorithmstoevaluatethe
patient’sriskmodel;andfinally,thebestmodelwasselectedviaevaluation
usingthemathematicalmethod.
Aggregationwasperformedonthetrainingdatabyimplementingthe
demographics,riskfactors,vulnerabilityonwhichthepre-processingwas
performedandtransformationoftheinputdata.Datacleaningincludedcleaning
andpre-processthedatabydecidingwhichstrategiestousetohandlemissing
fieldsandtoalterthedataaccordingtotherequirements.Wefirstidentified
uncertain,inaccurate,incompleteorunreasonablemedicaldataandthen
modifiedordeletedthemtoimprovethedataquality.
Intheclean-upprocess,weexaminedtheformat,integrity,reasonablenessand
limitationsofthedata.Datacleaningisofvitalimportanceformaintainthe
consistencyandaccuracyofthedataanalysis.Theaccuracyofriskprediction
dependsonthediversityfeatureofthehospitaldata.Wecanintegratethe
medicaldatatoguaranteedataatomicity,i.e.,weintegratedtheheightand
weighttoobtainBMI.Accordingtothediscussionwithadomainexpertand
Pearson’scorrelationanalyses,weextractedtheuser’sstatisticalcharacteristics
andsomeofthecharacteristicsassociatedwithhyperlipemiaandlivinghabits
(suchassmoking).

Emotion-Control Healthcare Applications
Toprovideproperandeffectiveemotioncare,weneedto
developanemotionmodelbasedonphysiologicaldatatraininginthecloud.
Thesystemshouldestablishuniqueresponsesfordifferentuseremotion
patterns.Forexample,touseECG(Electrocardiography),thesignalis
transmittedtothecloudviawisdomclothingwithECGacquisitionand
transmissionfunction.WhenthecloudreceivestheECGdata,itwillconduct
analysisandprocessinginrealtime.Next,accordingtotheuser’sunique
identification,theuser’semotionalstateispredictedbythetrainedmodel,while
theotherdatacollectedfrommobileterminalcanassistemotionprediction.
Whendetectingthattheuserhasnegativeemotions,animmediatecallis
madetotherelevantequipmentandresourcestoemotionallyinteractwith
users.Forexample,withasadnessemotion,inordertoplaymusicwhichcan
easethegrieffortheuser,thesystemcanevensendacommandtoarobotin
thehomeandlettherobotemotionallyinteractwithuserthroughaseriesof
methodsofactions,voice,etc.Andfinally,thesystemrealizestheeffectof
emotionalcare.Thepopulationthatneedsemotionalcareincludesempty-nest
people,depressivepatients,autismchildren,long-distancedrivers,pilotsand
spacemen,prisonersorslaves,etc.
Mental healthcare for special groups of populations

Data Collection and Feature Extraction
Data Style Data Type Usage Cue
Physical
Data
Physiological data
Heart rate, Breathing rate, Skin temperature,
Duration time of sleep
Activity levelStatic, Walking, Running
Location Latitude and longitude coordinates, User retention time
Environmental Temperature, Humidity
Phone screen
on/off
The time screen on/off
Body video
Facial expression video, Head movement video,
Eye blink video, Behavioral video
Cyber
Data
Calls
No. of incoming/outgoing calls, Average duration of
incoming/outgoing calls, No. of missed calls
SMS
No. of sent/receive messages, The length of the messages,
Content of each SMS
Emails No. of sent/receive emails,
Application
No. of uses of Office Apps, No. of uses of Maps Apps,
No. of uses of Games Apps, No. of uses of Chat Apps,
No. of uses of Camera App, No. of uses of Video/Music Apps,
Social Network
Data
SNS
The user ID and screen name, No. of friends, Content post,
repost and comment, Image post, repost and comment,
Content or Image create time
Wearabledevicesandmobilephonesareusedto
collectdataevery30minutes.Thedatacollectedarethen
categorizedintophysicaldata,cyberdataandsocial
networkdata.Physicaldataconsistsofphysiologicaldata,
activitylevel,locationinformation,environmental,phone
screenon/offandbodyvideos.Cyberdataincludesphone
calllogs,SMSlogs,emailslogsandapplicationusage
logs.SocialnetworkdataincludesSNSs.Ontheother
hand,theuser’semotionalstatusisobtainedmainly
throughthefollowingtwomethods:i)self-labelbytheuser;
andii)labelthroughtransferlearning.Table7.11shows
thedatacollectionindetail.Datapreprocessingmainly
containsthefollowingfouraspects:datacleaning,
eliminateredundancy,dataintegrationandtimeseries
normalization.
Table 7.11 Various data types in providing emotion control services

Transfer Learning based Labeling for Emotion DetectionSource
Domain
Target
Domain
Instance Space
Feature Space
FS FT
XT
XS Source Domain
Feature Space
FS
Translator
Target Domain
Feature Space
FT
bridge
T(FS,FT)
T(FS,FT) P(FS|FT)
Calculate the similarity of FS and FT
label
(a) The instance space (b) The concept of transfer learning
Figure 7.16 Instance space and feature space for transfer machine learning
Typically,eachpersonhashis/herownbehavioral
patternintermsofbehaviorsstateandlivinghabits,
i.e.differentpeoplemayhavedifferentphysiological
signalsandlivinghabitsunderthesameemotions.
AsshowninFigure7.16,variouspeopleexpress
theiremotionofhappinessbydifferencebehaviors,
whichcanbesensedbymultimodalperson-centric
data.Onekeypenetratingpointistomatchasingle
typeofemotionwithvarioususer’sbehaviors
throughtransferlearning.Theconceptoftransfer
learningisillustratedbelow.Variousdatatypesare
giveninTable7.11inprovidingemotioncontrol
services.
Letbethesourceinstancespace,i.e.thedatacollectedwhichhavemoodlabel,andletbethetargetinstancespace,i.e.thedata
collectedwhichdoesnothavemoodlabel.andarethefeaturespacescorrespondingtoand ,respectively.Asshowninthe
Figure7.16(a),Cdenotesthelabelspaceofanumberofemotionalmodes:{happiness,sadness,fear,anger,disgust,surprise}.The
transferlearningmodelappliesaMarkovchain( ),where , , andc∈C.X
S X
T S
F F
T X
S X
T s t t
c f f x→ → → tT
xX tT
fF Ss
fF

Transfer Learning based Labeling for Emotion Detection
Ourgoalistoestimatetheconditionalprobability .First,weneedtofindatranslator tolinkthetwofeature
spaces.Thesimilarityoffeaturesisusedtojudgethesimilarityoffeaturedomains.AsshownintheFigure7.16(b),welinkthe
featureandthroughthefollowingequation:( | )
t
p c x ( , ) ( | )
t s t s
T f f p f f s
f t
f 1
D ( || ) ( ( || ) ( || ))
2
JS T S KL T KL S
P P D P M D P M=+
where andtheKL-divergenceDKLisdefinedas:1/ 2( )
ST
M P P=+ ()
( || ) ( )log
()
T
KL T S T
xX S
Px
D P P P x
Px

=
Fortime-seriesdata,wefirstnormalizetheminto[0,1].Timeseriescollectedfromsourcedomainandtargetdomainare
denotedbyand ,respectively.Dynamictimewarping(DTW)isusedtomeasurethesimilarityofandasS
M T
M S
M T
M ( , ) ( , ) min{ ( 1, ), ( , 1), ( 1, 1)}
ij
D i j d m n D i j D i j D i j= + − − − −
where .Withthedecreasingof ,and becomemoresimilar.Sowetakethelow-n
similarsequencesout.Nowwelinktheand,sowecancalculatethetop-Nmostprobabilitysequenceslabel.
Forthetext,weextractthewordwhichscoresbetween[−1,−0.4]∪[0.4,1].AccordingtoSentiWordNet,thevectorsofscores
fromsourceandtargetdomainsaredenotedbyand.Nowweusecosinesimilaritytomeasurethesimilaritybetweenand
as:·VV
VV
cos( )
·
ST
ST
=
‖‖‖‖ 2
( , ) ( ) ,
i j i j i S j T
d m n m n m M n M= −   ( , )D i j S
M T
M t
f s
f V
S V
T V
S V
T

Input Feature Vector Xt
Divide Xt by data type
Estimate probability
distribution
Normalize SF data Calculate score vector
Get relative entropy Measure DTW distance Get cosine similarity
Output labels of top-N
distribution
Output labels of top-N
sequences
Output labels of top-N
vectors
Most frequently outputted label
SD(Statistical Data) CD(Content Data)
TD(Time-series Data) Transfer Learning based Labeling for Emotion Detection
Nowwelinktheand,sowecancalculatethetop-Nmost
probabilityvectorlabel.AsshownintheFigure7.17,theisthe
probabilitydistributionoffromthesourcedomain,forexample
plotthefrequencyofbodytemperaturevalue.Forphysiological
data,call,SMS,emailandapplication,weadoptthesamemethod
toestimateddistribution.istheprobabilitydistributionoffrom
thetargetdomain,sincetheJensen–Shannondivergenceis
widelyusedformmeasuringthesimilaritybetweentwoprobability
distributions. equalstozeroifandonlyifthetwo
distributionsandareidentical.Sowetakethelow-nsimilar
distributionsout.Nowwelinktheand,sowecancalculatethe
top-Nmostprobabilitydistributionslabel.t
f t
f S
P s
f T
P D ( || )
JS T S
PP
Figure 7.17 The concept of transfer learning for emotion labelings
f T
P S
P t
f s
f

Emotion Interaction through IoTand Clouds
Example 8.7 The AIWAC emotion monitory system Developed at HuazhongUniversity of
Science and Technology Layered architecture of the AIWAC emotion monitory system
(reprinted with permission from Zhang et al., 2015)
Traditionalaffectivepredictionresultsfromanalyzingonetypeof
emotionaldata.Thismayleadtoinaccuracytovalidatethe
detectionresults.Toovercomethisdifficulty,wepresentan
emotiondetectionarchitecture,namedAIWAC.AIWACstandsfor
AffectiveInteractionthroughWearableComputingAndCloud.The
systemcollectsemotionaldatafrommultiplesources:namelythe
cyber,physicalandsocialspaces.Inthephysicalspace,user's
physiologicaldataiscollected,includingvariousbodysignals,
suchasEEG,ECG,electromyography(EMG),bloodpressure,
bloodoxygen.
Inthecyberspace,weuseacomputertocollect,storeand
transferuser'sfacialand/orbehavioralvideocontents.Inthesocial
space,theuser'sprofile,behavioraldataandinteractivesocial
contentsareextracted.Withtheavailabilityofsocialnetworking
services,IoTframeworks,and4G/5Gmobilenetworks,the
affectivedatacollectedistrulyabigdatasourceoveralong
observationperiod.TheAIWACprovideuserswithphysiological
andpsychologicalhealthcaresupport.AIWACisdevelopedin
threelayers:(1)userterminallayerwithwearabledevicesfor
physiologydatacollectionandemotionalfeedback;(2).
communicationlayer;and(3)cloudlayerforaffectiveinteraction.

Emotion-Control via Robotics Technologies
Humanoid robotics for affective interactions
between AIWAC and clients.
Thehumanoidrobothasmadegreatprogress,butisalsofacingmany
technicalchallengestomakeitfullyintegratedintohumanlife,among
whichtoequipthehumanrobotwithemotionalinteractionabilityisoneof
themostchallengingproblems.
Robot affection interaction based on cloud computing
Withcloudcomputingtechnology,thereisnoneedforuserstounderstand
everydetailofthecloudcomputinginfrastructure,thecorresponding
professionalknowledgeorthedirectcontrol.

A 5G Cloud-Centric Healthcare System
Theconceptofthesmartcognitivesystemisillustratedwiththe
followingfeatures:
◆ Through5Gfuturetelecommunicationtechnologies,sensors,
cognitivedevicesandrobotsinteractsmoothlywithultra-
reliablelowlatencycommunications.
◆ Thedesignofnetworkingisenhancedsothatitcanmove
dataquickly.Fortheretrievaloraccessofstoredbigdata,
5Gnetworksconnectterminaldevicesanddatacentersata
veryfastrate,facilitatingquicklearningresponse.
◆ Learningfromdataistheheartofcognitivecomputing,anda
clouddatacenteristhemainhardwarefacilityforadvanced
learning.
◆ Cognitivecomputingrequiresawealthofdataavailable,as
cloudsareimplementedandconfiguredtostoreandprocess
thosedata.
Tobuildasmartcognitivesysteminthe5Gera,thesystemneedstoinclude
threefunctionalcomponents:
1)Behavioralinteractionterminal:cognitivebehaviorsinacognitivesystem
shouldbedisplayedinterminals;inordertoachievethis,robotsofvaried
typesandincreasinglypowerfulfunctionsarefavorablealternatives;
2)Environmentalperceptioncomponent:realizationofcognitionshouldbe
basedonbigdata,andthecognitivecomponentshouldrealizethe
comprehensiveperceptionofhearing,vision,touchandhumanemotion;
3)Cognitivereasoningcomponent:theintelligentcognitivereasoningmodel
caneffectivelysimulatehumancognitiveprocess,andrelatedtechnologies
includingAI,machinelearning,deeplearning,cloudcomputingandother
effectivetoolsutilizedtoestablishcognitivereasoningmodel.

A 5G Cloud-Centric Healthcare System
Architecture of a smart cloud/IoT/5G based cognitive system
Thesystemisdividedintothreelayers:Thefirstlayeris
builtwithsmartterminals,cloud-basedRANandcloud-
basedCoreNetwork.Theheterogeneousaccessnetworks
interconnectsmartterminals,suchassmartphones,smart
watches,robots,smartcarsandotherdevices.Theedge
cloudandremotecloudaretheinfrastructurestosupport
therealizationofcognitivefunctionsintermsofstorageand
computingresources.Thesecondlayerisforresource
managementtosupportaresourcecognitiveengineto
achieveresourceoptimizationandhighenergyefficiency.
Thethirdlayerprovidesdatacognitivecapability.Indata
cognitiveengine,AIandbigdatalearningtechniquesare
employedforcognitivebigdataanalytics,suchasinthe
domainofhealthcare.Thebigdataflowrepresentsthe
processofmassivedatacollection,storageandanalysis
withthesupportofcloudorIoT.Thetrafficflowconsistsof
packetsandcontrolmessagesduringusers’end-to-end
communications.

Two applications of a smart cognitive system
A 5G Cloud-Centric Healthcare System
Here,weshowtwoarchetypalapplicationsofthesmartcognitionsystem.Intelemedicine,remotesurgerymaybedesigned
tosavelifeinthedomainofhealthcare.Usingthe5Gnetwork,thecriticaloperationactionandhapticperceptionofthe
surgeonwillbemappedtotherobotarmintheremoteoperatingtablewithveryshortdelayandhighreliability.Inaddition,all
vitaldataofthepatientcanbeprocessedwithanalyticstoolsatremotecloudinrealtimetoguidetherescueteamtocarry
outsomepreliminarylife-savingoperationsbeforetransportingthepatienttothehospital.Thesecondarchetypalapplication
istodetecthumanemotionswiththehelpofsmartrobots,whichinteractwithcloudstoexecutesomeresponsiveactionsto
calmpatients.Alotofresearchexperimentshavebeensuggestedinthepast.Thecloud/IoTbasedsystemmayhelptosolve
emotioncontrolproblemsinthefuture.

Conclusion
Inthispresentation,wehavefocusedonbigdataapplicationinthebio-medicalandhealthcareareas.
However,thedatasetswehavetestedintheillustratedexamplecasesarenotsufficientlylargeinscaleto
drawageneralconclusiononTBorPBdatasets.Thischapterneedsthebackgroundfrompreviouschapters.
Bigdataandcloudsbothdemandamajoroverhaulofoureducationalprogramsinscienceandtechnology.
Thereisnouniqueorgeneralsolutiontobig-dataproblems,duetoheavydependenceonspecificapplication
domains.
Wemustleveragetheuseofcloudsandbig-dataanalyticsinstoring,processingandminingbigdata,
whichchangesrapidlyintimeandspace.Theclouds,mobile,IoTandsocialnetworksarechangingourworld,
reshapinghumanrelations,promotingtheglobaleconomyandtriggeringsocietalandpoliticalreformsona
world-scale.Thosemachinelearningmethodsmayperformdifferently,ifnon-medicalornon-healthcare
datasetsareprocessedortested.However,learningthemachinelearningmethodologiesismoreimportantin
generalbigdatascienceandcloudcomputingapplications.