Data Science Bootcamp Capstone Project - Wong Chee Fah.pptx

WongCheeFah 3 views 17 slides Jul 30, 2024
Slide 1
Slide 1 of 17
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17

About This Presentation

Capstone project presentation


Slide Content

Caregiver Quality of Life by National Council of Social Services Identifying Potentially At-Risk Caregivers Through Machine Learning – An Investigation into Its Feasibility Data Science Bootcamp Capstone Project Wong Chee Fah

Introduction Overview of the NCSS Quality of Life Study on Caregivers Conducted in 2018 Examine the wellbeing of caregivers Provide a holistic view of the aspects of life deemed important to caregivers Importance of caregiver well-being and support When caregivers struggle, care recipients risk early institutionalization Risk of potentially harmful behaviour towards care recipients when caregivers cognitive impairments increased physical symptoms higher risk of clinical depression This Photo by Unknown Author is licensed under CC BY-SA This Photo by Unknown Author is licensed under CC BY-SA

Problem Statement Potentially at-risk caregivers Defined as caregivers who scored low on questions A1 and A2 Question A1 How would you rate your quality of life? “Very poor”, “Poor”, “Neither poor nor good”, “Good”, “Very good” Question A2 How satisfied are you with your health? “Very dissatisfied”, “Dissatisfied”, “Neither satisfied nor dissatisfied”, “Satisfied”, “Very Satisfied” Are there good demographic and care-giving arrangement indicators for flagging out potentially at-risk caregivers? If no strong indicators are found, will the aggregate or raw WHOQOL scores provide any insights? How well do they identify caregivers who gave low scores to A1 and A2?

The Study Study population I nformal caregivers aged 21 and above Provided care to an individual requiring support due to age, disability, illness or special needs Exclusion of paid caregivers Participant recruitment methods - Sampling Care recipients from hospitals and Special Education (SPED) schools Users of various social service agencies Ministry of Social and Family Development (MSF) Disability Office's database Survey content Caregiver and care recipient characteristics and caregiving arrangements World Health Organization Quality of Life (WHOQOL) instrument Respondents’ awareness, usage, and required services for themselves and their care recipients

Machine Learning Strategy Labels/Targets – A1 and A2 Not factored into WHOQOL aggregate scores WHOQOL questions A3 to A26 cover facets including Physical – A3, A10, A16 Psychological – A5, A7, A11, A19, A26 Level of Independence – A20, A21, A22 Social Relationships – A4, A15, A17, A18 Environment – A8, A9, A12, A13, A14, A23, A24, A25 Personal Beliefs – A6 Both raw scores and aggregated WHOQOL scores are available Predictors Caregiver-care recipient demographic and caregiving arrangement Raw scores from A3 to A26 (excluding A21 – almost 50% NULLs) Aggregate WHOQOL scores Combination of c aregiver-care recipient demographic and caregiving arrangement and raw scores or WHOQOL scores

Machine Learning Strategy Random Forest Can handle mixed data types Ensemble for better generalization and accuracy compared to single decision trees C ross-validation Reduce over-fitting More reliable estimate of performance Dataset – 80-20 split Train models for A1 and A2 separately Consider combining A1 and A2 as there is some correlation between them as can be seen in the diagonal slope pattern of the bands in this A2 vs A1 filled bar plot

Preliminary Results (Separate A1 and A2 Models) Low sensitivity scores for low A1 and A2 scores across all models, using various predictor sets Best performing models are with WHOQOL aggregate scores, model_A1_WHOQOL and model_A2_WHOQOL Model Formula Accuracy Sensitivity Very poor Very dissatisfied Poor Dissatisfied Neither poor nor good Neither satisfied nor dissatisfied Good Satisfied Very good Very satisfied model_A1 A1 ~ A3 + A4 + A5 + A6 + A7 + A8 + A9 + A10 + A11 + A12 + A13 + A14 + A15 + A16 + A17 + A18 + A19 + A20 + A22 + A23 + A24 + A25 + A26 0.6489 0.000000 0.058824 0.522200 0.861500 0.217950 model_A2 A2 ~ A3 + A4 + A5 + A6 + A7 + A8 + A9 + A10 + A11 + A12 + A13 + A14 + A15 + A16 + A17 + A18 + A19 + A20 + A22 + A23 + A24 + A25 + A26 0.6519 0.100000 0.220930 0.361810 0.877600 0.338240 model_A1_mtry A1 ~ A3 + A4 + A5 + A6 + A7 + A8 + A9 + A10 + A11 + A12 + A13 + A14 + A15 + A16 + A17 + A18 + A19 + A20 + A22 + A23 + A24 + A25 + A26 0.6656   0.100000 0.137255 0.625900 0.814700 0.282050 model_A1_all A1 ~ cg_age_band + cg_gender + cg_edu + cg_employ_5cat + F9 + F12 + F15 + cg.relation.r + F20_1_Period + F21 + F25a + Condition + A3 + A4 + A5 + A6 + A7 + A8 + A9 + A10 + A11 + A12 + A13 + A14 + A15 + A16 + A17 + A18 + A19 + A20 + A22 + A23 + A24 + A25 + A26 0.6589 0.200000 0.215690 0.614800 0.798400 0.282050 model_A1_top4 A1 ~ A5 + F12 + F21 + F20_1_Period 0.6122 0.000000 0.000000 0.511100 0.796300 0.282050 model_A1_WHOQOL A1 ~ WHOQOL.Physical + WHOQOL.Psychological + WHOQOL.Social.Relationships + WHOQOL.Independence + WHOQOL.Environment + WHOQOL.Beliefs 0.6167 0.300000 0.137255 0.514800 0.773900 0.333330 model_A2_WHOQOL A2 ~ WHOQOL.Physical + WHOQOL.Psychological + WHOQOL.Social.Relationships + WHOQOL.Independence + WHOQOL.Environment + WHOQOL.Beliefs 0.6297 0.200000 0.220930 0.331660 0.859000 0.264710

C ombine A1 and A2 Don’t need the granularity of 5 classes We just want to catch the low scores Bin A1-A2 pairs into two classes All A1-A2 pairs with at least a “Very poor” or “Poor” score in A1, or, a “Very dissatisfied” or “Dissatisfied” score in A2 Labelled as “Poor and/or Dissatisfied” All other pairs Labelled as “OK or Better”

C ombine A1 and A2 Don’t need the granularity of 5 classes We just want to catch the low scores Bin A1-A2 pairs into two classes All A1-A2 pairs with at least a “Very poor” or “Poor” score in A1, or, a “Very dissatisfied” or “Dissatisfied” score in A2 Labelled as “Poor and/or Dissatisfied” All other pairs Labelled as “OK or Better”

C ombine A1 and A2 (Expanded) Expand the low score class to include A1-A2 pairs that are “Neither poor nor good” and “Neither satisfied nor dissatisfied” Low score class Labelled as “Poor, Dissatisfied or So-so” All other pairs Labelled as “Good or Better”

Combined A1-A2 Results Better sensitivity scores for low combined class scores across all models except for demographic predictors Best performing model is for expanded combined class with all predictors, model_A1_A2_grp2_WHOQOL Model Formula Accuracy Sensitivity model_A1_A2_grp A1_A2_grp ~ cg_age_band + cg_gender + cg_edu + cg_employ_5cat + F9 + F12 + F15 + cg.relation.r + F20_1_Period + F21 + F25a + Condition + A3 + A4 + A5 + A6 + A7 + A8 + A9 + A10 + A11 + A12 + A13 + A14 + A15 + A16 + A17 + A18 + A19 + A20 + A22 + A23 + A24 + A25 + A26 0.8769 0.27778 model_A1_A2_grp2 A1_A2_grp2 ~ cg_age_band + cg_gender + cg_edu + cg_employ_5cat + F9 + F12 + F15 + cg.relation.r + F20_1_Period + F21 + F25a + Condition + A3 + A4 + A5 + A6 + A7 + A8 + A9 + A10 + A11 + A12 + A13 + A14 + A15 + A16 + A17 + A18 + A19 + A20 + A22 + A23 + A24 + A25 + A26 0.8415 0.56120 model_A1_A2_grp_dem A1_A2_grp ~ cg_age_band + cg_gender + cg_edu + cg_employ_5cat + F9 + F12 + F15 + cg.relation.r + F20_1_Period + F21 + F25a + Condition 0.8603 0.00000 model_A1_A2_grp2_dem A1_A2_grp2 ~ cg_age_band + cg_gender + cg_edu + cg_employ_5cat + F9 + F12 + F15 + cg.relation.r + F20_1_Period + F21 + F25a + Condition 0.7373 0.00000 model_A1_A2_grp_WHOQOL A1_A2_grp ~ WHOQOL.Physical + WHOQOL.Psychological + WHOQOL.Social.Relationships + WHOQOL.Independence + WHOQOL.Environment + WHOQOL.Beliefs 0.8825 0.31746 model_A1_A2_grp2_WHOQOL A1_A2_grp2 ~ WHOQOL.Physical + WHOQOL.Psychological + WHOQOL.Social.Relationships + WHOQOL.Independence + WHOQOL.Environment + WHOQOL.Beliefs 0.8259 0.51050

Investigation into Imbalance Much fewer low A1 or A2 scores Would correcting the imbalance improve model sensitivity to low score A1-A2 pairs? Split dataset into train and test sets Get sample count for every A1-A2 pair in train set Get maximum sample count Over-sample each A1-A2 pair in train set to the maximum sample count

Balanced Pairs Results Demographic-only models still unable to predict at usable level of sensitivity Marginally better sensitivity scores in some models, best model is model_A1A2_bal2_dem_WHOQOL Model Formula Accuracy Sensitivity model_A1A2_bal_dem A1A2_grp ~ cg_age_band + cg_gender + cg_edu + cg_employ_5cat + F9 + F12 + F15 + cg.relation.r + F20_1_Period + F21 + F25a + Condition 0.8404 0.03968 model_A1A2_bal2_dem A1A2_grp2 ~ cg_age_band + cg_gender + cg_edu + cg_employ_5cat + F9 + F12 + F15 + cg.relation.r + F20_1_Period + F21 + F25a + Condition 0.7084 0.18143 model_A1A2_bal_A A1A2_grp ~ A3 + A4 + A5 + A6 + A7 + A8 + A9 + A10 + A11 + A12 + A13 + A14 + A15 + A16 + A17 + A18 + A19 + A20 + A22 + A23 + A24 + A25 + A26 0.8703 0.23810 model_A1A2_bal2_A A1A2_grp2 ~ A3 + A4 + A5 + A6 + A7 + A8 + A9 + A10 + A11 + A12 + A13 + A14 + A15 + A16 + A17 + A18 + A19 + A20 + A22 + A23 + A24 + A25 + A26 0.8182 0.55700 model_A1A2_bal_WHOQOL A1A2_grp ~ WHOQOL.Physical + WHOQOL.Psychological + WHOQOL.Social.Relationships + WHOQOL.Independence + WHOQOL.Environment + WHOQOL.Beliefs 0.8470 0.32540 model_A1A2_bal2_WHOQOL A1A2_grp2 ~ WHOQOL.Physical + WHOQOL.Psychological + WHOQOL.Social.Relationships + WHOQOL.Independence + WHOQOL.Environment + WHOQOL.Beliefs 0.7949 0.59490 model_A1A2_bal_all A1A2_grp ~ cg_age_band + cg_gender + cg_edu + cg_employ_5cat + F9 + F12 + F15 + cg.relation.r + F20_1_Period + F21 + F25a + Condition + A3 + A4 + A5 + A6 + A7 + A8 + A9 + A10 + A11 + A12 + A13 + A14 + A15 + A16 + A17 + A18 + A19 + A20 + A22 + A23 + A24 + A25 + A26 0.8825 0.24603 model_A1A2_bal2_all A1A2_grp2 ~ cg_age_band + cg_gender + cg_edu + cg_employ_5cat + F9 + F12 + F15 + cg.relation.r + F20_1_Period + F21 + F25a + Condition + A3 + A4 + A5 + A6 + A7 + A8 + A9 + A10 + A11 + A12 + A13 + A14 + A15 + A16 + A17 + A18 + A19 + A20 + A22 + A23 + A24 + A25 + A26 0.8259 0.55270 model_A1A2_bal_dem_WHOQOL A1A2_grp ~ cg_age_band + cg_gender + cg_edu + cg_employ_5cat + F9 + F12 + F15 + cg.relation.r + F20_1_Period + F21 + F25a + Condition + WHOQOL.Physical + WHOQOL.Psychological + WHOQOL.Social.Relationships + WHOQOL.Independence + WHOQOL.Environment + WHOQOL.Beliefs 0.8858 0.30952 model_A1A2_bal2_dem_WHOQOL A1A2_grp2 ~ cg_age_band + cg_gender + cg_edu + cg_employ_5cat + F9 + F12 + F15 + cg.relation.r + F20_1_Period + F21 + F25a + Condition + WHOQOL.Physical + WHOQOL.Psychological + WHOQOL.Social.Relationships + WHOQOL.Independence + WHOQOL.Environment + WHOQOL.Beliefs 0.8492 0.60340

Balanced Class Results Demographic-only models still unable to predict at usable level of sensitivity Better sensitivity compared to pairs balancing, best model is model_A1A2_balc2_WHOQOL Model Formula Accuracy Sensitivity model_A1A2_balc_dem A1A2_grp ~ cg_age_band + cg_gender + cg_edu + cg_employ_5cat + F9 + F12 + F15 + cg.relation.r + F20_1_Period + F21 + F25a + Condition 0.8326 0.08730 model_A1A2_balc2_dem A1A2_grp2 ~ cg_age_band + cg_gender + cg_edu + cg_employ_5cat + F9 + F12 + F15 + cg.relation.r + F20_1_Period + F21 + F25a + Condition 0.7007 0.23207 model_A1A2_balc_A A1A2_grp ~ A3 + A4 + A5 + A6 + A7 + A8 + A9 + A10 + A11 + A12 + A13 + A14 + A15 + A16 + A17 + A18 + A19 + A20 + A22 + A23 + A24 + A25 + A26 0.8681 0.34921 model_A1A2_balc2_A A1A2_grp2 ~ A3 + A4 + A5 + A6 + A7 + A8 + A9 + A10 + A11 + A12 + A13 + A14 + A15 + A16 + A17 + A18 + A19 + A20 + A22 + A23 + A24 + A25 + A26 0.8248 0.63290 model_A1A2_balc_WHOQOL A1A2_grp ~ WHOQOL.Physical + WHOQOL.Psychological + WHOQOL.Social.Relationships + WHOQOL.Independence + WHOQOL.Environment + WHOQOL.Beliefs 0.8459 0.41270 model_A1A2_balc2_WHOQOL A1A2_grp2 ~ WHOQOL.Physical + WHOQOL.Psychological + WHOQOL.Social.Relationships + WHOQOL.Independence + WHOQOL.Environment + WHOQOL.Beliefs 0.8126 0.64980 model_A1A2_balc_all A1A2_grp ~ cg_age_band + cg_gender + cg_edu + cg_employ_5cat + F9 + F12 + F15 + cg.relation.r + F20_1_Period + F21 + F25a + Condition + A3 + A4 + A5 + A6 + A7 + A8 + A9 + A10 + A11 + A12 + A13 + A14 + A15 + A16 + A17 + A18 + A19 + A20 + A22 + A23 + A24 + A25 + A26 0.8747 0.34127 model_A1A2_balc2_all A1A2_grp2 ~ cg_age_band + cg_gender + cg_edu + cg_employ_5cat + F9 + F12 + F15 + cg.relation.r + F20_1_Period + F21 + F25a + Condition + A3 + A4 + A5 + A6 + A7 + A8 + A9 + A10 + A11 + A12 + A13 + A14 + A15 + A16 + A17 + A18 + A19 + A20 + A22 + A23 + A24 + A25 + A26 0.8293 0.61180 model_A1A2_balc_dem_WHOQOL A1A2_grp ~ cg_age_band + cg_gender + cg_edu + cg_employ_5cat + F9 + F12 + F15 + cg.relation.r + F20_1_Period + F21 + F25a + Condition + WHOQOL.Physical + WHOQOL.Psychological + WHOQOL.Social.Relationships + WHOQOL.Independence + WHOQOL.Environment + WHOQOL.Beliefs 0.8780 0.45238 model_A1A2_balc2_dem_WHOQOL A1A2_grp2 ~ cg_age_band + cg_gender + cg_edu + cg_employ_5cat + F9 + F12 + F15 + cg.relation.r + F20_1_Period + F21 + F25a + Condition + WHOQOL.Physical + WHOQOL.Psychological + WHOQOL.Social.Relationships + WHOQOL.Independence + WHOQOL.Environment + WHOQOL.Beliefs 0.8525 0.64560

Findings Challenges in identifying caregivers with low A1 or A2 V isualizations show certain groups of caregivers have lower quality of life (e.g. unemployed) The r everse is not so straight because there are as many or more of those who had low A1 and/or A2 scores who did not have low scores for other dimensions, or were from different brackets of a demographic dimension Adjustments to the machine learning strategy Modeling A1 and A2 alone gave the worst performance Binning into two classes improved performance significant, especially with expanded low score class A1-A2 pair balancing did not improve model performance much, if at all Class balancing improved performance of all models Demographic and care-giving arrangement predictors only models consistently had the worst performance

Conclusion & Recommendations The given dataset produced models that were inadequate for accurate and reliable identification of potentially at-risk caregivers. D ata gaps need to be looked into. Follow-up questions for future surveys Go back to low score respondents to discover what factors, other than the ones they already responded to, caused them to choose low A1 or A2 scores The information can be a basis for formulating follow-up questions in future surveys Encourage/Incentivise caregiver communication Provide non-intrusive, anonymous channels for voluntary information sharing by caregivers Make use of technology (e. g. simple app that asks “How are you today?” regularly) Obtain care recipient perspectives Care recipients have a different perspective and may see what caregivers themselves are blind to

Thank You
Tags