Data Science Bootcamp Capstone Project - Wong Chee Fah.pptx
WongCheeFah
3 views
17 slides
Jul 30, 2024
Slide 1 of 17
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
About This Presentation
Capstone project presentation
Size: 645.98 KB
Language: en
Added: Jul 30, 2024
Slides: 17 pages
Slide Content
Caregiver Quality of Life by National Council of Social Services Identifying Potentially At-Risk Caregivers Through Machine Learning – An Investigation into Its Feasibility Data Science Bootcamp Capstone Project Wong Chee Fah
Introduction Overview of the NCSS Quality of Life Study on Caregivers Conducted in 2018 Examine the wellbeing of caregivers Provide a holistic view of the aspects of life deemed important to caregivers Importance of caregiver well-being and support When caregivers struggle, care recipients risk early institutionalization Risk of potentially harmful behaviour towards care recipients when caregivers cognitive impairments increased physical symptoms higher risk of clinical depression This Photo by Unknown Author is licensed under CC BY-SA This Photo by Unknown Author is licensed under CC BY-SA
Problem Statement Potentially at-risk caregivers Defined as caregivers who scored low on questions A1 and A2 Question A1 How would you rate your quality of life? “Very poor”, “Poor”, “Neither poor nor good”, “Good”, “Very good” Question A2 How satisfied are you with your health? “Very dissatisfied”, “Dissatisfied”, “Neither satisfied nor dissatisfied”, “Satisfied”, “Very Satisfied” Are there good demographic and care-giving arrangement indicators for flagging out potentially at-risk caregivers? If no strong indicators are found, will the aggregate or raw WHOQOL scores provide any insights? How well do they identify caregivers who gave low scores to A1 and A2?
The Study Study population I nformal caregivers aged 21 and above Provided care to an individual requiring support due to age, disability, illness or special needs Exclusion of paid caregivers Participant recruitment methods - Sampling Care recipients from hospitals and Special Education (SPED) schools Users of various social service agencies Ministry of Social and Family Development (MSF) Disability Office's database Survey content Caregiver and care recipient characteristics and caregiving arrangements World Health Organization Quality of Life (WHOQOL) instrument Respondents’ awareness, usage, and required services for themselves and their care recipients
Machine Learning Strategy Labels/Targets – A1 and A2 Not factored into WHOQOL aggregate scores WHOQOL questions A3 to A26 cover facets including Physical – A3, A10, A16 Psychological – A5, A7, A11, A19, A26 Level of Independence – A20, A21, A22 Social Relationships – A4, A15, A17, A18 Environment – A8, A9, A12, A13, A14, A23, A24, A25 Personal Beliefs – A6 Both raw scores and aggregated WHOQOL scores are available Predictors Caregiver-care recipient demographic and caregiving arrangement Raw scores from A3 to A26 (excluding A21 – almost 50% NULLs) Aggregate WHOQOL scores Combination of c aregiver-care recipient demographic and caregiving arrangement and raw scores or WHOQOL scores
Machine Learning Strategy Random Forest Can handle mixed data types Ensemble for better generalization and accuracy compared to single decision trees C ross-validation Reduce over-fitting More reliable estimate of performance Dataset – 80-20 split Train models for A1 and A2 separately Consider combining A1 and A2 as there is some correlation between them as can be seen in the diagonal slope pattern of the bands in this A2 vs A1 filled bar plot
C ombine A1 and A2 Don’t need the granularity of 5 classes We just want to catch the low scores Bin A1-A2 pairs into two classes All A1-A2 pairs with at least a “Very poor” or “Poor” score in A1, or, a “Very dissatisfied” or “Dissatisfied” score in A2 Labelled as “Poor and/or Dissatisfied” All other pairs Labelled as “OK or Better”
C ombine A1 and A2 Don’t need the granularity of 5 classes We just want to catch the low scores Bin A1-A2 pairs into two classes All A1-A2 pairs with at least a “Very poor” or “Poor” score in A1, or, a “Very dissatisfied” or “Dissatisfied” score in A2 Labelled as “Poor and/or Dissatisfied” All other pairs Labelled as “OK or Better”
C ombine A1 and A2 (Expanded) Expand the low score class to include A1-A2 pairs that are “Neither poor nor good” and “Neither satisfied nor dissatisfied” Low score class Labelled as “Poor, Dissatisfied or So-so” All other pairs Labelled as “Good or Better”
Investigation into Imbalance Much fewer low A1 or A2 scores Would correcting the imbalance improve model sensitivity to low score A1-A2 pairs? Split dataset into train and test sets Get sample count for every A1-A2 pair in train set Get maximum sample count Over-sample each A1-A2 pair in train set to the maximum sample count
Findings Challenges in identifying caregivers with low A1 or A2 V isualizations show certain groups of caregivers have lower quality of life (e.g. unemployed) The r everse is not so straight because there are as many or more of those who had low A1 and/or A2 scores who did not have low scores for other dimensions, or were from different brackets of a demographic dimension Adjustments to the machine learning strategy Modeling A1 and A2 alone gave the worst performance Binning into two classes improved performance significant, especially with expanded low score class A1-A2 pair balancing did not improve model performance much, if at all Class balancing improved performance of all models Demographic and care-giving arrangement predictors only models consistently had the worst performance
Conclusion & Recommendations The given dataset produced models that were inadequate for accurate and reliable identification of potentially at-risk caregivers. D ata gaps need to be looked into. Follow-up questions for future surveys Go back to low score respondents to discover what factors, other than the ones they already responded to, caused them to choose low A1 or A2 scores The information can be a basis for formulating follow-up questions in future surveys Encourage/Incentivise caregiver communication Provide non-intrusive, anonymous channels for voluntary information sharing by caregivers Make use of technology (e. g. simple app that asks “How are you today?” regularly) Obtain care recipient perspectives Care recipients have a different perspective and may see what caregivers themselves are blind to