1. A Predictive Diagnosis Assistant in an Electronic Medical Records Platform
FaithNassiwa1
30 views
16 slides
Jun 29, 2024
Slide 1 of 16
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
About This Presentation
This presentation addresses the critical issue of misdiagnosis in healthcare through the development of a Predictive Diagnosis Assistant (PDA). Utilizing the DDXPlus dataset developed by Tchango et al., which includes over 1.2 million records covering 49 diseases and 219 features, this research focu...
This presentation addresses the critical issue of misdiagnosis in healthcare through the development of a Predictive Diagnosis Assistant (PDA). Utilizing the DDXPlus dataset developed by Tchango et al., which includes over 1.2 million records covering 49 diseases and 219 features, this research focuses on structuring and optimizing the data for machine learning applications. By leveraging eXtreme Gradient Boosting (XGBoost) and various feature selection techniques, the PDA is integrated into a demo utility EMR platform developed using Streamlit. The PDA predicts the top three possible diagnoses based on patient records, aiming to support healthcare providers in enhancing diagnostic accuracy and patient outcomes.
Size: 1.03 MB
Language: en
Added: Jun 29, 2024
Slides: 16 pages
Slide Content
A Predictive Diagnosis Assistant in
an Electronic Medical Records
Platform
DS5500: Phase 2 Presentation
Team 14: Sri Charitha Narra, Graziano Peregrino Cezario, Kelly Uwase Rubangura, Faith
Nassiwa
●Background
●Goal and Objectives
●Dataset Overview
●Feature Selection and Modeling
●User Interface Build
●Database Modeling
●Takeaways
Background
Misdiagnosis in healthcare is a critical issue, leading to
patient harm and significant financial waste, with nearly
25% of expenditures going to low-value care costing up
to $101 billion annually.
Delayed
TreatmentInappropriate
Lab Test /
Results
Diagnostic Errors
Project Goal
Our solution, a machine learning(ML) based
Predictive Diagnosis Assistant, aims to minimize
diagnostic errors by analyzing past accurately
diagnosed patient records to provide potential
conditions and enhance treatment plans.
DDXPlus Dataset Overview
Data Summary
Data Source: Extracted from the article “DDXPlus: new dataset
for medical diagnosis. Tchango et al. 2022”
Type DescriptionCounts
Patient RecordsTrain & Test1,025,602 &
134,526
Features Categorial &
Boolean
10 & 209
Diseases Categorical49
DDXPlus EDA
10 most frequent evidences
● Do you have pain somewhere, related to your
consulting?
● Are you experiencing shortness of breath or
difficulty breathing in a significant way?
● Do you have a cough?
● Do you smoke cigarettes?
● Do you have a fever (either felt or measured
with a thermometer)?
● Do you drink alcohol excessively or do you
have an addiction to alcohol?
● Are you feeling nauseous or do you feel like
vomiting?
● Do you have asthma or have you ever had to
use a bronchodilator in the past?
● Are you significantly overweight compared to
people of the same height as you?
Modeling Evaluation Results
Feature Selection No. of FeaturesAccuracy ROC-AUC
49 Diseases, FI at 0.995 - CV 102 93% 0.99
Top 10 Diseases, FI >0.005 - CV 20 91% 0.99
Top 10 Diseases, PCA 83 96.85% 0.99
49 Diseases, Chi square 100 90.3% 0.99
49 Diseases, Chi square 50 88.51% 0.50
49 Diseases, FI at 0.95 - CV 58 66.74% 0.51
Model Discussion
Multi-stage / model
approach
●Model 1: Trained on a
subset of data, Top 10
Diseases and 20
Important Features
●Model 2: Trained on a
subset of data, all 49
Diseases and 102
Important Features
Home Page: Model 1(20 MIF Predictions) & Model 2(102 MIF Predictions) Prediction Results for 20 Test Patients
EMR Platform Development
Integrated Pre-Trained Models
Xgboost_10, xgboost_49
Subset Test Data
1000 Test Patients
Streamlit Pages
Home, Predict_10 and Predict_49
03
01 02
EMR Frontend Development
Pages:
-Home
-Predict 10
-Predict 49
DB Discussion
Why?
●Scalability
●Data retrieval and querying
●Reporting and analysis
●Report issues
Questions:-
●Do you have swollen or painful lymph nodes?
●Are you taking any new oral anticoagulants ?
●Are you immunosuppressed?
●Do you have heart failure
Questions represented by the table patient_id 112385
Overall Project Timeline and Milestones
P1: Wk3
Dataset 1:
Preprocessing and
Modelling
> Preliminary Report
1(completed)
P1: Wk6
Dataset 2: Preprocessing,
Modelling and Model Selection
> Report 1 (Completed)
P2: Wk8
EMR Development and Model
Integration
> Preliminary Report 2
(Completed)
P2: Wk10
Data Pre-Population in EMR and
Testing
> MVP (Deprioritized)
P2: Wk11
Final Improvements
> Report 2 (In Progress)
P1: Wk1
Planning, Data Gathering
and Initial EDA
> Proposal 1(Completed)
MVP: Minimum Viable Product
Takeaways
Lessons Learned
●Utility of multistage predictive models in healthcare
●Effectiveness of feature engineering and selection
●Impact of domain knowledge in model refinement
Future Work
●Database Integration
●Refining User Interface
●Exploring Advanced ML Techniques