1. A Predictive Diagnosis Assistant in an Electronic Medical Records Platform

FaithNassiwa1 30 views 16 slides Jun 29, 2024
Slide 1
Slide 1 of 16
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16

About This Presentation

This presentation addresses the critical issue of misdiagnosis in healthcare through the development of a Predictive Diagnosis Assistant (PDA). Utilizing the DDXPlus dataset developed by Tchango et al., which includes over 1.2 million records covering 49 diseases and 219 features, this research focu...


Slide Content

A Predictive Diagnosis Assistant in
an Electronic Medical Records
Platform
DS5500: Phase 2 Presentation

Team 14: Sri Charitha Narra, Graziano Peregrino Cezario, Kelly Uwase Rubangura, Faith
Nassiwa

Github: https://github.com/faithNassiwa/predictive-diagnosis-assistant

Agenda

●Background
●Goal and Objectives
●Dataset Overview
●Feature Selection and Modeling
●User Interface Build
●Database Modeling
●Takeaways

Background

Misdiagnosis in healthcare is a critical issue, leading to
patient harm and significant financial waste, with nearly
25% of expenditures going to low-value care costing up
to $101 billion annually.




Delayed
TreatmentInappropriate
Lab Test /
Results
Diagnostic Errors

Project Goal
Our solution, a machine learning(ML) based
Predictive Diagnosis Assistant, aims to minimize
diagnostic errors by analyzing past accurately
diagnosed patient records to provide potential
conditions and enhance treatment plans.

DDXPlus Dataset Overview
Data Summary











Data Source: Extracted from the article “DDXPlus: new dataset
for medical diagnosis. Tchango et al. 2022”


Type DescriptionCounts
Patient RecordsTrain & Test1,025,602 &
134,526
Features Categorial &
Boolean
10 & 209
Diseases Categorical49

DDXPlus EDA
10 most frequent evidences
● Do you have pain somewhere, related to your
consulting?
● Are you experiencing shortness of breath or
difficulty breathing in a significant way?
● Do you have a cough?
● Do you smoke cigarettes?
● Do you have a fever (either felt or measured
with a thermometer)?
● Do you drink alcohol excessively or do you
have an addiction to alcohol?
● Are you feeling nauseous or do you feel like
vomiting?
● Do you have asthma or have you ever had to
use a bronchodilator in the past?
● Are you significantly overweight compared to
people of the same height as you?

Feature Selection & Modeling
Feature Selection Techniques
●Correlation Analysis (CA)
●Feature Importance (FI)
●Chi-Square
Feature Extraction
●Principal Components Analysis(PCA)
Modeling - XGBoost
●Iterative Modeling - Train and evaluate performance with varying feature selections
●Cross Validation(CV) - Ensure reproducibility of evaluation results

Modeling Evaluation Results
Feature Selection No. of FeaturesAccuracy ROC-AUC
49 Diseases, FI at 0.995 - CV 102 93% 0.99
Top 10 Diseases, FI >0.005 - CV 20 91% 0.99
Top 10 Diseases, PCA 83 96.85% 0.99
49 Diseases, Chi square 100 90.3% 0.99
49 Diseases, Chi square 50 88.51% 0.50
49 Diseases, FI at 0.95 - CV 58 66.74% 0.51

Model Discussion
Multi-stage / model
approach
●Model 1: Trained on a
subset of data, Top 10
Diseases and 20
Important Features

●Model 2: Trained on a
subset of data, all 49
Diseases and 102
Important Features

Home Page: Model 1(20 MIF Predictions) & Model 2(102 MIF Predictions) Prediction Results for 20 Test Patients

EMR Platform Development
Integrated Pre-Trained Models
Xgboost_10, xgboost_49
Subset Test Data
1000 Test Patients
Streamlit Pages
Home, Predict_10 and Predict_49
03
01 02

EMR Frontend Development
Pages:
-Home
-Predict 10
-Predict 49

Database
Modeling -ERD
ERD: Entity Relation Diagram

DB Discussion
Why?
●Scalability
●Data retrieval and querying
●Reporting and analysis
●Report issues

Questions:-
●Do you have swollen or painful lymph nodes?
●Are you taking any new oral anticoagulants ?
●Are you immunosuppressed?
●Do you have heart failure
Questions represented by the table patient_id 112385

Overall Project Timeline and Milestones
P1: Wk3
Dataset 1:
Preprocessing and
Modelling
> Preliminary Report
1(completed)
P1: Wk6
Dataset 2: Preprocessing,
Modelling and Model Selection
> Report 1 (Completed)
P2: Wk8
EMR Development and Model
Integration
> Preliminary Report 2
(Completed)
P2: Wk10
Data Pre-Population in EMR and
Testing
> MVP (Deprioritized)
P2: Wk11
Final Improvements
> Report 2 (In Progress)
P1: Wk1

Planning, Data Gathering
and Initial EDA
> Proposal 1(Completed)

MVP: Minimum Viable Product

Takeaways
Lessons Learned
●Utility of multistage predictive models in healthcare
●Effectiveness of feature engineering and selection
●Impact of domain knowledge in model refinement

Future Work
●Database Integration
●Refining User Interface
●Exploring Advanced ML Techniques

Thank You