Prof_Nisheeth_for_dedial_deep _learning.pptx

Digital public goods for AI in Health in India Nisheeth CSE,CGS,CDIS@IITK 12/17/2024 1

Failure mode: data leakage Epic Systems is one of the largest EHR vendors in the US They released a sepsis onset prediction model in 2017, which is deployed in hundreds of US hospitals A 2021 external validation study of the algorithm’s performance found It detects only 7% of sepsis cases It generated sepsis predictions for about 20% of all hospitalized patients Anatomy of a failure

Failure mode: data leakage The Epic sepsis model is proprietary, so nobody knew that they were using whether a doctor has prescribed antibiotics as a predictive feature in their model If a doctor is prescribing antibiotics, they are already suspecting sepsis So the model performed well in internal testing and pilot demonstrations on back-tested data But had low PPV in real-world settings Closed testing produces black swans

Trusting AI systems can be hard AI systems cannot be fully evaluated using conventional software engineering quality testing Governance of AI systems must recognize this basic fact AI systems must demonstrate that they do what they claim to do to show they are trustworthy Modern AI systems can sometimes fail in unexpected ways (Raji et al, 2022)

Governing misbehaving AI systems How could this have been avoided? Third-party testing But that is expensive, presupposes an ecosystem etc. Let's say standard ABC standardizes third-party testing procedures for a class of models Epic can choose to follow ABC, but how do vendors know they should follow ABC? How do vendors know they have correctly followed ABC? A regulator can require vendors delivering sepsis prediction models to follow ABC, but How does the regulator know which models, sectors or products should follow ABC? How do we trust AI systems?

Make AI in India: a short story Meet Dr. Sharma and her friend, they aim to develop a reliable AI model for heart failure prediction They obtain data from their state's premier superspeciality hospital They obtain some BIRAC funding, and start rolling

Initial success ... Heart Failure Dataset Model : Random Forest Results : Accuracy: 90%, ROC-AUC: 0.93, Cross-Validation Accuracy: 86.7% ± 4.3%.

... and scaling failure Dr. Sharma trained a model using the data she had, here is the model performance (tested on some unseen held-out data)

Classical solution: standardization and certification BIS could standardize third-party testing procedures for AI systems used to make healthcare decisions Hospital regulators could require that algorithms deployed in their hospitals meet this standard Some agency could certify that an algorithm meets this standard But AI systems quality testing is very challenging

The AI quality testing trilemma Testing for a variety of use cases centrally enhances reliability. But this requires a central private data repository, restricting openness Letting people test independently permits openness and enables coverage, but people are likely overfitting independently then Testing on publicly available benchmarks is open and reliable for data distributions consistent with existing benchmarks, but not for use cases outside the coverage of these benchmarks Trilemma: You can get any two, but not all three

The trilemma explained Open datasets for model testing quickly lose statistical validity Closed datasets for model testing fail to take long tail coverage scenarios into account Model providers have no incentive to perform true out-of-sample validation Model consumers frequently have no clear understanding of potential failure modes of complex AI models

The statistical validity problem (Ioannidis, 2005)

Positive Predictive Value The probability that an effect that is reported as statistically significant is true

Table. PPV of Research Findings for Various Combinations of Power (1 - ß), Ratio of True to Not-True Relationships (R), and Bias (u) Ioannidis JPA (2005) Why Most Published Research Findings Are False. PLOS Medicine 2(8): e124. https://doi.org/10.1371/journal.pmed.0020124 http://journals.plos.org/plosmedicine/article?id=10.1371/journal.pmed.0020124

A Chance Encounter Dr. Sharma and her friend meet Prof Srivastava at a conference, Prof. suggests a solution to their problem —a set of three AI DPGs addressing the AI quality testing trilemma.

Health information exchange consent management (HIECM) The architecture of the ABDM National Health Stack already has the concept of consent management built-in Patients can provide consent tokens to registered medical entities to review their electronic health records held by other medical entities We propose to extend the concept of consent management from clinical settings to research settings Electronic unbundling of HER Informed consent for medical data usage for research purposes Provable differential privacy guarantees for anonymization Citizens

Quality preserving databases (QPD) A QPD is a management layer on top of a public research database that Requires users to specify test characteristics Requires a manager to allocate the significance level for the test Requires user to compensate for dataset usage in the form of additional data samples to restore statistical value of a dataset commensurate to that lost by their testing In theory, a QPD Can serve an infinite series of requests Satisfies fairness and stability requirements Controls statistical validity levels using alpha-investing Model Providers

An overview of the framework… FedClient : Supports data pre-processing and model training on private data FedServer : Responsible for learning global model and benchmarking it on the held-out data

Upload and Pre-process your data -One can view test data format on the server, then pre-process the private data as needed

1. 2. Register as a Client Select the Model

Define Model Config and request training Based on model config and one’s claims, server will calculate the price of the training (in terms of data points) and one has to pay that price to start the training Other registered Clients can see this request and participate in the training

Okay.. I’ll start by preprocessing our data Lets start training our Heart failure prediction model

Overall Training Steps

After Training… FedServer will update the benchmarks if the recent model beats the previous benchmark This benchmarking is trusted and publicly available. After benchmarking is done, data points equivalent to the training cost will be traded from the Client. *this image is not exactly from our framework, we are currently developing this feature

The Model learned after training

The Model learned after training No data is shared between stakeholders Model performance improves Third party testing Public benchmarks on a statistically robust private dataset

Solving the quality testing trilemma

Trilemma resolved with AI DPGs There is a central private test set, but it is open to testing, while retaining statistical validity using QPD Leveraging ABDM enables access to data across a nationwide context, through HIECM. This provides rich and comprehensive data access. The OBP enables trustworthy testing and benchmarking, simultaneously serving regulatory and quality assurance purposes

Key partners

Moving forward We are thinking about how to enable asymmetric federated learning on our platform Also adding model pipelines and datasets to the platform Lots of scope for digital forensics and authentication tools We're working with NHA to define a stage-wise deployment plan Inputs from other institutions extremely welcome at this stage We need beta-testers willing to play with models and datasets on our platform We will be launching it as part of an AI for health hackathon during Techkriti@IITK People interested in testing earlier are welcome to contact me

Questions? CommentS? www.iitk.ac.in/cdis [email protected]

Prof_Nisheeth_for_dedial_deep _learning.pptx

About This Presentation

Slide Content

Tags

Categories

Download

Quick Actions

Statistics

Related Slideshows

Prof_Nisheeth_for_dedial_deep _learning.pptx

About This Presentation

Slide Content

Slide 1

Slide 2

Slide 3

Slide 4

Slide 5

Slide 6

Slide 7

Slide 8

Slide 9

Slide 10

Slide 11

Slide 12

Slide 13

Slide 14

Slide 15

Slide 16

Slide 17

Slide 18

Slide 19

Slide 20

Slide 21

Slide 22

Slide 23

Slide 24

Slide 25

Slide 26

Slide 27

Slide 28

Slide 29

Slide 30

Slide 31

Tags

Categories

Download

Quick Actions

Statistics

Related Slideshows

8-top-ai-courses-for-customer-support-representatives-in-2025.pptx

7-essential-ai-courses-for-call-center-supervisors-in-2025.pptx

25-essential-ai-courses-for-user-support-specialists-in-2025.pptx

8-essential-ai-courses-for-insurance-customer-service-representatives-in-2025.pptx

Know for Certain

PPT OPD LES 3ertt4t4tqqqe23e3e3rq2qq232.pptx