Developing Distributed High-performance Computing Capabilities of an Open Science Platform for Robust Epidemic Analysis

globusonline 45 views 41 slides May 30, 2024
Slide 1
Slide 1 of 41
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27
Slide 28
28
Slide 29
29
Slide 30
30
Slide 31
31
Slide 32
32
Slide 33
33
Slide 34
34
Slide 35
35
Slide 36
36
Slide 37
37
Slide 38
38
Slide 39
39
Slide 40
40
Slide 41
41

About This Presentation

COVID-19 had an unprecedented impact on scientific collaboration. The pandemic and its broad response from the scientific community has forged new relationships among public health practitioners, mathematical modelers, and scientific computing specialists, while revealing critical gaps in exploiting...


Slide Content

DEVELOPING DISTRIBUTED
HIGH-PERFORMANCE
COMPUTING CAPABILITIES
OF AN OPEN SCIENCE
PLATFORM FOR ROBUST
EPIDEMIC ANALYSIS
(OSPREY)
Jonathan Ozik
1,2

Work with: Valerie Hayot-Sasson
1,2
, Nick Collier
1,2
, Justin M. Wozniak
1,2
, Abby
Stevens
1,2
, Yadu Babuji
1,2
, Mickaël Binois
3
, Arindam Fadikar
1,2
, Alexandra Würth
3
,
Kyle Chard
1,2
, Chick Macal
1,2

1. Argonne, 2. UChicago, 3. Inria

2
https://www.chicagomag.com/chicago-magazine/march-
2021/67-days-to-lockdown/

3

4
https://www.chicagomag.com/chicago-magazine/march-
2021/67-days-to-lockdown/

5
https://www.chicagomag.com/chicago-magazine/march-
2021/67-days-to-lockdown/

Rapidly Evolving Policy and
Epidemiology Questions
What mitigation strategies
should be considered?
April May June July
CityCOVID model for Chicago, 2.7 M people
(agents) move hourly between 1.2 M locations
March
How will COVID-19
affect populations?
How does mobility affect
transmission?
How should we ease
mitigations?
How do behaviors affect
transmission?
Policy
What are place/occupation
based risks?
How do different age groups
behave differently?
How can we reopen
schools/universities?
Epidemiology
It’s not a matter of if, it’s a matter of
when, so can we make this routine?
2020

National Virtual Biotechnology Laboratory: Cross-DOE Lab Effort
7

Illinois Department of Public Health + Illinois Governor’s
COVID-19 Modeling Taskforce
8
●Four modeling groups met twice weekly, generating forecasts,
scenarios, analyses:
○University of Chicago (Sarah Cobey)
○Northwestern University (Jaline Gerardin)
○UIUC (Nigel Goldenfeld and Sergei Maslov)
○Argonne (Chick Macal and JO)
●Facilitated by the Chicago-based data analytics company Civis
Analytics
○Provided a secure data platform, including line-list data from IDPH
○CIVIS manually aggregated and curated surveillance data
○Modeling groups provided results, CIVIS aggregated them
■Hospitalizations, Deaths, R(t), Hosp. Capacity
○Meetings with IDPH, Governor’s office
■Regular epidemic updates and hospital capacity forecasts
■Emerging topics, e.g., sentinel surveillance

Many messy data streams
9
Hospital Occupancy
Percent capacity
Ventilators

https://www.anl.gov/dis/citycovid

https://www.anl.gov/dis/citycovid

https://www.anl.gov/dis/citycovid

Relaxing Protective Behaviors
Increasing Out of
Household Activities
0% 20%
0%
10%
Alpha Variant Becomes Dominant Strain Mid-April to June 1 for All Scenarios
VARIANT PREVALENCE
4.5% 7.6%

Lessons learned in the modeling and data space
▪Individual research groups generally worked independently to use HPC,
data management, ML/AI, and automation methods to develop, calibrate,
modify, verify, and validate their epidemiologic models.
?????? Large amounts of heroic, overlapping work
?????? Lacked robustness, security, scalability, or efficiency
▪Data were heterogeneous, changing, and incomplete requiring complex
integration across diverse and novel surveillance signals
?????? Created significant challenges for use in epidemiologic workflows
▪The Open Science Platform for Robust Epidemic analYsis (OSPREY)
seeks to lower the barriers to and automate epidemiologic model
analyses on HPC and cloud resources
14

Automation for Public Health
15

Requirements for An Open Science Platform for Robust
Epidemic Analysis
16
H. Pollack
IEEE Workshop on Parallel and Distributed Processing for Computational Social Systems
(ParSocial 2023), IPDPS Workshop, May 15-May 19 2023, St. Petersburg, Florida USA

OSPREY Goals and Requirements
▪Integrated, algorithm-driven
HPC workflows (Goal 1)
–Coordinated multi-resource task execution
•Epi-modeling needs a range of tasks to be
executed across distributed/heterogeneous
ecosystem resources
–Robust, secure, and automated access to
distributed HPC resources
•Wide range of access, authentication,
security protocols
–Scalable, fault-tolerant task execution
•Demand can vary dramatically over time
–Fast time-to-solution workflows
•Analyses need to provide actionable insights
quickly
–Multi-language workflows
•Need inclusive, multi-language APIs
–Efficient wide-area data staging
•Uniform access to epi-data and model
artifacts
17
▪Data ingestion, curation, and
management (Goal 2)
–Data stream ingestion
•Move and track data sets from origin to their site
of use
–Automated data curation
•Data analysis pipelines for data de-biasing, data
integration, uncertainty quantification, and
metadata and provenance tracking
–Managing algorithm and model artifacts
•Serve algorithm and model artifacts for, e.g., data
assimilation
▪Shared Development Environment for
rapid response and collaboration
(Goal 3)
•Model and workflow sharing
–Portable workflows that run on federated HPC
systems users have access
•Model validation and publishing
•Post complete models with the data used to
validate them for reproduction, extension, or
scaling by others

Canonical automated epidemiological analyses
18

We’ve developed automated data ingestion flows

Periodic trigger for data
retrieval
1. Fetch data from
remote source
Data ingestion flow
2. Input source
validation
3. Data transformation
Store transformed data
in GCS
Update metadata
database
Ingestion and curation flows:
•monitor remote sources (frequent/sporadic updates)
•run arbitrary validation and transformation steps (schema updates &
different schema between sources)
•persist data in long-term storage (data updates and deletions)
19

We’ve developed “real-time” data analysis
Data analysis flow:




Add new analysis
metadata, including
provenance
Publish analysis
Run analysis
Request data
Receive data

Get data location

Event-based triggers enable analysis of data as new data is added
20
Executed on:
Polaris

Scaling is achieved via decentralized computing and storage
Metadata Web portal and API policies
Externally-managed compute
and storage resources
Internally managed metadata
management and policies
Globus’ hybrid model enables “bring-your-own resources” architecture, ensuring
scalability as community of users grow
21

Periodic and event-based policies enable autonomous
science
Periodic
Event-based
Data portals
00:00 Monday
Trigger
condition met
Data ingestion
flow is
triggered
Dependencies met:
Hospitalization records,
location and
employment data have
all been collected
Analysis flow is triggered
Metadata
updated
with
outputs
Conditionally
triggered flow
based on
previous results
22

Making ingested data and model results FAIR
Findable Globus Search and Django
web portal
Accessible Globus and HTTPS
transfers enabled by GCS 5
InteroperableMetadata server to
standardize data
representation
Reusable Provenance capture stored
in metadata server
23

Privacy and Security
Decentralized resource approach gives users fine-grained
control:
-Only users authorized to access storage/GCS5 can retrieve
the data
-Only users authorized to use Globus Compute Endpoint can
run on endpoints
-Data owners may modify permissions at their own discretion

OSPREY API in the future may restrict access to metadata to
authorized users
24

Automation requires coordination through cloud services
Compute and storage resources
Automated Triggers and Actions
Arbitrary Policies and Actions
Metadata Management
Privacy and Security
Globus Connect Server
and Compute
Globus Flows and
Timers
Globus Compute
Globus Search and
metadata database
Globus Auth
25

EMEWS DB: On-Demand Distributed Analyses
26
Jupyter Notebook
R Markdown Notebook
or other “client” code

27
Globus Compute
endpoint on LCRC
Improv@ANL
EQSQL:
EMEWS Queues
for SQL

28
Globus Compute
endpoint on LCRC
Improv@ANL
EQSQL:
EMEWS Queues
for SQL
R reticulate library to
invoke GC via R API

29
EMEWS DB: On-Demand Distributed Analyses

EMEWS DB: Tracking Epidemics with Tasks and Futures
Day 20-33 Day 34-47

Funding Acknowledgements
▪NSF 2200234: PIPP Phase I: Robust Epidemic Surveillance and
Modeling (RESUME)
▪DOE Office of Science: National Virtual Biotechnology
Laboratory, a consortium of DOE national laboratories focused
on response to COVID-19, with funding provided by the
Coronavirus CARES Act
▪DOE ASCR BRaVE: Calibration Techniques and a
High-Performance Computing Workflow for Disease Models
▪France and Chicago Collaborating in the Sciences (FACCTS):
Developing Next Generation Tools for Large-scale
Computational Science
31

OSPREY: Chicago Department of Public Health
32
“Health departments, clinicians, and policy makers alike are proud to partner
with researchers like you who are designing cutting-edge computational
approaches…”

“CDPH has appreciated your willingness to provide model-based analyses in
a timely manner during this crisis to help our understanding of the
implications of the COVID-19 epidemic to our city…”

“CDPH has used your modeling way to explain outbreak modeling to
residents, government officials, and many other stakeholders… research that
has been invaluable to our citywide planning efforts and integral for decision
making.”

“Your work has the potential to make impacts well beyond COVID-19, to a
general rapid response scientific platform.”

Actions via Globus Flows
Data ingestion
User-defined analysis
$ dsaas create [-h] -n NAME -u URL [-t TIMER] [-d DESCRIPTION] [-v VERIFIER] [-m MODIFIER] -e
EMAIL
$ dsaas register [-h] [-e ENDPOINT_UUID] [-f FUNCTION_UUID] [-s SOURCE_ID=VERSION_ID
[SOURCE_ID=VERSION_ID ...]] [-k KEY=VALUE [KEY=VALUE ...] | -c CONFIG] [-d DESCRIPTION]
Flow registration
33

Using DSaaS in your code
34

Illinois Department of Public Health + Illinois Governor’s
COVID-19 Modeling Taskforce
35
●Four modeling groups met twice weekly, generating forecasts, scenarios,
analyses:
○University of Chicago (Sarah Cobey)
○Northwestern University (Jaline Gerardin)
○UIUC (Nigel Goldenfeld and Sergei Maslov)
○Argonne (Chick Macal and JO)
●Facilitated by the Chicago-based data analytics company Civis Analytics
○Provided a secure data platform, including line-list data from IDPH
○CIVIS manually aggregated and curated surveillance data
○Modeling groups provided results, CIVIS aggregated them
■Hospitalizations, Deaths, R(t), Hosp. Capacity
○Meetings with IDPH, Governor’s office
■Regular epidemic updates and hospital capacity forecasts
■Emerging topics, e.g., sentinel surveillance

▪This was the output (3-4 week
lookahead) that, after many
months of modeler and
stakeholder interactions, ended
up being most informative.
▪Had to do with the combination
of:
–How uncertainties propagate
for COVID-19 models
–How overflow decisions could
be made

Capturing output provenance and data
provenance
id int
function_id int
function_argsstring
description string
timer int
timer_job_id string
derived_fromSourceVersions
contributed_toOutput
Run analysis
Add / Update
metadata
Store outputs
on GCS5
36

Compute resources
Include polaris, chameleon, perlmutter
37

38

Policies for autonomous science
Types:

1.Timer-based policies

2.Lambda policies

Policy Enabled by
Periodic source retrieval Globus Timers
Flow execution when
dependencies are met
Globus Compute (User
lambda) + Database
update
Flow execution determined by
research output
Globus Compute (User
lambda) + database
update
39

Inside into
Data ingestion flow




Periodic trigger for
data retrieval
1. Fetch data from
remote source
Data ingestion flow
2. Input source
validation
3. Data
transformation
Store transformed
data in GCS
Update metadata
database
Requirements:
-Periodic fetching
40

Actions (AKA Globus Flows)
Fetch data
from source
Run
validation/transformation
UDFs
Update database +
GCS collection
Data ingestion User-defined analysis and/or
policy
**Note: Data ingestion output publishing should
be incorporated in UDF which is executed in
single-step flow
(Add examples of how
users can define and
launch actions?? )
41