Developing Distributed High-performance Computing Capabilities of an Open Science Platform for Robust Epidemic Analysis
globusonline
45 views
41 slides
May 30, 2024
Slide 1 of 41
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
About This Presentation
COVID-19 had an unprecedented impact on scientific collaboration. The pandemic and its broad response from the scientific community has forged new relationships among public health practitioners, mathematical modelers, and scientific computing specialists, while revealing critical gaps in exploiting...
COVID-19 had an unprecedented impact on scientific collaboration. The pandemic and its broad response from the scientific community has forged new relationships among public health practitioners, mathematical modelers, and scientific computing specialists, while revealing critical gaps in exploiting advanced computing systems to support urgent decision making. Informed by our team’s work in applying high-performance computing in support of public health decision makers during the COVID-19 pandemic, we present how Globus technologies are enabling the development of an open science platform for robust epidemic analysis, with the goal of collaborative, secure, distributed, on-demand, and fast time-to-solution analyses to support public health.
Size: 9.82 MB
Language: en
Added: May 30, 2024
Slides: 41 pages
Slide Content
DEVELOPING DISTRIBUTED
HIGH-PERFORMANCE
COMPUTING CAPABILITIES
OF AN OPEN SCIENCE
PLATFORM FOR ROBUST
EPIDEMIC ANALYSIS
(OSPREY)
Jonathan Ozik
1,2
Rapidly Evolving Policy and
Epidemiology Questions
What mitigation strategies
should be considered?
April May June July
CityCOVID model for Chicago, 2.7 M people
(agents) move hourly between 1.2 M locations
March
How will COVID-19
affect populations?
How does mobility affect
transmission?
How should we ease
mitigations?
How do behaviors affect
transmission?
Policy
What are place/occupation
based risks?
How do different age groups
behave differently?
How can we reopen
schools/universities?
Epidemiology
It’s not a matter of if, it’s a matter of
when, so can we make this routine?
2020
National Virtual Biotechnology Laboratory: Cross-DOE Lab Effort
7
Illinois Department of Public Health + Illinois Governor’s
COVID-19 Modeling Taskforce
8
●Four modeling groups met twice weekly, generating forecasts,
scenarios, analyses:
○University of Chicago (Sarah Cobey)
○Northwestern University (Jaline Gerardin)
○UIUC (Nigel Goldenfeld and Sergei Maslov)
○Argonne (Chick Macal and JO)
●Facilitated by the Chicago-based data analytics company Civis
Analytics
○Provided a secure data platform, including line-list data from IDPH
○CIVIS manually aggregated and curated surveillance data
○Modeling groups provided results, CIVIS aggregated them
■Hospitalizations, Deaths, R(t), Hosp. Capacity
○Meetings with IDPH, Governor’s office
■Regular epidemic updates and hospital capacity forecasts
■Emerging topics, e.g., sentinel surveillance
Many messy data streams
9
Hospital Occupancy
Percent capacity
Ventilators
https://www.anl.gov/dis/citycovid
https://www.anl.gov/dis/citycovid
https://www.anl.gov/dis/citycovid
Relaxing Protective Behaviors
Increasing Out of
Household Activities
0% 20%
0%
10%
Alpha Variant Becomes Dominant Strain Mid-April to June 1 for All Scenarios
VARIANT PREVALENCE
4.5% 7.6%
Lessons learned in the modeling and data space
▪Individual research groups generally worked independently to use HPC,
data management, ML/AI, and automation methods to develop, calibrate,
modify, verify, and validate their epidemiologic models.
?????? Large amounts of heroic, overlapping work
?????? Lacked robustness, security, scalability, or efficiency
▪Data were heterogeneous, changing, and incomplete requiring complex
integration across diverse and novel surveillance signals
?????? Created significant challenges for use in epidemiologic workflows
▪The Open Science Platform for Robust Epidemic analYsis (OSPREY)
seeks to lower the barriers to and automate epidemiologic model
analyses on HPC and cloud resources
14
Automation for Public Health
15
Requirements for An Open Science Platform for Robust
Epidemic Analysis
16
H. Pollack
IEEE Workshop on Parallel and Distributed Processing for Computational Social Systems
(ParSocial 2023), IPDPS Workshop, May 15-May 19 2023, St. Petersburg, Florida USA
OSPREY Goals and Requirements
▪Integrated, algorithm-driven
HPC workflows (Goal 1)
–Coordinated multi-resource task execution
•Epi-modeling needs a range of tasks to be
executed across distributed/heterogeneous
ecosystem resources
–Robust, secure, and automated access to
distributed HPC resources
•Wide range of access, authentication,
security protocols
–Scalable, fault-tolerant task execution
•Demand can vary dramatically over time
–Fast time-to-solution workflows
•Analyses need to provide actionable insights
quickly
–Multi-language workflows
•Need inclusive, multi-language APIs
–Efficient wide-area data staging
•Uniform access to epi-data and model
artifacts
17
▪Data ingestion, curation, and
management (Goal 2)
–Data stream ingestion
•Move and track data sets from origin to their site
of use
–Automated data curation
•Data analysis pipelines for data de-biasing, data
integration, uncertainty quantification, and
metadata and provenance tracking
–Managing algorithm and model artifacts
•Serve algorithm and model artifacts for, e.g., data
assimilation
▪Shared Development Environment for
rapid response and collaboration
(Goal 3)
•Model and workflow sharing
–Portable workflows that run on federated HPC
systems users have access
•Model validation and publishing
•Post complete models with the data used to
validate them for reproduction, extension, or
scaling by others
Canonical automated epidemiological analyses
18
We’ve developed automated data ingestion flows
Periodic trigger for data
retrieval
1. Fetch data from
remote source
Data ingestion flow
2. Input source
validation
3. Data transformation
Store transformed data
in GCS
Update metadata
database
Ingestion and curation flows:
•monitor remote sources (frequent/sporadic updates)
•run arbitrary validation and transformation steps (schema updates &
different schema between sources)
•persist data in long-term storage (data updates and deletions)
19
We’ve developed “real-time” data analysis
Data analysis flow:
Add new analysis
metadata, including
provenance
Publish analysis
Run analysis
Request data
Receive data
Get data location
Event-based triggers enable analysis of data as new data is added
20
Executed on:
Polaris
Scaling is achieved via decentralized computing and storage
Metadata Web portal and API policies
Externally-managed compute
and storage resources
Internally managed metadata
management and policies
Globus’ hybrid model enables “bring-your-own resources” architecture, ensuring
scalability as community of users grow
21
Periodic and event-based policies enable autonomous
science
Periodic
Event-based
Data portals
00:00 Monday
Trigger
condition met
Data ingestion
flow is
triggered
Dependencies met:
Hospitalization records,
location and
employment data have
all been collected
Analysis flow is triggered
Metadata
updated
with
outputs
Conditionally
triggered flow
based on
previous results
22
Making ingested data and model results FAIR
Findable Globus Search and Django
web portal
Accessible Globus and HTTPS
transfers enabled by GCS 5
InteroperableMetadata server to
standardize data
representation
Reusable Provenance capture stored
in metadata server
23
Privacy and Security
Decentralized resource approach gives users fine-grained
control:
-Only users authorized to access storage/GCS5 can retrieve
the data
-Only users authorized to use Globus Compute Endpoint can
run on endpoints
-Data owners may modify permissions at their own discretion
OSPREY API in the future may restrict access to metadata to
authorized users
24
Automation requires coordination through cloud services
Compute and storage resources
Automated Triggers and Actions
Arbitrary Policies and Actions
Metadata Management
Privacy and Security
Globus Connect Server
and Compute
Globus Flows and
Timers
Globus Compute
Globus Search and
metadata database
Globus Auth
25
EMEWS DB: On-Demand Distributed Analyses
26
Jupyter Notebook
R Markdown Notebook
or other “client” code
27
Globus Compute
endpoint on LCRC
Improv@ANL
EQSQL:
EMEWS Queues
for SQL
28
Globus Compute
endpoint on LCRC
Improv@ANL
EQSQL:
EMEWS Queues
for SQL
R reticulate library to
invoke GC via R API
29
EMEWS DB: On-Demand Distributed Analyses
EMEWS DB: Tracking Epidemics with Tasks and Futures
Day 20-33 Day 34-47
Funding Acknowledgements
▪NSF 2200234: PIPP Phase I: Robust Epidemic Surveillance and
Modeling (RESUME)
▪DOE Office of Science: National Virtual Biotechnology
Laboratory, a consortium of DOE national laboratories focused
on response to COVID-19, with funding provided by the
Coronavirus CARES Act
▪DOE ASCR BRaVE: Calibration Techniques and a
High-Performance Computing Workflow for Disease Models
▪France and Chicago Collaborating in the Sciences (FACCTS):
Developing Next Generation Tools for Large-scale
Computational Science
31
OSPREY: Chicago Department of Public Health
32
“Health departments, clinicians, and policy makers alike are proud to partner
with researchers like you who are designing cutting-edge computational
approaches…”
“CDPH has appreciated your willingness to provide model-based analyses in
a timely manner during this crisis to help our understanding of the
implications of the COVID-19 epidemic to our city…”
“CDPH has used your modeling way to explain outbreak modeling to
residents, government officials, and many other stakeholders… research that
has been invaluable to our citywide planning efforts and integral for decision
making.”
“Your work has the potential to make impacts well beyond COVID-19, to a
general rapid response scientific platform.”
Illinois Department of Public Health + Illinois Governor’s
COVID-19 Modeling Taskforce
35
●Four modeling groups met twice weekly, generating forecasts, scenarios,
analyses:
○University of Chicago (Sarah Cobey)
○Northwestern University (Jaline Gerardin)
○UIUC (Nigel Goldenfeld and Sergei Maslov)
○Argonne (Chick Macal and JO)
●Facilitated by the Chicago-based data analytics company Civis Analytics
○Provided a secure data platform, including line-list data from IDPH
○CIVIS manually aggregated and curated surveillance data
○Modeling groups provided results, CIVIS aggregated them
■Hospitalizations, Deaths, R(t), Hosp. Capacity
○Meetings with IDPH, Governor’s office
■Regular epidemic updates and hospital capacity forecasts
■Emerging topics, e.g., sentinel surveillance
▪This was the output (3-4 week
lookahead) that, after many
months of modeler and
stakeholder interactions, ended
up being most informative.
▪Had to do with the combination
of:
–How uncertainties propagate
for COVID-19 models
–How overflow decisions could
be made
Capturing output provenance and data
provenance
id int
function_id int
function_argsstring
description string
timer int
timer_job_id string
derived_fromSourceVersions
contributed_toOutput
Run analysis
Add / Update
metadata
Store outputs
on GCS5
36
Compute resources
Include polaris, chameleon, perlmutter
37
38
Policies for autonomous science
Types:
1.Timer-based policies
2.Lambda policies
Policy Enabled by
Periodic source retrieval Globus Timers
Flow execution when
dependencies are met
Globus Compute (User
lambda) + Database
update
Flow execution determined by
research output
Globus Compute (User
lambda) + database
update
39
Inside into
Data ingestion flow
Periodic trigger for
data retrieval
1. Fetch data from
remote source
Data ingestion flow
2. Input source
validation
3. Data
transformation
Store transformed
data in GCS
Update metadata
database
Requirements:
-Periodic fetching
40
Actions (AKA Globus Flows)
Fetch data
from source
Run
validation/transformation
UDFs
Update database +
GCS collection
Data ingestion User-defined analysis and/or
policy
**Note: Data ingestion output publishing should
be incorporated in UDF which is executed in
single-step flow
(Add examples of how
users can define and
launch actions?? )
41