The Department of Energy's Integrated Research Infrastructure (IRI)
globusonline
62 views
36 slides
May 30, 2024
Slide 1 of 36
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
About This Presentation
We will provide an overview of DOE’s IRI initiative as it moves into early implementation, what drives the IRI vision, and the role of DOE in the larger national research ecosystem.
Size: 8 MB
Language: en
Added: May 30, 2024
Slides: 36 pages
Slide Content
The Department of Energy’s
Integrated Research Infrastructure (IRI)
GlobusWorld, Chicago, Illinois
Ben Brown, Director, Facilities Division, ASCR
May 7, 2024
Energy/gov/science
Energy.gov/science
The imperative of integration
DOE’s Research Infrastructure
Integration … what does it really mean?
IRI: is it just the Grid, or something more?
Researchers and their data
IRI = Integrated Research Infrastructure
Computing = high performance computing, data, and networking
Energy.gov/scienceEnergy.gov/science
Our Mission:
Deliverscientific discoveries
and major scientific tools to
transform our understanding
of nature and advance the
energy, economic, and
national security of the
United States.
More than 34,000 researchers
supported at more than300
institutions and 17 DOE
national laboratories
More than 39,500
users of 28 Office of
Science scientific
user facilities
$8.24B
FY 2024 enacted
Steward 10 of the
17 DOE national laboratories
3
4
Energy.gov/science
DOE is a system of independent national laboratories
It is one of the happy incidents of the federal system that
a single courageous state may, if its citizens choose,
serve as a laboratory; and try novel social and economic
experiments without risk to the rest of the country.
Justice Louis D. Brandeis, 1932
Energy.gov/science
DOE is a
laboratory of
laboratories
Integration challenges
us to meld ideas,
practices, and even
cultures.
Energy.gov/science
ASCR Facilities: Major systems
The ASCR Facilities enterprise thrives in partnerships that
accelerate discovery and innovation.
Argonne Leadership Computing Facility
Energy Sciences NetworkOak Ridge Leadership Computing Facility
National Energy Research Scientific Computing Center
High Performance Data Facility
Energy.gov/science
9
$41.6M today
Integration origins: The Office of Energy Research created ESnet and NERSC to
democratize the National Laboratories’ access to HPC
Energy.gov/scienceEnergy.gov/science
The ASCR Facilities are Scientific User Facilities
FY 2023
28 scientific
user facilities
>37,000 users OLCF ALCF NERSC ESnet EMSL
ARM JGI SNS HFIR ALS APS
LCLS NSLS-II SSRL CFN CINT CNM
CNMS TMF DIII-D NSTX-U FACET ATF
Fermilab AC CEBAF ATLAS RHIC FRIB
10
11
IRI
Energy.gov/science12
Energy.gov/science
Energy.gov/science
Energy.gov/science
LCLS-II First Light
16
September 13: First light of LCLS- II at
SLAC National Accelerator Laboratory
17
Integrated Research Infrastructure
The double meaning of IRI
Integrated Research Infrastructure
Energy.gov/science
Linking distributed resources is becoming paramount to
modern collaborative science, to integrated science.
Accelerating discovery & innovation
Democratizing access
Drawing new talent
Advancing open science
The challenges of our time call upon DOE and its national
laboratories to be an open innovation ecosystem.
18
Energy.gov/science
DOE’s Integrated Research Infrastructure (IRI) Vision:
To empower researchers to meld DOE’s world-class research tools, infrastructure, and user facilities seamlessly and
securely in novel ways to radically accelerate discovery and innovation
New modes of
integrated science
Researchers
Edge
Sensors
Computing
Testbeds
Experimental and Observational
User Facilities Advanced
Computing
Advanced
Networking
AI Tools
Digital Twins
High Performance
Data Facility
Cloud
Computing
Software
Data Management
Data Repositories
PuRE Data Assets
AI-enabled insight from
integrating vast data sources
Rapid data analysis and
steering of experiments
Novel workflows using
multiple user facilities
Software and
Applications
Local
Campus
Computing
19
Energy.gov/science
DOE’s Integrated Research Infrastructure (IRI) Vision:
To empower researchers to meld DOE’s world-class research tools, infrastructure, and user facilities seamlessly and
securely in novel ways to radically accelerate discovery and innovation
20
New modes of
integrated science
Researchers
Edge
Sensors
Computing
Testbeds
Experimental and Observational
User Facilities Advanced
Computing
Advanced
Networking
AI Tools
Digital Twins
High Performance
Data Facility
Cloud
Computing
Software
Data Management
Data Repositories
PuRE Data Assets
AI-enabled insight from
integrating vast data sources
Rapid data analysis and
steering of experiments
Novel workflows using
multiple user facilities
Software and
Applications
Local
Campus
Computing
The IRI Vision:
It’s about empowering people.
It’s about data .
Energy.gov/science
The IRI Architecture Blueprint Activity
established a framework for serious planning
Download link
Energy.gov/science
The IRI Blueprint Activity created a framework for IRI implementation
User experience practice will ensure relentless attention to user
perspectives and needs through requirements gathering, user-
centric (co)-design, continuous feedback, and other means.
Resource co-operations practice is focused on creating new modes
of cooperation, collaboration, co- scheduling, and joint planning
across facilities and DOE programs.
Cybersecurity and federated access practice is focused on
creating novel solutions that enable seamless scientific collaboration
within a secure and trusted IRI ecosystem.
Workflows, interfaces, and automation practice is focused on
creating novel solutions that facilitate the dynamic assembly of
components across facilities into end-to-end IRI pipelines.
Scientific data life cycle practice is focused on ensuring that users
can manage their data and metadata across facilities from inception
to curation, archiving, dissemination, and publication.
Portable/scalable solutions practice is focused on ensuring that
transitions can be made across heterogeneous facilities (portability)
and from smaller to larger resources (scalability).
Time-sensitive pattern has urgency,
requiring real-time or end-to-end
performance with high reliability, e.g., for
timely decision-making, experiment
steering, and virtual proximity.
Data integration-intensive pattern
requires combining and analyzing data
from multiple sources, e.g., sites,
experiments, and/or computational runs.
Long-term campaign pattern requires
sustained access to resources over a long
period to accomplish a well-defined
objective.
IRI Science Patterns (3) IRI Practice Areas (6)
Convened over 150 DOE national laboratory experts from all 28 SC
user facilities across 13 national laboratories to consider the
technological, policy, and sociological challenges to implementing IRI.
22
Energy.gov/science
Cross- facility partnerships are yielding early results
LBNL’s Superfacility project, ORNL’s INTERSECT project, and ANL’s NEXUS project, and a several other collaborations, are
active incubators for IRI design patterns. Here are a few cherry-picked highlights from the Supercomputing 23 conference
(November 12- 17, 2023 in Denver):
FES: DIII-D user facility
•Has worked with ALCF to run post-shot analysis on Polaris at 16X the prior resolution and a completed the analysis between shots,
allowing the analysis result to be considered with every shot instead of every other shot.
•Has worked with NERSC to automate rapid plasma state reconstruction on Perlmutter. Previously these reconstructions were handcrafted
with 4,000 produced in the 15 years between 2008- 22; they created over 20,000 automated reconstructions in the first 6 months.
BES: x-ray light sources
•LCLS is streaming data to NERSC (Perlmutter) and OLCF (Frontier) via ESnet to achieve wall clock speedups of data analysis; what would have
taken ~ 30 minutes at LCLS is now reduced to 5 minutes, fast enough to make adjustments between experiments.
•APS has worked with ALCF and has multiple beamlines running analyses in production on Polaris: X-ray Photon Correlation Spectroscopy, Laue
Depth reconstructions, X-ray Ptychography, High- Energy Diffraction Microscopy.
BES: electron microscopy
•The National Center for Electron Microscopy at the Molecular Foundry regularlystreamsdata from theirhigh-resolution electron microscope
directly into NERSC's compute nodes for immediate processing; this process is 14x faster than previous file transfer methods with a more
consistent transfer time.
Energy.gov/science
IRI Program value propositions (authored by the SC IRI Coordination Group)
For the taxpayer, for all of us:
Achieve greater productivity and avoid duplication of effort.
For the researcher:
Achieve transformational reduction in time to insight and complexity.
For program/RI/institutional leaders:
Achieve greater effectiveness and efficiency in coordinating efforts;
Achieve more nimble solutions than would be possible alone;
Gain leverage with partners who possess like requirements;
Avoid single points of failure; and
Gain access to expertise and shared experience.
24
Energy.gov/science
Vision StrategyImplement
Timeline of IRI Program Development
Jan 2020 Jan 2024Jan 2022 Jan 2023
FY 2024 PBR advances IRI and the
High Performance Data Facility
SC IRI Blueprint Activity launch
IRI Blueprint Activity results
FY 2021 President’s Budget Request
includes Integrated Computation
and Data Infrastructure Initiative
ASCR IRI Task Force launch
Jan 2021
ASCR IRI Task Force report
IRI Program Development
HPDF
Selection
25
GO
Standup of the IRI Program is a DOE FY24- 25 Agency Priority Goal
Energy.gov/science
1.Invest in IRI foundational infrastructure
2.Stand up the IRI Program governance and FY24 workstreams
3.Bring IRI projects into formal coordination
4.Deploy an IRI Science Testbed across the ASCR Facilities
These are all connected.
These are each essential.
IRI Program launch is a DOE FY24- 25 Agency Priority Goal.
ASCR is implementing IRI through these four major elements.
1
2
3
4
26
HPDF: A Brief Overview
•First-of-its-kind DOE Office of Science user facility
•Distributed operations model will be essential to long-term success and
required performance levels
•Project structure integrated with JLab and LBNL staff
HPDF: Meeting the Greatest Needs
The DOE envisions arevolutionary ecosystem – the
Integrated Research Infrastructure – to deliver seamless,
secureinteroperability across National Laboratory facilities
The 2023 IRI Architecture Blueprint Activity identified three
broad science patterns that demand research infrastructure
interoperability:
•Time-sensitive patterns
•Data-integration-intensive patterns
•Long-term campaign patterns
HPDF will enable analysis, preservation, and accessibility to the staggering
amounts of experimental data produced by SC facilities
Our mission:
To enable
andaccelerate
scientific
discovery
by delivering
state-of-the-art
data
management
infrastructure,
capabilities,
and tools
Data science requires curated and annotated data that adheres to
FAIR principles, and data reuse will be an HPDF metric. Office of
Scientific and Technical Information services will complement HPDF
to provide full life cycle coverage.
Flexible & Full Life Cycle Coverage
•Management– A dynamic and scalable data
managementinfrastructure integrated with
the DOE computing ecosystem
•Capture– Dynamically allocatable data storage
and edgecomputing at the point of
generation
•Staging– Dynamic placement of data in
proximity to appropriatecomputing for
reduction, analysis, and processing
•Archiving– Extreme-scale distributed archiving
and cataloging ofdata with FAIR principles –
findability, accessibility, interoperability, and
reusability
•Processing– Resources for workflow and
automation forprocessing and analyses
of data at scale
Preserve
Long-term data
curation and archival
Transfer
Move and manage data
and dynamic data
streams
Publish
Fulfill FAIR principles for scientific data
Clean and Process
Scalable scientific and AI/ML workflows
Acquire and Prepare
Ingest experimental and observational and reprocessed data using standard APIs
User Facilities
Science
Gateways Raw &
Derived Data
Hub
Analyze
Share
Refine
Release
Tiered
Storage
Compute
ESnet
Hub & Spoke Architecture: Seamless Service
•HPDF distributed infrastructurewill
be designed to maximizeplanned
availability and resilience
•Resources will be optimizedfor
data processing and analysis
•Partnering with spoke siteswill provide
seamless datalife cycle services to
scientific users worldwide
•Working with IRI ensuresa secure,
high-performance mesh data fabric that
enables data and workloads to flow
freelyamong the hub, spokes, and HPC
facilities
•Pilot activities and partnerships will help
refine the design ashub software and
hardware technology evolve
Community Structure
How strongly governed or united a community is
around a set of policies or goals for its data products
Organizational Structure
How the organization or institution is designed to
support their user community’s full data life cycle.
Funding Model(s)
How the spoke is funded, for what lifespan, and how end users are supported (sub- awards, allocations,
etc.) to leverage its resources
Size
The size of a particular spoke will be shaped by the
confluence of anticipated user base, data volume
and velocity, and resources (staff, compute)
Data/Compute Resources
The types and extent of technical functionality a spoke supports for its user community.
Facets of Spoke Design
Energy.gov/science
Quotes from participants at one of the last Exascale Computing Project
meetings, reflecting on the journey (with some paraphrasing)
“Integration is not optional anymore.”
“ECP was the time to challenge assumptions … [and embark on a] holistic rethinking and
restructuring.”
“Be technically ambitious.”
“Dare to try, no matter what, because business as usual is almost guaranteed to fail.”
“You have to build not just the software, but also the communities around the software.”
“Invest seriously in automation.”
“For a scientist, code is not their main focus; it is a tool…. But nobody wants their code to
break.”
“In order to make progress, we developers have to be able to drop [support for old things].”
Energy.gov/science
Summing up where we stand today with IRI
IRI is envisioned as a long-term program, with a formal governance structure, to frame rich
partnerships in a seamless computing and data ecosystem for researchers.
The ASCR Facilities (ALCF, ESnet, NERSC, OLCF) are nucleating the IRI governance. Globus is
an important partner.
HPDF is a new major project to create a new DOE User Facility with a budget of $300M; the
project is just getting going.
In a deep sense, IRI is about creating – but not inventing from scratch –
a software (and middleware) ecosystem.
Leverage and lessons learned from well-executed and well-stewarded software and
middleware (like Globus!) are essential to developing a robust IRI.
Software is infrastructure!
33
Software is infrastructure!
34
The dawn of the AI era.
The dawn of the nuclear era.
understand risk
harness potential
Energy.gov/science
Eras of DOE: the era of integrated research is now
35