Climate Science Flows - Enabling Petabyte-Scale Climate Analysis with the Earth System Grid Federation and Globus Flows.pdf
globusonline
8 views
8 slides
May 29, 2024
Slide 1 of 8
1
2
3
4
5
6
7
8
About This Presentation
The Earth System Grid Federation (ESGF) is a global network of data servers that archives and distributes the planet’s largest collection of Earth system model output for thousands of climate and environmental scientists worldwide. Many of these petabyte-scale data archives are located in proximit...
The Earth System Grid Federation (ESGF) is a global network of data servers that archives and distributes the planet’s largest collection of Earth system model output for thousands of climate and environmental scientists worldwide. Many of these petabyte-scale data archives are located in proximity to large high-performance computing (HPC) or cloud computing resources, but the primary workflow for data users consists of transferring data, and applying computations on a different system. As a part of the ESGF 2.0 US project (funded by the United States Department of Energy Office of Science), we developed pre-defined data workflows, which can be run on-demand, capable of applying many data reduction and data analysis to the large ESGF data archives, transferring only the resultant analysis (ex. visualizations, smaller data files). In this talk, we will showcase a few of these workflows, highlighting how globus-flows can be used for petabyte-scale climate analysis.
Size: 1.47 MB
Language: en
Added: May 29, 2024
Slides: 8 pages
Slide Content
Suggested line of text (optional):
WE START WITH YES.
ENABLING DATA PROXIMATE CLIMATE ANALYSIS
CLIMATE SCIENCE
FLOWS WITH THE
EARTH SYSTEM GRID
FEDERATION
erhtjhtyhy
MAXWELL GROVER
Atmospheric Data Scientist
Argonne National Laboratory
BENOIT COTE
Data Services Software Developer
Argonne National Laboratory
NATHAN COLLIER
Computational Earth System Scientist
Oak Ridge National Laboratory
Wednesday May 8, 2024
GlobusWorld, Chicago, Illinois
AN INTRODUCTION TO THE
EARTH SYSTEM GRID FEDERATION
Instructions on
replacing a
current image:
1.Select and
delete image
and click the
icon to insert
a different
image
2.Use the crop
tool to position
the image
within the
shape.
A visual representation of the earth system, from NOAA
GFDL
WHAT DATASETS ARE WE WORKING WITH?
3
An Introduction to Earth System Models
Earth System Model components, from the DOE Energy
Exascale Earth System Model
HOW BIG IS THE DATA?
Petabyte-scale Datasets
4
•CMIP5 totals
>5 PB (including
replicas)
•CMIP6 totals
>25 PB (including
replicas)
•CMIP7 is
expected to have
more high
resolution output
& ensembles,
totaling ~100 PB
•ESGF2-US will
expand
Federation
holdings by
adding other
Earth science
data projects for
AI/ML, large
ensembles, etc.
•CMIP5 totals >5 PB
(including replicas)
•CMIP6 totals >25 PB
(including replicas)
•CMIP7 is expected to have
more high resolution output
& ensembles, totaling ~100
PB
•Great use-case for
data-proximate computing!
HOW IS THE DATA DISTRIBUTED?
The Federation
5
Suggested closing statement (optional):
WE START WITH YES.
AND END WITH THANK YOU.
DO YOU HAVE ANY BIG QUESTIONS?
THE CALCULATION:
ENSO WITH GLOBUS!
BIT.LY/
ESGF-GLOBUS-EXAMPLE
WHAT ARE WE CALCULATING?
The El Niño Southern Index (ENSO)
7
.
1.Subset sea surface
temperature along the
equator
a)5°N - 5°S, 170°
W-120°W
2.Calculate a 5 month
running mean, then
calculate anomalies
3.Define El Niño / La Niña
a)>+0.4 °C ?????? El Niño
b)< -0.4 °C ?????? La Niña
ESGF FLOWS AT ALCF
The User Workflow
8
1.Function is defined at LCF (or
HPC of your choice)
2.Access to this function is
managed via group
memberships (data access
abstracted away from user)
3.Users call this from the
compute environment of their
choice (ex. notebooks)
4.The aggregated, virtual
dataset is computed* and
returned to the user
*using a service account on ALCF