Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Earth System Grid Federation and Globus Flows
globusonline
54 views
8 slides
May 31, 2024
Slide 1 of 8
1
2
3
4
5
6
7
8
About This Presentation
The Earth System Grid Federation (ESGF) is a global network of data servers that archives and distributes the planet’s largest collection of Earth system model output for thousands of climate and environmental scientists worldwide. Many of these petabyte-scale data archives are located in proximit...
The Earth System Grid Federation (ESGF) is a global network of data servers that archives and distributes the planet’s largest collection of Earth system model output for thousands of climate and environmental scientists worldwide. Many of these petabyte-scale data archives are located in proximity to large high-performance computing (HPC) or cloud computing resources, but the primary workflow for data users consists of transferring data, and applying computations on a different system. As a part of the ESGF 2.0 US project (funded by the United States Department of Energy Office of Science), we developed pre-defined data workflows, which can be run on-demand, capable of applying many data reduction and data analysis to the large ESGF data archives, transferring only the resultant analysis (ex. visualizations, smaller data files). In this talk, we will showcase a few of these workflows, highlighting how Globus Flows can be used for petabyte-scale climate analysis.
Size: 1.7 MB
Language: en
Added: May 31, 2024
Slides: 8 pages
Slide Content
ENABLING DATA PROXIMATE CLIMATE ANALYSIS
CLIMATE SCIENCE
FLOWS WITH THE
EARTH SYSTEM GRID
FEDERATION
erhtjhtyhy
MAXWELL GROVER
Atmospheric Data Scientist
Argonne National Laboratory
BENOIT COTE
Data Services Software Developer
Argonne National Laboratory
NATHAN COLLIER
Computational Earth System Scientist
Oak Ridge National Laboratory
Wednesday May 8, 2024
GlobusWorld, Chicago, Illinois
AN INTRODUCTION TO THE
EARTH SYSTEM GRID FEDERATION
A visual representation of the earth system, from NOAA
GFDL
WHAT DATASETS ARE WE WORKING WITH?
3
An Introduction to Earth System Models
Earth System Model components, from the DOE Energy
Exascale Earth System Model
HOW BIG IS THE DATA?
Petabyte-scale Datasets
4
•CMIP5 totals >5 PB
(including replicas)
•CMIP6 totals >25 PB
(including replicas)
•CMIP7 is expected to have
more high resolution output
& ensembles, totaling ~100
PB
•Great use-case for data-
proximate computing!
HOW IS THE DATA DISTRIBUTED?
The Federation
5
THE CALCULATION:
ENSO WITH GLOBUS!
BIT.LY/
ESGF-GLOBUS-EXAMPLE
WHAT ARE WE CALCULATING?
The El NiñoSouthern Index (ENSO)
7
.1.Subset sea surface
temperature along the
equator
a)5°N - 5°S, 170°W-
120°W
2.Calculate a 5 month
running mean, then
calculate anomalies
3.Define El Niño/ La Niña
a)>+0.4°C àEl Niño
b)< -0.4°C àLa Niña
ESGF FLOWS AT ALCF
The User Workflow
8
1.Function is defined at LCF (or
HPC of your choice)
2.Access to this function is
managed via group
memberships (data access
abstracted away from user)
3.Users call this from the
compute environment of their
choice (ex. notebooks)
4.The aggregated, virtual
dataset is computed* and
returned to the user
*using a service account on ALCF