Climate Science Flows - Enabling Petabyte-Scale Climate Analysis with the Earth System Grid Federation and Globus Flows.pdf

globusonline 8 views 8 slides May 29, 2024
Slide 1
Slide 1 of 8
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8

About This Presentation

The Earth System Grid Federation (ESGF) is a global network of data servers that archives and distributes the planet’s largest collection of Earth system model output for thousands of climate and environmental scientists worldwide. Many of these petabyte-scale data archives are located in proximit...


Slide Content

Suggested line of text (optional):

WE START WITH YES.

ENABLING DATA PROXIMATE CLIMATE ANALYSIS
CLIMATE SCIENCE
FLOWS WITH THE
EARTH SYSTEM GRID
FEDERATION
erhtjhtyhy
MAXWELL GROVER
Atmospheric Data Scientist
Argonne National Laboratory
BENOIT COTE
Data Services Software Developer
Argonne National Laboratory

NATHAN COLLIER
Computational Earth System Scientist
Oak Ridge National Laboratory

Wednesday May 8, 2024
GlobusWorld, Chicago, Illinois

AN INTRODUCTION TO THE
EARTH SYSTEM GRID FEDERATION

Instructions on
replacing a
current image:
1.Select and
delete image
and click the
icon to insert
a different
image
2.Use the crop
tool to position
the image
within the
shape.

A visual representation of the earth system, from NOAA
GFDL
WHAT DATASETS ARE WE WORKING WITH?
3
An Introduction to Earth System Models
Earth System Model components, from the DOE Energy
Exascale Earth System Model

HOW BIG IS THE DATA?
Petabyte-scale Datasets
4

•CMIP5 totals
>5 PB (including
replicas)
•CMIP6 totals
>25 PB (including
replicas)
•CMIP7 is
expected to have
more high
resolution output
& ensembles,
totaling ~100 PB
•ESGF2-US will
expand
Federation
holdings by
adding other
Earth science
data projects for
AI/ML, large
ensembles, etc.







•CMIP5 totals >5 PB
(including replicas)
•CMIP6 totals >25 PB
(including replicas)
•CMIP7 is expected to have
more high resolution output
& ensembles, totaling ~100
PB
•Great use-case for
data-proximate computing!

HOW IS THE DATA DISTRIBUTED?
The Federation
5

Suggested closing statement (optional):

WE START WITH YES.
AND END WITH THANK YOU.
DO YOU HAVE ANY BIG QUESTIONS?

THE CALCULATION:
ENSO WITH GLOBUS!

BIT.LY/
ESGF-GLOBUS-EXAMPLE

WHAT ARE WE CALCULATING?
The El Niño Southern Index (ENSO)
7
.
1.Subset sea surface
temperature along the
equator
a)5°N - 5°S, 170°
W-120°W
2.Calculate a 5 month
running mean, then
calculate anomalies
3.Define El Niño / La Niña
a)>+0.4 °C ?????? El Niño
b)< -0.4 °C ?????? La Niña

ESGF FLOWS AT ALCF
The User Workflow
8
1.Function is defined at LCF (or
HPC of your choice)
2.Access to this function is
managed via group
memberships (data access
abstracted away from user)
3.Users call this from the
compute environment of their
choice (ex. notebooks)
4.The aggregated, virtual
dataset is computed* and
returned to the user
*using a service account on ALCF