Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Earth System Grid Federation and Globus Flows

globusonline 54 views 8 slides May 31, 2024
Slide 1
Slide 1 of 8
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8

About This Presentation

The Earth System Grid Federation (ESGF) is a global network of data servers that archives and distributes the planet’s largest collection of Earth system model output for thousands of climate and environmental scientists worldwide. Many of these petabyte-scale data archives are located in proximit...


Slide Content

ENABLING DATA PROXIMATE CLIMATE ANALYSIS
CLIMATE SCIENCE
FLOWS WITH THE
EARTH SYSTEM GRID
FEDERATION
erhtjhtyhy
MAXWELL GROVER
Atmospheric Data Scientist
Argonne National Laboratory
BENOIT COTE
Data Services Software Developer
Argonne National Laboratory
NATHAN COLLIER
Computational Earth System Scientist
Oak Ridge National Laboratory
Wednesday May 8, 2024
GlobusWorld, Chicago, Illinois

AN INTRODUCTION TO THE
EARTH SYSTEM GRID FEDERATION

A visual representation of the earth system, from NOAA
GFDL
WHAT DATASETS ARE WE WORKING WITH?
3
An Introduction to Earth System Models
Earth System Model components, from the DOE Energy
Exascale Earth System Model

HOW BIG IS THE DATA?
Petabyte-scale Datasets
4
•CMIP5 totals >5 PB
(including replicas)
•CMIP6 totals >25 PB
(including replicas)
•CMIP7 is expected to have
more high resolution output
& ensembles, totaling ~100
PB
•Great use-case for data-
proximate computing!

HOW IS THE DATA DISTRIBUTED?
The Federation
5

THE CALCULATION:
ENSO WITH GLOBUS!
BIT.LY/
ESGF-GLOBUS-EXAMPLE

WHAT ARE WE CALCULATING?
The El NiñoSouthern Index (ENSO)
7
.1.Subset sea surface
temperature along the
equator
a)5°N - 5°S, 170°W-
120°W
2.Calculate a 5 month
running mean, then
calculate anomalies
3.Define El Niño/ La Niña
a)>+0.4°C àEl Niño
b)< -0.4°C àLa Niña

ESGF FLOWS AT ALCF
The User Workflow
8
1.Function is defined at LCF (or
HPC of your choice)
2.Access to this function is
managed via group
memberships (data access
abstracted away from user)
3.Users call this from the
compute environment of their
choice (ex. notebooks)
4.The aggregated, virtual
dataset is computed* and
returned to the user
*using a service account on ALCF