Cloud Optimized HDF5 for the ICESat-2 mission

HDFEOS 64 views 18 slides Aug 02, 2024

Slide 1 of 18

About This Presentation

HDF and HDF-EOS Workshop XXVII (2024)

Size: 3.92 MB

Language: en

Added: Aug 02, 2024

Slides: 18 pages

Slide Content

Cloud Optimized HDF5
for the
ICESat-2 mission
ESIP Summer meeting 2024
Luis López
Research Software Engineer
NSIDC

Andrew Barrett
Aleksandar Jelenak
Lisa Kaser
Jeff Lee
Amy Steiker

Credit: NASA's Goddard Space Flight Center

Important questions about our planet can now be answered by
integrating years of data from different missions.
Global Sea Ice Concentration Boreal Forest Biomass

The data coming from these missions is
now available in the cloud! **
NASA and other agencies started to migrate their data to the cloud.
**caveat: it’s by large in archival formats, HDF5 and NetCDF

Problem: Accessing HDF5 in
the cloud is slow, how slow?

Improving performance of HDF5 in the cloud is key to
enable science at scale.
●Data is becoming too large to work
locally.

●I/O libraries are optimized for local and
supercomputing workﬂows.

●HDF as a format was not designed for
the cloud.

The problem = Size + Tools + Format

Cloud-optimized HDF5?
https://www.hdfgroup.org/2024/01/strategies-and-software-to-optimize-hdf5-netcdf-4-files-for-the-cloud/
●Metadata is consolidated
●Custom caching buffer size
●Global API lock is still in place

●Metadata is scattered through the file, each nested group makes this
problem worse.
●By default, metadata is written to the file (and read from) on fixed blocks of
4kb. 1MB of metadata ~= 250 requests.
●Global API lock, those 250 reqs are sequential!
Why HDF is not Performant in the Cloud

Paged Aggregation (data + metadata)

Metadata Blocks (user or dedicated page)

Trying Big Files from the ICESat-2 Mission
Source: https://github.com/nsidc/earthaccess/discussions/251

Accidental Complexity
NASA Policies
ﬁle format libraryI/O driver
data wrangling
library
AWS S3
ds = xr.open_dataset(“s3://nasa-data.hdf5”)
(or ROS3)

It’s APIs All the Way Down

Cloud-Optimized HDF5 Works!*
Code: https://gist.github.com/betolink/b545c364f80882c113b8cc27b763c729
Source: Andrew Barrett

Remote I/O Visualized
https://github.com/ajelenak/ros3vfd-log-info
●Cloud optimizations to HDF5
reduces requests by an order of
magnitude
●Data that’s not cloud optimized or
is read with out-of-the-box
parameters produces a lot of I/O

What could we do with CO-HDF?
Xpublish Kerchunk SlideRule Happy researchers

Improving performance of HDF5 in the cloud is key to
enable science at scale. Thanks!

Cloud Optimized HDF5 for the ICESat-2 mission

About This Presentation

Slide Content

Tags

Categories

Download

Quick Actions

Statistics

Related Slideshows

Cloud Optimized HDF5 for the ICESat-2 mission

About This Presentation

Slide Content

Slide 1

Slide 3

Slide 4

Slide 5

Slide 6

Slide 7

Slide 8

Slide 9

Slide 10

Slide 12

Slide 13

Slide 14

Slide 15

Slide 16

Slide 17

Slide 18

Tags

Categories

Download

Quick Actions

Statistics

Related Slideshows

Pray For The Peace Of Jerusalem and You Will Prosper

Don_t_Waste_Your_Life_God.....powerpoint

VILLASUR_FACTORS_TO_CONSIDER_IN_PLATING_SALAD_10-13.pdf

Fertility awareness methods for women in the society

Chapter 5 Arithmetic Functions Computer Organisation and Architecture

syakira bhasa inggris (1) (1).pptx.......