Providing Globus Services to Users of JASMIN for Environmental Data Analysis
globusonline
17 views
14 slides
May 31, 2024
Slide 1 of 14
1
2
3
4
5
6
7
8
9
10
11
12
13
14
About This Presentation
JASMIN is the UK’s high-performance data analysis platform for environmental science, operated by STFC on behalf of the UK Natural Environment Research Council (NERC). In addition to its role in hosting the CEDA Archive (NERC’s long-term repository for climate, atmospheric science & Earth ob...
JASMIN is the UK’s high-performance data analysis platform for environmental science, operated by STFC on behalf of the UK Natural Environment Research Council (NERC). In addition to its role in hosting the CEDA Archive (NERC’s long-term repository for climate, atmospheric science & Earth observation data in the UK), JASMIN provides a collaborative platform to a community of around 2,000 scientists in the UK and beyond, providing nearly 400 environmental science projects with working space, compute resources and tools to facilitate their work. High-performance data transfer into and out of JASMIN has always been a key feature, with many scientists bringing model outputs from supercomputers elsewhere in the UK, to analyse against observational or other model data in the CEDA Archive. A growing number of JASMIN users are now realising the benefits of using the Globus service to provide reliable and efficient data movement and other tasks in this and other contexts. Further use cases involve long-distance (intercontinental) transfers to and from JASMIN, and collecting results from a mobile atmospheric radar system, pushing data to JASMIN via a lightweight Globus deployment. We provide details of how Globus fits into our current infrastructure, our experience of the recent migration to GCSv5.4, and of our interest in developing use of the wider ecosystem of Globus services for the benefit of our user community.
Size: 6.38 MB
Language: en
Added: May 31, 2024
Slides: 14 pages
Slide Content
Providing Globus services
to users of JASMIN
for environmental data analysis
GlobusWorld 2024 Dr Matt Pritchard
STFC RAL, UK
-UK data analysis platform for environmental science
-Operated by STFC on behalf of NERC
-Infrastructure team: STFC Scientific Computing
-User services team: CEDA/RAL Space
-Tenants
-CEDA Archive (uses JASMIN as host infrastructure)
->350 other science projects
-“Bring compute to the data”
-Community grown around the platform
About JASMIN
About JASMIN
~2000 users
350+ projects
Storage:
~50 PB SOF for GWS volumes
~8 PB PFS for scratch, transfer cache & specialist volumes
~8 PB Object storage for access from anywhere
~90 PB Tape capacity
SSD storage for user home directories & small-file volumes
BLK storage for cloud system storage & databases
18,500 CPU nodes (+ soon)
~5M CPU hrs/mo
Slurm cluster
+ 8 interactive “sci” nodes
1x x 4-card GPU nodes
2 x 8-card GPU nodes
~60 cloud
tenancies
Current projects
Example workflow: user/group project
Example workflow:
climate science community
Climate Model Intercomparison project (CMIP)
•IPCC assessment reports AR5, AR6, …
•inform government policy, decision making
Climate modelling centres
around the world
Met Office
Hadley Centre
Example workflow: remote instruments
NCAS Mobile
X-band weather
radar
BT Tower Observatory,
London
COZI lab,
Univ. of
York
3 main functions
Data transfer services
Data download
from CEDA Archive
[RO]
Ingest to data to CEDA
Archive for long-term
curation
[W]
Data flow in/out of
JASMIN group
workspaces
[RW]
ftp
http
gridftp, globus