Extending Globus into a Site-wide Automated Data Infrastructure.pdf

globusonline 109 views 8 slides May 29, 2024

Slide 1 of 8

About This Presentation

The Rosalind Franklin Institute hosts a variety of scientific instruments, which allow us to capture a multifaceted and multilevel view of biological systems, generating around 70 terabytes of data a month. Distributed solutions, such as Globus and Ceph, facilitates storage, access, and transfer of ...

Size: 1.47 MB

Language: en

Added: May 29, 2024

Slides: 8 pages

Slide Content

Extending Globus into
a site wide automated
data infrastructure
Tibor Auer, Dimitrios Bellos, Laura Shemilt
Advanced Research Computing
The Rosalind Franklin Institute

The Rosalind Franklin Institute
•Research
•Aim: image, interpret and intervene in biological systems

•Integrative scope

•Multilevel resolution: macroscopic to atomic

The Rosalind Franklin Institute
•Infrastructural requirements

•Fast, secure, and eﬃcient data transfer

•Eﬃcient and reproducible analysis workﬂows
•Data management challenge

•Amount: 70 TB per month
•Heterogeneous data: formats, structures, and
collection rates

•Heterogenous computing requirements:
•Physical workstations, VMs, and oﬀsite HPCs
•Various open-source software tools

Fast, secure, and eﬃcient data transfer
•Globus •RFI Globus
•Access management using guest collections and
ACL rules (ORCID links local and Globus identities)

•Automated conﬁguration
•Service Accounts avoid need for human-in-the-loop

•Setup steps (need to be done only a few times) →
automated with Ansible
•Transfer steps (need to be done regularly) → RFI
Globus API based on Globus SDK for Python

Workstation
(SSHFS)

VM

HPC

Instruments
Storage
(POSIX & S3)

Eﬃcient and reproducible analysis workﬂows
•Automated data acquisition and annotation•Automated data retrieval
Integrate with microservices
Transactional and
scientiﬁc metadata
Transactional and
scientiﬁc metadata

Workstation
Storage
(POSIX & S3)

Instruments
User permission
(LDAP/Keycloak)

Eﬃcient and reproducible analysis workﬂows
•Automated analysis
Integrate with microservices
•Check for new data
•Wait for the
‘sentinel’ directory
•Sanity checks
•Generate UDID
(directory name
on HPC)
Transfer
data
Create a
‘sentinel’
directory
If the ‘sentinel’
directory is
deleted, transfer
results back
(Optionally)
delete data
from HPC
Submit
the analysis script
to the scheduler
The job deletes the
‘sentinel’ directory

Acknowledgement
Advanced Research Computing
•Dimitrios Bellos
•Laura Crawford
•Nick Crawford
•Alex Lubbock
•Laura Shemilt
•Mark Basham

•Silvia da Graca Ramos (former member)
•Joss Whittle (former member)

Instrument scientists and lab managers

IT
•Niaz Khan
•Matthew Selby

Globus Support Team

Extending Globus into a Site-wide Automated Data Infrastructure.pdf

About This Presentation

Slide Content

Tags

Categories

Download

Quick Actions

Statistics

Related Slideshows

Extending Globus into a Site-wide Automated Data Infrastructure.pdf

About This Presentation

Slide Content

Slide 2

Slide 3

Slide 4

Slide 5

Slide 6

Slide 7

Slide 8

Tags

Categories

Download

Quick Actions

Statistics

Related Slideshows

8-top-ai-courses-for-customer-support-representatives-in-2025.pptx

7-essential-ai-courses-for-call-center-supervisors-in-2025.pptx

25-essential-ai-courses-for-user-support-specialists-in-2025.pptx

8-essential-ai-courses-for-insurance-customer-service-representatives-in-2025.pptx

Know for Certain

PPT OPD LES 3ertt4t4tqqqe23e3e3rq2qq232.pptx