GlobusWorld 2024 Opening Keynote session

globusonline 76 views 54 slides May 30, 2024
Slide 1
Slide 1 of 54
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27
Slide 28
28
Slide 29
29
Slide 30
30
Slide 31
31
Slide 32
32
Slide 33
33
Slide 34
34
Slide 35
35
Slide 36
36
Slide 37
37
Slide 38
38
Slide 39
39
Slide 40
40
Slide 41
41
Slide 42
42
Slide 43
43
Slide 44
44
Slide 45
45
Slide 46
46
Slide 47
47
Slide 48
48
Slide 49
49
Slide 50
50
Slide 51
51
Slide 52
52
Slide 53
53
Slide 54
54

About This Presentation

Listen to the keynote address and hear about the latest developments from Rachana Ananthakrishnan and Ian Foster who review the updates to the Globus Platform and Service, and the relevance of Globus to the scientific community as an automation platform to accelerate scientific discovery.


Slide Content

State of the Globus World
Rachana Ananthakrishnan
Ian Foster

Globus: a hybrid model for research IT
2
Standards-compli
ant security
fabric
Compute
Facility
On-prem & cloud
storage
Laptop/desktop
Institutional
resources
Instrument
facility/Lab
Laptop/desktop
Custom services
Global
management &
orchestration
Hosted, persistent, scalable, resilient services
Local agents
with plug-in
Action
Provider
Globus
Compute
Globus
Connect

33M+
compute
tasks
241K+
flow
runs
60K+
connecte
d
storage

15K+
guest
collection
s
542K+
registered
users/app
s
192M+
document
s
indexed

New Globus users
4

Data ecosystem
•Web addressable data (HTTP/S
access)
•Secure, reliable and managed
transfer
•Collaborative data sharing, with
fine grained access control
•Consistent UX across diverse
storage systems
Data collections

Globus transfer users
6

7

Globus Connect Server deployments
8

Some of the updates for data ecosystem
•Mapped collections management
–update owners
–delete protected by default
•Facilitate collection lifecycle management
–Mapped collection administrators can modify guest collection metadata and delete roles
and permissions
–Time of last use of collection
–CLI tools to manage collections based on creation/last use time
•IPv6 only and dual stack support for data transfer and sharing
–Globus Transfer, Auth, and Groups services; GCS and GCP
•Expanded set of Linux distributions supported for Globus Connect Server
–11 distros, and 26 versions
•ARM AArch64 supported for Globus Connect Personal
9

Continue to grow supported storage systems

Connector highlights
•OneDrive: update to use preferred metadata
checksums.
•HPSS: error reporting, and caching improvements
•Google Drive: skip duplicate file
•Google/Dropbox/Box/MS/: support use of any user
account without need to mapping
•Grow S3 compatible storage system partners: Storj


11

Protected data management
12

Protected data management updates
•Support use of Timers with protected data
•Expiration time on permission with guest collections
•Administrative controls on permission expiration
policy


13

Compliance
Increased contractual
requirements around
information security
and privacy.

14
We are hiring!
Governance, Risk and
Compliance Lead

Compute ecosystem •Programmatic access to
compute resources
•Reliable and managed
execution
•Consistent user interface
across diverse execution
systems
•Fine grained access control
Compute
endpoint

User interaction with Globus Compute
16
A B
You request a function be
executed on endpoints A and B
1
2
Globus Compute manages
the reliable and secure
execution on these endpoints
3
Globus Compute returns results or
stores them until requested
Compute
Service
A compute
resource
Another
compute
resource

Globus Compute Multiuser Endpoint
•Deployed and operated by administrators
–Launches processes as user’s local account
•Preconfigured templates for local site options and
policies
•Same AuthN & AuthZ as Globus Connect Server
–Domain based authentication policy
–Authorization via mapping of user to local account
17
Launches an
endpoint process
for user

Globus Compute Multiuser Endpoint
18
Globus Compute Multiuser
Endpoint
Identity
Mapping
Configuratio
n
Templates
User Endpoint Process
(as local user)
Globus Compute Engine
Launches an
endpoint process
for user
Node N
Node 2
Node 1
Compute
Service
User Endpoint Process
(as local user)
Globus Compute Engine

Globus Compute Multiuser Endpoint
19








Learn more at tutorial
tomorrow

Automation ecosystem
20
•Event-driven invocation of actions on
diverse services
•Reliable and managed orchestration
•Extensible to support custom service
APIs
•Delegated execution and monitoring
Flows

Flows highlights
•Better error handling for consents and authentication
•Discovery of flows and runs via search service
•Improvements to guided start of run
•In-depth flow validation prior to deploy

21
$ globus flows validate definition.json

Python SDK/CLI
•Globus Connect Server commands
–endpoint, collection management, guest collection creation
•Timer pause/resume
•Flows run management

docs.globus.org/cli/reference/changelog/
globus-sdk-python.readthedocs.io/en/stable/changelog.html
22

Javascript (JS) SDK
•Simplify integration with
web applications, JS
runtimes
•Support for all the services
in the Globus platform
•Globus web app
(app.globus.org) uses the
JS SDK
23
github.com/globus/globus-sdk-javascript

What are we seeing the community invest in?
•(Secure) Data distribution/publication
•FAIR data/ML ready data/…
•Migration across storage systems
•Dealing with data from instruments/experiment
•Applications discovering and using “for purpose”
resources
•Managed run of a compute campaign (a bag of tasks)
•Offering accessible user interfaces for complex
capabilities

24

Some of our focus areas…
25
•Harmonize terminology and model on data ecosystem
•Connector enhancements
•Increased limits for transfer, driven by automation
•MPI support for compute
•Web interfaces for compute task management for user and
admins
•Expanded policy support for search indices
•Additional services for use with protected data
•…

Lowering barriers for authoring flows
26
globus.github.io/flows-ide/
•Schema validation
•Visualization of
the flow definition
•Integration with
action provider
schema
•Leverage
validation tools

Building portal/science gateways/applications
27
Platform APIs
SDKs, CLI, Helper
pages
Globus Django Portal
Framework
Sample portal Globus Static
Portal Framework

Globus Static Portal Framework
•Single Page Application (SPA) portals
•No code solution
–Globus provided Generators for common use cases
–Customizable configuration of portal using JSON
•Served from any static content hosting solution
–E.g. GitHub Pages, AWS S3
•Pre-built continuous integration and deployment (CI/CD)
–Using GitHub Actions
28

1. Register the portal with Globus
29
Register an
application
with Globus
Auth, so the
portal has it’s
own identity.

2. Create new repository from template
30
Template repository
contains:
-Configuration
template for specific
use case
•Configuration of
GitHub Actions to use
generator and
automatically deploy
using GitHub Pages
•Dependabot
configuration to
manage
dependencies

Globus provided template repositories

3. Configure the project repository to use GitHub
Actions
31
Configure the
repository’s
Pages to be
deployed using
Action

4. Update configuration to customize
32
Configuration:
-Client id
-Portal features:
-Title
-Privacy Policy
-Terms of Services
-Tagline
-Data served:
-Collection id
-Path

5. Portal is automatically deployed
33
•Uses GitHub
Actions to
build
•Deployed
using GitHub
Pages

6. And kept updated
34
•Dependency
updates are
managed via
Dependabot

Sustaining
and
growing
Globus
35

Subscriber growth
36

Subscribers by Subscription Type
37

Self-managed subscription management
Subscription groups to
manage roles and privileges

39
Group policy and
membership managed
by the institution

Please update default
text in your subscription
group description!

Engage with the Globus team
40
Globus Discuss
community mailing list

Our Mission
Increase the efficiency and
effectiveness of researchers
engaged in data-driven
science and scholarship
through sustainable software

An automation ecosystem
43
•Event-driven invocation of actions on
diverse services
•Reliable and managed orchestration
•Extensible to support custom service
APIs
•Delegated execution and monitoring
Flows

The continued evolution of the scientific method
https://doi.org/10.1038/s41524-022-00765-z
•Scientific knowledge at scale
•AI-generated hypotheses
•Autonomous testing
1600s 1950s 2000s 2020s
Empirical
Science

1
st
Paradigm
Theoretical
Science

2
nd
Paradigm
Computational
Science

3rd Paradigm
Big Data-driven
Science

4
th
Paradigm
Accelerated
Discovery
Observations
Experimentation
Scientific laws in
physics, biology,
chemistry, etc.
•Simulations
•Molecular dynamics
•Mechanistic models
•Big data, machine learning
•Patterns, anomalies
•Visualization
Increasing automation, connectivity, and scale

Accelerating discovery using AI, HPC, and robotics
Extraction, integration and
reasoning with knowledge
at scale
Tools help identify new
questions based on needs
and gaps in knowledge
Machine representation of
knowledge leads to new
hypotheses and questions
Generative models
automatically propose new
hypotheses that expand the
discovery space
Robotic labs automate
experimentation and bridge
digital models and physical
testing
Accelerated
Scientific
Method
https://doi.org/10.1038/s41524-022-00765-z
Pattern and anomaly detection
integrated with simulation and
experiment to extract insights

Accelerating discovery using AI, HPC, and robotics
Extraction, integration and
reasoning with knowledge
at scale
Tools help identify new
questions based on needs
and gaps in knowledge
Machine representation of
knowledge leads to new
hypotheses and questions
Generative models
automatically propose new
hypotheses that expand the
discovery space
Robotic labs automate
experimentation and bridge
digital models and physical
testing
Accelerated
Scientific
Method
https://doi.org/10.1038/s41524-022-00765-z
Pattern and anomaly detection
integrated with simulation and
experiment to extract insights
Access & integrate data, computing, instruments, services
Anywhere, any time; securely, reliably, rapidly, scalably

Science and
Engineering
Datasets
Mathematics
Biology
Materials
Chemistry
Particle Physics
Nuclear Physics
Computer Science
Climate
Medicine
Cosmology
Fusion Energy
Accelerators
Reactors
Energy Systems
Manufacturing
Downstream
Scientific Tasks
Autonomous
Experiments
Scientific
Discovery
Digital Twins
Inverse Design
Code Optimization
Accelerated
Simulations
Text and Code
Corpora
General Text
Media
News
Humanities
History
Law
Digital Libraries
OSTI Archive
Scientific Journals
arXiv
Code repositories
Data.gov
PubMed
Agency Archives

Open
Science
Foundation
Model
Training
Tuned and Adapted Downstream Models Co-Design
AI: Open science foundation model(s)

•General purpose scientific LLM: Broadly trained,
on general corpora; scientific papers and texts;
structured science data
•Explore pathways towards a “Scientific Assistant”
•Built with international partners
•Multilingual: English,日本語, French, German,
Spanish, Italian, …
•Multimodal: Images, tables, equations, proofs,
time-series, graphs, fields, sequences, …
Trillion Parameter Consortium
A founding member
of:
AuroraGPT: A foundation model
for open science

Accelerating discovery using AI, HPC, and robotics
Extraction, integration and
reasoning with knowledge
at scale
Tools help identify new
questions based on needs
and gaps in knowledge
Machine representation of
knowledge leads to new
hypotheses and questions
Generative models
automatically propose new
hypotheses that expand the
discovery space
Robotic labs automate
experimentation and bridge
digital models and physical
testing
Accelerated
Scientific
Method
https://doi.org/10.1038/s41524-022-00765-z
Pattern and anomaly detection
integrated with simulation and
experiment to extract insights

AuroraGPT dFMs UX-LLM
Hybrid AI models
(Community of Experts
information flows)

Query PubMed for ChatGPT
feedstock
Accelerated discovery processes

For example: A peptide expert

(Prototyped with PubMed and ChatGPT)
Retrieve abstracts A from PubMed that
reference specified peptide

Use ChatGPT to build hypotheses by
using retrieval-augmented generation: e.g.:

“Given A, on which organism is {peptide}
acting?”

Arvind Ramanathan, Priyanka Setty, et al.
We want a model
with deep expertise
regarding peptides
and related topics
We want to be able
to make millions of
such requests

PMC Agent BC-BRC
Agent
Generate additional
experiments?
?
Set of
peptides as
input
Query PubMed for
ChatGPT feedstock
Align proteins, predict
structure, rank results
Evaluate structures
and filter results
UniProt
Agent
Peptide agent may be used with other
agents to identify antimicrobial peptides
Agents run on
HPC/AI resources

Self-driving lab performs experiments
Candidates for
experimental
evaluation

AARL-P
Rapid Prototyping Lab
Bldg 240
AARL-C
Polybot
CNM User
Facility
Building 440
AARL-X
APS Sector
8-ID
AARL-A
Airfree
Building
200
AARL-B
Biology
BSL-2
Bldg 350
Argonne Autonomous Research Laboratories (AARL)

Accelerating discovery using AI, HPC, and robotics
Extraction, integration and
reasoning with knowledge
at scale
Tools help identify new
questions based on needs
and gaps in knowledge
Machine representation of
knowledge leads to new
hypotheses and questions
Generative models
automatically propose new
hypotheses that expand the
discovery space
Robotic labs automate
experimentation and bridge
digital models and physical
testing
Accelerated
Scientific
Method
https://doi.org/10.1038/s41524-022-00765-z
Pattern and anomaly detection
integrated with simulation and
experiment to extract insights
•Access & integrate data, computing,
instruments, and services
•Anywhere, any time;
securely, reliably, rapidly, scalably