Reactive Documents and Computational Pipelines - Bridging the Gap

globusonline 28 views 12 slides May 29, 2024
Slide 1
Slide 1 of 12
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12

About This Presentation

As scientific discovery and experimentation become increasingly reliant on computational methods, the static nature of traditional publications renders them progressively fragmented and unreproducible. How can workflow automation tools, such as Globus, be leveraged to address these issues and potent...


Slide Content

LivePublication
Integrating live, distributed computational workflows with research articles
Augustus Ellerm, University of Canterbury
Mark Gahegan, University of Auckland
Benjamin Adams, University of Canterbury
Nelis Drost, University of Auckland

Beyond Prediction:
Explanatory and
Transparent Data
Science
•NZ cross university,
multi-year project
•Explanatory data
science
•Model introspection
•Provenance
•Scientific
transparency

Scientists and
Authors: What do we
have in common (not
much)
•Products of our research
workflows (such as tables,
graphs, descriptions of the
workflow, code) should be easily,
and in compelling
ways, incorporated into our
publications
•Preferably, articles would
automagically generate
well-encoded aspects of our
work
•A research article should be a
collection of linked research
outputs

Stokel-Walker, Chris. 2023. “ChatGPT Listed as Author on Research
Papers: Many Scientists Disapprove.” Nature Publishing Group UK.
January 18, 2023. https://doi.org/10.1038/d41586-023-00107-z.

The digital age
challenge
•Paradigm shift in Research
•Growing use of computational tools
and ‘born digital’ research
•Reproducibility and Transparency
•Fragmentation of scientific record
•Limitations of traditional
publication containers
•Static
•Go out of date

LivePublication: From Data, To
Computation,
To Publication

Workflow Management Systems
•Provide [1]
•Abstraction from infrastructure complexity
•Reuse and reproducibility frameworks
•Reporting of computational methodologies & implicit
decisions regarding scientific processes
•LivePublication uses WMS to …
•Containerise real, executing scientific experiments within
scientific articles
•Gather implicit and explicit data on methodologies
•Enable consistently ‘fresh’ publications
[1] Goble, Carole, Sarah Cohen-Boulakia, Stian Soiland-Reyes, Daniel Garijo, Yolanda Gil, Michael R.
Crusoe, Kristian Peters, and Daniel Schober. 2020. “FAIR Computational Workflows.” Data Intelligence 2
(1-2): 108–21.

Provenance &
WMS
•Provenance can be divided into two
perspectives
•The workflow plan (How) – prospective provenance
•The workflow run (What) – retrospective
provenance
•LivePublication uses provenance to
•Represent runs of workflows, intermediate
inputs/outputs, data sources, results, and other
domain-specific meta-data
•Containerise an executable workflow description
through prospective provenance
•Enrich publications with usually obscured
methodological information (performance, timings,
scripts, & other introspective information)

Globus Flows
& Globus
Compute
•Integrating Globus Flows within
the LivePublication Framework
•Two primary mechanisms:
•Globus compute
integration for generating
provenance crates
•Gladier integration for
generating workflow
definitions to automatically
manage generated
provenance data

Distributed Step
Crate
•Intermediate format for portable distributed node provenance data
•Extensions for performance monitoring and introspection
•Extensions for access control requirements for parsing re-use /
re-execution of distributed flow

Provenance Run Crate |
Distributed Step Crate
•Already existing
Provenance specification –
Provenance Run Crate
•Integration of Distributed
Step Crate into the
Provenance Run Crate
Profile

LivePublication summary
•LivePublication provides
•A low(er) barrier to entry for live representations of born digital research
•A framework, and tooling, which interfaces between already existing eScience
technologies
•A way of keeping research fresh, relevant, and useful for longer
•LivePublication enables
•New ways of publishing longitudinal studies
•Biodiversity surveys, climate science, pandemic modelling
•The maintenance of rich metadata and provenance information of born digital
research
•Reproducibility (replicability), Transparency, Reuse
•Programmatic articles which can describe multiple states of an ongoing
experiment
•A user with their own data can be provided an analysis of that data from
the authors perspective!

•Questions?