Reactive Documents and Computational Pipelines - Bridging the Gap
globusonline
28 views
12 slides
May 29, 2024
Slide 1 of 12
1
2
3
4
5
6
7
8
9
10
11
12
About This Presentation
As scientific discovery and experimentation become increasingly reliant on computational methods, the static nature of traditional publications renders them progressively fragmented and unreproducible. How can workflow automation tools, such as Globus, be leveraged to address these issues and potent...
As scientific discovery and experimentation become increasingly reliant on computational methods, the static nature of traditional publications renders them progressively fragmented and unreproducible. How can workflow automation tools, such as Globus, be leveraged to address these issues and potentially create a new, higher-value form of publication? LivePublication leverages Globus’s custom Action Provider integrations and Compute nodes to capture semantic and provenance information during distributed flow executions. This information is then embedded within an RO-crate and interfaced with a programmatic document, creating a seamless pipeline from instruments, to computation, to publication.
Size: 2.36 MB
Language: en
Added: May 29, 2024
Slides: 12 pages
Slide Content
LivePublication
Integrating live, distributed computational workflows with research articles
Augustus Ellerm, University of Canterbury
Mark Gahegan, University of Auckland
Benjamin Adams, University of Canterbury
Nelis Drost, University of Auckland
Beyond Prediction:
Explanatory and
Transparent Data
Science
•NZ cross university,
multi-year project
•Explanatory data
science
•Model introspection
•Provenance
•Scientific
transparency
Scientists and
Authors: What do we
have in common (not
much)
•Products of our research
workflows (such as tables,
graphs, descriptions of the
workflow, code) should be easily,
and in compelling
ways, incorporated into our
publications
•Preferably, articles would
automagically generate
well-encoded aspects of our
work
•A research article should be a
collection of linked research
outputs
Stokel-Walker, Chris. 2023. “ChatGPT Listed as Author on Research
Papers: Many Scientists Disapprove.” Nature Publishing Group UK.
January 18, 2023. https://doi.org/10.1038/d41586-023-00107-z.
The digital age
challenge
•Paradigm shift in Research
•Growing use of computational tools
and ‘born digital’ research
•Reproducibility and Transparency
•Fragmentation of scientific record
•Limitations of traditional
publication containers
•Static
•Go out of date
LivePublication: From Data, To
Computation,
To Publication
Workflow Management Systems
•Provide [1]
•Abstraction from infrastructure complexity
•Reuse and reproducibility frameworks
•Reporting of computational methodologies & implicit
decisions regarding scientific processes
•LivePublication uses WMS to …
•Containerise real, executing scientific experiments within
scientific articles
•Gather implicit and explicit data on methodologies
•Enable consistently ‘fresh’ publications
[1] Goble, Carole, Sarah Cohen-Boulakia, Stian Soiland-Reyes, Daniel Garijo, Yolanda Gil, Michael R.
Crusoe, Kristian Peters, and Daniel Schober. 2020. “FAIR Computational Workflows.” Data Intelligence 2
(1-2): 108–21.
Provenance &
WMS
•Provenance can be divided into two
perspectives
•The workflow plan (How) – prospective provenance
•The workflow run (What) – retrospective
provenance
•LivePublication uses provenance to
•Represent runs of workflows, intermediate
inputs/outputs, data sources, results, and other
domain-specific meta-data
•Containerise an executable workflow description
through prospective provenance
•Enrich publications with usually obscured
methodological information (performance, timings,
scripts, & other introspective information)
Globus Flows
& Globus
Compute
•Integrating Globus Flows within
the LivePublication Framework
•Two primary mechanisms:
•Globus compute
integration for generating
provenance crates
•Gladier integration for
generating workflow
definitions to automatically
manage generated
provenance data
Distributed Step
Crate
•Intermediate format for portable distributed node provenance data
•Extensions for performance monitoring and introspection
•Extensions for access control requirements for parsing re-use /
re-execution of distributed flow
Provenance Run Crate |
Distributed Step Crate
•Already existing
Provenance specification –
Provenance Run Crate
•Integration of Distributed
Step Crate into the
Provenance Run Crate
Profile
LivePublication summary
•LivePublication provides
•A low(er) barrier to entry for live representations of born digital research
•A framework, and tooling, which interfaces between already existing eScience
technologies
•A way of keeping research fresh, relevant, and useful for longer
•LivePublication enables
•New ways of publishing longitudinal studies
•Biodiversity surveys, climate science, pandemic modelling
•The maintenance of rich metadata and provenance information of born digital
research
•Reproducibility (replicability), Transparency, Reuse
•Programmatic articles which can describe multiple states of an ongoing
experiment
•A user with their own data can be provided an analysis of that data from
the authors perspective!