MarcelRibeiroDantas
7 views
25 slides
May 07, 2024
Slide 1 of 25
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
About This Presentation
PyData Meetup Natal April 2024
Size: 17.76 MB
Language: en
Added: May 07, 2024
Slides: 25 pages
Slide Content
Abril/2024
Trazendo data provenance
e reprodutibilidade para
suas análises de dados em
Python
Using computers to collect, store, analyze, and disseminate data and information
Large files
P 100 GB for one raw
human genome…
Many languages
Bash, Python, R, PERL…
Complex interactions
Networks of software
and their dependencies…
Workflows
Reproducibility
Hidden reproducibility issues are like an iceberg
First, we tried to re-run the
analysis with the code and data
provided by the authors.
Second, we reimplemented the whole
method in a Python package... Experimenting with reproducibility:
a case study of robustness in bioinformatics
Kim et al., GigaScience w
Nextflow
Managing modern workflows is complicated
A reactive workflow framework and a programming DSL
Processes
Channels
Workflow
Nextflow
A reactive workflow framework and a programming DSL
data x data y data z
Input Channel
Process A
Task 1
Task 2
Task 3
data x
data y
data z
output x
output y
output z
Nextflow
A reactive workflow framework and a programming DSL
data x data y data z
Input Channel
Process A
Task 1
Task 2
Task 3
data x
data y
data z
output x
output y
output z
Nextflow
output xoutput youtput z
Output Channel
A reactive workflow framework and a programming DSL
Process A
output x
output y
output z
Nextflow
output xoutput youtput z
Output Channel
Process B
Task 1
Task 2
Task 3
data x
data y
data z
A reactive workflow framework and a programming DSL
Nextflow
A reactive workflow framework and a programming DSL
Nextflow on the Cloud
A reactive workflow framework and a programming DSL
Nextflow on the Cloud
A Nextflow pipeline for audio transcription using Whisper from OpenAI
nf-whisper
YouTube
URL
Video
File
Audio
File
Transcript
File
pytube Whisper
A Nextflow pipeline for audio transcription using Whisper from OpenAI
nf-whisper
A Nextflow pipeline for audio transcription using Whisper from OpenAI
nf-whisper
A Nextflow pipeline for audio transcription using Whisper from OpenAI
nf-whisper
A reactive workflow framework and a programming DSL
Parallelism
Reentrancy
oResume partial runs)
Reusability
Subworkflow foo
Nextflow
Nextflow
Nextflow is a language, a runtime, and a community
pipeline
runtime
Task orchestration
and execution
Built-in version control
with Git
Write code
in any language
Define software
dependencies via containers
Orchestrate tasks with
dataflow programming
Nextflow is a language, a runtime, and a community
Reproducible
Integration with code
management tools, with
versioned releases.
Portable
Docker, Singularity,
Conda, works with most
compute environments.
Scalable
5 samples on your
laptop, 5k on an HPC or
5 million in the cloud.
Nextflow
Provenance report
Generates provenance
reports in different
formats
Enables partial
execution in different
platforms
Makes it easier to
connect workflows and
different technologies
Nextflow plugins
nf-prov: Nextflow plugin to render provenance reports for pipeline runs.
https://github.com/nextflow-io/nf-prov
nextflow-io/nf-prov
Different platformsWorkflow integration
nf-core
A community effort to collect a curated set of analysis pipelines built using Nextflow
8k+
Slack
users
40k
GitHub
commits
2k+
GitHub
contributors
16k+
Pull
requests
120a
GitHub
repositories
7k+
GitHub
issues
Pipelines
M95 pipelines and a
base template
Linting
Choose conventions to
test for consistency
Modules
M1150 modules
Tooling
Development and
deployment
Subworkflows
M55 subworkflows
Schema
Validation, channels
and user interface
nf-core components
Pick and choose which component you need
Pipelines
Create from template,
sync to get updates
Schema
Build your pipeline
schema with a GUI
Modules
Create, install, update,
patch, test
Download
Fetch with singularity
images for offline use
Subworkflows
Create, install and
update
Linting
Test nf-core standards
and best practices
nf-core/tools
Command line tools to help you build your pipeline with ease
Participate
Seminars, training, hackathons, and more
●Bytesize seminars
●Training sessions
●Hackathons
●Social media
●Blogs
●Community Forum and Slack
●Documentation
●Mentorships
Thank you
Marcel Ribeiro-Dantas, Ph.D.
Developer Advocate at Seqera [email protected]
Barcelona | Natal