PyData Meetup Presentation in Natal April 2024

MarcelRibeiroDantas 7 views 25 slides May 07, 2024
Slide 1
Slide 1 of 25
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25

About This Presentation

PyData Meetup Natal April 2024


Slide Content

Abril/2024
Trazendo data provenance
e reprodutibilidade para
suas análises de dados em
Python

Using computers to collect, store, analyze, and disseminate data and information
Large files
P 100 GB for one raw
human genome…
Many languages
Bash, Python, R, PERL…
Complex interactions
Networks of software
and their dependencies…
Workflows

Reproducibility
Hidden reproducibility issues are like an iceberg
First, we tried to re-run the
analysis with the code and data
provided by the authors.
Second, we reimplemented the whole
method in a Python package... Experimenting with reproducibility:
a case study of robustness in bioinformatics
Kim et al., GigaScience w

Nextflow
Managing modern workflows is complicated

A reactive workflow framework and a programming DSL
Processes
Channels
Workflow
Nextflow

A reactive workflow framework and a programming DSL
data x data y data z
Input Channel
Process A
Task 1
Task 2
Task 3
data x
data y
data z
output x
output y
output z
Nextflow

A reactive workflow framework and a programming DSL
data x data y data z
Input Channel
Process A
Task 1
Task 2
Task 3
data x
data y
data z
output x
output y
output z
Nextflow
output xoutput youtput z
Output Channel

A reactive workflow framework and a programming DSL
Process A
output x
output y
output z
Nextflow
output xoutput youtput z
Output Channel
Process B
Task 1
Task 2
Task 3
data x
data y
data z

A reactive workflow framework and a programming DSL
Nextflow

A reactive workflow framework and a programming DSL
Nextflow on the Cloud

A reactive workflow framework and a programming DSL
Nextflow on the Cloud

A Nextflow pipeline for audio transcription using Whisper from OpenAI
nf-whisper
YouTube
URL
Video
File
Audio
File
Transcript
File
pytube Whisper

A Nextflow pipeline for audio transcription using Whisper from OpenAI
nf-whisper

A Nextflow pipeline for audio transcription using Whisper from OpenAI
nf-whisper

A Nextflow pipeline for audio transcription using Whisper from OpenAI
nf-whisper

A reactive workflow framework and a programming DSL
Parallelism
Reentrancy
oResume partial runs)
Reusability
Subworkflow foo
Nextflow

Nextflow
Nextflow is a language, a runtime, and a community
pipeline
runtime
Task orchestration
and execution
Built-in version control
with Git
Write code
in any language
Define software
dependencies via containers
Orchestrate tasks with
dataflow programming

Nextflow is a language, a runtime, and a community
Reproducible
Integration with code
management tools, with
versioned releases.
Portable
Docker, Singularity,
Conda, works with most
compute environments.
Scalable
5 samples on your
laptop, 5k on an HPC or
5 million in the cloud.
Nextflow

Provenance report
Generates provenance
reports in different
formats
Enables partial
execution in different
platforms
Makes it easier to
connect workflows and
different technologies
Nextflow plugins
nf-prov: Nextflow plugin to render provenance reports for pipeline runs.
https://github.com/nextflow-io/nf-prov
nextflow-io/nf-prov
Different platformsWorkflow integration

nf-core
A community effort to collect a curated set of analysis pipelines built using Nextflow
8k+
Slack
users
40k
GitHub
commits
2k+
GitHub
contributors
16k+
Pull
requests
120a
GitHub
repositories
7k+
GitHub
issues

Pipelines
M95 pipelines and a
base template
Linting
Choose conventions to
test for consistency
Modules
M1150 modules
Tooling
Development and
deployment
Subworkflows
M55 subworkflows
Schema
Validation, channels
and user interface
nf-core components
Pick and choose which component you need

Pipelines
Create from template,
sync to get updates
Schema
Build your pipeline
schema with a GUI
Modules
Create, install, update,
patch, test
Download
Fetch with singularity
images for offline use
Subworkflows
Create, install and
update
Linting
Test nf-core standards
and best practices
nf-core/tools
Command line tools to help you build your pipeline with ease

Participate
Seminars, training, hackathons, and more
●Bytesize seminars
●Training sessions
●Hackathons
●Social media
●Blogs
●Community Forum and Slack
●Documentation
●Mentorships

Thank you
Marcel Ribeiro-Dantas, Ph.D.
Developer Advocate at Seqera
[email protected]
Barcelona | Natal
Tags