M1-WhatIsRAP, why NSOs should invest on itpdf

AhmedElKordy19 9 views 17 slides Mar 07, 2025
Slide 1
Slide 1 of 17
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17

About This Presentation

RAP


Slide Content

What is a RAP?
Principles of Reproducible Analytical Pipelines
Christophe Bontemps
1
SIAP-2024

Fundamental Principles of Official Statistics
Clear mention of the process used
to produce statistics
·
To retain trust in official statistics, the
statistical agencies need to decide
according to strictly professional
considerations, including scientific
principles and professional ethics, on
the methods and procedures for the
collection, processing, storage and
presentation of statistical data.
·
2/17

Usual practice: Theory vs reality
3/17

Usual practice: In the end
4/17

What are the issues?
Lots of files
Cut and paste is not a reliable, reproducible approach!
Mistakes hard to track
Each operator has his/her own approach
Several versions of code may coexist
The steps aren’t recorded
Testing is hard
Reproducibility is not granted
Quality is controlled only at the end
·
·
·
·
·
·
·
·
·
5/17

What is a Reproducible Analytical Pipeline (RAP)?
It is a process
It is easily repeatable
It is easily extendable
It is automated
It minimises mistakes
It is fast
It builds trust
·
·
·
·
·
·
·
6/17

What does a RAP look like?
It is a simple process:
linking inputs (data)
to outputs (publication)
·
·
7/17

What does a RAP look like?
This process can be decomposed:
Succession of tasks
Direct linkage of actions
·
·
8/17

What does a RAP look like?
This process can be decomposed:
Each task is coded
No manual actions
Each task uses inputs
Each task produces outputs
Easy to test tasks
individually
Each output is identified
·
·↪
·
·
·↪
·↪
9/17

What does a RAP look like?
This process is documented:
Each code has versions
Versions are annotated
Easy to follow tasks
development
Easy to track mistakes
·
·
·↪
·↪
10/17

What does a RAP look like?
This process is easy to save:
Each code is securely saved
Each version can be
revereted
Easy to undo/revert to
past version
Easy to test
·
·
·↪
·↪
11/17

What are the benefits?
Analysis within an RAP are:
Easy to use
Easy to find information
Easy for others to use
Easy to revise and adapt
Easy to reuse
Automated and fast
Open and promoting
trust
·
·
·
·
·
·
·
12/17

What do we need?
A good knowledge of the process
A good organisation:
An open source software
A versioning system
Time to learn
·
·
of files
of code
of documentation
-
-
-
·
·
·
13/17

RAP in practice
Implemented in some NSOs (Vanuatu)
Can be done easily with R/Rstudio
Can also be done with Python/Jupyter notebooks,
Quarto (both R, Python, Julia, others…)
Large community to help
·
·
·
·
14/17

Let’s Start!

Useful resources
The UK government RAP website.
UK best practice documentation.
A free RAP course to teach you all you need to know.
How the Data Science Campus sets its coding standards.
A new open-source book from the Alan Turing institute setting out how to do
reproducible data science.
·
·
·
·
·
16/17

Citing The Turing Way
Many of the beautiful images used in this presentation were taken from The
Turing Way book.
Full citation:
The Turing Way Community, Becky Arnold, Louise Bowler, Sarah Gibson, Patricia
Herterich, Rosie Higman, … Kirstie Whitaker. (2019, March 25). The Turing Way: A
Handbook for Reproducible Data Science (Version v0.0.4). Zenodo.
http://doi.org/10.5281/zenodo.3233986
17/17
Tags