M3-ReproducibleDocuments Reproducability analytical pipelines.pdf

AhmedElKordy19 9 views 57 slides Mar 07, 2025
Slide 1
Slide 1 of 57
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27
Slide 28
28
Slide 29
29
Slide 30
30
Slide 31
31
Slide 32
32
Slide 33
33
Slide 34
34
Slide 35
35
Slide 36
36
Slide 37
37
Slide 38
38
Slide 39
39
Slide 40
40
Slide 41
41
Slide 42
42
Slide 43
43
Slide 44
44
Slide 45
45
Slide 46
46
Slide 47
47
Slide 48
48
Slide 49
49
Slide 50
50
Slide 51
51
Slide 52
52
Slide 53
53
Slide 54
54
Slide 55
55
Slide 56
56
Slide 57
57

About This Presentation

RAP


Slide Content

ReproducibilityOrganize your workCode for othersDRYTakeaways
Principles of
Reproducible Analytical Pipelines
Good practices for
Reproducibility
Christophe Bontemps
Statistical Institute for Asia and the Pacic
Christophe Bontemps 1

ReproducibilityOrganize your workCode for othersDRYTakeaways
REPRODUCIBILITY
The termreproduciblesimply means that:
thesameanalysis with
thesamedata should lead to
thesameoutput (results)
Source: The Turing Way project
Christophe Bontemps 2

ReproducibilityOrganize your workCode for othersDRYTakeaways
REPRODUCIBILITY
The termreproduciblesimply means that:
thesameanalysis with
thesamedata should lead to
thesameoutput (results)
Source: The Turing Way project
Christophe Bontemps 2

ReproducibilityOrganize your workCode for othersDRYTakeaways
REPRODUCIBILITY
The termreproduciblesimply means that:
thesameanalysis with
thesamedata should lead to
thesameoutput (results)
Source: The Turing Way project
Christophe Bontemps 2

ReproducibilityOrganize your workCode for othersDRYTakeaways
REPRODUCIBILITY
The termreproduciblesimply means that:
thesameanalysis with
thesamedata should lead to
thesameoutput (results)
Source: The Turing Way project
Christophe Bontemps 2

ReproducibilityOrganize your workCode for othersDRYTakeaways
REPRODUCIBILITY
Related ideas are also interesting:
Replicable: The analysis
should work with other data
Robust: A different analysis
(same data) lead to similar
conclusions
Reusable: Some elements of
the code can be used with
other data
Adapted from The Turing Way project
Christophe Bontemps 3

ReproducibilityOrganize your workCode for othersDRYTakeaways
REPRODUCIBILITY
Related ideas are also interesting:
Replicable: The analysis
should work with other data
Robust: A different analysis
(same data) lead to similar
conclusions
Reusable: Some elements of
the code can be used with
other data
Adapted from The Turing Way project
Christophe Bontemps 3

ReproducibilityOrganize your workCode for othersDRYTakeaways
REPRODUCIBILITY
Related ideas are also interesting:
Replicable: The analysis
should work with other data
Robust: A different analysis
(same data) lead to similar
conclusions
Reusable: Some elements of
the code can be used with
other data
Adapted from The Turing Way project
Christophe Bontemps 3

ReproducibilityOrganize your workCode for othersDRYTakeaways
REPRODUCIBILITY
Related ideas are also interesting:
Replicable: The analysis
should work with other data
Robust: A different analysis
(same data) lead to similar
conclusions
Reusable: Some elements of
the code can be used with
other data
Adapted from The Turing Way project
Christophe Bontemps 3

ReproducibilityOrganize your workCode for othersDRYTakeaways
3MAIN PRINCIPLES:
1.
2.3. Do notRepeatYourself
Apply this in context (colleagues, code, software,...)
Christophe Bontemps 4

ReproducibilityOrganize your workCode for othersDRYTakeaways
3MAIN PRINCIPLES:
1.
2.3. Do notRepeatYourself
Apply this in context (colleagues, code, software,...)
Christophe Bontemps 4

ReproducibilityOrganize your workCode for othersDRYTakeaways
3MAIN PRINCIPLES:
1.
2.3. Do notRepeatYourself
Apply this in context (colleagues, code, software,...)
Christophe Bontemps 4

ReproducibilityOrganize your workCode for othersDRYTakeaways
3MAIN PRINCIPLES:
1.
2.3. Do notRepeatYourself
Apply this in context (colleagues, code, software,...)
Christophe Bontemps 4

ReproducibilityOrganize your workCode for othersDRYTakeaways
3MAIN PRINCIPLES:
1.
2.3. Do notRepeatYourself
Apply this in context (colleagues, code, software,...)
Christophe Bontemps 4

ReproducibilityOrganize your workCode for othersDRYTakeaways
ORGANIZE YOUR WORK
Have a clear directory structure
ISeparate les into
data (raw,
transformed),
programs, results
and documentation
IMake directories
portable (relative
path)
Christophe Bontemps 5

ReproducibilityOrganize your workCode for othersDRYTakeaways
ORGANIZE YOUR WORK
Have a clear directory structure
ISeparate les into
data (raw,
transformed),
programs, results
and documentation
IMake directories
portable (relative
path)
Example of a well-organized directory structure.
Christophe Bontemps 5

ReproducibilityOrganize your workCode for othersDRYTakeaways
ORGANIZE YOUR WORK
Have a clear directory structure
ISeparate les into
data (raw,
transformed),
programs, results
and documentation
IMake directories
portable (relative
path)
Usual
Mydata <-
read.csv("c://document/2024/RAPCourse/Data/TradeData.csv")
Christophe Bontemps 5

ReproducibilityOrganize your workCode for othersDRYTakeaways
ORGANIZE YOUR WORK
Have a clear directory structure
ISeparate les into
data (raw,
transformed),
programs, results
and documentation
IMake directories
portable (relative
path)
Usual
Mydata <-
read.csv("c://document/2024/RAPCourse/Data/TradeData.csv")
Better
Mydata <- read.csv("Data/TradeData.csv")
Christophe Bontemps 5

ReproducibilityOrganize your workCode for othersDRYTakeaways
ORGANIZE YOUR WORK
Use naming conventions:
For les
IAvoid lazy
names
IMeaningful
les names
IOrder of
execution
Usual
prog1.R
prog2.R
table.R
model.R
Christophe Bontemps 6

ReproducibilityOrganize your workCode for othersDRYTakeaways
ORGANIZE YOUR WORK
Use naming conventions:
For les
IAvoid lazy
names
IMeaningful
les names
IOrder of
execution
Usual
prog1.R
prog2.R
table.R
model.R
Better
Cleaning_Data.R
Stat_Desc.R
Statistics_Trade.R
Regression_Trade.R
Christophe Bontemps 6

ReproducibilityOrganize your workCode for othersDRYTakeaways
ORGANIZE YOUR WORK
Use naming conventions:
For les
IAvoid lazy
names
IMeaningful
les names
IOrder of
execution
Usual
prog1.R
prog2.R
table.R
model.R
Even better
01_Cleaning_data.R
02_Stat_Desc.R
03_Statistics_Trade.R
03_Regression_Trade.R
Christophe Bontemps 6

ReproducibilityOrganize your workCode for othersDRYTakeaways
ORGANIZE YOUR WORK
Use naming conventions:
For outputs
IAvoid
numbering
IExplicit type
of output
Usual
Table1.pdf
Table2.pdf
Reg1.jpg
Model.csv
Christophe Bontemps 7

ReproducibilityOrganize your workCode for othersDRYTakeaways
ORGANIZE YOUR WORK
Use naming conventions:
For outputs
IAvoid
numbering
IExplicit type
of output
Usual
Table1.pdf
Table2.pdf
Reg1.jpg
Model.csv
Better
Stat_Desc_Table.pdf
Trade_Stat_Table.pdf
Reg_Trade_Graphic.jpg
Reg_Trade_Results.csv
Christophe Bontemps 7

ReproducibilityOrganize your workCode for othersDRYTakeaways
ORGANIZE YOUR WORK
Keep track of the workow:
ICut and paste should be
avoided
IEvery step of the process is
coded
IManage (and draw) the
workow
Example of a simple workow.
Christophe Bontemps 8

ReproducibilityOrganize your workCode for othersDRYTakeaways
ORGANIZE YOUR WORK
Keep track of the workow:
ICut and paste should be
avoided
IEvery step of the process is
coded
IManage (and draw) the
workow
Example of a simple workow.
Christophe Bontemps 8

ReproducibilityOrganize your workCode for othersDRYTakeaways
ORGANIZE YOUR WORK
Keep track of the workow:
ICut and paste should be
avoided
IEvery step of the process is
coded
IManage (and draw) the
workow
Example of a simple workow.
Christophe Bontemps 8

ReproducibilityOrganize your workCode for othersDRYTakeaways
ORGANIZE YOUR WORK
Use a version control system (Git/GitHub)
Christophe Bontemps 9

ReproducibilityOrganize your workCode for othersDRYTakeaways
CODE FOR OTHERS (INCLUDING YOUR "future self")
Program with style:
IUseliterate programming“Let us concentrate rather on explaining to humans
what we want the computer to do”
D. Knuth (1984)
IUse conventions on layout (Comments, indentation,...)
Christophe Bontemps 10

ReproducibilityOrganize your workCode for othersDRYTakeaways
CODE FOR OTHERS (INCLUDING YOUR "future self")
Program with style:
IUseliterate programming“Let us concentrate rather on explaining to humans
what we want the computer to do”
D. Knuth (1984)
IUse conventions on layout (Comments, indentation,...)
Christophe Bontemps 10

ReproducibilityOrganize your workCode for othersDRYTakeaways
CODE FOR OTHERS (INCLUDING YOUR "future self")
Program with style:
IUseliterate programming“Let us concentrate rather on explaining to humans
what we want the computer to do”
D. Knuth (1984)
IUse conventions on layout (Comments, indentation,...)
Christophe Bontemps 10

ReproducibilityOrganize your workCode for othersDRYTakeaways
CODE FOR OTHERS
Program with style:
Use pipe operator%>%(tidyverse)Classic R programmingnrow(select(filter(TradeData, Export == "Beef")))
With pipe operator %>%
TradeData %>%
filter(Export=="Beef") %>%
nrow()
Christophe Bontemps 11

ReproducibilityOrganize your workCode for othersDRYTakeaways
CODE FOR OTHERS
Program with style:
Use pipe operator%>%(tidyverse)Classic R programmingnrow(select(filter(TradeData, Export == "Beef")))
With pipe operator %>%
TradeData %>%
filter(Export=="Beef") %>%
nrow()
Christophe Bontemps 11

ReproducibilityOrganize your workCode for othersDRYTakeaways
CODE FOR OTHERS
Program with style:
Use pipe operator%>%(tidyverse)Classic R programmingnrow(select(filter(TradeData, Export == "Beef")))
With pipe operator %>%
TradeData %>%
filter(Export=="Beef") %>%
nrow()
Christophe Bontemps 11

ReproducibilityOrganize your workCode for othersDRYTakeaways
CODE FOR OTHERS
Program with style:
Use pipe operator%>%(tidyverse)Classic R programmingnrow(select(filter(TradeData, Export == "Beef")))
With pipe operator %>%
TradeData %>%
filter(Export=="Beef") %>%
nrow()
Christophe Bontemps 11

ReproducibilityOrganize your workCode for othersDRYTakeaways
CODE FOR OTHERS
Program with style:
Use pipe operator%>%(tidyverse)Classic R programmingnrow(select(filter(TradeData, Export == "Beef")))
With pipe operator %>%
TradeData %>%
filter(Export=="Beef") %>%
nrow()
Christophe Bontemps 11

ReproducibilityOrganize your workCode for othersDRYTakeaways
CODE FOR OTHERS
Program with style:
Use pipe operator%>%(tidyverse)Classic R programmingnrow(select(filter(TradeData, Export == "Beef")))
With pipe operator %>%
TradeData %>%
filter(Export=="Beef") %>%
nrow()
Christophe Bontemps 11

ReproducibilityOrganize your workCode for othersDRYTakeaways
CODE FOR OTHERS
Program with style
IAvoid
ambiguities
IAvoid
changing
units
Usual
sex <- ifelse(gender == "1001", 1, 2)
Better
female <- ifelse(gender == "1001",1,0)
male <- ifelse(gender != "1001",1,0)
Christophe Bontemps 12

ReproducibilityOrganize your workCode for othersDRYTakeaways
CODE FOR OTHERS
Program with style
IAvoid
ambiguities
IAvoid
changing
units
Usual
gdp <- gdp/118.722
Christophe Bontemps 12

ReproducibilityOrganize your workCode for othersDRYTakeaways
CODE FOR OTHERS
Program with style
IAvoid
ambiguities
IAvoid
changing
units
Usual
gdp <- gdp/118.722
Better
gdp_US <- gdp / 118.722
Christophe Bontemps 12

ReproducibilityOrganize your workCode for othersDRYTakeaways
CODE FOR OTHERS
Program with style
IAvoid
ambiguities
IAvoid
changing
units
Usual
gdp <- gdp/118.722
Even better
US_Vanu_exch_rate <- 118.722
gdp_US <- gdp / US_Vanu_exch_rate
Christophe Bontemps 12

ReproducibilityOrganize your workCode for othersDRYTakeaways
DO NOTREPEATYOURSELF
Create reusable objects
IStore
values
IAvoid
repetitions
IUse
functions
Usual
Current_Data <- subset(Mydata, year ==2023)
Christophe Bontemps 13

ReproducibilityOrganize your workCode for othersDRYTakeaways
DO NOTREPEATYOURSELF
Create reusable objects
IStore
values
IAvoid
repetitions
IUse
functions
Usual
Current_Data <- subset(Mydata, year ==2023)
Better
Current_year <- 2023
Current_Data <- subset(Mydata,
year == Current_year)
Christophe Bontemps 13

ReproducibilityOrganize your workCode for othersDRYTakeaways
DO NOTREPEATYOURSELF
Create reusable objects
IStore
values
IAvoid
repetitions
IUse
functions
Usual
data <- Mydata[Mydata$export == "Beef", ]
plot(data$Year, data$Value,
main = "Export for Beef")
data <- Mydata[Mydata$export == "Kava", ]
plot(data$Year, data$Value,
main = "Export for Kava")
...
Christophe Bontemps 13

ReproducibilityOrganize your workCode for othersDRYTakeaways
DO NOTREPEATYOURSELF
Create reusable objects
IStore
values
IAvoid
repetitions
IUse
functions
Better
type <- "Beef"
Mydata %>%
filter(exports == type) %>%
ggplot() +
aes(x = Year, y = Value) +
geom_point() +
ggtitle(paste("Export for ", type))
Christophe Bontemps 13

ReproducibilityOrganize your workCode for othersDRYTakeaways
DO NOTREPEATYOURSELF
Create reusable objects
IStore
values
IAvoid
repetitions
IUse
functions
Even better
Exports_graphic <- function(type) {
Mydata %>%
filter(exports == type) %>%
ggplot() +
aes(x = Year, y = Value) +
geom_point() +
ggtitle(paste("Export for ", type))
}
Exports_graphic("Beef")
Exports_graphic("Kava")
Christophe Bontemps 13

ReproducibilityOrganize your workCode for othersDRYTakeaways
OTHER PRINCIPLES
IDiscuss with colleagues that may use your work
IAutomatize as much as you can,!Reduces your brain's memory burdenIThere are easy steps everybody can do,!Write small programs, one for each taskIUse open source program,!Easier to share, easier to automatize,!Also cost-effectiveITest your work regularly:
Christophe Bontemps 14

ReproducibilityOrganize your workCode for othersDRYTakeaways
OTHER PRINCIPLES
IDiscuss with colleagues that may use your work
IAutomatize as much as you can,!Reduces your brain's memory burdenIThere are easy steps everybody can do,!Write small programs, one for each taskIUse open source program,!Easier to share, easier to automatize,!Also cost-effectiveITest your work regularly:
Christophe Bontemps 14

ReproducibilityOrganize your workCode for othersDRYTakeaways
OTHER PRINCIPLES
IDiscuss with colleagues that may use your work
IAutomatize as much as you can,!Reduces your brain's memory burdenIThere are easy steps everybody can do,!Write small programs, one for each taskIUse open source program,!Easier to share, easier to automatize,!Also cost-effectiveITest your work regularly:
Christophe Bontemps 14

ReproducibilityOrganize your workCode for othersDRYTakeaways
OTHER PRINCIPLES
IDiscuss with colleagues that may use your work
IAutomatize as much as you can,!Reduces your brain's memory burdenIThere are easy steps everybody can do,!Write small programs, one for each taskIUse open source program,!Easier to share, easier to automatize,!Also cost-effectiveITest your work regularly:
Christophe Bontemps 14

ReproducibilityOrganize your workCode for othersDRYTakeaways
OTHER PRINCIPLES
IDiscuss with colleagues that may use your work
IAutomatize as much as you can,!Reduces your brain's memory burdenIThere are easy steps everybody can do,!Write small programs, one for each taskIUse open source program,!Easier to share, easier to automatize,!Also cost-effectiveITest your work regularly:
Christophe Bontemps 14

ReproducibilityOrganize your workCode for othersDRYTakeaways
OTHER PRINCIPLES
IDiscuss with colleagues that may use your work
IAutomatize as much as you can,!Reduces your brain's memory burdenIThere are easy steps everybody can do,!Write small programs, one for each taskIUse open source program,!Easier to share, easier to automatize,!Also cost-effectiveITest your work regularly:
Christophe Bontemps 14

ReproducibilityOrganize your workCode for othersDRYTakeaways
OTHER PRINCIPLES
IDiscuss with colleagues that may use your work
IAutomatize as much as you can,!Reduces your brain's memory burdenIThere are easy steps everybody can do,!Write small programs, one for each taskIUse open source program,!Easier to share, easier to automatize,!Also cost-effectiveITest your work regularly:
Christophe Bontemps 14

ReproducibilityOrganize your workCode for othersDRYTakeaways
OTHER PRINCIPLES
IDiscuss with colleagues that may use your work
IAutomatize as much as you can,!Reduces your brain's memory burdenIThere are easy steps everybody can do,!Write small programs, one for each taskIUse open source program,!Easier to share, easier to automatize,!Also cost-effectiveITest your work regularly:
Christophe Bontemps 14

ReproducibilityOrganize your workCode for othersDRYTakeaways
OTHER PRINCIPLES
IDiscuss with colleagues that may use your work
IAutomatize as much as you can,!Reduces your brain's memory burdenIThere are easy steps everybody can do,!Write small programs, one for each taskIUse open source program,!Easier to share, easier to automatize,!Also cost-effectiveITest your work regularly:
Christophe Bontemps 14

ReproducibilityOrganize your workCode for othersDRYTakeaways
OTHER PRINCIPLES
IDiscuss with colleagues that may use your work
IAutomatize as much as you can,!Reduces your brain's memory burdenIThere are easy steps everybody can do,!Write small programs, one for each taskIUse open source program,!Easier to share, easier to automatize,!Also cost-effectiveITest your work regularly:
Christophe Bontemps 14

ReproducibilityOrganize your workCode for othersDRYTakeaways
OTHER PRINCIPLES
IDiscuss with colleagues that may use your work
IAutomatize as much as you can,!Reduces your brain's memory burdenIThere are easy steps everybody can do,!Write small programs, one for each taskIUse open source program,!Easier to share, easier to automatize,!Also cost-effectiveITest your work regularly:
“Do what has been said, say what has been done, and
check that what has been said has really been done !”
Christophe Bontemps 14

ReproducibilityOrganize your workCode for othersDRYTakeaways
OTHER PRINCIPLES
IDiscuss with colleagues that may use your work
IAutomatize as much as you can,!Reduces your brain's memory burdenIThere are easy steps everybody can do,!Write small programs, one for each taskIUse open source program,!Easier to share, easier to automatize,!Also cost-effectiveITest your work regularly:
“Codewhat has been said, say what has beencoded, and
check that what has been said has really beencoded!”
Christophe Bontemps 14