ReproducibilityOrganize your workCode for othersDRYTakeaways
Principles of
Reproducible Analytical Pipelines
Good practices for
Reproducibility
Christophe Bontemps
Statistical Institute for Asia and the Pacic
Christophe Bontemps 1
ReproducibilityOrganize your workCode for othersDRYTakeaways
REPRODUCIBILITY
The termreproduciblesimply means that:
thesameanalysis with
thesamedata should lead to
thesameoutput (results)
Source: The Turing Way project
Christophe Bontemps 2
ReproducibilityOrganize your workCode for othersDRYTakeaways
REPRODUCIBILITY
The termreproduciblesimply means that:
thesameanalysis with
thesamedata should lead to
thesameoutput (results)
Source: The Turing Way project
Christophe Bontemps 2
ReproducibilityOrganize your workCode for othersDRYTakeaways
REPRODUCIBILITY
The termreproduciblesimply means that:
thesameanalysis with
thesamedata should lead to
thesameoutput (results)
Source: The Turing Way project
Christophe Bontemps 2
ReproducibilityOrganize your workCode for othersDRYTakeaways
REPRODUCIBILITY
The termreproduciblesimply means that:
thesameanalysis with
thesamedata should lead to
thesameoutput (results)
Source: The Turing Way project
Christophe Bontemps 2
ReproducibilityOrganize your workCode for othersDRYTakeaways
REPRODUCIBILITY
Related ideas are also interesting:
Replicable: The analysis
should work with other data
Robust: A different analysis
(same data) lead to similar
conclusions
Reusable: Some elements of
the code can be used with
other data
Adapted from The Turing Way project
Christophe Bontemps 3
ReproducibilityOrganize your workCode for othersDRYTakeaways
REPRODUCIBILITY
Related ideas are also interesting:
Replicable: The analysis
should work with other data
Robust: A different analysis
(same data) lead to similar
conclusions
Reusable: Some elements of
the code can be used with
other data
Adapted from The Turing Way project
Christophe Bontemps 3
ReproducibilityOrganize your workCode for othersDRYTakeaways
REPRODUCIBILITY
Related ideas are also interesting:
Replicable: The analysis
should work with other data
Robust: A different analysis
(same data) lead to similar
conclusions
Reusable: Some elements of
the code can be used with
other data
Adapted from The Turing Way project
Christophe Bontemps 3
ReproducibilityOrganize your workCode for othersDRYTakeaways
REPRODUCIBILITY
Related ideas are also interesting:
Replicable: The analysis
should work with other data
Robust: A different analysis
(same data) lead to similar
conclusions
Reusable: Some elements of
the code can be used with
other data
Adapted from The Turing Way project
Christophe Bontemps 3
ReproducibilityOrganize your workCode for othersDRYTakeaways
3MAIN PRINCIPLES:
1.
2.3. Do notRepeatYourself
Apply this in context (colleagues, code, software,...)
Christophe Bontemps 4
ReproducibilityOrganize your workCode for othersDRYTakeaways
3MAIN PRINCIPLES:
1.
2.3. Do notRepeatYourself
Apply this in context (colleagues, code, software,...)
Christophe Bontemps 4
ReproducibilityOrganize your workCode for othersDRYTakeaways
3MAIN PRINCIPLES:
1.
2.3. Do notRepeatYourself
Apply this in context (colleagues, code, software,...)
Christophe Bontemps 4
ReproducibilityOrganize your workCode for othersDRYTakeaways
3MAIN PRINCIPLES:
1.
2.3. Do notRepeatYourself
Apply this in context (colleagues, code, software,...)
Christophe Bontemps 4
ReproducibilityOrganize your workCode for othersDRYTakeaways
3MAIN PRINCIPLES:
1.
2.3. Do notRepeatYourself
Apply this in context (colleagues, code, software,...)
Christophe Bontemps 4
ReproducibilityOrganize your workCode for othersDRYTakeaways
ORGANIZE YOUR WORK
Have a clear directory structure
ISeparate les into
data (raw,
transformed),
programs, results
and documentation
IMake directories
portable (relative
path)
Christophe Bontemps 5
ReproducibilityOrganize your workCode for othersDRYTakeaways
ORGANIZE YOUR WORK
Have a clear directory structure
ISeparate les into
data (raw,
transformed),
programs, results
and documentation
IMake directories
portable (relative
path)
Example of a well-organized directory structure.
Christophe Bontemps 5
ReproducibilityOrganize your workCode for othersDRYTakeaways
ORGANIZE YOUR WORK
Have a clear directory structure
ISeparate les into
data (raw,
transformed),
programs, results
and documentation
IMake directories
portable (relative
path)
Usual
Mydata <-
read.csv("c://document/2024/RAPCourse/Data/TradeData.csv")
Christophe Bontemps 5
ReproducibilityOrganize your workCode for othersDRYTakeaways
ORGANIZE YOUR WORK
Have a clear directory structure
ISeparate les into
data (raw,
transformed),
programs, results
and documentation
IMake directories
portable (relative
path)
Usual
Mydata <-
read.csv("c://document/2024/RAPCourse/Data/TradeData.csv")
Better
Mydata <- read.csv("Data/TradeData.csv")
Christophe Bontemps 5
ReproducibilityOrganize your workCode for othersDRYTakeaways
ORGANIZE YOUR WORK
Use naming conventions:
For les
IAvoid lazy
names
IMeaningful
les names
IOrder of
execution
Usual
prog1.R
prog2.R
table.R
model.R
Christophe Bontemps 6
ReproducibilityOrganize your workCode for othersDRYTakeaways
ORGANIZE YOUR WORK
Use naming conventions:
For les
IAvoid lazy
names
IMeaningful
les names
IOrder of
execution
Usual
prog1.R
prog2.R
table.R
model.R
Better
Cleaning_Data.R
Stat_Desc.R
Statistics_Trade.R
Regression_Trade.R
Christophe Bontemps 6
ReproducibilityOrganize your workCode for othersDRYTakeaways
ORGANIZE YOUR WORK
Use naming conventions:
For les
IAvoid lazy
names
IMeaningful
les names
IOrder of
execution
Usual
prog1.R
prog2.R
table.R
model.R
Even better
01_Cleaning_data.R
02_Stat_Desc.R
03_Statistics_Trade.R
03_Regression_Trade.R
Christophe Bontemps 6
ReproducibilityOrganize your workCode for othersDRYTakeaways
ORGANIZE YOUR WORK
Use naming conventions:
For outputs
IAvoid
numbering
IExplicit type
of output
Usual
Table1.pdf
Table2.pdf
Reg1.jpg
Model.csv
Christophe Bontemps 7
ReproducibilityOrganize your workCode for othersDRYTakeaways
ORGANIZE YOUR WORK
Use naming conventions:
For outputs
IAvoid
numbering
IExplicit type
of output
Usual
Table1.pdf
Table2.pdf
Reg1.jpg
Model.csv
Better
Stat_Desc_Table.pdf
Trade_Stat_Table.pdf
Reg_Trade_Graphic.jpg
Reg_Trade_Results.csv
Christophe Bontemps 7
ReproducibilityOrganize your workCode for othersDRYTakeaways
ORGANIZE YOUR WORK
Keep track of the workow:
ICut and paste should be
avoided
IEvery step of the process is
coded
IManage (and draw) the
workow
Example of a simple workow.
Christophe Bontemps 8
ReproducibilityOrganize your workCode for othersDRYTakeaways
ORGANIZE YOUR WORK
Keep track of the workow:
ICut and paste should be
avoided
IEvery step of the process is
coded
IManage (and draw) the
workow
Example of a simple workow.
Christophe Bontemps 8
ReproducibilityOrganize your workCode for othersDRYTakeaways
ORGANIZE YOUR WORK
Keep track of the workow:
ICut and paste should be
avoided
IEvery step of the process is
coded
IManage (and draw) the
workow
Example of a simple workow.
Christophe Bontemps 8
ReproducibilityOrganize your workCode for othersDRYTakeaways
ORGANIZE YOUR WORK
Use a version control system (Git/GitHub)
Christophe Bontemps 9
ReproducibilityOrganize your workCode for othersDRYTakeaways
CODE FOR OTHERS (INCLUDING YOUR "future self")
Program with style:
IUseliterate programmingLet us concentrate rather on explaining to humans
what we want the computer to do
D. Knuth (1984)
IUse conventions on layout (Comments, indentation,...)
Christophe Bontemps 10
ReproducibilityOrganize your workCode for othersDRYTakeaways
CODE FOR OTHERS (INCLUDING YOUR "future self")
Program with style:
IUseliterate programmingLet us concentrate rather on explaining to humans
what we want the computer to do
D. Knuth (1984)
IUse conventions on layout (Comments, indentation,...)
Christophe Bontemps 10
ReproducibilityOrganize your workCode for othersDRYTakeaways
CODE FOR OTHERS (INCLUDING YOUR "future self")
Program with style:
IUseliterate programmingLet us concentrate rather on explaining to humans
what we want the computer to do
D. Knuth (1984)
IUse conventions on layout (Comments, indentation,...)
Christophe Bontemps 10
ReproducibilityOrganize your workCode for othersDRYTakeaways
CODE FOR OTHERS
Program with style:
Use pipe operator%>%(tidyverse)Classic R programmingnrow(select(filter(TradeData, Export == "Beef")))
With pipe operator %>%
TradeData %>%
filter(Export=="Beef") %>%
nrow()
Christophe Bontemps 11
ReproducibilityOrganize your workCode for othersDRYTakeaways
CODE FOR OTHERS
Program with style:
Use pipe operator%>%(tidyverse)Classic R programmingnrow(select(filter(TradeData, Export == "Beef")))
With pipe operator %>%
TradeData %>%
filter(Export=="Beef") %>%
nrow()
Christophe Bontemps 11
ReproducibilityOrganize your workCode for othersDRYTakeaways
CODE FOR OTHERS
Program with style:
Use pipe operator%>%(tidyverse)Classic R programmingnrow(select(filter(TradeData, Export == "Beef")))
With pipe operator %>%
TradeData %>%
filter(Export=="Beef") %>%
nrow()
Christophe Bontemps 11
ReproducibilityOrganize your workCode for othersDRYTakeaways
CODE FOR OTHERS
Program with style:
Use pipe operator%>%(tidyverse)Classic R programmingnrow(select(filter(TradeData, Export == "Beef")))
With pipe operator %>%
TradeData %>%
filter(Export=="Beef") %>%
nrow()
Christophe Bontemps 11
ReproducibilityOrganize your workCode for othersDRYTakeaways
CODE FOR OTHERS
Program with style:
Use pipe operator%>%(tidyverse)Classic R programmingnrow(select(filter(TradeData, Export == "Beef")))
With pipe operator %>%
TradeData %>%
filter(Export=="Beef") %>%
nrow()
Christophe Bontemps 11
ReproducibilityOrganize your workCode for othersDRYTakeaways
CODE FOR OTHERS
Program with style:
Use pipe operator%>%(tidyverse)Classic R programmingnrow(select(filter(TradeData, Export == "Beef")))
With pipe operator %>%
TradeData %>%
filter(Export=="Beef") %>%
nrow()
Christophe Bontemps 11
ReproducibilityOrganize your workCode for othersDRYTakeaways
CODE FOR OTHERS
Program with style
IAvoid
ambiguities
IAvoid
changing
units
Usual
sex <- ifelse(gender == "1001", 1, 2)
Better
female <- ifelse(gender == "1001",1,0)
male <- ifelse(gender != "1001",1,0)
Christophe Bontemps 12
ReproducibilityOrganize your workCode for othersDRYTakeaways
CODE FOR OTHERS
Program with style
IAvoid
ambiguities
IAvoid
changing
units
Usual
gdp <- gdp/118.722
Christophe Bontemps 12
ReproducibilityOrganize your workCode for othersDRYTakeaways
CODE FOR OTHERS
Program with style
IAvoid
ambiguities
IAvoid
changing
units
Usual
gdp <- gdp/118.722
Better
gdp_US <- gdp / 118.722
Christophe Bontemps 12
ReproducibilityOrganize your workCode for othersDRYTakeaways
CODE FOR OTHERS
Program with style
IAvoid
ambiguities
IAvoid
changing
units
Usual
gdp <- gdp/118.722
Even better
US_Vanu_exch_rate <- 118.722
gdp_US <- gdp / US_Vanu_exch_rate
Christophe Bontemps 12
ReproducibilityOrganize your workCode for othersDRYTakeaways
DO NOTREPEATYOURSELF
Create reusable objects
IStore
values
IAvoid
repetitions
IUse
functions
Usual
Current_Data <- subset(Mydata, year ==2023)
Christophe Bontemps 13
ReproducibilityOrganize your workCode for othersDRYTakeaways
DO NOTREPEATYOURSELF
Create reusable objects
IStore
values
IAvoid
repetitions
IUse
functions
Usual
Current_Data <- subset(Mydata, year ==2023)
Better
Current_year <- 2023
Current_Data <- subset(Mydata,
year == Current_year)
Christophe Bontemps 13
ReproducibilityOrganize your workCode for othersDRYTakeaways
DO NOTREPEATYOURSELF
Create reusable objects
IStore
values
IAvoid
repetitions
IUse
functions
Usual
data <- Mydata[Mydata$export == "Beef", ]
plot(data$Year, data$Value,
main = "Export for Beef")
data <- Mydata[Mydata$export == "Kava", ]
plot(data$Year, data$Value,
main = "Export for Kava")
...
Christophe Bontemps 13
ReproducibilityOrganize your workCode for othersDRYTakeaways
DO NOTREPEATYOURSELF
Create reusable objects
IStore
values
IAvoid
repetitions
IUse
functions
Better
type <- "Beef"
Mydata %>%
filter(exports == type) %>%
ggplot() +
aes(x = Year, y = Value) +
geom_point() +
ggtitle(paste("Export for ", type))
Christophe Bontemps 13
ReproducibilityOrganize your workCode for othersDRYTakeaways
DO NOTREPEATYOURSELF
Create reusable objects
IStore
values
IAvoid
repetitions
IUse
functions
Even better
Exports_graphic <- function(type) {
Mydata %>%
filter(exports == type) %>%
ggplot() +
aes(x = Year, y = Value) +
geom_point() +
ggtitle(paste("Export for ", type))
}
Exports_graphic("Beef")
Exports_graphic("Kava")
Christophe Bontemps 13
ReproducibilityOrganize your workCode for othersDRYTakeaways
OTHER PRINCIPLES
IDiscuss with colleagues that may use your work
IAutomatize as much as you can,!Reduces your brain's memory burdenIThere are easy steps everybody can do,!Write small programs, one for each taskIUse open source program,!Easier to share, easier to automatize,!Also cost-effectiveITest your work regularly:
Christophe Bontemps 14
ReproducibilityOrganize your workCode for othersDRYTakeaways
OTHER PRINCIPLES
IDiscuss with colleagues that may use your work
IAutomatize as much as you can,!Reduces your brain's memory burdenIThere are easy steps everybody can do,!Write small programs, one for each taskIUse open source program,!Easier to share, easier to automatize,!Also cost-effectiveITest your work regularly:
Christophe Bontemps 14
ReproducibilityOrganize your workCode for othersDRYTakeaways
OTHER PRINCIPLES
IDiscuss with colleagues that may use your work
IAutomatize as much as you can,!Reduces your brain's memory burdenIThere are easy steps everybody can do,!Write small programs, one for each taskIUse open source program,!Easier to share, easier to automatize,!Also cost-effectiveITest your work regularly:
Christophe Bontemps 14
ReproducibilityOrganize your workCode for othersDRYTakeaways
OTHER PRINCIPLES
IDiscuss with colleagues that may use your work
IAutomatize as much as you can,!Reduces your brain's memory burdenIThere are easy steps everybody can do,!Write small programs, one for each taskIUse open source program,!Easier to share, easier to automatize,!Also cost-effectiveITest your work regularly:
Christophe Bontemps 14
ReproducibilityOrganize your workCode for othersDRYTakeaways
OTHER PRINCIPLES
IDiscuss with colleagues that may use your work
IAutomatize as much as you can,!Reduces your brain's memory burdenIThere are easy steps everybody can do,!Write small programs, one for each taskIUse open source program,!Easier to share, easier to automatize,!Also cost-effectiveITest your work regularly:
Christophe Bontemps 14
ReproducibilityOrganize your workCode for othersDRYTakeaways
OTHER PRINCIPLES
IDiscuss with colleagues that may use your work
IAutomatize as much as you can,!Reduces your brain's memory burdenIThere are easy steps everybody can do,!Write small programs, one for each taskIUse open source program,!Easier to share, easier to automatize,!Also cost-effectiveITest your work regularly:
Christophe Bontemps 14
ReproducibilityOrganize your workCode for othersDRYTakeaways
OTHER PRINCIPLES
IDiscuss with colleagues that may use your work
IAutomatize as much as you can,!Reduces your brain's memory burdenIThere are easy steps everybody can do,!Write small programs, one for each taskIUse open source program,!Easier to share, easier to automatize,!Also cost-effectiveITest your work regularly:
Christophe Bontemps 14
ReproducibilityOrganize your workCode for othersDRYTakeaways
OTHER PRINCIPLES
IDiscuss with colleagues that may use your work
IAutomatize as much as you can,!Reduces your brain's memory burdenIThere are easy steps everybody can do,!Write small programs, one for each taskIUse open source program,!Easier to share, easier to automatize,!Also cost-effectiveITest your work regularly:
Christophe Bontemps 14
ReproducibilityOrganize your workCode for othersDRYTakeaways
OTHER PRINCIPLES
IDiscuss with colleagues that may use your work
IAutomatize as much as you can,!Reduces your brain's memory burdenIThere are easy steps everybody can do,!Write small programs, one for each taskIUse open source program,!Easier to share, easier to automatize,!Also cost-effectiveITest your work regularly:
Christophe Bontemps 14
ReproducibilityOrganize your workCode for othersDRYTakeaways
OTHER PRINCIPLES
IDiscuss with colleagues that may use your work
IAutomatize as much as you can,!Reduces your brain's memory burdenIThere are easy steps everybody can do,!Write small programs, one for each taskIUse open source program,!Easier to share, easier to automatize,!Also cost-effectiveITest your work regularly:
Christophe Bontemps 14
ReproducibilityOrganize your workCode for othersDRYTakeaways
OTHER PRINCIPLES
IDiscuss with colleagues that may use your work
IAutomatize as much as you can,!Reduces your brain's memory burdenIThere are easy steps everybody can do,!Write small programs, one for each taskIUse open source program,!Easier to share, easier to automatize,!Also cost-effectiveITest your work regularly:
Do what has been said, say what has been done, and
check that what has been said has really been done !
Christophe Bontemps 14
ReproducibilityOrganize your workCode for othersDRYTakeaways
OTHER PRINCIPLES
IDiscuss with colleagues that may use your work
IAutomatize as much as you can,!Reduces your brain's memory burdenIThere are easy steps everybody can do,!Write small programs, one for each taskIUse open source program,!Easier to share, easier to automatize,!Also cost-effectiveITest your work regularly:
Codewhat has been said, say what has beencoded, and
check that what has been said has really beencoded!
Christophe Bontemps 14