Replication Crisis in Psycholhhhyogy.ppt

GyanviAgarwal3 12 views 25 slides Mar 05, 2025
Slide 1
Slide 1 of 25
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25

About This Presentation

Hh


Slide Content

2011: Daryl Bem published a paper in JPSP
(top journal) claiming to have found evidence
that ESP exists
Post-publication, the significant effects were
attributed to an excessive familywise error
rate
Did not correct alpha for multiple comparisons
(e.g., through a Bonferroni correction)
His evidence for ESP is chalked up to false
positives (i.e., Type I errors)

2011: Cases of scientific fraud
Diederik Stapel busted for
fabricating data
2011-present: Major findings in
social psychology not replicated
Ego depletion, embodied cognition,
power posing
Behavioural priming under fire
2015: Open Science
Collaboration failed to replicate
most studies in psychology
Strack: fake smile boosts
actual happiness
Cuddy: Power posing
boosts testosterone,
decreases cortisol

Replicability of results is essential for science
If results do not replicate, how can we be sure that
they exist at all?
Open Science Collaboration attempted to replicate 100
studies published in 3 top psychology journals in 2008
Replications used materials supplied by original
authors and were high-powered
Results: 39% of the original studies were successfully
replicated
25% of social psychology studies replicated
50% of cognitive psychology studies replicated
Effect sizes overestimated in original studies

Sampling variability: even with a direct
replication, no two samples are exactly the
same
Hidden moderators across studies
E.g., different cultural contexts, time in
history, demographic characteristics
Low statistical power
False positives in original study
False negatives in replication study

Traditionally, journals were biased in favour of
publishing:
1. Significant findings
Non-significant findings relegated to a
researcher’s “file drawer”
2. Novel findings
Counterintuitive, surprising findings more likely to
be published in top journals

Academics are incentivized to publish
“flashy” results in top journals
Earns jobs, promotions, editorships at journals,
traditional media coverage, esteem
Academics may engage in questionable
research practices, or, in the worst-case
scenario, engage in data fraud, all to drive p-
values below .05.
Results in a plethora of false positives in the
research literature that don’t replicate

False positives are Type I errors
Claim an effect exists when it actually doesn’t
i.e., incorrectly reject a null hypothesis that there is
no effect (should accept there is no effect)
p < .05 means that you are willing to accept a 5%
chance of false positives
False negatives are Type II errors
Claim an effect doesn’t exist when it actually does
Simmons, Nelson, & Simonsohn (2011) demonstrated
that it’s easy to statistically support a hypothesis that is
actually false (i.e., find a false positive)

Can be dangerous
E.g., health research: claim that a treatment is effective
when actually it isn’t, or even has negative side effects
Wastes resources
Researchers waste time, effort, and money conducting
research on effects that don’t actually exist
Hard to excise false positives from the literature
Not enough incentives for researchers to conduct
replications that debunk the false positives
Erodes the credibility of psychological science
“Fake science”

Researchers may engage in questionable
research practices not because they
intend to be dishonest
Rather, they are motivated to make
decisions that support their hypotheses
Make decisions so that p-value will be
below .05 and can claim statistical significance
Pervasive among researchers
But with every self-serving decision made,
the chance of false positives increases

Researchers have many decisions to make
when conducting a study:
Choosing a sample size - stopping data collection
How to deal with outliers/illegitimate responses
Which conditions/groups should be compared
Creating variables
Which items? Transformations?
Which variables should be included in analyses
IVs, DVs, controls, mediators, moderators
Making each decision increases the false
positive rate

Common for researchers to collect data,
conduct analyses, and if not significant, to
collect more data
Stop collecting data when p < .05
More often you test for significance, the higher
the likelihood of false positives
The lower the initial sample size, and the fewer
the number of participants added after each
subsequent data collection, the higher the
likelihood of false positives
Lakens: sequential analysis

Researchers may increase the false positive rate even when
not engaging in questionable research practices
They have multiple choices to make when analysing data
depending on the data at hand
The choices follow from their hypotheses, and are not an
egregious “p-hacking/data fishing” expedition
However, the fact that there are so many analytic paths to
take increases the researcher degrees of freedom and
likelihood of false positives
Small sample sizes and measurement error make it likely that
results will not replicate
Can we trust any published findings anymore?
What can we do to reduce false positives?

P-hacking: “Data fishing expedition”
Try different types of analyses until p-value is
driven below .05
P-curve analysis attempts to detect
presence of p-hacking and publication
bias/file drawer problem (Simonsohn, Nelson, &
Simmons, 2014)
P-curve is the distribution of significant p-
values in a body of research

P-curve for p-hacked data will be left-skewed (tail on left side)
More p-values around .04 or .05 than .01 or .02
P-curve for non-p-hacked data will be right-skewed (tail on
right side)
More p-values around .01 or .02 than .04 or .05

Hypothesizing After Results are Known
Look at the data first and then create a post-hoc
hypothesis; present it as if it were developed a
priori
State your hypotheses before collecting data
Make your hypotheses as clear as possible
in your dissertation proposals
Should be directional, comprehensive yet
parsimonious

Bad hypothesis: “There will be a significant
difference in collectivism between Americans
and Indians” (non-directional)
Good hypothesis: “Indians will be significantly
higher in interdependence than Americans”
Even better: build a process model in your
hypotheses (mediation and/or moderation)
Example: “Indians will be significantly higher in
interdependence than Americans and, in turn, will
demonstrate a more holistic cognitive style”

Researchers need to disclose their degrees of
freedom so reviewers/other researchers can fully
evaluate their work
Preregistration
List all variables, materials, hypotheses, and analysis
plans before collect data
Eliminates selective reporting of results, HARKing
Materials and procedure archived online (e.g., Open
Science Framework: https://.osf.io)
Open access to data
Public repository of findings to reduce the file drawer
problem

Simmons et al. (2011) suggest at least N =
20 per group, but this isn’t large enough
Within-subject designs rather than between-
subjects designs
More power for the same number of
participants

Cohen’s effect size guidelines:
Effect size as correlation (r)
Effect size as mean difference (d)
Small effect sizes can only be
accurately detected with high
statistical power
Download G*Power
http://www.gpower.hhu.de/en.html

Exploratory work necessary and important
However, it should be stated in the paper that the
work was exploratory. Do not try to pass off
exploratory work as confirmatory.
Best to do an exact replication of the exploratory
work to see if it can be confirmed
But resource and time-intensive
Two studies: first is exploratory but still based on
theory; second is confirmatory and preregistered

Behavioural (or social) priming: exposing people to incidental cues/primes
(e.g., words, pictures) influences other behaviour without awareness
E.g., expose participants to words related to the elderly and they tend to
walk slower (Bargh, Chen, & Burrows, 1996)
N = 60
Doyen et al. (2012) were able to replicate the effect only when the
experimenters expected participants to walk slower; did not replicate when
experimenters did not expect this. Demand characteristics play a role.
Study 1: N = 120

Within-subject design; supraliminal and subliminal
presentation of primes
Across 6 studies, N = 988 (high power)
Study 1: N = 153 college students
Study 2: N = 219 Mturk users
Study 3: N = 115 Mturk users
Found that primes influenced gambling decisions
So it may be premature to declare that behavioural
priming doesn’t exist. Just need powerful research
designs to detect effects.

Replication should be just as important as
innovation in science
Registered replication reports
Science is a continual process of updating
what we know, self-correcting as we go
along: innovation, doesn’t replicate, figure
out why  more innovation, replicate
Psychology has changed in the last 5-6
years. Open science becoming the norm.
Social media instrumental. No excuses!
Tags