Biostatistics and epidemiology 01stats20

nokwazimhlongo02 15 views 35 slides Aug 26, 2024
Slide 1
Slide 1 of 35
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27
Slide 28
28
Slide 29
29
Slide 30
30
Slide 31
31
Slide 32
32
Slide 33
33
Slide 34
34
Slide 35
35

About This Presentation

More information on statistics


Slide Content

Introduction to Biostatistics and
Epidemiology
Vashini Pillay
[email protected]

Philosophical background
• Basic premise: there is an external, objective
“truth” that applies to the whole population
• We will never know the Truth
• We can estimate the Truth by testing a sample
of the population
• Make some inferences about the whole
population
• Question:
– How well does this estimate represent the Truth?

3 Questions
1) Is the sample data representative of the
population (ie free of bias)
– Can’t answer this question with statistical methods
– Need to examine how the data was collected
2) “Is there an association”
– Look at point estimate (RR / OR / RRR / ARR / NNT)
3) How likely is it that this result occurred by
chance?
– P values and confidence intervals may help

1. Bias
• A systematic error in the design, conduct
or analysis of the study which results in a
mistaken estimate of the exposure-
outcome relationship.
• It is NOTdue to random variability
(ie:chance).

1. Bias
• Selection Bias:
-systematic errors in the selection of subjects
(the manner in which subjects are selected into the study leads to
systematic differences in the distributions of these subjects in the
exposure/outcome groups compared to the original source
population)
• Information Bias:
-systematic errors in the collection of information
from study subjects
(some of the information collected with regards to eith er the exposure
or the outcome is incorrect resulting in study subjects bein g
misclassified into the incorrect study group)
Anna Grimsrud March 2009

1. Bias
• Types of Selection Bias:
-prevalence bias
-participation bias
-sampling bias
-LTFU
• Types of Information Bias:
-recall bias
-measurement bias
-observer bias
-assessment bias
Anna Grimsrud March 2009

Study Designs
• Observational Studies:
- Case Study
- Case Series
- Cross-sectional
- Cohort
- Case Control
• Experimental Studies:
- Randomized Controlled Trials

Study Designs
• Influence the way in which we sample
study population
• Influence the way in which we measure /
collect data
• Influence the manner in which we analyse
data

2. Is there an Association?
• Risk Ratio
the ratioof the risk of developing the outcome of
interest (eg:disease) in the exposed subjects to
the risk of developing the outcome of interest in
the non-exposedsubjects.
RR > 1 : Risk of disease is greater among the exposed than
among the non-exposed
RR = 1: Risk of disease is the same among the exposed and
the non-exposed
RR < 1: Risk of disease is less among the exposed than
among the non-exposed (ie: protective effect)

2. Is there an Association?
• Odds Ratio
the ratioof the odds of developing the outcome of
interest (eg:disease) in the exposed subjects to
the odds of developing the outcome of interest in
the non-exposedsubjects.
OR > 1 : Odds of disease is greater among the exposed than
among the non-exposed
OR = 1: Odds of disease is the same among the exposed and
the non-exposed
OR < 1: Odds of disease is less among the exposed than
among the non-exposed (ie: protective effect)

2. Is the Association real ?
– To “detect an association” that isn’t real = type 1 erro r
(False Positive in Diagnostic Testing)
– To “miss an association” that is actually there = type 2
error
(False Negative in Diagnostic Testing)
• This decision should be reviewed and quantified
on every analysis

3. How likely is it that this result
occurred by chance?
•P-values deal with probability of the
estimate
•Confidence Intervals also deal with the
precision of the estimate

Null Hypothesis
• The Null Hypothesis traditionally states
that there is no difference in association /
relationship between 2 measured
phenomena (default / reference point)
•Alternative Hypothesis states that there
is a difference in association / relationship
between 2 measured phenomena.

P-Value
• “The probability of obtaining a result as extreme
as this, assuming the Null Hypothesis is true”
• A measure to quantify your degree of certainty
with regards to the result obtained
(ie: the estimate of the probability that the resul t obtained has
occurred by pure statistical chance/accident).
• The smallerthe p-value, the more likely you are
to rejectthe Null (ie: the observed association is
very unlikely to have occurred by chance alone)

P-Value
• P-value = 0.5means that the probability of the result
obtained having happened by chance is 1 in 2.
• P-value =0.05means that the probability of the result
obtained having happened by chance is 1 in 20.
• P-value = 0.01 means that the probability of the result
obtained having happened by chance is 1 in 100.
• P-value = 0.001 means that the probability of the result
obtained having happened by chance is 1 in 1000

P-Value
• Traditionally a p-value < 0.05 or less
rejects the Null Hypothesis at the 5%
significance level suggesting statistical
significance.
BUT… in terms of clinical significance:
• Is a p-value of 0.049 very different from
that of 0.05???
• Is a p-value of 0.051 very different from
that of 0.05???

Problems with P-values
• Statisticians hate them (for many complex reasons)
– Major abuse of p value:
• Label variable S or NS (significant or not signifi cant)
-based on a single threshold value
-without looking at the magnitude of the effect
-without looking at the clinical significance of th e effect
– Eg a cancer etiology study shows
• Suggestive evidence of an enormous increase in risk with chemical A
–Risk ratio 13.4, p=0.051
• Strong evidence of a small increase in risk with c hemical B
–Risk ratio 1.10, p=0.001
– Chemical A: 13 times increased risk
– Chemical B: 10% increased risk
Never mind the p value, which chemical are you more afraid of?

Common misconceptions of P-Values
• P = 0.05 does not mean there is only a 5% chance that
the null hypothesis is true.
• P = 0.05 does not mean there is a 5% chance of a Type I
error (i.e. false positive).
• P = 0.05 does not mean there is a 95% chance that the
results would replicate if the study were repeated.
• P > 0.05 does not mean there is no difference between
groups.
• P < 0.05 does not mean you have proved your
experimental hypothesis.
Goodman S.A. Ann Intern Med. 1999;130:995-1004

Confidence interval
• Emphasis on precision of the estimate
-provides a range of values in which the estimate
obtained through ones analysis, would be considered
precise.
• Derived from same underlying parameters (variance and
sample size)
• Provides us with more information than a p-value

Normal distribution
• For a normally shaped distribution, 1
standard deviation on either side of the
mean contains 66% of the estimates
• 2 standard deviations contain on either
side of the mean contains 97% of the
estimates
• 1.96 standard deviations contain 95% of
the estimates

Sampling distribution showing effect of sampling error (SE).
Sheldon T A Evid Based Nurs 2000;3:36-39
©2000 by BMJ Publishing Group Ltd and RCN Publishing Company Ltd

Calculate 95% confidence interval
• Calculate sample statistic (point estimate)
• Calculate “standard error” (SE) of the statistic
– “Standard error” is similar to standard deviation
(ie: standard deviation of the sample population)
– Measure of the “spread” of the data
– Affected by sample size
• Calculate 1.96 x SE
• Upper limit: Point estimate + 1.96xSE
• Lower limit: Point estimate -1.96xSE

Sample size
• Larger sample size –smaller standard
error –narrower confidence intervals
• Greater precision
• If the upper limit and lower limit include /
cross the value 1, then the result is not
statistically significant (at 5% level)

The larger the sample (n), the smaller the sampling error.
Sheldon T A Evid Based Nurs 2000;3:36-39
©2000 by BMJ Publishing Group Ltd and RCN Publishing Company Ltd

Reporting a confidence interval
• “The truth” existed before you took your sample, it is
what it is, and it is unchanging
• Your estimates may be variable (depending on how you
took the sample, how many times you repeat the test)
• The truth is fixed, your estimates are flexible
• “We can say with 95% confidence that this interval
includes/covers/overlaps the Truth”
• You cannot say: “The truth falls within this interval”
– Implies that your borders are fixed, and the truth is variable, may
“fall” here or “fall” there
– The truth is fixed! Your intervals are variable

Advantage of confidence intervals
• Can see size of effect
• Width of confidence interval gives idea of
the “stability” of the estimate
–Sample size?
–Effect of some extreme outlier values?
–Random error?
• Narrow CI’s are always better!

P Value vs CI
• Hypothetical disease:
• Which exposures are
statistically significant? (p
values)
• Which has widest CI?
(most affected by random
error)
• Which are most precise
(most trustworthy, less
likely to change with
repeat testing)?
Exposure Relative
risk
95% CI P
value
A 2.1 0.6 – 7.8 0.24
B 1.6 1.3 – 2.0 0.001
C 4.4 1.5 – 12.4 0.002

Past paper examples: Mar 2008
• You are interested in interventions that could be used in the area that could prevent
relapse after discharge in children with severe mal nutrition treated at the hospital.
You find the following article during an evidence-b ase search
•“Home based therapy for severe malnutrition with ready-to-use food “
– M J Manary, M J Ndkeha, P Ashorn, K Maleta, A Briend
•Background:
– The standard treatment of severe malnutrition in M alawi often utilises prolonged inpatient
care, and after discharge results in high rates of relapse.
•Aims:
– To test the hypothesis that the recovery rate, def ined as catch-up growth such that weight-
for-height z score >0 (WHZ, based on initial height ) for ready-to-use food (RTUF) is greater
than two other home based dietary regimens in the t reatment of malnutrition.
•Methods:
– HIV negative children >1 year old discharged from the nutrition unit in Blantyre, Malawi were
randomised to one of three dietary regimens: RTUF, RTUF supplement, or blended
maize/soy flour. RTUF and maize/soy flour provided 730 kJ/kg/day, while the RTUF
supplement provided a fixed amount of energy, 2100 kJ/day.
– Children were followed fortnightly. Children compl eted the study when they reached WHZ
>0, relapsed, or died.
– Outcomes were compared using a time-event model.

•Results:
– A total of 282 children were enrolled.
– Children receiving RTUF were more likely to reach WHZ >0 than those
receiving RTUF supplement or maize/soy flour (95% v 78%, RR 1.2,
95% CI 1.1 to 1.3).
– Intention to treat analyses also showed that more child ren receiving
RTUF reached graduation weight than those receiving RTUF
supplement or maize/soy (86% v 66%, 20% difference, 95% CI 8% to
33%).
– The average weight gain was 5.2 g/kg/day in the RTUF group compared
to 3.1 g/kg/day for the maize/soy and RTUF supplement groups. Six
months later, 96% of all children who reached graduati on weight and
returned for follow up, had normal anthropometric in dices
• Abbreviations:
– MUAC, mid-upper arm circumference; NRU, nutritional rehabilitation
unit; RTUF, ready-to-use food; WHZ, weight-for-heigh t z score

• a) What type of study design was used? (1)
• b) What was the main study outcome? (1)
• c) How would you interpret a weight-for-heigh t z-score of 0 and -1? (2)
• d) How would you interpret the relative risk of 1.2 in the statement
“Children receiving RTUF were more likely to reach WHZ >0 than those
receiving RTUF supplement or maize/soy flour (95% v 78%, RR 1.2, 95%
CI 1.1 to 1.3).”? (1)
• e) How would you interpret the 95% confidence interval of 1.1 to 1.3 in
the same sentence? (1)
• f) Was this difference statistically significant? E xplain. (1)
• g) What statistical test would you use to decide if the average weight
gain in the RTUF group was significantly different to the maize/soy and
RTUF supplement groups? (1)
• h) What would your conclusion be about managing children with severe
malnutrition at home, based on this Study? (3 )

Sept 2010:

Sept 2008: ex prem, spastic di
Tags