Public Health Lecture 3 Introduction to Biostatistics

phuakl 57 views 23 slides Jun 20, 2024
Slide 1
Slide 1 of 23
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23

About This Presentation

A short introduction to Biostatistics


Slide Content

Foundations of
Public Health
Lecture 3-Introduction to
Biostatistics
PhuaKai Lit, PhD (Johns Hopkins)
Retired public health professor
(Monash University Malaysia School
of Medicine and Health Sciences)

Lecture Objectives
What is “Biostatistics”?
Sampling -simple random, systematic, stratified,
cluster sampling
Descriptive Statistics -measures of central
tendency, measures of dispersion
Inferential Statistics
Normal Distribution
Basics of hypothesis testing
Basic stats tests -Chi-squared test, Student’s t-
test, Pearson’s r, simple regression, multiple
regression
Parametric and non-parametric tests
Further examples

What is “Biostatistics”?
Biostatistics is the application of statistical tools and
techniques to the collection, analysis and
presentation of health-related data (e.g. for the
development of public health policy).

Sampling
Since it is usually not possible to collect data
on each and every person in a population, a
representative sample (of sufficient size)
needs to be drawn from the population, and
the findings from this sample are then used
to draw conclusions about the population
(with a high degree of certainty)
Note that the samples drawn must be
representative. This is done through random
selection using different kinds of sampling
techniques such as simple random,
systematic, stratified, and cluster sampling.

Simple Random Sampling
The proverbial simple random sample is
drawn by (from a small population): writing
the name of a person on a piece of paper,
rolling up the paper and then throwing it into
a shoe box. This is repeated for every person
who makes up the small population.
The rolled papers in the box are then
thoroughly shaken and a sample of a certain
size is drawn from the shoe box.
This will constitute the simple random
sample. (Of course, more sophisticated ways
are used these days to collect a sample)

Systematic Sampling
Names of persons in the entire population
are written down and ordered (placed in
some kind of order) and then rules are
followed to collect the sample e.g. include
into the sample, every ith person on the list
of ordered names.
This would be an example of a rule: write
down the name of every 5th person, 10th
person, 15th person, 20th person etc. This
would make up the systematic sample.

Stratified Sampling
A stratified sample reflects the composition
of the entire population.
For example, if the population of Malaysia
consists of 55% Malays, 25% Chinese, 10%
Indians and 10% “Others”, then the stratified
sample of 1000 should be chosen as follows:
select 550 Malays randomly from all Malays,
select 250 Chinese randomly from all
Chinese, select 100 Indians randomly from
all Indians, and select 100 “Others” randomly
from all “Others” in the Malaysian population.

Cluster Sampling
Divide the entire population up into clusters,
and then randomly select a fixed number of
clusters.
Include each and every individual from each
cluster selected into the sample.
Example: divide a small town into a number
of clusters, randomly select a number of
clusters, then include every individual from
each selected cluster into your sample.

Descriptive Statistics
Measures of Central Tendency -mean, median,
mode. Mean (arithmetic mean) is the average
derived from a series of numbers. Median is the
number that lies in the exact middle of an ordered
series of numbers. Mode is the most commonly
occurring number in a series.
Measures of Disperson -range, inter-quartile
range, variance, standard deviation. Range is the
difference between the highest number and the
lowest number in a series of ordered numbers. The
IQR is the difference between the 75th percentile
and the 25th percentile. The variance is the square
of the standard deviation.
Note: the higher the std deviation, the more spread
out the data.

Inferential Statistics
Drawing conclusions about a population on the
basis of findings derived from a random sample
drawn from the entire population.

Normal Distribution
Many variables are normally distributed e.g. height,
weight of everyone in an entire population.
The Normal Distribution curve is bell-shaped, and
symmetrical about its axis.
In the Standard Normal Distribution, sixty-eight% of
all data lies within 1 std deviation from the mean,
95.5% lies within 2 std deviations from the mean,
and 99.7% lies within 3 std deviations from the
mean.
Formula for conversion into standard deviation
units: x-mu/(sigma) where x is the unstandardised
value, mu is the mean, sigma is the standard
deviation.

Rules:
1) State your Null Hypothesis (the test hypothesis you
wish to reject)
2) State your Reseach Hypothesis (also called
Alternative Hypothesis)
3) Reject the Null Hypothesis if the p-value,
calculated by the stats software programme, is less
than 0.05 (p < 0.05). Fail to reject if p > 0.05
4) Statistically significant if p < 0.05
5) Highly statistically significant if p < 0.01
Hypothesis Testing

13
Basic Statistical Tests
Chi-squared “goodness-of-fit” test
Chi-squared test of association between
variable X and variable Y
Student’s t-test (of difference between two
poulation means i.e. mu1 and mu2)
ANOVA (analysis of variance)
Pearson’s correlation coefficient r
Simple regression -one dependent variable
(DV), one independent variable (IV)
Multiple regression -more than one IV

14
Basic Statistical Tests
Chi-squared “goodness-of-fit” test: to see how
well your data fits a hypothesised distribution

15
Basic Statistical Tests
Chi-squared test of association: to test if there is
an association between Variable X and Variable
Y, or if this could have arisen by chance alone.
Example: Null Hypothesis -there is no
association between gender and the eating
disorder called anorexia nervosa
Research Hypothesis -there is a relationship
between gender and anorexia nervosa
(It is also called the chi-squared test of
independence).

16
Basic Statistical Tests
Student’s t-test (of difference between two
population means): to test if the two samples you
are comparing have been drawn from two
underlying populations with different population
means
Example: Null Hypothesis -there is no
difference between population mean 1 and
population mean 2
Research Hypothesis -there is a difference
between population mean 1 and population
mean 2

17
Basic Statistical Tests
ANOVA (analysis of variance): to compare the
variance of three or more population distributions
simultaneously.

18
Basic Statistical Tests
Pearson’s r (in full -Pearsons’ product moment
correlation coefficient): the test the degree of
linear relationship between Variable X and
Variable Y.
Linear relationships can be positive (as Variable
X increases, Variable Y also increases) or
negative (as Variable X increases, Variable Y
decreases)

19
Basic Statistical Tests
Simple regression: to predict Variable Y (the
dependent variable) using one Variable X (the
independent variable)
For example, predict Income (Y) from years of
Schooling (X)
Multiple regression: to predict Variable Y (the
dependent variable) using more than one
independent variable ( X1, X2 etc)
For example, predict Income (Y) from years of
schooling (X1) and Gender (X2)

20
Parametric and Non-parametric tests
Parametric tests -makes assumptions about
the underlying population distribution e.g. the
population is normally distributed
Non-parametric tests -makes no assumtion
about the underlying population distribution from
which samples are drawn

21
Further examples
see: phuakl.tripod.com/biostatistics1.html
see: phuakl.tripod.com/biostatistics2.html
see: phuakl.tripod.com/chisquare.doc

22
Additional Resources

Thank You
Tags