GIBSON MANDOZANA
BIOSTATISTICIAN-UZ-CRC
COMMUNITY MEDICINE
Outline
Descriptive Statistics-Definition
Types of data
-Quantitative and Qualitative data
Data presentation
-Bar graph, pie chart, histogram, line graph and
boxplot
Descriptive Statistics
Utilizes numerical, tabular and
graphical methods to look for patterns
in a data set
-to summaries the information revealed in a
data set
present that information in a convenient
form
Descriptive
Statistics
1.Involves
Presenting Data
Characterizing
Data
2.Purpose
Describe Data
X = 30.5 SX = 30.5 S
22
= 113 = 113
00
2525
5050
Q1Q1Q2Q2Q3Q3Q4Q4
$$
Types of data
Quantitative-are numerical values that measure
some characteristics of an individual such as height
or salary.
There are two types of numerical data
Continuous data -occurs when there is no
limitation on the values which a characteristic
being measured can take.(other than that which
restricts us when taking measurement)
Example: weight can be 171.2, 171.3, 171,4 etc
Discrete data- are numeric data that have a finite
number of possible values
Example: shoe size, number of brothers (when
data represent count they are discrete)
Types of data
Qualitative/Categorical: occur when each individual can
only belong to one of a number of distinct categories
such as males / female
Categorical data – expressed not in terms of number but
natural language of description e.g. favorite color=blue
Can further be classified into two depending on
ordering
Nominal-the categories are not ordered but simply have names
(e.g. blood group A, AB, O or marital
status(married/widowed/single)). In this case there is no reason
to suspect being married is better (or worse) than single.
Ordinal-categories are order in some way e.g. disease staging
(advanced, moderate, mild) or degree of pain (severe,
moderate, mild, none)
Types of data
Types of data
Categorical
(Qualitative)
Numerical
(Quantitative)
Nominal
(no
ranking)
Ordinal
(ranked)
Discrete Continuous
Interval
data
Ratio data
Note: Interval is numerical data
expressed as an interval e.g. age
15-25, 25-35
Ratio data is derived from
ratio of numerical data e.g
BMI
Univariate Analysis
involves the examination across cases of one
variable at a time. There are three major
characteristics of a single variable that we tend to
look at:
Frequency distribution
Central tendency
Dispersion
In most situations, we would describe all three of
these characteristics for each of the variables in
our study.
Frequency distribution
is a presentation of the number of times (or the
frequency) that each value (or group of values)
occurs in the study population.
helps to give a picture of the shape of the
distribution of
the data.
A frequency distribution can be displayed as a
table, a bar chart, a histogram, or a frequency
polygon
The method usually depends on the type of
variable being described.
Frequency distribution -
Qualitative data
Categorical variables are qualitative in nature and
are best displayed as a table or a bar chart.
Example 1: Frequency table; simply shows the
number of times each specific observation
appears in a sample or population.
Example 1
In the month of April, the number of accidents occurring in the workplace was
recorded as follows:
1010
77
66
55
22
TotalTotal 30 30
No of AccidentsNo of AccidentsFrequencyFrequency
The The relative frequencyrelative frequency of a class is the fraction or of a class is the fraction or
proportion of the total number of data itemsproportion of the total number of data items
belonging to the class.belonging to the class.
A A relative frequency distributionrelative frequency distribution is a tabular is a tabular
summary of a set of data showing the relativesummary of a set of data showing the relative
frequency for each class.frequency for each class.
Relative Frequency DistributionRelative Frequency Distribution
Percent Frequency
Distribution
The The percent frequencypercent frequency of a class is the relative of a class is the relative
frequency multiplied by 100.frequency multiplied by 100.
AA percent frequency distributionpercent frequency distribution is a tabular is a tabular
summary of a set of data showing the percentsummary of a set of data showing the percent
frequency for each class.frequency for each class.
Relative Frequency andRelative Frequency and
Percent Frequency DistributionsPercent Frequency Distributions
00
11
22
33
44
.333.333
.233.233
.200.200
.167.167
.067.067
TotalTotal 1.000 1.000
33.333.3
23.323.3
20.020.0
16.716.7
6.76.7
100.0100.0
RelativeRelative
FrequencyFrequency
PercentPercent
FrequencyFrequencyNo. of AccidentsNo. of Accidents
.333(100) = .333(100) =
33.3%33.3%
2/30 = .0672/30 = .067
Bar Chart
A bar chart, graph that used to display frequency
distributions for ordinal and nominal data.
The various categories into which the
observations fall are presented along the
horizontal axis.
A vertical bar is drawn above each category and
the height of the bar represents the frequency or
relative of observations in that class
The bar should be of equal width and separated
from one another (as not no imply continuity)
0 1 2 3 4
F
r
e
q
u
e
n
c
y
No. of Accidents
Bar GraphBar Graph
1
2
3
4
5
6
7
8
9
10
Example 1
Pie Chart
The The pie chartpie chart is a commonly used graphical device is a commonly used graphical device
for presenting relative frequency distributions forfor presenting relative frequency distributions for
qualitative data.qualitative data.
First draw a First draw a circlecircle; then use the relative; then use the relative
frequencies to subdivide the circlefrequencies to subdivide the circle
into sectors that correspond to theinto sectors that correspond to the
relative frequency for each class.relative frequency for each class.
Since there are 360 degrees in a circle, Since there are 360 degrees in a circle,
a class with a relative frequency of .25 woulda class with a relative frequency of .25 would
consume .25(360) = 90 degrees of the circle.consume .25(360) = 90 degrees of the circle.
Example 1Example 1
Pie ChartPie Chart
0
1
2
3
4
33.3%
23.3%
16.7%
20%
6.7%
Frequency distribution-
Numeric variable
Numerical variables are quantitative in nature
and are best displayed as a frequency histogram
or a frequency polygon.
A frequency histogram shows the frequencies
relative to each other.
The horizontal axis displays the true limits of the
various intervals
The width of the bar is in proportion with the
class interval that it represents.
Typically there are no spaces between bars in a
frequency histogram,
Frequency Distribution-
Quantitative data
Guidelines for Selecting Number of
Classes
• Use between 5 and 20 classes.Use between 5 and 20 classes.
• Data sets with a larger number of elementsData sets with a larger number of elements
usually require a larger number of classes.usually require a larger number of classes.
• Smaller data sets usually require fewer classesSmaller data sets usually require fewer classes
Frequency Distribution
Guidelines for Selecting Width of Classes
Largest Data Value Smallest Data Value
Number of Classes
•Use classes of equal width.Use classes of equal width.
•Approximate Class Width =Approximate Class Width =
Frequency Distribution
For Example 2, if we choose six classes:
Approximate Class Width = (65 - 16)/6 = 8.2 = 9
We first prepare a Tally Sheet
Round Round
upup
Histogram
Another common graphical presentation ofAnother common graphical presentation of
quantitative data is a quantitative data is a histogramhistogram..
The variable of interest is placed on the horizontalThe variable of interest is placed on the horizontal
axis.axis.
A rectangle is drawn above each class interval withA rectangle is drawn above each class interval with
its height corresponding to the interval’s its height corresponding to the interval’s frequencyfrequency,,
relative frequencyrelative frequency, or , or percent frequencypercent frequency..
Unlike a bar graph, a histogram has Unlike a bar graph, a histogram has no naturalno natural
separation between rectanglesseparation between rectangles of adjacent classes. of adjacent classes.
Histogram
4
8
12
16
20
24
28
32
36
Age
F
r
e
q
u
e
n
c
y
1523 2432 3341 4250 5159 60-68
Example 2Example 2
Histogram
Symmetric
Left tail is the mirror image of the right tail
R
e
l
a
t
i
v
e
F
r
e
q
u
e
n
c
y
.05
.10
.15
.20
.25
.30
.35
0
Histogram
Moderately Skewed Left
A longer tail to the left
R
e
l
a
t
i
v
e
F
r
e
q
u
e
n
c
y
.05
.10
.15
.20
.25
.30
.35
0
Histogram
Moderately Right Skewed
A Longer tail to the right
R
e
l
a
t
i
v
e
F
r
e
q
u
e
n
c
y
.05
.10
.15
.20
.25
.30
.35
0
Histogram
Highly Skewed Right
A very long tail to the right
R
e
l
a
t
i
v
e
F
r
e
q
u
e
n
c
y
.05
.10
.15
.20
.25
.30
.35
0
Frequency polygon
A frequency polygon includes the same area
under the line that a histogram displays within
the bars.
Is constructed by placing a point at the center of
each interval
Point are then connected by a straight line.
Though a frequency polygon may look like a line
graph, a frequency polygon must be closed at the
ends.
Histogram and Frequency PolygonHistogram and Frequency Polygon
ModeMode
0
50
100
150
200
250
300
22.252.52.7533.253.53.7544.254.54.7555.25
Birth weight (Kg)
F
r
e
q
u
e
n
c
y
Other way of presenting
data
Quantitative data
Scatter plot
Box-plot
Line graph
Ogive
Scatter plot
Used to depict the relationship between two different
continuous measurements.
Each point on the graph represents a pair of values.
F
V
C
FEV1
1.55333 4.00667
2.05
4.89
Box plot
Uses summary measures such as min max
median and interquartile range to summarize a
set of continuous or discrete variable.
Line graph
Same as scatter plot but each value 0n horizontal axis has a
single corresponding measurement on vertical axis
Adjacent point are connected by a straight line
Commonly horizontal axis is the time variable
Cumulative frequency distributionCumulative frequency distribution shows the shows the
number of items with values less than or equal tonumber of items with values less than or equal to
the upper limit of each class..the upper limit of each class..
Cumulative relative frequency distributionCumulative relative frequency distribution – shows – shows
the proportion of items with values less than orthe proportion of items with values less than or
equal to the upper limit of each class.equal to the upper limit of each class.
Cumulative DistributionsCumulative Distributions
Cumulative percent frequency distributionCumulative percent frequency distribution – shows – shows
the percentage of items with values less than orthe percentage of items with values less than or
equal to the upper limit of each class.equal to the upper limit of each class.
OgiveOgive
An An ogiveogive is a graph of a cumulative distribution. is a graph of a cumulative distribution.
The data values are shown on the horizontal axis.The data values are shown on the horizontal axis.
Shown on the vertical axis are the:Shown on the vertical axis are the:
•cumulative frequencies, orcumulative frequencies, or
•cumulative relative frequencies, orcumulative relative frequencies, or
•cumulative percent frequenciescumulative percent frequencies
The frequency (one of the above) of each class is The frequency (one of the above) of each class is
plotted as a point.plotted as a point.
The plotted points are connected by straight lines.The plotted points are connected by straight lines.
•Because the class limits for the age data are 15-23, Because the class limits for the age data are 15-23,
24-32, and so on, there appear to be one-unit gaps 24-32, and so on, there appear to be one-unit gaps
from 23 to 24, 32 to 33, and so on.from 23 to 24, 32 to 33, and so on.
OgiveOgive
•These gaps are eliminated by plotting points These gaps are eliminated by plotting points
halfway between the class limits.halfway between the class limits.
•Thus, 23.5 is used for the 15-23 class, 32.5 is used Thus, 23.5 is used for the 15-23 class, 32.5 is used
for the 24-32 class, and so on.for the 24-32 class, and so on.
Example 2Example 2
AgeAge
20
40
60
80
100
C
u
m
u
l
a
t
i
v
e
P
e
r
c
e
n
t
F
r
e
q
u
e
n
c
y
C
u
m
u
l
a
t
i
v
e
P
e
r
c
e
n
t
F
r
e
q
u
e
n
c
y
15 24 33 42 51 61 6815 24 33 42 51 61 68
(50.5, 66)(50.5, 66)
Ogive withOgive with
Cumulative Percent Frequencies Cumulative Percent Frequencies
Example 2Example 2