Data Types and Descriptive Statistics.ppt

Kelly568272 12 views 47 slides Feb 25, 2025
Slide 1
Slide 1 of 47
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27
Slide 28
28
Slide 29
29
Slide 30
30
Slide 31
31
Slide 32
32
Slide 33
33
Slide 34
34
Slide 35
35
Slide 36
36
Slide 37
37
Slide 38
38
Slide 39
39
Slide 40
40
Slide 41
41
Slide 42
42
Slide 43
43
Slide 44
44
Slide 45
45
Slide 46
46
Slide 47
47

About This Presentation

notes on data types and stastics


Slide Content

GIBSON MANDOZANA
BIOSTATISTICIAN-UZ-CRC
COMMUNITY MEDICINE

Outline
Descriptive Statistics-Definition
Types of data
-Quantitative and Qualitative data
Data presentation
-Bar graph, pie chart, histogram, line graph and
boxplot

Descriptive Statistics
Utilizes numerical, tabular and
graphical methods to look for patterns
in a data set
-to summaries the information revealed in a
data set
 present that information in a convenient
form

Descriptive
Statistics
1.Involves
Presenting Data
Characterizing
Data
2.Purpose
Describe Data
X = 30.5 SX = 30.5 S
22
= 113 = 113
00
2525
5050
Q1Q1Q2Q2Q3Q3Q4Q4
$$

Types of data
Quantitative-are numerical values that measure
some characteristics of an individual such as height
or salary.
There are two types of numerical data
Continuous data -occurs when there is no
limitation on the values which a characteristic
being measured can take.(other than that which
restricts us when taking measurement)
Example: weight can be 171.2, 171.3, 171,4 etc
Discrete data- are numeric data that have a finite
number of possible values
Example: shoe size, number of brothers (when
data represent count they are discrete)

Types of data
Qualitative/Categorical: occur when each individual can
only belong to one of a number of distinct categories
such as males / female
Categorical data – expressed not in terms of number but
natural language of description e.g. favorite color=blue
Can further be classified into two depending on
ordering
Nominal-the categories are not ordered but simply have names
(e.g. blood group A, AB, O or marital
status(married/widowed/single)). In this case there is no reason
to suspect being married is better (or worse) than single.
Ordinal-categories are order in some way e.g. disease staging
(advanced, moderate, mild) or degree of pain (severe,
moderate, mild, none)

Types of data
Types of data
Categorical
(Qualitative)
Numerical
(Quantitative)
Nominal
(no
ranking)
Ordinal
(ranked)
Discrete Continuous
Interval
data
Ratio data
Note: Interval is numerical data
expressed as an interval e.g. age
15-25, 25-35
Ratio data is derived from
ratio of numerical data e.g
BMI

Univariate Analysis
involves the examination across cases of one
variable at a time. There are three major
characteristics of a single variable that we tend to
look at:
Frequency distribution
Central tendency
Dispersion
In most situations, we would describe all three of
these characteristics for each of the variables in
our study.

Frequency distribution
is a presentation of the number of times (or the
frequency) that each value (or group of values)
occurs in the study population.
 helps to give a picture of the shape of the
distribution of
the data.
A frequency distribution can be displayed as a
table, a bar chart, a histogram, or a frequency
polygon
The method usually depends on the type of
variable being described.

Frequency distribution -
Qualitative data
Categorical variables are qualitative in nature and
are best displayed as a table or a bar chart.
Example 1: Frequency table; simply shows the
number of times each specific observation
appears in a sample or population.

Example 1
In the month of April, the number of accidents occurring in the workplace was
recorded as follows:

1 1 2 3 2 0
3 0 1 1 1 3
4 0 2 2 1 1
2 0 0 3 0 0
0 3 4 0 0 2

Tally Sheet
No of Accidents Tally
0 |||| |||||||| ||||
1 |||| |||||| ||
2 |||| ||||| |
3 ||||||||
4 ||||

Frequency DistributionFrequency Distribution

00
11
22
33
44

1010
77
66
55
22
TotalTotal 30 30
No of AccidentsNo of AccidentsFrequencyFrequency

The The relative frequencyrelative frequency of a class is the fraction or of a class is the fraction or
proportion of the total number of data itemsproportion of the total number of data items
belonging to the class.belonging to the class.
A A relative frequency distributionrelative frequency distribution is a tabular is a tabular
summary of a set of data showing the relativesummary of a set of data showing the relative
frequency for each class.frequency for each class.
Relative Frequency DistributionRelative Frequency Distribution

Percent Frequency
Distribution
The The percent frequencypercent frequency of a class is the relative of a class is the relative
frequency multiplied by 100.frequency multiplied by 100.
AA percent frequency distributionpercent frequency distribution is a tabular is a tabular
summary of a set of data showing the percentsummary of a set of data showing the percent
frequency for each class.frequency for each class.

Relative Frequency andRelative Frequency and
Percent Frequency DistributionsPercent Frequency Distributions
00
11
22
33
44
.333.333
.233.233
.200.200
.167.167
.067.067
TotalTotal 1.000 1.000
33.333.3
23.323.3
20.020.0
16.716.7
6.76.7
100.0100.0
RelativeRelative
FrequencyFrequency
PercentPercent
FrequencyFrequencyNo. of AccidentsNo. of Accidents
.333(100) = .333(100) =
33.3%33.3%
2/30 = .0672/30 = .067

Bar Chart
A bar chart, graph that used to display frequency
distributions for ordinal and nominal data.
The various categories into which the
observations fall are presented along the
horizontal axis.
A vertical bar is drawn above each category and
the height of the bar represents the frequency or
relative of observations in that class
The bar should be of equal width and separated
from one another (as not no imply continuity)

0 1 2 3 4
F
r
e
q
u
e
n
c
y
No. of Accidents
Bar GraphBar Graph
1
2
3
4
5
6
7
8
9
10
Example 1

Pie Chart
 The The pie chartpie chart is a commonly used graphical device is a commonly used graphical device
for presenting relative frequency distributions forfor presenting relative frequency distributions for
qualitative data.qualitative data.

First draw a First draw a circlecircle; then use the relative; then use the relative
frequencies to subdivide the circlefrequencies to subdivide the circle
into sectors that correspond to theinto sectors that correspond to the
relative frequency for each class.relative frequency for each class.

Since there are 360 degrees in a circle, Since there are 360 degrees in a circle,
a class with a relative frequency of .25 woulda class with a relative frequency of .25 would
consume .25(360) = 90 degrees of the circle.consume .25(360) = 90 degrees of the circle.

Example 1Example 1
Pie ChartPie Chart
0
1
2
3
4
33.3%
23.3%
16.7%
20%
6.7%

Frequency distribution-
Numeric variable
Numerical variables are quantitative in nature
and are best displayed as a frequency histogram
or a frequency polygon.
A frequency histogram shows the frequencies
relative to each other.
The horizontal axis displays the true limits of the
various intervals
The width of the bar is in proportion with the
class interval that it represents.
Typically there are no spaces between bars in a
frequency histogram,

Frequency histogram

Example 2
313116162222505030304242636333335656646441413737
414163633131171761615353303052523232282854543636
656524244141262654544949191931315656323220205454
646454545252585817171919646442422323434334343333
424221214141242441416464616146463434404030304343
545443434545535330304343404030302525525258582828
606032322424343443432323424259595454454551514141
505058584444404064642424424262624646525221212525
515156565050606065654040414161611616404025255555
525219194141262652525656626257571616212126262929
484826266262434358582525212152524242333339392626

Frequency Distribution-
Quantitative data
Guidelines for Selecting Number of
Classes
• Use between 5 and 20 classes.Use between 5 and 20 classes.
• Data sets with a larger number of elementsData sets with a larger number of elements
usually require a larger number of classes.usually require a larger number of classes.
• Smaller data sets usually require fewer classesSmaller data sets usually require fewer classes

Frequency Distribution
Guidelines for Selecting Width of Classes
Largest Data Value Smallest Data Value
Number of Classes

•Use classes of equal width.Use classes of equal width.
•Approximate Class Width =Approximate Class Width =

Frequency Distribution
For Example 2, if we choose six classes:
Approximate Class Width = (65 - 16)/6 = 8.2 =  9
We first prepare a Tally Sheet
Round Round
upup

Tally Sheet
AgeAge TallyTally
15 - 2315 - 23 IIII IIII IIII IIIII IIII IIII I
24 - 3224 - 32 IIII IIII IIII IIII IIII IIIIII IIII IIII IIII IIII II
33 - 4133 - 41 IIII IIII IIII IIIIIII IIII IIII III
42 - 5042 - 50 IIII IIII IIII IIIIIII IIII IIII III
51 - 5951 - 59 IIII IIII IIII IIII IIII IIIIIII IIII IIII IIII IIII III
60 - 6860 - 68 IIII IIII IIII IIIIII IIII IIII II

Frequency Distribution

15-2315-23
24-32 24-32
33-4133-41
42-5042-50
51-5951-59
60-6860-68

1616
2727
2222
2222
2828
1717
Total 132Total 132
AgeAge FrequencyFrequency

Relative Frequency and
Percent Frequency
Distribution

15-2315-23
24-32 24-32
33-4133-41
42-5042-50
51-5951-59
60-6860-68
AgeAge

.121.121
.205.205
.167.167
.167.167
.212.212
.128.128
Total 1.00 Total 1.00
RelativeRelative
FrequencyFrequency
12.112.1
20.520.5
16.716.7
16.716.7
21.221.2
12.812.8
100.0 100.0
PercentPercent
FrequencyFrequency
16/13216/132 .121(100).121(100)

Histogram
 Another common graphical presentation ofAnother common graphical presentation of
quantitative data is a quantitative data is a histogramhistogram..
 The variable of interest is placed on the horizontalThe variable of interest is placed on the horizontal
axis.axis.
 A rectangle is drawn above each class interval withA rectangle is drawn above each class interval with
its height corresponding to the interval’s its height corresponding to the interval’s frequencyfrequency,,
relative frequencyrelative frequency, or , or percent frequencypercent frequency..
 Unlike a bar graph, a histogram has Unlike a bar graph, a histogram has no naturalno natural
separation between rectanglesseparation between rectangles of adjacent classes. of adjacent classes.

Histogram
4
8
12
16
20
24
28
32
36
Age
F
r
e
q
u
e
n
c
y
1523 2432 3341 4250 5159 60-68
Example 2Example 2

Histogram
Symmetric
Left tail is the mirror image of the right tail
R
e
l
a
t
i
v
e

F
r
e
q
u
e
n
c
y
.05
.10
.15
.20
.25
.30
.35
0

Histogram
Moderately Skewed Left
A longer tail to the left
R
e
l
a
t
i
v
e

F
r
e
q
u
e
n
c
y
.05
.10
.15
.20
.25
.30
.35
0

Histogram
Moderately Right Skewed
A Longer tail to the right
R
e
l
a
t
i
v
e

F
r
e
q
u
e
n
c
y
.05
.10
.15
.20
.25
.30
.35
0

Histogram
Highly Skewed Right
A very long tail to the right
R
e
l
a
t
i
v
e

F
r
e
q
u
e
n
c
y
.05
.10
.15
.20
.25
.30
.35
0

Frequency polygon
A frequency polygon includes the same area
under the line that a histogram displays within
the bars.
 Is constructed by placing a point at the center of
each interval
Point are then connected by a straight line.
 Though a frequency polygon may look like a line
graph, a frequency polygon must be closed at the
ends.

Histogram and Frequency PolygonHistogram and Frequency Polygon
ModeMode
0
50
100
150
200
250
300
22.252.52.7533.253.53.7544.254.54.7555.25
Birth weight (Kg)
F
r
e
q
u
e
n
c
y

Other way of presenting
data
Quantitative data
Scatter plot
Box-plot
Line graph
Ogive

Scatter plot
Used to depict the relationship between two different
continuous measurements.
Each point on the graph represents a pair of values.
F
V
C
FEV1
1.55333 4.00667
2.05
4.89

Box plot
Uses summary measures such as min max
median and interquartile range to summarize a
set of continuous or discrete variable.

Line graph
Same as scatter plot but each value 0n horizontal axis has a
single corresponding measurement on vertical axis
Adjacent point are connected by a straight line
Commonly horizontal axis is the time variable

Cumulative frequency distributionCumulative frequency distribution  shows the shows the
number of items with values less than or equal tonumber of items with values less than or equal to
the upper limit of each class..the upper limit of each class..
Cumulative relative frequency distributionCumulative relative frequency distribution – shows – shows
the proportion of items with values less than orthe proportion of items with values less than or
equal to the upper limit of each class.equal to the upper limit of each class.
Cumulative DistributionsCumulative Distributions
Cumulative percent frequency distributionCumulative percent frequency distribution – shows – shows
the percentage of items with values less than orthe percentage of items with values less than or
equal to the upper limit of each class.equal to the upper limit of each class.

Cumulative Distributions
Example 2

<< 23 23
<< 32 32
<< 41 41
<< 50 50
<< 59 59
<< 68 68
AgeAge
CumulativeCumulative
FrequencyFrequency
CumulativeCumulative
RelativeRelative
FrequencyFrequency
CumulativeCumulative
PercentPercent
FrequencyFrequency
1616
4343
6565
8787
115115
132 132
.121.121
.326.326
.492.492
.660.660
.871.871
1.00 1.00
12.112.1
32.632.6
49.249.2
66.066.0
87.187.1
100.0 100.0
16 + 2716 + 27 43/13243/132
.326(100).326(100)

OgiveOgive

An An ogiveogive is a graph of a cumulative distribution. is a graph of a cumulative distribution.

The data values are shown on the horizontal axis.The data values are shown on the horizontal axis.

Shown on the vertical axis are the:Shown on the vertical axis are the:
•cumulative frequencies, orcumulative frequencies, or
•cumulative relative frequencies, orcumulative relative frequencies, or
•cumulative percent frequenciescumulative percent frequencies

The frequency (one of the above) of each class is The frequency (one of the above) of each class is
plotted as a point.plotted as a point.

The plotted points are connected by straight lines.The plotted points are connected by straight lines.

•Because the class limits for the age data are 15-23, Because the class limits for the age data are 15-23,
24-32, and so on, there appear to be one-unit gaps 24-32, and so on, there appear to be one-unit gaps
from 23 to 24, 32 to 33, and so on.from 23 to 24, 32 to 33, and so on.
OgiveOgive
•These gaps are eliminated by plotting points These gaps are eliminated by plotting points
halfway between the class limits.halfway between the class limits.
•Thus, 23.5 is used for the 15-23 class, 32.5 is used Thus, 23.5 is used for the 15-23 class, 32.5 is used
for the 24-32 class, and so on.for the 24-32 class, and so on.

Example 2Example 2

AgeAge
20
40
60
80
100
C
u
m
u
l
a
t
i
v
e

P
e
r
c
e
n
t

F
r
e
q
u
e
n
c
y
C
u
m
u
l
a
t
i
v
e

P
e
r
c
e
n
t

F
r
e
q
u
e
n
c
y
15 24 33 42 51 61 6815 24 33 42 51 61 68
(50.5, 66)(50.5, 66)
Ogive withOgive with
Cumulative Percent Frequencies Cumulative Percent Frequencies
Example 2Example 2

THANK YOU
SIYABONGA
TATENDA