Descriptive Statistics for Graduate School

JollyAceDayag1 32 views 91 slides Oct 04, 2024
Slide 1
Slide 1 of 91
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27
Slide 28
28
Slide 29
29
Slide 30
30
Slide 31
31
Slide 32
32
Slide 33
33
Slide 34
34
Slide 35
35
Slide 36
36
Slide 37
37
Slide 38
38
Slide 39
39
Slide 40
40
Slide 41
41
Slide 42
42
Slide 43
43
Slide 44
44
Slide 45
45
Slide 46
46
Slide 47
47
Slide 48
48
Slide 49
49
Slide 50
50
Slide 51
51
Slide 52
52
Slide 53
53
Slide 54
54
Slide 55
55
Slide 56
56
Slide 57
57
Slide 58
58
Slide 59
59
Slide 60
60
Slide 61
61
Slide 62
62
Slide 63
63
Slide 64
64
Slide 65
65
Slide 66
66
Slide 67
67
Slide 68
68
Slide 69
69
Slide 70
70
Slide 71
71
Slide 72
72
Slide 73
73
Slide 74
74
Slide 75
75
Slide 76
76
Slide 77
77
Slide 78
78
Slide 79
79
Slide 80
80
Slide 81
81
Slide 82
82
Slide 83
83
Slide 84
84
Slide 85
85
Slide 86
86
Slide 87
87
Slide 88
88
Slide 89
89
Slide 90
90
Slide 91
91

About This Presentation

Descriptive Statistics


Slide Content

St. Paul University PhilippinesSt. Paul University Philippines
Graduate SchoolGraduate School
A Course Presentation inA Course Presentation in
STATISTICSSTATISTICS

Note: Most of the Slides were taken from
Elementary Statistics: A Handbook of Slide
Presentation prepared by Z.V.J. Albacea, C.E.
Reano, R.V. Collado, L.N. Comia and N.A.
Tandang in 2005 for the Institute of Statistics,
CAS, UP Los Banos
INTRODUCTION TO
STATISTICS AND
STATISTICAL
INFERENCE

Session 1.3
TEACHING BASIC STATISTICS ….
Definition of Statistics
plural sense: numerical facts, e.g.
number of patients in a Saint Paul
Hospital, profile of patients, etc.
singular sense: is a science that deals
with the process of collection,
presentation, analysis and
interpretation of data.

Session 1.4
TEACHING BASIC STATISTICS ….
Areas of Statistics
Descriptive statistics
methods concerned w/
collecting, describing, and
analyzing a set of data
without drawing
conclusions (or inferences)
about a large group
Inferential statistics
methods concerned
with the analysis of a
subset of data leading
to predictions or
inferences about the
entire set of data

Session 1.5
TEACHING BASIC STATISTICS ….
Example of Descriptive Statistics
Present the Philippine population by constructing a
graph indicating the total number of Filipinos counted
during the last census by age group and sex

Session 1.6
TEACHING BASIC STATISTICS ….
Example of Inferential Statistics
A new milk formulation designed to improve the psychomotor
development of infants was tested on randomly selected infants.
Based on the results, it was concluded that the new milk formulation is
effective in improving the psychomotor development of infants.

Session 1.7
TEACHING BASIC STATISTICS ….
Inferential Statistics
Larger Set
(N units/observations)
Smaller Set
(n units/observations)
Inferences and
Generalizations

Session 1.8
TEACHING BASIC STATISTICS ….
Key Definitions
Parameters are numerical measures
that describe the population or universe
of interest. Usually donated by Greek
letters;  (mu),  (sigma),  (rho), 
(lambda),  (tau),  (theta),  (alpha) and
 (beta).
Statistics are numerical measures of a
sample

Session 1.9
TEACHING BASIC STATISTICS ….
VARIABLES
Qualitative Quantitative
ContinuousDiscrete
Types of Variables
Qualitative variable
Describes the quality or
character of something
Quantitative variable
Describes the amount or
number of something
a.Discrete
countable
b.Continuous
Measurable (measured
using a continuous scale
such as kilos, cms, grams)
c.Constant

Session 1.10
TEACHING BASIC STATISTICS ….
Levels of Measurement
1.Nominal
Numbers or symbols used to classify units
into distinct categories
2.Ordinal scale
Accounts for order; no indication of distance
between positions
3.Interval scale
Equal intervals (fixed unit of measurement);
no absolute zero
4.Ratio scale
Has absolute zero

Session 1.11
TEACHING BASIC STATISTICS ….
Methods of Collecting Data

Objective Method
Subjective Method
Use of Existing Records

Session 1.12
TEACHING BASIC STATISTICS ….
Methods of Presenting Data
Textual
Tabular
Graphical

Note: The Slides were taken from Elementary
Statistics: A Handbook of Slide Presentation
prepared by Z.V.J. Albacea, C.E. Reano, R.V.
Collado, L.N. Comia and N.A. Tandang in 2005
for the Institute of Statistics, CAS, UP Los
Banos
BASIC CONCEPTS IN
SAMPLING AND
SAMPLING TECHNIQUES

Session 3.14
TEACHING BASIC STATISTICS ….
Sampling
Process
Sample
Data
Universe
Inferences/Generalization
(Subject to Uncertainty)
INFERENTIAL STATISTICS

Session 3.15
TEACHING BASIC STATISTICS ….
WHY DO WE USE SAMPLES?
1. Reduced Cost
2. Greater Speed or Timeliness
3. Greater Efficiency and Accuracy
4. Greater Scope
5. Convenience
6. Necessity

Session 3.16
TEACHING BASIC STATISTICS ….
TWO TYPES OF SAMPLES
1. Probability sample
2. Non-probability sample

Session 3.17
TEACHING BASIC STATISTICS ….
BASIC SAMPLING TECHNIQUES
Simple Random Sampling
Stratified Random Sampling
Systematic Random Sampling
Cluster Sampling
Slide No. 3.20

Session 3.18
TEACHING BASIC STATISTICS ….
SIMPLE RANDOM SAMPLING
Most basic method of drawing a
probability sample
Assigns equal probabilities of
selection to each possible sample
Results to a simple random sample

Session 3.19
TEACHING BASIC STATISTICS ….
STRATIFIED RANDOM SAMPLING
The universe is divided into L
mutually exclusive sub-universes
called strata.
Independent simple random
samples are obtained from each
stratum.
Note:
1 1

L L
h h
h h
N N n n
 
  
Slide No. 3.14
Slide No. 3.15

Session 3.20
TEACHING BASIC STATISTICS ….
ILLUSTRATION
C
D
B
A
B
Slide No. 3.13

Determining Adequate Determining Adequate
Sample SizeSample Size

Sampling Formula
(Slovin’s)
N
n = -----------
1 + e
2
N
Where n = sample size
N = population size
e = margin of error

Example for Slovin’s
Formula
If N = 3000 and e = .05, then n is
3000
n = -------------------
1 + (.05)
2
(3000)
n = 3000/8.5 = 352.94 = 353

Strata/Department Number of
respondents
Number of samples
Surgery 800 94
Medical 500 59
Pedia 1000 118
Obygyney 700 82
Total 3000 353

Session 3.25
TEACHING BASIC STATISTICS ….
SYSTEMATIC SAMPLING
Adopts a skipping pattern in the selection
of sample units
Gives a better cross-section if the listing is
linear in trend but has high risk of bias if
there is periodicity in the listing of units in
the sampling frame
Allows the simultaneous listing and
selection of samples in one operation

Session 3.26
TEACHING BASIC STATISTICS ….
Population
Systematic
Sample
ILLUSTRATION

Session 3.27
TEACHING BASIC STATISTICS ….
CLUSTER SAMPLING
It considers a universe divided into N
mutually exclusive sub-groups called
clusters.
A random sample of n clusters is selected
and their elements are completely
enumerated.
It has simpler frame requirements.
It is administratively convenient to
implement.
Slide No. 3.19
Slide No. 3.11

Session 3.28
TEACHING BASIC STATISTICS ….
ILLUSTRATION
Population
Cluster Sample
Slide No. 3.18

Session 3.29
TEACHING BASIC STATISTICS ….
SIMPLE TWO-STAGE SAMPLING
In the first stage, the units are grouped into N sub-
groups, called primary sampling units (psu’s) and
a simple random sample of n psu’s are selected.
Illustration:
A PRIMARY SAMPLING
UNIT



Session 3.30
TEACHING BASIC STATISTICS ….
SIMPLE TWO-STAGE SAMPLING
In the second stage, from each of the n psu’s
selected with M
i
elements, simple random sample
of m
i units, called secondary sampling units ssu’s,
will be obtained.
Illustration:
A SECONDARY
SAMPLING UNIT
SAMPLE

Session 1.31
TEACHING BASIC STATISTICS ….
MeanMedianMode
Summary Measures
Variation
Variance
Standard Deviation
Coefficient of
Variation
Range
Location
Maximum
Minimum
Percentile
Quartile
Decile
Median
Interquartile
Range
Skewness
Kurtosis
Central
Tendency

Session 1.32
TEACHING BASIC STATISTICS ….
A single value that is used to identify
the “center” of the data
it is thought of as a typical value of
the distribution
precise yet simple
most representative value of the
data
Measures of Central Tendency

Session 1.33
TEACHING BASIC STATISTICS ….
Mean
Most common measure of the center
Also known as arithmetic average
1 1 2
N
i
i N
X
X X X
N N

   
 


1 21
n
i
ni
x
x x x
x
n n

  
 


Population Mean:
Sample Mean:

Mean:
•It comes into 2 different
forms:
1) Simple Mean
2) Weighted Mean

Example 1:
A study was done on 5 typical fast-food
meals in Metro Manila. The following table
shows the amount of fat, in number of
teaspoons, present in each meal. Calculate
the mean amount of fat for these 5 fast-
food meals.
Fast-food meal A B C D E
Fat (in tsp)1418221016

How to solve the simple
mean:
•The simple mean is obtained by
adding all the values/
observations of a certain
variable and divide the sum by
the total number of values,
cases or observations.

•To obtain the simple mean amount
of fat for the 5 fast-food meals
•Mean = (14+18+22+10+16)/5
•Mean = 80/5 = 16
•This means to say that mean fat
content of the 5 fast-food meals
is too much.
Fast-food meal A B C D E
Fat (in tsp)1418221016

Example 2:
•The following represents the final
grades obtained by a nursing
student one summer term:
•Anatomy (5 units) - - - 93
•Chemistry (3 units) - - - 88
•PE 2 (2 units) - - - 89
–Find the weighted average of the
student.

To solve for the weighted average To solve for the weighted average
of the student we have...of the student we have...
w
i
x
i
Mean = ----------
w
93(5) + 88(3) + 89(2)
Mean = --------------------------
10
465 + 264 + 178 907
Mean = ----------------------- = -------- = 90.7 (Excellent)
10 10

Example 3:
•The following represents the responses of
50 randomly chosen respondents in one
item of a research questionnaire:
•Very Strongly Agree (5) - - - 17
•Strongly Agree (4) - - - 11
•Agree (3) - - - 9
•Disagree (2) - - - 12
•Strongly Disagree (1) - - - 1
–Find the weighted response of the
respondents.

To solve for the weighted To solve for the weighted
response we have...response we have...
w
i
x
i
Mean = ----------
w
5(17) + 4(11) + 3(9) + 2(12) + 1(1)
Mean = ------------------------------------------
50
85+44+27+24+1 181
Mean = ----------------------- = -------- = 3.62 (Strongly Agree)
50 50

Table of Interpretation
(5 pt. Likert Scale)
4.20 – 5.00 Very Strongly
Agree
3.40 – 4.19 Strongly Agree
2.60 – 3.39 Agree
1.80 – 2.59 Disagree
1.00 – 1.79 Strongly Disagree

Session 1.43
TEACHING BASIC STATISTICS ….
Properties of the Mean
may not be an actual
observation in the data set
can be applied in at least
interval level
easy to compute
every observation contributes to
the value of the mean

Session 1.44
TEACHING BASIC STATISTICS ….
Properties of the Mean
subgroup means can be combined to come up
with a group mean (use weighted mean)
easily affected by extreme values
0 1 2 3 4 5 6 7 8 9 10
0 1 2 3 4 5 6 7 8 9 10 12 14

Mean = 5
Mean = 6

Session 1.45
TEACHING BASIC STATISTICS ….
Median
Divides the observations into two equal
parts
If the number of observations is odd, the
median is the middle number.
If the number of observations is even, the
median is the average of the 2 middle
numbers.
Sample median denoted as
while population median is denoted as
x
~

~

Median for Odd SampleMedian for Odd Sample
Odd???

The array for the data A is :
10, 14, 16, 18, 22
•To obtain the median fat content
of the 5 meals we have to use
the median formula for odd
sample since n = 5.
•Median = [(n + 1)/2]
s
•Median = (5 + 1)/2
•Median = 3
rd
item = 16

Median for Median for
Even SampleEven Sample
What is
even?

The following are samples scores
obtained from a 75 item summative test:
(n= 12) 48, 53, 63, 65, 45, 47, 52, 48,
63, 54, 63, 53
•Since n = 12 (even).
•Median = [ 6
th
s
+ 7
th
s
/2]
•Median = [(53 + 54)/2] = 53.5
Array : 45, 47, 48, 48, 52, 53, 54, 55, 63, 63, 63, 65

Session 1.50 TEACHING BASIC
STATISTICS ….
Properties of a Median
•may not be an actual observation in
the data set
•can be applied in at least ordinal
level
•a positional measure; not affected
by extreme values
0 1 2 3 4 5 6 7 8 9 100 1 2 3 4 5 6 7 8 9 10 12 14

Median = 5

Session 1.51
TEACHING BASIC STATISTICS ….
Mode
occurs most frequently
nominal average
may or may not exist
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
Mode = 9
0 1 2 3 4 5 6
No Mode

A set of data is said to be …
Unimodal or monomodal
if it has only one mode.
Example: 33, 35, 35, 38,
40, 46
Its mode is 35.

A set of data is said to be …
Bimodal if it has two
modes.
Example: 33, 35, 35, 38,
40, 40, 46
Its modes are 35 and 40.

A set of data is said to be …
Multimodal if it has more than
two modes.
Example: 33, 35, 35, 38, 40, 40,
46, 46, 51, 58, 58, 60
Its modes are 35, 40, 46 and 58.

Session 1.55
TEACHING BASIC STATISTICS ….
Properties of a Mode
can be used for qualitative as
well as quantitative data
may not be unique
not affected by extreme values
can be computed for
ungrouped and grouped data

Session 1.56
TEACHING BASIC STATISTICS ….
Properties of a Mode
can be used for qualitative as
well as quantitative data
may not be unique
not affected by extreme values
can be computed for
ungrouped and grouped data

Session 1.57
TEACHING BASIC STATISTICS ….
Mean, Median & Mode
Use the mean when:
sampling stability is desired
other measures are to be
computed

Session 1.58
TEACHING BASIC STATISTICS ….
Mean, Median & Mode
Use the median when:
the exact midpoint of the
distribution is desired
there are extreme
observations

Session 1.59
TEACHING BASIC STATISTICS ….
Mean, Median & Mode
Use the mode when:
when the "typical" value is
desired
when the dataset is measured
on a nominal scale

Session 1.60
TEACHING BASIC STATISTICS ….
Measures of Location
 A Measure of Location summarizes a
data set by giving a value within the
range of the data values that describes
its location relative to the entire data set
arranged according to magnitude (called
an array).
Some Common Measures:
 Minimum, Maximum
 Percentiles, Deciles, Quartiles

Session 1.61
TEACHING BASIC STATISTICS ….
Maximum and Minimum
Minimum is the smallest value in the
data set, denoted as MIN.
Maximum is the largest value in the
data set, denoted as MAX.

Session 1.62
TEACHING BASIC STATISTICS ….
Percentiles
Numerical measures that give the
relative position of a data value
relative to the entire data set.
Divide an array (raw data arranged
in increasing or decreasing order of
magnitude) into 100 equal parts.
The j
th
percentile, denoted as P
j
, is
the data value in the the data set
that separates the bottom j% of the
data from the top (100-j)%.

Session 1.63
TEACHING BASIC STATISTICS ….
EXAMPLE
Suppose LJ was told that relative
to the other scores on a certain
test, his score was the 95
th

percentile.
 This means that (at least) 95%
of those who took the test had
scores less than or equal to LJ’s
score, while (at least) 5% had
scores higher than LJ’s.

Session 1.64
TEACHING BASIC STATISTICS ….
Deciles
Divide an array into ten equal
parts, each part having ten
percent of the distribution of
the data values, denoted by D
j
.
The 1
st
decile is the 10
th

percentile; the 2
nd
decile is the
20
th
percentile…..

Session 1.65
TEACHING BASIC STATISTICS ….
Quartiles
Divide an array into four equal
parts, each part having 25% of
the distribution of the data
values, denoted by Q
j.
The 1
st
quartile is the 25
th

percentile; the 2
nd
quartile is
the 50
th
percentile, also the
median and the 3
rd
quartile is
the 75
th
percentile.

Session 1.66
TEACHING BASIC STATISTICS ….
Measures of Variation
A measure of variation is a
single value that is used to
describe the spread of the
distribution
A measure of central tendency
alone does not uniquely
describe a distribution

Session 1.67
TEACHING BASIC STATISTICS ….
Mean = 15.5
s = 3.338

11 12 13 14 15 16 17 18 19 20 21
11 12 13 14 15 16 17 18 19 20 21
Data B
Data A
Mean = 15.5
s = .9258
11 12 13 14 15 16 17 18 19 20 21
Mean = 15.5
s = 4.57
Data C
A look at dispersion…

Session 1.68
TEACHING BASIC STATISTICS ….
Two Types of Measures of
Dispersion
Absolute Measures of Dispersion:
 Range
 Inter-quartile Range
 Variance
 Standard Deviation
Relative Measure of Dispersion:
 Coefficient of Variation

Session 1.69
TEACHING BASIC STATISTICS ….
Range (R)
The difference between the maximum and
minimum value in a data set, i.e.

R = MAX – MIN
Example: Pulse rates of 15 male residents of a
certain village
54 58 58 60 62 65 66 71
74 75 77 78 80 82 85
R = 85 - 54 = 31

Session 1.70
TEACHING BASIC STATISTICS ….
Some Properties of the Range
The larger the value of the
range, the more dispersed
the observations are.
It is quick and easy to
understand.
A rough measure of
dispersion.

Session 1.71
TEACHING BASIC STATISTICS ….
Inter-Quartile Range (IQR)
The difference between the third quartile and
first quartile, i.e.

IQR = Q
3
– Q
1

Example: Pulse rates of 15 residents of a
certain village
54 58 58 60 62 65 66 71
74 75 77 78 80 82 85
IQR = 78 - 60 = 18

Session 1.72
TEACHING BASIC STATISTICS ….
Some Properties of IQR
Reduces the influence of
extreme values.
Not as easy to calculate
as the Range.

Session 1.73
TEACHING BASIC STATISTICS ….
Variance
important measure of variation
shows variation about the mean
Population variance
Sample variance
N
X
N
i
i



1
2
2
)(

1
)(
1
2
2





n
xx
s
n
i
i

Session 1.74
TEACHING BASIC STATISTICS ….
Standard Deviation (SD)
most important measure of variation
square root of Variance
has the same units as the original data
Population SD
Sample SD
N
X
N
i
i



1
2
)(

1
)(
1
2





n
xx
s
n
i
i

Session 1.75
TEACHING BASIC STATISTICS ….
(Sample) Data: 10 12 14 15 17 18 24
n = 7 Mean =16
309.4
7
2
)1624(
2
)1618(
2
)1617(
2
)1615(
2
)1614(
2
)1612(
2
)1610(


s
Computation of Standard Deviation

Session 1.76
TEACHING BASIC STATISTICS ….
Remarks on Standard Deviation
If there is a large amount of variation,
then on average, the data values will be
far from the mean. Hence, the SD will be
large.
If there is only a small amount of
variation, then on average, the data
values will be close to the mean. Hence,
the SD will be small.

Session 1.77
TEACHING BASIC STATISTICS ….
Mean = 15.5
s = 3.338 11 12 13 14 15 16 17 18 19 20 21
11 12 13 14 15 16 17 18 19 20 21
Data B
Data A
Mean = 15.5
s = .9258
11 12 13 14 15 16 17 18 19 20 21
Mean = 15.5
s = 4.57
Data C
Comparing Standard Deviations
(comparable only when units of measure are the same and
the means are not too different from each other)

Session 1.78
TEACHING BASIC STATISTICS ….
Example: Team A - Heights of five marathon players in inches
65”
65 “ 65 “
65 “ 65 “ 65 “
Mean = 65
S = 0
Comparing Standard Deviations

Session 1.79
TEACHING BASIC STATISTICS ….
Example: Team B - Heights of five marathon players in inches
62 “ 67 “
66 “ 70 “ 60 “
Mean = 65”
s = 4.0”
Comparing Standard Deviation

Session 1.80
TEACHING BASIC STATISTICS ….
Properties of Standard Deviation
It is the most widely used measure of
dispersion. (Chebychev’s Inequality)
It is based on all the items and is rigidly
defined.
It is used to test the reliability of measures
calculated from samples.
The standard deviation is sensitive to the
presence of extreme values.
It is not easy to calculate by hand (unlike the
range).

Session 1.81
TEACHING BASIC STATISTICS ….
Chebyshev’s Rule
It permits us to make statements about
the percentage of observations that
must be within a specified number of
standard deviation from the mean
The proportion of any distribution that
lies within k standard deviations of the
mean is at least 1-(1/k
2
) where k is any
positive number larger than 1.
This rule applies to any distribution.

Session 1.82
TEACHING BASIC STATISTICS ….
For any data set with mean () and
standard deviation (SD), the following
statements apply:
At least 75% of the observations are
within 2SD of its mean.
At least 88.9% of the observations are
within 3SD of its mean.
Chebyshev’s Rule

Session 1.83
TEACHING BASIC STATISTICS ….
Illustration
At least 75%
At least 75% of the observations
are within 2SD of its mean.

Session 1.84
TEACHING BASIC STATISTICS ….
Example
The midterm exam scores of 100 STAT 1 students
last semester had a mean of 65 and a standard
deviation of 8 points.
Applying the Chebyshev’s Rule, we can say that:
1. At least 75% of the students had scores
between 49 and 81.
2. At least 88.9% of the students had scores
between 41 and 89.

Session 1.85
TEACHING BASIC STATISTICS ….
Coefficient of Variation (CV)
measure of relative variation
usually expressed in percent
shows variation relative to mean
used to compare 2 or more groups
Formula :
100%






Mean
SD
CV

Session 1.86
TEACHING BASIC STATISTICS ….
Comparing CVs
Stock A: Average Price = P50
SD = P5
CV = 10%
Stock B: Average Price = P100
SD = P5
CV = 5%

Session 1.87
TEACHING BASIC STATISTICS ….
Measure of Skewness
Describes the degree of departures of the
distribution of the data from symmetry.
The degree of skewness is measured by
the coefficient of skewness, denoted as SK
and computed as,
 
SD
MedianMean
K


3
S

Session 1.88
TEACHING BASIC STATISTICS ….
What is Symmetry?
A distribution is said to be
symmetric about the mean,
if the distribution to the left
of mean is the “mirror
image” of the distribution to
the right of the mean.
Likewise, a symmetric
distribution has SK=0 since
its mean is equal to its
median and its mode.

Session 1.89
TEACHING BASIC STATISTICS ….
SK > 0
positively
skewed
Measure of Skewness
SK < 0
negatively skewed

Session 1.90
TEACHING BASIC STATISTICS ….
Measure of Kurtosis
Describes the extent of peakedness or
flatness of the distribution of the data.
Measured by coefficient of kurtosis (K)
computed as,
 
4
1
4
3
N
i
i
X
K
N




 

Session 1.91
TEACHING BASIC STATISTICS ….
K = 0
mesokurtic
K > 0
leptokurtic
K < 0
platykurtic
Measure of Kurtosis
Tags