DATA ANALYSIS FOR BUSINESS ch02-Discriptive Statistics_Tabular and Graphical Methods.ppt
kellymeinhold327
27 views
44 slides
May 09, 2024
Slide 1 of 44
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
About This Presentation
DATA ANALYSIS FOR BUSINESS CH2
Size: 3.12 MB
Language: en
Added: May 09, 2024
Slides: 44 pages
Slide Content
11
Chapter 2
Descriptive Statistics: Tabular and
Graphical Methods
Graphically Summarizing Qualitative Data
Graphically Summarizing Quantitative Data
Stem-and-leaf Display
Misleading Graphs and Charts
2
2.1 Graphically Summarizing Qualitative Data
With qualitative data, names identify the different
categories
This data can be summarized using a frequency
distribution
Frequency distribution: A table that summarizes
the number (or frequency) of items in each of
several non-overlapping classes.
2-3
Describing Pizza Preferences
A business entrepreneur plans to open a pizza restaurant
in a college town, and wishes to study the pizza
preferences of the college students.
Table 2.1 lists pizza preferences of 50 college students
Table 2.1 does not reveal much useful information
Table 2.1
Example 2.1
4
A frequency distribution is a
useful summary
The frequency distribution
shows us how the
preferences are distributed
among the six restaurants.
Papa’s John’s is the mostpopular restaurant.
Papa’s John’s is roughly twiceas popular of the next three
runners up –Bruno’s, Little Caesars, and Will’s.
Pizza Hut and Domino’s are the least
preferred restaurants
Table 2.2
5
Relative Frequency and Percent Frequency
Relative frequency summarizes the proportion (or fraction)
of items in each class
If the data set consists of nobservations,
Multiply times 100 to obtain the percent frequency.
Table 2.3
2-6
Bar Charts and Pie Charts
Bar chart: A vertical or horizontal rectangle
represents the frequency for each category
Height can be frequency, relative frequency, or
percent frequency
Pie chart: A circle divided into slices where
the size of each slice represents its relative
frequency or percent frequency
2-7
Excel Bar and Pie Chart of the Pizza
Preference Data
Figures 2.1 and 2.2
8
Exercise 2.1
Jeep Model
Frequency Relative
Frequency
Percent
Frequency
Commander 71 0.2829 28.29%
Grand Cherokee 70 0.2789 27.89%
Liberty 80 0.3187 31.78%
Wrangler 30 0.1195 11.95%
251 1.0000 100.00%
Table 2.4
Table 2.4 is the frequency distribution of
vehicles sold in 2006 by the Greater
Cincinnati Jeep dealers.
Please find the relative frequency and
percent frequency.
99
Comparison
Percentage of Automobiles Sold by Manufacturer, 1970
versus 1997
Figures 2.3 and 2.4
2-10
2.2 Graphically Summarizing Quantitative
Data
Often need to summarize and describe the shape of
the distribution of a population or sample of
measurements.
Summarize quantitative data by using
frequency distribution:
a list of data classes with the count or “frequency” of values
that belong to each class
“Classify and count”
The frequency distribution is a table
histogram:
a pictureof the frequency distribution
1111
Constructing the frequency distribution
Steps in making a frequency distribution:
1.Determine the number of classes K
2.Determine the class length
3.Form non-overlapping classes of equal width
4.Tally and count the number of measurements in
each class
5.Graph the histogram
1212
Example 2.2
The Payment Time Case: Reducing
Payment Times
In order to assess the effectiveness of the system, the
consulting firm will study the payment times for invoices
processed during the first three months of the system’s
operation.
During this period, 7,823 invoices are processed using
the new system. To study the payment times of these
invoices, the consulting firm numbers the invoices from
0001 to 7823 and uses random numbers to select a
random sample of 65 invoices. The resulting 65 payment
times are given in Table 2.5
1313
2229161518171213171615
1917102115141718122014
1615162022142519231519
1823221616191318242426
1318171524151714181721
16212519202716171621
Table 2.5 A Sample of Payment Times (in Days)
for 65 Randomly Selected Invoices.
Example 2.2 #2
Table 2.5
1414
Group all of the ndata into Knumber of classes
Kis the smallest whole number for which
2
K
n
In Examples 2.2 , n= 65
For K= 6, 2
6
= 64, < n
For K= 7, 2
7
= 128, > n
So use K= 7 classes
Step1: The number of classes K
1515
Class length L is the step size from one to the next
In Examples 2.2, The Payment Time Case, the largest
value is 29 days and the smallest value is 10 days, so
Arbitrarily round the class length up to 3 days/classK
L
value smallest - value Largest
days/class 71432
classes 7
days 19
classes 7
days 10 - 29
.L
Step2: Class Length L
16
The classes start on the smallest data value. This is the lower
boundaryof the first class. The upper boundaryof the first
class is smallest value +L.
•In the example 2.2, the lower boundary of the first class is 10, the
upper boundary of the first class is 10+3=13. So the first class -10
days and less than 13 days (10≤n<13)-includes 10,11,and 12 days.
The lower boundary of the second class is the upper boundary of
the first class. The upper boundary of the second class is adding
L to this lower boundary.
In the example 2.2, the second class-13 days and less than 16 days
(13≤n<16)--includes 13,14, and 15 days.
And so on
Step 3: Form non-overlapping class of equal width
(Define the boundaries of classes)
1717
Classes (days)Tally Frequency
10 < 13 ||| 3
13 < 16 |||| 14
16 < 19 ||| 23
19 < 22 || 12
22 < 25 ||| 8
25 < 28 |||| 4
28 < 31 | 1
65
||||||||
||||||||||||||||
||||||||
||||
Check: All frequencies must sum to n
Step 4: Tallies and Frequencies
Table 2.6
18
Step 5: Graph the histogram
Show the frequency distribution in a histogram
Figure 2.5
19
A graph in which rectangles represent the
classes
The base of the rectangle represents the class
length
The height of the rectangle represents
the frequency in a frequency histogram, or
the relative frequency in a relative frequency
histogram
Histogram
20
The relative frequencyof a class is the proportion or
fraction of data that is contained in that class
Calculated by dividing the class frequency by the total
number of data values
For example:
Relative frequency may be expressed as either a
decimal or percent (percent frequencydistribution)
A relative frequency distribution is a list of all the data
classes and their associated relative frequencies
Relative Frequency, Percent Frequency
Classes (days)Frequency Relative FrequencyPercent Frequency
10 < 13 3 3/65 = 0.0462 4.62%
13 < 15 14 14/65 = 0.2154 21.54
… … …
2121
Classes (days)Frequency Relative Frequency
10 < 13 3 3/65 = 0.0462
13 < 16 14 14/65 = 0.2154
16 < 19 23 0.3538
19 < 22 12 0.1846
22 < 25 8 0.1231
25 < 28 4 0.0615
28 < 31 1 0.0154
65 1.0000
Check: All relative frequencies must sum to 1
Relative Frequency: Example 2.2
Table 2.7
2222
Relative Frequency Histogram
Example 2.2: The Payment Times Case
Figure 2.6
The tail on the right appears to be longer than the tail on
the left. We say: the distribution is skewed to the right.
23
Remarks
The procedure introduced is not the only way to
construct a histogram.
e.g. it is not necessary to
set the lower boundary of
the 1
st
class equal to the
smallest measurement.
Sometimesit is desirable to let the nature of the
problem determine the histogram classes.
e.g. 10-year lengths for ages of the residents in a city
Sometimeshistogram with unequal class
lengths is better. e.g. open-ended classes
Figure 2.7
24
Some common distribution shapes
Right SkewedLeft Skewed Symmetric
Figure 2.8
2525
Skewness(偏度)
Skewed distributions are not symmetrical about their
center. Rather, they are lop-sided with a longer tail on
one side or the other.
•A population is distributed according to its relative
frequency curve
•The skew is the side with the longer tail
Right SkewedLeft Skewed Symmetric
Figure 2.9
26
FrequencyPolygons
Plot a point above each class midpoint at a height
equal to the frequency of the class
Useful when comparing two or more distributions
Table 2.8
Example 2.3Comparing Two Grade Distribution
32 63 69 85 91
45 64 69 86 92
50 64 72 87 92
56 65 76 87 93
58 66 78 88 93
60 67 81 89 94
61 67 83 90 96
61 68 83 90 98
Scores for Statistics Exam 1
(in increasing order)
Classes FrequencyPercent
Frequency
2-28
Cumulative Distributions
Another way to summarize a distribution is to
construct a cumulative distribution
To do this, use the same number of classes, class
lengths, and class boundaries used for the
frequency distribution
Rather than a count, we record the number of
measurements that are less than the upper
boundary of that class
In other words, a running total
2-29
Various Frequency Distribution
Table 2.10
2-30
Ogive
Ogive: A graph of a cumulative distribution
Plot a point above each upper class boundary at
height of cumulative frequency
Connect points with line segments
Can also be drawn using:
Cumulative relative frequencies
Cumulative percent frequencies
Figure 2.14
2-31
2.3 Stem-and-Leaf Displays
Purpose is to see the overall pattern of the
data, by grouping the data into classes
the variation from class to class
the amount of data in each class
the distribution of the data within each class
Best for small to moderately sized data
distributions
2-32
Car Mileage Example
Table 2.11
Example 2.4
3333
The stem-and-leaf display of car mileages:
29 8
30 13455677888
31 0012334444455667778899
32 011123344557788
33 03
29 + 0.8 = 29.8
33 + 0.0 = 33.0
33 + 0.3 = 33.3
Figure 2.15
Stem unit =1, Leaf unit =0.1
3434
Splitting The Stems
There are no rules that dictate the number of stem
values, so we can split the stems as needed
Starred classes (*) extend from 0.0 to 0.4
Unstarredclasses extend from 0.5 to 0.9
29 8
30 * 1 3 4
30 5 5 6 7 7 8 8 8
31 * 0 0 1 2 3 3 4 4 4 4 4
31 5 5 6 6 7 7 7 8 8 9 9
32 * 0 1 1 1 2 3 3 4 4
32 5 5 7 7 8
33 * 0 3
Figure 2.16
3535
Looking at the last stem-and-leaf display, the
distribution appears almost “symmetrical” (对称的)
The upper portion of the display…
Stems 29, 30*, 30, and 31*
… is almost a mirror image of the lower portion of
the display
Stems 31, 32*, 32, and 33*
3636
Constructing a Stem-and-Leaf Display
1.Decide what units will be used for the stems and the
leaves. As a general rule, choose units for the stems so
that there will be somewhere between 5 and 20 stems.
2.Place the stems in a column with the smallest stem at
the top of the column and the largest stem at the
bottom.
3.Enter the leaf for each measurement into the row
corresponding to the proper stem. The leaves should
be single-digit numbers (rounded values).
4.If desired, rearrange the leaves so that they are in
increasing order from left to right.
2-37
Constructing a Stem-and-Leaf Display
It is possible to construct a stem-and-leaf display
from measurements containing any number of digits.
Example 2.5
Table 2.13
Number of DVD players sold
for each of last 12 months
Stem and Leaf plot
for
Players
Sold
stem unit =1000
leaf unit =100
Frequency StemLeaf
1 135
2 143 7
3 152 7 9
3 161 5 7
2 171 9
0 18
1 190
12
13,50215,93214,739
15,24914,31217,111
19,01016,12116,708
17,88615,66516,475
Figure 2.17
Back-to-Back Stem-and-Leaf Display
Exam1 Exam2
2 3
3
4
5 4
0 5
8 6 5 5
4 4 3 1 1 0 6 2 3
9 9 8 7 6 5 6 6 7 7
2 7 1 3 4 4 4
8 6 7 5 6 7 7 8
3 3 1 8 0 2 3 4
9 8 7 7 6 5 8 5 6 6 7 7 8 9
4 3 3 2 2 1 0 0 9 0 1 1 2 3 3 4 4
8 6 9 5 7 9
We can construct a Back-
to-Back Stem-and-Leaf
Display if we wish to
comparetwo distributions.
Conclusion:
Exam 1: two concentrations
of scores (bimodal)
Exam 2: almost single
peaked and somewhat
skewed to the left
Figure 2.18
Example 2.6
Description of Quantitative 定量data
Table and Graph
Stem-and-leaf display (茎叶图)
Frequency distributions (频率分布)
Histogram (直方图)
Dot plot(点图)
4040
2.4 Misleading Graphs and Charts
Scale Break
Break the vertical scale to exaggerate effect
Mean Salaries at a Major University, 2002 -2005
Figure 2.19
4141
Misleading Graphs and Charts:
Scale Effects
Compress vs. stretch the vertical axis to exaggerate or minimize
the effect
Mean Salary Increases at a Major University, 2002 -2005
Figure 2.20
42
Chapter Summary
Frequency distribution
Bar chart and pie chart
Histogram
Shape of the distribution
Stem-and-leaf display
Misleading graphs and charts