AbdullahAbdullah76320
34 views
32 slides
Aug 07, 2024
Slide 1 of 32
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
About This Presentation
Data analysis
Size: 650.74 KB
Language: en
Added: Aug 07, 2024
Slides: 32 pages
Slide Content
Measures of variation
Data for 5 starting
players from two
basketball teams:
A: 72 , 73, 76, 76, 78
B: 67, 72, 76, 76, 84
Mean, Median & Mode
Measures of Variation
Ex. 1 continued. To describe the difference in the
two data sets, we use a descriptive measure that
indicates the amount of spread , or dispersion, in a
data set.
Range: difference between maximum and
minimum values of the data set.
Measures of Variation
Range of team A: 78-72=6
Range of team B: 84-67=17
Advantage of range: 1) easy to compute
Disadvantage: only two values are
considered.
Unlike the range, the sample standard deviation
takes into account all data values. The following
procedure is used to find the sample standard
deviation:
1. Find mean of data : =
1
n
i
x
n
72 73 76 76 78
75
5
Step 2: Find the deviation of each score from the
mean
x
72
72-75 = -3
73 73–75 = -2
76 76-75 = 1
76 76-75 = 1
78 78-75= 3
Note that
the sum of
the
deviations =
0
0
( ) 0x x
x x
The sum of the deviations from mean will always be zero.
This can be used as a check to determine if your calculations
are correct.
Note that
_
( ) 0x x
Step 3: Square each deviation from the mean. Find the sum of
the squared deviations.
Height deviation squared deviation
72 -3 9
73 -2 4
76 1 1
76 1 1
78 3 9
= 24
2
1
( )
n
i
i
X X
Step 4: The sample variance is determined by dividing the sum of
the squared deviations by (n-1) (the number of scores minus one)
Note that sum of squared deviations is 24
Sample variance is
=
2
2
_
1
( )
1
i
n
i
x x
s
n
24
6
5 1
The four steps can be combined into one mathematical
formula for the sample standard deviation. The sample
standard deviation is the square root of the quotient of the sum
of the squared deviations and (n-1)
_
2
1
( )
1
i
n
i
x x
s
n
Sample Standard Deviation:
=6
Four step procedure to calculate sample standard
deviation:
1. Find the mean of the data
2. Set up a table which lists the data in the left hand
column and the deviations from the mean in the next
column.
3. In the third column from the left, square each
deviation and then find the sum of the squares of the
deviations.
4. Divide the sum of the squared deviations by (n-1)
and then take the positive square root of the result.
Problem for students:
By hand: Find variance and
standard deviation of data: 5, 8, 9,
7, 6
Answer: Standard deviation is
approximately 1.581 and the
variance is the square of 1.581 =
2.496
Standard deviation of grouped data:
1. Find each class midpoint.
2. Find the deviation of each value
from the mean
3. Each deviation is squared and then
multiplied by the class frequency.
4. Find the sum of these values and
divide the result by (n-1) (one less
than the total number of
observations).
2
1
( )
1
k
i i
i
x x f
s
n
Here is the frequency distribution of the number of rounds of golf
played by a group of golfers. The class midpoints are in the second
column. The mean is 29.35 . Third column represents the square of the
difference between the class midpoint and the mean. The 5
th
column is the
product of the frequency with values of the third column. The final result is
highlighted in red
class midpoint data-mean frequency (x-mean)^2*frequency x*f
squared
[0,7) 3.5668.3948 0 0 0
[7,14) 10.5355.4482 2 710.8963556 21
[14,21) 17.5140.5015 10 1405.015111 175
[21,28) 24.523.55484 21 494.6517333 514.5
[28,35) 31.54.608178 23 105.9880889 724.5
[35,42) 38.583.66151 14 1171.261156 539
[42,49) 45.5260.7148 5 1303.574222 227.5
75 5191.38666729.35333
8.37579094
2
1
( )
1
k
i i
i
x x f
s
n
Variance For Grouped Data
Interpreting the standard deviation
1. The more variation in a data set, the greater the
standard deviation.
2. The larger the standard deviation, the more
“spread” in the shape of the histogram representing
the data.
3. Standard deviation is used for quality control in
business and industry. If there is too much variation
in the manufacturing of a certain product, the
process is out of control and adjustments to the
machinery must be made to insure more uniformity
in the production process.
Three standard deviations rule
“ Almost all” the data will lie within 3 standard deviations
of the mean
Mathematically, nearly 100% of the data will fall in the
interval determined by
_ _
( 3 , 3 )x s x s
Empirical Rule
If a data set is “mound shaped” or “bell-shaped”,
then:
1. approximately 68% of the data lies within one
standard deviation of the mean
2. Approximately 95% data lies within 2 standard
deviations of the mean.
3. About 99.7 % of the data falls within 3 standard
deviations of the mean.
Yellow region is 68% of the total area. This includes all data within one
standard deviation of the mean.
Yellow region plus brown regions include 95% of the total area. This
includes all data that are within two standard deviations from the
mean.
Question
A company produces a lightweight valve that is
specified to weigh 1365 grams. Unfortunately, because
of imperfections in the manufacturing process not all of
the valves produced weigh exactly 1365 grams. In fact,
the weights of the valves produced are normally
distributed with a mean weight of 1365 grams and a
standard deviation of 294 grams. Within what range of
weights would approximately 95% of the valve weights
fall? Approximately 16% of the weights would be more
than what value? Approximately 0.15% of the weights
would be less than what value?
Solution
Chebyshev’s Theorem
The empirical rule applies only when data
are known to be approximately normally
distributed.
Chebyshev’s theorem applies to all
distributions regardless of their shape
and thus can be used whenever the data
distribution shape is unknown or is non-
normal.
Question
In the computing industry the average age of
professional employees tends to be younger than in
many other business professions. Suppose the
average age of a professional employed by a
particular computer firm is 28 with a standard
deviation of 6 years. A histogram of professional
employee ages with this firm reveals that the data
are not normally distributed but rather are amassed
in the 20s and that few workers are over 40. Apply
Chebyshev’s theorem to determine within what
range of ages would at least 80% of the workers’
ages fall.
Coefficient of Variation
The coefficient of variation is a
statistic that is the ratio of the
standard deviation to the mean
expressed in percentage and is
denoted CV.
CV = (σ/μ)*(100)
Interquartile Range
The interquartile range is the range of values
between the first and third quartile.
Essentially, it is the range of the middle 50% of the
data and is determined by computing the value of Q3
- Q1.
The interquartile range is especially useful in
situations where data users are more interested in
values toward the middle and less interested in
extremes.
In describing a real estate housing market, Realtors
might use the interquartile range as a measure of
housing prices when describing the middle half of the
market for buyers who are interested in houses in
the midrange.