One way anova

inciroglu 9,664 views 39 slides Oct 25, 2011
Slide 1
Slide 1 of 39
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27
Slide 28
28
Slide 29
29
Slide 30
30
Slide 31
31
Slide 32
32
Slide 33
33
Slide 34
34
Slide 35
35
Slide 36
36
Slide 37
37
Slide 38
38
Slide 39
39

About This Presentation

No description available for this slideshow.


Slide Content

Analysis of Variance
Chapter 12

Introduction
•Analysis of variance compares two or more
populations of interval data.
•Specifically, we are interested in determining
whether differences exist between the population
means.
•The procedure works by analyzing the sample
variance.

•The analysis of variance is a procedure that
tests to determine whether differences exits
between two or more population means.
•To do this, the technique analyzes the sample
variances
12.1 One Way Analysis of
Variance

One Way Analysis of Variance:
Example
•A magazine publisher wants to compare three
different styles of covers for a magazine that will
be offered for sale at supermarket checkout
lines. She assigns 60 stores at random to the
three styles of covers and records the number of
magazines that are sold in a one-week period.

One Way Analysis of Variance:
Example
•How do five bookstores in the same city differ in
the demographics of their customers? A market
researcher asks 50 customers of each store to
respond to a questionnaire. One variable of
interest is the customer’s age.

Graphical demonstration:
Employing two types of variability
Idea Behind
ANOVA

20
25
30
1
7
Treatment 1Treatment 2Treatment 3
10
12
19
9
Treatment 1Treatment 2Treatment 3
20
16
15
14
11
10
9
10x
1=
15x
2=
20x
3
=
10x
1=
15x
2=
20x
3
=
The sample means are the same as before,
but the larger within-sample variability
makes it harder to draw a conclusion
about the population means.
A small variability within
the samples makes it easier
to draw a conclusion about the
population means.

Idea behind ANOVA: recall the two-
sample t-statistic
•Difference between 2 means, pooled variances, sample sizes
both equal to n
•Numerator of t
2
: measures variation between the groups in
terms of the difference between their sample means
•Denominator: measures variation within groups by the pooled
estimator of the common variance.
•If the within-group variation is small, the same variation
between groups produces a larger statistic and a more
significant result.
2
1 1
2
2
2
( )
( )2
n
p
pn n
n
p
x yx y
s
s
x y
s
t
t
--
+
-
= =
=

•Example 12.1
–An apple juice manufacturer is planning to develop a new
product -a liquid concentrate.
–The marketing manager has to decide how to market the
new product.
–Three strategies are considered
•Emphasize convenience of using the product.
•Emphasize the quality of the product.
•Emphasize the product’s low price.
One Way Analysis of Variance:
Example

•Example 12.1 - continued
–An experiment was conducted as follows:
•In three cities an advertisement campaign was launched .
•In each city only one of the three characteristics
(convenience, quality, and price) was emphasized.
•The weekly sales were recorded for twenty weeks
following the beginning of the campaigns.
One Way Analysis of Variance

One Way Analysis of Variance
ConvnceQualityPrice
529 804 672
658 630 531
793 774 443
514 717 596
663 679 602
719 604 502
711 620 659
606 697 689
461 706 675
529 615 512
498 492 691
663 719 733
604 787 698
495 699 776
485 572 561
557 523 572
353 584 469
557 634 581
542 580 679
614 624 532
Weekly
sales
Weekly
sales
Weekly
sales

•Solution
–The data are interval
–The problem objective is to compare sales in three
cities.
–We hypothesize that the three population means are
equal
One Way Analysis of Variance

H
0
: m
1
= m
2
= m
3
H
1
: At least two means differ
To build the statistic needed to test the
hypotheses use the following notation:
•Solution
Defining the Hypotheses

Independent samples are drawn from k populations (treatment
groups).
1 2 k
X
11
x
21
.
.
.
X
n1,1
1
1
x
n
X
12
x
22
.
.
.
X
n2,2
2
2
x
n
X
1k
x
2k
.
.
.
X
nk,k
k
k
x
n
Sample size
Sample mean
First observation,
first sample
Second observation,
second sample
X is the “response variable”.
The variables’ value are called “responses”.
Notation

Terminology
•In the context of this problem…
Response variable – weekly sales
Responses – actual sale values
Experimental unit – weeks in the three cities when we
record sales figures.
Factor – the criterion by which we classify the populations
(the treatments). In this problems the factor is the marketing
strategy.
Factor levels – the population (treatment) names. In this
problem factor levels are the 3 marketing strategies: 1)
convenience, 2) quality, 3) price

Two types of variability are employed when
testing for the equality of the population
means
The rationale of the test statistic

The rationale behind the test statistic – I
•If the null hypothesis is true, we would expect all
the sample means to be close to one another
(and as a result, close to the grand mean).
•If the alternative hypothesis is true, at least
some of the sample means would differ.
•Thus, we measure variability between sample
means.

•The variability between the sample means is
measured as the sum of squared distances
between each mean and the grand mean.
This sum is called the
Sum of Squares for Groups
SSG
In our example treatments are
represented by the different
advertising strategies.
Variability between sample means

k
2
j j
j 1
SSG n(x x)
=
= -å
There are k treatments
The size of sample j
The mean of sample j
Sum of squares for treatment groups
(SSG)
Note: When the sample means are close to
one another, their distance from the grand
mean is small, leading to a small SSG. Thus,
large SSG indicates large variation between
sample means, which supports H
1.

•Solution – continued
Calculate SSG
1 2 3
k
2
j j
j 1
x 577.55 653.00 608.65
SSG n (x x)
x x
=
= = =
= -å
= 20(577.55 - 613.07
)2
+
+ 20(653.00 - 613.07)
2
+
+ 20(608.65 - 613.07)
2
=
= 57,512.23
The grand mean is calculated by
1 1 2 2
1 2
...
...
k k
k
nx n x n x
x
n n n
+ + +
=
+ + +
Sum of squares for treatment groups
(SSG)

Is SSG = 57,512.23 large enough to
reject H
0
in favor of H
1
?
See next.
Sum of squares for treatment groups
(SSG)

•Large variability within the samples weakens the
“ability” of the sample means to represent their
corresponding population means.
•Therefore, even though sample means may
markedly differ from one another, SSG must be
judged relative to the “within samples variability”.
The rationale behind test statistic – II

•The variability within samples is measured by
adding all the squared distances between
observations and their sample means.
This sum is called the
Sum of Squares for Error
SSEIn our example this is the
sum of all squared differences
between sales in city j and the
sample mean of city j (over all
the three cities).
Within samples variability

•Solution – continued
Calculate SSE
Sum of squares for errors (SSE)
åå
==
-=
===
k
j
jij
n
i
xxSSE
sss
j
1
2
1
2
3
2
2
2
1
)(
24.670,811,238,700.775,10
= (n
1
- 1)s
1
2
+ (n
2
-1)s
2
2
+ (n
3
-1)s
3
2
= (20 -1)10,774.44 + (20 -1)7,238.61+ (20-1)8,670.24
= 506,983.50

Is SSG = 57,512.23 large enough
relative to SSE = 506,983.50 to reject
the null hypothesis that specifies that
all the means are equal?
Sum of squares for errors (SSE)

To perform the test we need to calculate
the mean squaresmean squares as follows:
The mean sum of squares
Calculation of MSG -
Mean Square for treatment
Groups
1
57,512.23
3 1
28,756.12
SSG
MSG
k
=
-
=
-
=
Calculation of MSE
Mean Square for Error
45.894,8
360
50.983,509
=
-
=
-
=
kn
SSE
MSE

Calculation of the test statistic
28,756.12
8,894.45
3.23
MSG
F
MSE
=
=
=
with the following degrees of freedom:
v
1
=k -1 and v
2
=n-k
Required Conditions:
1. The populations tested
are normally distributed.
2. The variances of all the
populations tested are
equal.

And finally the hypothesis test:
H
0
: m
1
= m
2
= …=m
k
H
1
: At least two means differ
Test statistic:
R.R: F>F
a,k-1,n-k
MSG
F
MSE
=
The F test rejection region

The F test
H
o
: m
1
= m
2
= m
3
H
1
: At least two means differ
Test statistic F= MSG/ MSE= 3.23
15.3FFF:.R.R
360,13,05.0knk
»=>
---,-,a1
Since 3.23 > 3.15, there is sufficient evidence
to reject H
o
in favor of H
1
,

and argue that at least one
of the mean sales is different than the others.
28,756.12
8,894.17
3.23
MSG
F
MSE
=
=
=

-0.02
0
0.02
0.04
0.06
0.08
0.1
0 1 2 3 4
•Use Excel to find the p-value
–f
x
Statistical FDIST(3.23,2,57) = .0467
The F test p- value
p Value = P(F>3.23) = .0467

Excel single factor ANOVA
SS(Total) = SSG + SSE
Anova: Single Factor
SUMMARY
Groups Count Sum AverageVariance
Convenience 20 11551 577.5510775.00
Quality 20 13060 653.007238.11
Price 20 12173 608.658670.24
ANOVA
Source of VariationSS df MS F P-value F crit
Between Groups 57512 2 28756 3.230.0468 3.16
Within Groups 506984 57 8894
Total 564496 59

Multiple Comparisons
•When the null hypothesis is rejected, it may be
desirable to find which mean(s) is (are) different,
and at what ranking order.
•Two statistical inference procedures, geared at
doing this, are presented:
–“regular” confidence interval calculations
–Bonferroni adjustment

•Two means are considered different if the
confidence interval for the difference between
the corresponding sample means does not
contain 0. In this case the larger sample mean
is believed to be associated with a larger
population mean.
•How do we calculate the confidence intervals?
Multiple Comparisons

“Regular” Method
•This method builds on the equal variances confidence
interval for the difference between two means.
•The CI is improved by using MSE rather than s
p
2
(we use
ALL the data to estimate the common variance instead of
only the data from 2 samples)
2,
1 1
( )
. . ,
i j n k
i j
x x t s
n n
df n ks MSE
a-
- ± * +
= - =

Experiment-wise Type I error rate
(the effective Type I error)
•The preceding “regular” method may result in an increased
probability of committing a type I error.
•The experiment-wise Type I error rate is the probability of
committing at least one Type I error at significance level a. It
is calculated by:
experiment-wise Type I error rate = 1-(1 – a)
g
where g is the number of pairwise comparisons (i.e. g =
k
C
2
=
k(k-1)/2.
•For example, if a=.05, k=4, then
experiment-wise Type I error rate =1-.735=.265
•The Bonferroni adjustment determines the required Type I error
probability per pairwise comparison (a
*
) ,

to secure a pre-
determined overall a.

•The procedure:
–Compute the number of pairwise comparisons (g)
[g=k(k-1)/2], where k is the number of populations.
–Set a
*
= a/g, where a is the true probability of
making at least one Type I error (called experiment-
wise Type I error).
–Calculate the following CI for m
i
– m
j
*
2,
1 1
( )
. . ,
i j n k
i j
x x t s
n n
d f n k s MSE
a -
- ± * +
= - =
Bonferroni Adjustment

1 2
1 3
2 3
577.55 653 75.45
577.55 608.65 31.10
653 608.65 44.35
x x
x x
x x
- = - = -
- = - = -
- = - =
•Example - continued
–Rank the effectiveness of the marketing strategies
(based on mean weekly sales).
–Use the Bonferroni adjustment method
•Solution
–The sample mean sales were 577.55, 653.0, 608.65.
–We calculate g=k(k-1)/2 to be 3(2)/2 = 3.
–We set a
*
= .05/3 = .0167, thus t
.0167/2, 60-3
= 2.467 (Excel).
–Note that s = √8894.447 = 94.31
*
2
1 1
2.467*94.31 1/20 1/20 73.57
i j
t s
n n
a
* + =
+ =
Bonferroni Method

Bonferroni Method: The Three Confidence
Intervals
*
2,
1 1
( )
. . ,
i j n k
i j
x x t s
n n
d f n k s MSE
a -
- ± * +
= - =
*
2
1 1
2.467*94.31 1/20 1/20 73.57
i j
t s
n n
a
* + =
+ =
1 3
: 31.10 73.57 ( 104.67,42.47)m m- - ± -
1 2
: 75.45 73.57 ( 149.02, 1.88)m m- - ± - -
There is a significant difference between m
1
and m
2
.
1 2
1 3
2 3
577.55 653 75.45
577.55 608.65 31.10
653 608.65 44.35
x x
x x
x x
- = - =-
- = - =-
- = - =
2 3
:44.35 73.57( 29.22,117.92)m m- ± -

Bonferroni Method: Conclusions
Resulting from Confidence Intervals
Do we have evidence to distinguish two means?
•Group 1 Convenience: sample mean 577.55
•Group 2 Quality: sample mean 653
•Group 3 Price: sample mean 608.65
•List the group numbers in increasing order of their sample
means; connecting overhead lines mean no significant difference
1 3
: 31.10 73.57 ( 104.67,42.47)m m- - ± -
1 2
: 75.45 73.57 ( 149.02, 1.88)m m- - ± - -
2 3
:44.35 73.57 ( 29.22,117.92)m m- ± -
1 3 2
Tags