Session-47 - Inferential Statistics-2 - Testing of Difference - Parametric Tests.pdf

drpriyankaswasthavri 8 views 101 slides Jul 23, 2024
Slide 1
Slide 1 of 101
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27
Slide 28
28
Slide 29
29
Slide 30
30
Slide 31
31
Slide 32
32
Slide 33
33
Slide 34
34
Slide 35
35
Slide 36
36
Slide 37
37
Slide 38
38
Slide 39
39
Slide 40
40
Slide 41
41
Slide 42
42
Slide 43
43
Slide 44
44
Slide 45
45
Slide 46
46
Slide 47
47
Slide 48
48
Slide 49
49
Slide 50
50
Slide 51
51
Slide 52
52
Slide 53
53
Slide 54
54
Slide 55
55
Slide 56
56
Slide 57
57
Slide 58
58
Slide 59
59
Slide 60
60
Slide 61
61
Slide 62
62
Slide 63
63
Slide 64
64
Slide 65
65
Slide 66
66
Slide 67
67
Slide 68
68
Slide 69
69
Slide 70
70
Slide 71
71
Slide 72
72
Slide 73
73
Slide 74
74
Slide 75
75
Slide 76
76
Slide 77
77
Slide 78
78
Slide 79
79
Slide 80
80
Slide 81
81
Slide 82
82
Slide 83
83
Slide 84
84
Slide 85
85
Slide 86
86
Slide 87
87
Slide 88
88
Slide 89
89
Slide 90
90
Slide 91
91
Slide 92
92
Slide 93
93
Slide 94
94
Slide 95
95
Slide 96
96
Slide 97
97
Slide 98
98
Slide 99
99
Slide 100
100
Slide 101
101

About This Presentation

Nbvcxxswweeddddffff


Slide Content

TESTING OF DIFFERENCE:
PARAMETRIC TESTS

Dr Kalpana K Mahajan
Former Professor, Deptt of Statistics
Panjab University, Chandigarh

About Statistics

“There are three types of lies -- lies,
damn lies, and statistics.”
— Benjamin Disraeli

About Statistics

Torture numbers, and they'll confess to
anything. ~Gregg Easterbrook

About Statistics

“A single death is a tragedy; a million
deaths is a statistic.”

I 03 1°

About Statistics

“99 percent of all statistics only tell 49
percent of the story.”

DATA is all pervasive

Sports
Politics
Economics
Physics
Chemistry
Botany

+» National
+» International
+ Space

Data from around

Gender, employment status, marital status, family size, Age, weight, wt of
father

M, Y N, 4, 25 yrs, 76 kg, 66 kg
M,N, N, 3, 21 yrs, 66 kg, 76 kg
F,N,N, 5, 22 yrs, 56 kg, 66 kg
M, Y, N, 5, 26 yrs, 78 kg, 76 kg
F, Y, N, 4, 25 yrs, 58 kg, 77 kg
M, N, N, 4, 21 yrs, 56 kg, 78 kg
M, Y, Y, 6, 28 yrs, 68 kg, 79 kg
M, Y, N, 4, 25 yrs, 76 kg, 66 kg
F, N,N, 3, 21 yrs, 49 kg, 79 kg

What is the basic purpose of statistics?

Purpose of Statistics

« The basic purpose of the statistical analysis is:
to find out OR refer to OR get at...

« how we can talk about the data in minimum
possible terms OR words.......

OR
« what is the typical message hidden in the data
To Describe and Infer

Data from around
Ly

Gender, employment status, marital status, family size, Age, weight, wt of
father

M, Y, N, 4, 25 yrs, 76 kg, 66 kg
M,N, N, 3, 21 yrs, 66 kg, 76 kg
F,N,N, 5, 22 yrs, 56 kg, 66 kg
M, Y, N, 5, 26 yrs, 78 kg, 76 kg
F, Y N, 4, 25 yrs, 58 kg, 77 kg
M,N, N, 4, 21 yrs, 56 kg, 78 kg
M, Y, Y, 6, 28 yrs, 68 kg, 79 kg
M, Y, N, 4, 25 yrs, 76 kg, 66 kg
EN, N, 3, 21 yrs, 49 kg, 79 kg

FOCUS ison WEIGHT of participant

76 kg
66 kg
56 kg
78 kg
58 kg Could the Average weight be said to be 65 kg?
56 kg
68 kg
76 kg
49 kg E

FOCUS is ón WEIGHT of participant

+ 76 kg

u ha So we are

o COMPARING the

+ 56KE Average weight with
+ 68

+ 76 ie 65 kg

+ 49kg

FOCUS is ón WEIGHT of participant
+ 76kg So CASE |

+ 66kg

+ 56kg

: bs COMPARING the
: 564 Average with a
"7648 FIXED

+ 49kg

CASE II (WEIGHT of participants MALE and FEMALE)
Ly

Female , Male

76 kg, 66 kg

66 kg, 76 kg Le 2

56 kg, 66 kg PO
78 kg, 76 kg

58 kg, 77 kg

56 kg, 78 kg

68 kg, 79 kg

76 kg,
49 kg,

* CASE III WEIGHT of
Participants “Before & After Treatment”

CASE Ill
Before, After

+ 76 kg, 66 kg
+ 66 kg, 76 kg
+ 56 kg, 66 kg
+ 78kg,76kg
+ 58 kg, 77 kg
+ 56 kg, 78 kg
+ 68 kg, 79 kg
+ 76 kg, 78 kg
+ 49 kg, 62 kg

FOCUS is on WEIGHT of participants

CASE | CASE II CASE III Wt
Participants Female , Male Before & After
Treatment
+ 76kg + 76 kg, 66 kg + 76kg, 66 ke
+ 66kg + 66 kg, 76 kg + 66 kg, 76 kg
+ 56 kg, 66 kg
+ 56 kg + 56 kg, 66 kg + 78kg, 76 kg
+ 78kg + 78 kg, 76 kg + 58 kg, 77 kg
+ 58k; + 58kg, 77k + 56 kg, 78 kg
8 8 7 + 68kg, 79 kg
+ 56kg + 56 kg, 78 kg + 76kg,78kg
+ 68kg + 68 kg, 79 kg + 49kg,62 kg
+ 76kg + 76kg,

+ 49kg + 49kg,

Other Situations

Example

+ Akeen observer computes that the average
content in 10 bottles of soft drink of Brand A
was 295 ml with standard deviation of 5 ml.

« He files a case against company on cheating.

« But company justifies its production saying that
content supplied are as per specifications ..
A.M 300 ml and s.d. 10 ml

Taking two Samples
x

Sample |

10 bottles of soft drink of Brand A

Average content 295 ml with standard deviation of 5 ml
Sample 2

+ Another sample of 10 bottles may give average content of say 305 ml with standard
deviation of 5 ml

+ Company justifies . . specifications .. A.M 300 ml and s.d. 10 ml

Catch is ..

Rh
+ Average content 295 ml with + Another sample of 10 bottles may
standard deviation of 5 ml. give average content of say 305 ml

The variation could be
due to cause and Effect
Or "by chance” .. chance is due to sampling

Company justifies .. specifications .. A.M 300 ml and s.d. 10 ml

Message

« Based on sample observations we take
decision for the population (Totality)

+ Statistical analysis is carried out to draw
inference about population on the basis of
samples collected from the population

How to Test?....

t.. F..ANOVA
The test statistic in the t-test is known as the t-
statistic. ...used to determine whether the
population means differ.

t-test is one of a number of hypothesis tests.

To compare three or more variables, .. use an
analysis of variance (ANOVA).

If sample size is large . . use a z-test.

Other tests include chi-square and f-test.

Choosing the right

.

Right statistical test!

statistical test depends
on:

Nature of the data
Sample characteristics [M

=
Inferences to be made |»

Inference: fundamental terms
IN

Statistics and Parameters
Statistical Hypotheses
Errors

Test Statistic

Word “Statistics” ..
— Plural sense -Data
— Singular Sense Subject
— Statistic(s) .. Formula(e)
— Statistics ... Characteristics of sample

3
Inference: fundamental terms

+ Statistics and Parameters

+ Statistical Hypotheses MD) GA

« Errors pert undam S 26, Stat
à We DL
+ Test Statistic = AAA 4
+ Word “Statistics” .. ayjisngl by Degpand Deep Publica
— Plural sense -Data ’/W org. STATISTICS
# gfundamentals of Statist
— Singular Sense Subject PRA SO ERA

— Statistic(s) .. Formula(e)
— Statistics ... Characteristics of sample

Statistics and Parameters

+ The statistical generalisations are carried out to settle for some
knowledge about certain characteristics of a population on the
basis of known facts about a sample drawn from the
population.

+ In order to avoid confusion over whose characteristics are
being focused at, we refer to the characteristics of the
population as parameters while that of the sample as

statistics.

We are mostly interested in the population rather than outcome
of a particular sample.

We select a sample and on the basis of characteristics of the sample
i.e., statistics we tend to draw inferences about the parameters.

we say that “the fraction defectives in an output is less than or
equal to 10 pec”. ..it is a case of composite hypothesis as it
contains two or more elements of the parameter set.

ASTATISTICAL HYPOTHESIS

A convenient way in decision procedure is mainly concerned wit
the selection of either of two courses of actions and in testing
procedure attention is thus focused on two possible sets of value
of the parameters or two statistical hypotheses. Such a pair of se
are called null hypothesis and alternative hypothesis. The
designation of null and alternative hypothesis is arbitrary.
However, conventionally the null hypothesis is specified in an ex
manner and it usually corresponds to the absence of effects of t!
variable being investigated.

Suppose we wish io decide whether one procedure is better than
another, we formulate the hypothesis that there is no difference
between the procedure and such hypotheses are often called null
hypothesis. Usually the null hypothesis is denoted by H, and any

hypothesis which differs from a given hypothesis is called an alternati\
hypothesis, ‘denoted by H, or H,. In contrast, an alternative hypothesi
usually less sharply formulated. For example, if one null hypothesis (H
is p = 0.40 then we can formulate a number of alternative hypotheses
as:

H,: p#0.40 or

H,,: p > 0.40

H,,: p <0.40

Suppose we wish ko decide whether one procedure is better than
another, we formulate the hypothesis that there is no difference
between the procedure and such hypotheses are often called null
hypothesis. Usually the null hypothesis is denoted by H, and any
hypothesis which differs from a given hypothesis is called an alternati
hypothesis, ‘denoted by H, or H,. In contrast, an alternative hypothesi
usually less sharply formulated. For example, if one null hypothesis (H
is p = 0.40 then we can formulate a number of alternative hypotheses
as:

H,: p#0.40 or

H,,: p > 0.40

H,,: P <0.40

Two errors

Ly
Decision|Accept Ho Reject Ho
Situation
Ho True Correct Type | Error
Ho False Type Il Error Correct

Probabilities of committing Type | and Type II error are
considered as the risks of wrong decisions.
Our aim is to minimize error in our tests of hypothesis

To minimize error in our tests of . . . . is not simple.
Because for a given sample size an attempt to decrease Type | error results in an
increase in Type Il error and vice versa
so a balance is to be made.

Ly
Level of significance?

+ The probability of Type | error is also known as
level of significance of the test. It is denoted by ‘a’.

+. ¡.e., Prob. (Reject Ho when it is true) = a, and

« Prob. (Accept H, when it is false) = B

r
Level of significance?

« In testing of hypothesis the levels of
significance usually employed are 5% and 1%.

« When a = 0.05, we are 95% confident that we
have made the right decision

STANDARD ERROR

« The statistical constants of the
population, i.e., the parameters, are fixed
but are usually unknown and we take the
help of sample characteristic i.e., the
statistics, to (approximately) determine
the characteristics of population

+ But from a population of size ‘N’ we can have NC,

samples of size ‘n’. For each of these samples we
can compute a statistics e.g., mean ( X bar) or/ and
variance (s2) etc. which more often than not, shall
vary from sample to sample. Thus for different
samples we get different values of statistics.

20 and 1
(say

For es

ample from the data: 16, 19, 1 , if we take

samples of size 3, we shall get 10 values of 3

The values thus obtained may be grouped into a frequency
distribution and the frequency distribution of the statistics is known
as the sampling distributions of the statistics. The standard error is
the standard deviation of the sampling distribution of the statistics.

For the lar
error forms the ba
Taking cue from the fact that for
ribution its mean and variance completely
specify the distribution. We define Z for any
atistics ¢ such that:

sample size the stand
is of testing of hypothesis
normal

t-E(t
gant MO
(1)
Where, ¢ ay statisti say m

1). is expected value (mean) of 7

S.E. (D is standard error of £
Then Z is normally distributed with mean O and

variance 1.

For* the k
error forms the basis of testing of hypothesis.
Taking cue from the fact that for a normal
distribution its mean and variance completely
specify the distribution. We define Z for any
statistics 7 such that:

t-E(t)
SE.(r)

e sample size the standard

, Lis any statistics, say mean,
E(t), is expected value (mean) of 7
S.E. (t) is standard error of £

Then Z is normally distributed with mean 0 and
variance 1

Decision

we decide to take our decision at 5% level of
significance then the value of Z=1.96, and our test
procedure states that if the difference between
the observed and expected value of ‘t’ is greater
than 1.96 times the S.E. (t) the null-hypothesis is
rejected (at 5% level of significance),

p-value.

« The P value is a probability, with a value ranging from
zero to one. It measures of how much evidence we
have against the null hypothesis. In other words, the
strength of evidence in support of a null hypothesis is
measured by the p-value. Suppose the test statistic is
equal to S. The p-value is the probability of observing
a test statistic as extreme as S, assuming the null
hypothesis is true. If the p-value is less than the
significance level, we reject the null hypothesis

Before we proceed with specific tests

TEST CONCERNING PROPORTIONS

k

Rooted to the concept of Bernoulli variates i.e., a
variable that can take either of two values- 0 or 1, Yes
or No, Success or No Success, etc.,

Where Prob (1)= p (say) & Prob (0)= q, so that p+q=1
We extend the scope to a population being divided into
two Gps, one with two different exclusive features.

Thus,* quite logically, we can say that in ‘n°
independent Bernoulli trials (where the probability of
success is constant say ‘p’) the probability of X successes
or V7 = p proportions of success is given by Binomial
distribution. We know that for binomial distribution (where
X is the number of success, ‘p° is the probability of success
and ‘n° is the number of trials) the expected value! i.e.,
E(x

=np, and the standard deviation = /npq where
q=1-p.

We also know that for a statistic Y, Z =

is normally

distributed with mean 0 and variance 1.

Thus for ‘p’ we can define

Rh

And the Z is normally distributed with mean 0 and variance 1.

ables we h:

From the normal probabilit

$Z<3)=0.9973 or P(|Z|s3)

ard norm
P(-1.96< Z<1.29)=0.95 or P(|ZIS1.96)=0.95
P|

words, we can, on the basis of
gard to the null hypothesis,

ariate in all probability is expected to lie between -3

258<Z<258)=0.99 or P(|Z|£2.58)=0.99

ate, take the following

In oth

possible decision in r

ard normal

() if[Z/>3 the null hypothesis is al

(ii) if |Z [> 1.96, the null hypothesis is rejected at 5% level of significance otherwise.

(ii) if |Z |«1.96, then the null hypothesis is not rejected at 59 nif

check throu; 2.58 to decide at 1% level

he

Testing Situations ..

At times we may be interested to test:

if the sample mean differ significantly from a
hypothetical value of the population mean u

if the given sample has been drawn from a population
with a specific mean.

if the two population means differ significantly or not.

if the given sample has been drawn from a population
with a specific variance

if the two population variances differ significantly or not.
Tests for proportions...

Steps in testing procedure

State a null hypothesis ** * * State an alternative hypothesis.

Define a test statistic along with its distribution under the null
hypothesis.

Compute the value of test statistic from the information given in the
sample data.

|
Compare the calculated value of test statistic with the theoretical 4
(tabulated) value of the test statistic and decide about the rejection À
or non-rejection of the null hypothesis at a pre-defined level of

tabulated value of the statistic then the null hypothesis (usually of
equality or no difference) is rejected at the pre-defined level of

significance.
if the calculated value of the test statistic is greater than the )
significance. /

if the calculated value of the test statistic is less than the tabulated value of the
statistic then the null hypothesis (usually of equality or no difference) is not
rejected and we attribute the deviation between observed and hypothesized
value to sampling fluctuations. And we may say that the data do not provide any
evidence against the null-hypothesis

portant to know that the sample size plays a very important role in Î
the testing procedure. Precisely we have different procedures for large

samples and small samples. In case the sample size is more than 30,

then for testing procedure the sample size is considered as large.

t-Test Statistic

Background

The t-test is used to test hypotheses about means
when the population variance is unknown (the
usual case). Closely related to z, the unit normal.
Developed by Gossett for the quality control of
beer.

Comes in 3 varieties:

Single sample, independent samples, and
dependent samples.

Kinds of t-tests

Formula is slightly different for each:

tests whether a sample mean is significantly different from a pre-
existing value (e.g. norms)

Rh

tests the relationship between 2 linked samples, e.g. means
obtained in 2 conditions by a single group of participants

tests the relationship between 2 independent populations

Could AVERAGE weight be said to be 65
kg?

+ 76 kg « Assumptions
+ 66kg
+ 56kg
78 kg
58 kg
+ 56kg
+ 68kg

+ Normality

Could Average weight be said to be 65 kg?

76 kg « Null Hyp: Ho: u = 65 Kg

66 kg Read Mu =65 Kg
56 kg « Alt Hyp: Ha: p > 65 Kg

78 kg OR

58 kg + Alt Hyp: Ha: p< 65 Kg

56 kg OR h

68 kg « Alt Hyp: Ha: p not Eq to 65 Kg

76 kg

49 kg

Could Average weight be said to be 65 kg?

76 kg
66 kg
56 kg
78 kg
58 kg
56 kg
68 kg
76 kg
49 kg

Null Hyp: Ho: u = 65 Kg
Alt Hyp: Ha: Mu not Eq to 65 Kg
Level of Significance
Test Statistics
mek

FOCUS is on WEIGHT of participants

CASE | (FOCUS was on WEIGHT of

participant)

76 kg
66 kg
56 kg
78 kg
58 kg
56 kg
68 kg
76 kg
49 kg

CASE II (FOCUS is on WEIGHT of
participants MALE and FEMALE)

Female , Male
76 kg, 66 kg
66 kg, 76 kg
56 kg, 66 kg
78 kg, 76 kg
58 kg, 77 kg
56 kg, 78 kg
68 kg, 79 kg
76 kg,

49 kg,

CASE II (WEIGHT of participants MALE and FEMALE)

Female , Male « Ho: p(f) =p(m) ... Read as
76 kg, 66 kg Mu (f) = Mu(m)

66 kg, 76 kg Me « Ha: y (f) not Eq to y (m)
56 kg, 66 kg qe” « Level of Significance

78 kg, 76 kg + Test Statistics

58 kg, 77 kg e st independent samples
56 kg, 78 kg

68 kg, 79 kg

76 kg,

49 kg,

FOCUS is on WEIGHT of participants

CASE |
Participants
+ 76kg
+ 66kg
+ 56kg
+ 78kg
+ 58kg
+ 56kg
+ 68kg
+ 76kg
+ 49kg

CASE II
Female , Male
+ 76 kg, 66 kg
+ 66 kg, 76 kg
+ 56 kg, 66 kg
« 78kg, 76 kg
+ 58 kg, 77 kg
+ 56 kg, 78 kg
+ 68 kg, 79 kg
+ 76kg,

+ 49kg,

CASE Ill Wt
Before & After

Treatment

76 kg, 66 kg
66 kg, 76 kg
56 kg, 66 kg
78 kg, 76 kg
58 kg, 77 kg
56 kg, 78 kk
68 kg, 79 kg
76 kg, 78 kg
49 kg, 62 kg

FOCUS is on WEIGHT of
participants “Before & After Treatment”

CASE III
Before, After

+ 76 kg, 66 kg
+ 66kg, 76 kg
+ 56 kg, 66 kg
+ 78kg,76kg
+ 58 kg, 77 kg
« 56 kg, 78 kg
+ 68 kg, 79 kg
+ 76kg,78 kg
+ 49kg,62 kg

CASE of Paired
Observations

Ho: Mu (d) =0

Ha: Mu (d) not Eq to O
Level of Significance
Test Statistics *
...Pairedt

Example: Hypertension/Cholesterol

e Mean cholesterol hypertensive men

« Mean cholesterol in male general
population (20-74 years old)

« In the 20-74 year old male population the
mean serum cholesterol is 211 mg}ml with
a standard deviation of 46 mg/ml

Cholesterol Hypotheses

+ Ho: u = 211 mg/ml
— = population mean serum cholesterol for
male hypertensive
— Mean cholesterol for hypertensive men =
mean for general male population
Ha: H # 211 mg/ml à

t-Test Statistic

« Want to test continuous outcome
+ Unknown variance
+ Under H Y-
Under H, X-u, =
(n-1)
s/n "
+ Critical values: statistics books or computer

« t-distribution approximately normal for
degrees of freedom (df) >30

Cholesterol: t-statistic
_X-u 220-211 _
s/Vn 38.6/V25

« For a = 0.05, two-sided test from t(24)
distribution the critical value = 2.064
+ |T|=1.17< 2.064

« The difference is not statistically significant at
the a = 0.05 level

+ Fail to reject H,

» Usingdata T

Cl for the Mean, Unknown Variance

« Uses the t distribution
« Degrees of freedom

X- Dni ta 128 X btt 0/25
Vn | Vn
2.064 *38.6 2.064 *38.6
=| 220- ,220+
[E =|

E (204.06,235.93)

« The single sample t-test is quite easy to
understand, but it is rarely used in practice
because we do not often have a population mean
to compare against an observed sample mean.
However, a more common situation is to compare
two different samples of subjects in order to
decide if they come from the same population.

h

Suppose we have two random samples of sizes n, and

n, respectively, and these are drawn from (normal or

approximately normal) populations whose standard
deviations are equal.

Let x and Y, be the two sample means respectively

and let sj ands, be their respective standard
deviations.

Then if we wish to test the null hypothesis that the
samples comes from the same population we use the
test statistics.

Ho: ti = Ma (i.e., There is no differenge between the groups)

ainst the alternative.

Ho. t= pe (ie. There is significant difference between the groups)

Under the null hypothesis the statistics:

Where 5? =

Under null hypothesis 7 is distributed as students ¢

with 7, +7,

Two fundamental assumptions are made on the t-test
for testing for differences of means.
(i) Parent populations from which the samples
have been drawn are normal.

(ii) The population variances are equal

Problem

¢ A psychologist wishes to compare the
intelligence quotient of students in the
course of laws and commerce. The

TES! ne ven
int =

Sample size Mean sd
16 14 9

h
11 109 7

Course Sample size Mean sd
Commerce 16 114 9
Laws 1 109

« Can we on the basis of above data conclude
that there is no significant difference
between that 1.Q.'s of the student in the two
courses.

Here we are interested to test the null hypothes
that the samples come from the same population.
If we denote the population mean 1.Q.'s_ the
student from commerce and laws by 4, and 4,
respectively then we are in fact testing the null-
hypothes x

Ho: ti = pu (ie. There is no difference
between the groups) against the alternative.

Hy: = M2 (Le. There is significant
difference between the groups)

We shall use the statistic

s|
Yn m
(which under the null hypothesis follows t -distribution)

where 5~

11296 + 53

æ
in
3

114-109 5

Viie+til 8.57x0.39

From the table of '? we find the tabulated
value of 7 at 0.05 of (two-tailed) level of

and t= 1.50

significance with 25(16+11-2) degrees of
freedom is 2.06.

Since the calculated value of t is less than
the tabulated value of r at 0.05 level of
significance - we do not reject the null hypothesis
and conelxde the no significance dil
1.Q. scores of the students in the two courses

ence the

Unpaired Tests: Common Variance

« Same idea

+ Known variance: Z test statistic

+ Unknown variance: t test statistic
» Ho: Hy = Hy VS. Ha: Hy # Hp

x '
+ Assume common variance

x-y ¥-y

Z = === or T = ———=—
ovl/n+l/m SV1/n+1/m

Paired Tests: Difference
Two Continuous Outcomes
Exact same idea
Known variance: Z test statistic
Unknown variance: t test statistic
Ho: Hy = 0 vs. Ha: Hy # 0
Paired Z-test or Paired t-test

ANOVA

Analysis of Variance (ANOVA)

« One-way ANOVA is used to determine
whether there are any statistically significant
differences between the means of two or
more independent (unrelated) groups.

+ Usually we tend to Use when there are a
minimum of three, rather than two groups.

Ly

Example

« Exam performance differed based on
test anxiety levels amongst students,
dividing students into three
independent groups (e.g., low, medium
and high-stressed students).

h

Assumptions

+» Variable should be measured at
the interval or ratio level

« Independence of observations, which means that
there is no relationship between the observations
in each group or between the groups themselves

+» There should be no significant outliers

Assumptions

Variable should be approximately normally distributed
for each category of the independent variable.

homogeneity of variances. (Levene's test for
homogeneity of variances)
If data fails this assumption, THEN

Welch ANOVA instead of a one-way ANOVA, AND also

use a different post hoc test. N

FOCUS is on WEIGHT of four shades of
participants (Four languages, four regions)

76 kg, 66 kg, 71 kg, 62 kg da A
66 kg, 76 kg, 56 kg, 66 kg ast
56 kg, 66 kg, 78 kg, 76kg Y
78 kg, 76 kg, 49 kg, 79 kg
58 kg, 77 kg, 66 kg, 71 kg
56 kg, 78 kg, 58 kg, 77 kg
68 kg, 79 kg, 76 kg, 49 kg,
76 kg, 66 kg, 56 kg, 78 kg
49 kg, 79 kg. 77 kg, 66 kg

One-way ANOVA

» Ho: u (Gp1) = u (Gp2)= u (Gp3) = u
(Gp4)

« Ha: Not all u (s) are equal

Is the packing machine working properly?

+ Suppose people have lodged complaints about
the weight of the 12.5 Kg mealie-meal bags.

A consultant took a sample of mealie-meal bags
and did not find any problem with the average
weight. That is, she could not reject the null
hypothesis that the population mean weight u
= 12.5 Kg

What could be the problem? N

Why study variance?

« Although the mean is OK in the above example,
there could be a problem with the variance

« Packaging plants are designed to operate within
certain specified precision

« Ideally it would be desirable to have the machine
pack exactly 12.5 Kg in every bag but this is
practically impossible. So a certain pre-specified
variation is tolerated x

Testing for a single variance

« After years of operation it is always important
to check whether the machine variation y? is
still at the initially set level of precision
(sayo; )

+ This implies testing the hypothesis
H,:0? =0$

against the alternative
H,:0° >0 N

Comparing variances

« Asimilar problem could occur if a factory
manager is considering whether to buy
packaging Machine A or Machine B.

« During test runs, Machine A produced sample
variance so while Machine B produced sample

variance 2.
SB

Question:
» Are these variances significantly different?

Test for comparing variances

« Suppose the population variances for weights of
mealie-meal bags packaged from machines A
and B are respectively
o 2 and © ;

+. We can answer the question concerning whether
the variances are different by testing the null
hypothesis H,:0? =0?

against the alternative H, 0% se oí

a À A A :
We will return to this later in the session.

Other applications

Other applications where testing for variance

may be important includes the following:

« Foreign exchange stability is important in any
economy. Too much variation of a currency is
not good.

« Price stability of other commodities is also
important.

Question: Can you name other possible areas of
application where testing that the variation
remains stable at a pre-set value is important?

The chi-square test

+ This test applies when we want to test for a single
variance.

+ The null hypothesis is of the form

H,:0? =03

« Need to test this against the alternative

. H,:0*>0;

» The test is based on the comparison between s”
and ©, usingthe ratio “>

Oo

Conducting the test
« Calculate the chi-square test statistic
Ye (n-1)s?
O6
Under H,, this is known to have a chi-square
distribution with n-1 d.f.

« Compare this with chi-square tables, or use
statistics software to get the p-value.

2 2 2 à
« Here, p-value= Pfx;,>X?) where x; isa
chi-square random variable.

Form of Chi-square distn

ke

\ Shaded area
\ represent the p-value
|

|

]

Value of calculated test-statistic ]

Back to Example

« Suppose the mealie-meal packaging machine is
designed to operate with precision of
623 =0.0016 Kg?

« Suppose that data from a sample of 12 mealie-
meal bags gave s? =0.0025 Kg? -
cs

« Does the data indicate a significant increase in
the variation?

Test computations and results
+ The calculated chi-square value
11*0.0025

a
X? a(n-1)5, A #172
o?” 0.0016

« The p-value (based on a chi-square with 11 d.f.)
is

P(x? >17.2) = 0.143

indicating no significant increase in the variance.
k

The F-test

« The F- test is used for comparing two variances,
say om and o; ‘i

« The hypothesis being tested is
H,:0% =0;

with either a one-sided alternative
2 E .
H,:6; >O3 ; H, Or <Oz
or a two-sided alternative
ene 2
H,:0, #0; N

The F-test

« The null hypothesis is rejected, for large values
of the F-statistic below, in the case of a one-
sided test

For a 2-sided test, need to pay attention to both sides of the F-distribution (see
below).

0.49

Example of an F-distribution

To use just the upper tail
F, value, ensure F-ratio is
2419 calculated so it is >1, then
/ use upper tail of the 24%
F-tabled value when testing
at 5% significance.

1% region (0.5% x2)

2.11
Tags