BY
VISWANTH REDDY.S
DEPARTMENT OF PHARMACOLOGY
GOKARAJU RANGARAJU COLLEGE OF PHARMACY
Analysis of variance(ANOVA)
Experimental designs
CRD
RCBD
LSD
Applications of biostatistics
Its mainly employed for comparison of means of three
or more samples including the variations in each
sample.
this statistical technique first devoloped by R.A.Fisher
and was extensively used for agricultural experiments.
The analyis of variance is a method to estimate the
contribution made by each factor to the total
variation.the total variation splits in to the following
two components .
1.variation with in the samples
2.variation between the samples
There are two classifications for the analysis of variance
when we classify data based on one factor analysis it is known as one way
ANOVA
When we classify data on the basis of two factors which is known as two way
ANOVA
The technique of analysing variance in case of one factor and two factors is
similar.however , incase of onefactor analysis the total variance is divided in
to twoparts only
1.Variance between samples
2.Variance with in the samples.
the variance with in the samples is residual variance.
In case of two factor analysis ,the total variance is divided in to
3parts viz.,,
variance due to factor number one
Variance due to factor number two
Residual variance
PROCEDURE FOR CALCULATING F-STATISTIC:
T-test employed for two mean samples
F-test is employed for comparison means of three or more samples. in this
case , the variation between the treatments and the replicates are shown in
columns and rows, respectively. Now we have to find out whether these
variations are significant and if so what level of significance, for this purpose
calculate the F-statistic which is the ratio of variances. The detailed procedure
as follows:
TREATMENTS
1 2 3
1 X11 X21 X31---------------∑XR1
R
E
P 2 X12 X22 X32----------------∑XR2
L
I
C 3 X13 X23 X33-----------------∑XR3
A
T
E
S
∑X= ∑XC1 ∑XC2 ∑XC3= GRAND TOTAL(G)
∑X2= ∑ XC
2
1
+ ∑ XC
2
2
+ ∑XC
3
2
---------------------------------------------------------------A
(∑X)2/nc= (∑ XC1)2/nc1+ (∑ XC2)2/nc2+ (∑XC3)2/nc3-----------------B
(∑X)2/nr= (∑ XR1)2/nr1+ (∑ XR2)2/nr2+ (∑XR3)2/nr3-------------------C
C.F = (∑X)2/n= G2/n---------------------------------------------------------------------D
Now total sum of squares=A-D
between treatments sum of squares=B-D
between rows sum of square= C-D
residual sum of squares= (A-D)-[(B-D)+(C-D)]
SOURCE OF
VARIATION
DEGREES OF
FREEDOM(d.f)
SUM OF
SQUARES(SS)
MEANS OF
SQUARES(MS)
BETWEEN
TREATMENTS
c-1 B-D B-D/c-1
BETWEEN ROWS r-1 C-D C-D/r-1
RESIDUAL (C-1)(r-1) (A-B-[(B-D)+(C-D)](A-B-[(B-D)+(C-D)]/(C-
1)(r-1)
TOTAL Cr-1 A-D
TREATMENTS
1 2 3
1 X11 X21 X31
R
E
P 2 X12 X22 X32
L
I
C 3 X13 X23 X33
A
T
E
S
∑X= ∑XC1 ∑XC2 ∑XC3= GRAND TOTAL(G)
1.Find the total sum of squares X2= XC
∑ ∑
2
1
+ XC
∑
2
2
+ XC
∑
3
2
--------A
2.Square
the coloumn total and divide separately each total by
number of observations inn each coloumn denoted by
C1,C2,C3------etc
( X)2/nc= ( XC1)2/nc1+ ( XC2)2/nc2+ ( XC3)2/nc3-----------------B∑ ∑ ∑ ∑
3.Find the grand total
∑X= ∑XC1 + ∑XC2 + ∑XC3= GRAND TOTAL(G)
4.Square the grand total and divide it by the number of observations(n).
correction factor, C.F.=( ∑X)
2
/n or GT
2
/n---------------------------------D
5. Calculate the F value
F=BETWEEN TREATMENT MEAN SQUARE/RESIDUAL MEAN SQUARE
SOURCE OF
VARIATION
DEGREES OF
FREEDOM(d.f)
SUM OF
SQUARES(SS)
MEANS OF
SQUARES(MS)
F VALUE
BETWEEN
TREATMENTS
c-1 B-D B-D/c-1
B-D/c-1/A-B/C(r-1)
RESIDUAL C(r-1) A-B A-B/C(r-1)
TOTAL Cr-1 A-D
In one way classification we have studied influence of one factor.however ,
in two way classification we will study the influence of two factors.
In such cases , data are classified based on two criteria..for example , the
yield of different varieties of wheat may be affected by the application of
different fertilizers.
Therefore analysis of variance can be used to test the effects of these two
factors simultaneosly.
The calculation in two factors analysis is more or less the same In addition
to the calculation based on rows.
In one way classification columns are taken into consideration . However in
two way analysis both coloumns and rows are considered.
TREATMENTS
1 2 3
1 X11 X21 X31--------------- ∑ XR1
R
E
P 2 X12 X22 X32---------------- ∑XR2
L
I
C 3 X13 X23 X33----------------- ∑XR3
A
T
E
S
∑X= ∑XC1 ∑XC2 ∑XC3= GRAND TOTAL(G) ∑X2= ∑ XC
2
1+ ∑ XC
2
2
+ ∑XC
3
2
---------------------------------------------------------------A
(∑X)2/nc= (∑ XC1)2/nc1+ (∑ XC2)2/nc2+ (∑XC3)2/nc3-----------------B
(∑X)2/nr= (∑ XR1)2/nr1+ (∑ XR2)2/nr2+ (∑XR3)2/nr3-------------------C
C.F = (∑X)2/n= G2/n---------------------------------------------------------------------D
Now total sum of squares=A-D
between treatments sum of squares=B-D
between rows sum of square= C-D
residual sum of squares= (A-D)-[(B-D)+(C-D)]
SOURCE OF
VARIATION
DEGREES OF
FREEDOM(d.f)
SUM OF
SQUARES(SS)
MEANS OF
SQUARES(MS) F VALUE
BETWEEN
TREATMENTS
c-1 B-D B-D/c-1
B-D/c-1/(A-B-
[(B-D)+(C-D)]/(C-
1)(r-1)
BETWEEN
ROWS
r-1 C-D C-D/r-1
C-D/r-1/(A-B-
[(B-D)+(C-D)]/(C-
1)(r-1)
RESIDUAL (C-1)(r-1) (A-B-[(B-D)+(C-
D)]
(A-B-[(B-D)+(C-
D)]/(C-1)(r-1)
TOTAL Cr-1 A-D
A statistical design is a plan for the collection and analysis of
data.
It mainly deals with the following parameters..
However the selection of an efficient design requires careful
planning in advance of data collection and also analysis
AB
C
DA
A
B
B
C
C
D
D
CDAB
A
A
B
B
C
C
D
D
To eliminate bias
To ensure independence among observations
Required for valid significance tests and interval estimates
Old New Old New Old New Old New
In each pair of plots, although replicated, the new variety is
consistently assigned to the plot with the higher fertility level.
Low High
The repetition of a treatment in an experiment
A A
A
B
B
B
CC
C
D
D
D
Ex:
If physicians wants to know whether a
particular drug which has been invented will be
benificial in the treatment of particular disease
A farmer wants to know whether new type of
fertilizer will give him better yields..he will frane
his investigation interms of some suitable
hypothesis.
There are many types of experimental designs…
in which the most imp are as follows….
Where the treatments are assigned completetly
at random so that each treatment unit has the
same chance of receiving any one treatment.
This is suitable for only the expriment material
is homogenous.(ex:laboratory experiments,
green house studies etc.)
Not suitable for heterogenous study.(ex: field
experiments)
DEPT OF PHARMACOLOGY
Advantages :
Simple and easy
Provides maximum number of degrees of freedom
Disadvantages:
Only suitable for small number of treatments and for homogenous experimental material.
Low precision if the plots are not uniform
AB
C
DA
A
B
B
C
C
D
D
Simplest and least restrictive
Every plot is equally likely to be assigned to
any treatment
A A
A
B
B
B
CC
C
D
D
D
We have an experiment to test three varieties:
the top line from Oregon, Washington, and
Idaho to find which grows best in our area -----
t=3, r=4
1234
5678
9101112
A
A
A
A
12156
Layout of CRD:
The step by step procedures for randamization and layout of a
CRD are given for a field experiment with four treatments with
five replications.
Determine the total number of experimental units (n) as the
number of treatments and number of replications.
n=r×t→5×4=20
The entire experimental material is divided in to “n” number of
experiments.
ex: five treatments with four replicatons . We need 20
experimental units.the 20 units are numberd as follows……
DEPT OF PHARMACOLOGY
1 2 3 4 5
6 7 8 9 10
11 12 13.14 15
16 17 18 19 20
Assign the treatments to the experimental units by 3 digit random
numbers , selected from random number table.
The random numbers written in order and are ranked , however
the lowest random number gives rank1, the highest rank allotted
to large number. These ranks corresponds to unit number
Then the first set of r units are alloted to treatment T
1
Then the next set of r units are alloted to treatment T
2
Then the other set of r units T
3 & so on…
random number rank treatment
937 17
149 02
908 15 T1
361 07
953 19
749 13
180 04 T2
951 18
953 19
749 13
180 04 T3
951 18
957 20
157 03
571 11 T4
226 05
Final layout:
DEPT OF PHARMACOLOGY
1
T3
2
T1
3
T5
4
T2
5
T5
6
T4
7
T1
8
T3
9
T4
10
T4
11
T5
12
T4
13
T2
14
T3
15
T1
16
T3
17
T1
18
T2
19
T2
20
T5
Analysis of variance:
There are two sources of variation among these
observations obtained from a CRD trial.
1. Treatment variation
2. Experimental error
The relative size of the two is used to indicate
whether the observed difference among the
treatment is real or due to chance.
Calculations:
1.Correction factor(C.F)= (GT)
2
/n
2.Total sum of squares(total ss)=total ss-c.f
3.Treatment sum of squares(TSS)=TSS-cf
4.Error sum of squares(ESS)=total ss – TSS
These results are summarized in the ANOVA table & the mean
squares and F are calculated.
ANOVA table:
DEPT OF PHARMACOLOGY
Source of
variation
df ss ms F
treatments t-1 TSS TMS=TSS/t-1TMS/EMS
Error n-t ESS EMS=ESS/n-t
Total n-1 Total SS
Most widely used experimental designs in agricultural
research.
The design also extensively used in the fields of
biology, medical, social sciences and also business
research.
Experimental material is grouped in to homogenous
sub groups… the sub group is commonly termed as
block.since each block will consists the entire set of
treatments , a block is equivalent to a replication.
Ex: in field experiments , the soil fertility is an important
character that influences crop responses.
Hence the treatments applied at random to relatively
homogenous units with in each block and replicated over all
the blocks, the design is known as a RBD.
divides the group of experimental units into n homogeneous
groups of size t.
These homogeneous groups are called blocks.
The treatments are then randomly assigned to the
experimental units in each block - one treatment to a unit in
each block.
Advantages of RCBD:
this design has been shown to be more efficient or accurate than CRD for
most of types of experimental work . The elimination of between SS from
residual SS , usually results in a decrease of error of mean SS.
Flexibility is another advantage of RCBD. Large number of treatments can be
included in this design.
Dis advantages of RCBD:
not suitable for large number of treatments … because if the block size is
large it may be difficult to maintain homogenicity with in blocks.
Consequently error will be increased.
Advantages& Disadvantages of RCBD:
Layout of RCBD:
let us consider that the experiment is to be conducted on 4 blocks of land,
each having 5 plots. Now we take in to consideration five treatments , each
replicated 4 times, we divide the whole experimental area in to 4 relatively
homogenous blocks and each block into five plots or units. Treatments
allocated at random to the units of a block .
A E B D C
E D C B A
C B A E D
A D E C B
PLOTS
B
L
O
C
K
S
1 2 3 4 5
1
2
3
4
The Anova Table for a randomized Block Experiment
Source of
variation
d.f S.S. M.S.S F
Treatments t-1 SS
T SS
T/t-1 SS
T/t-1/SS
E/(t-1)
(r-1)
Blocks r-1 SS
B
SS
B
/r-1
SS
B
/r-1/SS
E
/(t-1)
(r-1)
Error (t-1)(r-1)SS
E
SS
E
/(t-1)(r-1)
Total rt-1 total SS
By comparing the variance ratio of treatments with the
critical value of F we can find out if the different treatments
are significantly differe
The conclusion will be irrespective of the difference on
account of blocks.
Ex:
A Latin Square experiment is assumed to be a three-factor
experiment.
The factors are rows, columns and treatments.
It is assumed that there is no interaction between rows, columns and
treatments.
The degrees of freedom for the interactions is used to estimate error
differ from randomized complete block designs in that the
experimental units are grouped in blocks in two different ways, that
is, by rows and columns.
A requirement of the latin square is that the number of treatments,
rows, and number of replications, columns, must be equal; therefore,
the total number of experimental units must be a perfect square. For
example, if there are 4 treatments, there must be 4 replicates, or 4
rows and 4 columns.
•.
Latin Square Designs
Selected Latin Squares
3 x 34 x 4
A B CA B C D A B C DA B C DA B C D
B C AB A D C B C D AB D A CB A D C
C A BC D B A C D A BC A D BC D A B
D C A B D A B CD C B AD C B A
5 x 5 6 x 6
A B C D E A B C D E F
B A E C D B F D C A E
C D A E B C D E F B A
D E B A C D A F E C B
E C D B A E C A B F D
F E B A D C
The layout LSD is shown below for an experiment with five treatments
A,B.C,D,E . The 5×5 LSD plan given as follows.
Later on the process of randomization is done with the help of table of
random numbers method. for this select 5 three digit random numbers.
A B C D E
B A E C D
C D A E B
D E B A C
E C D B A
Random numbers sequence rank
628
846
475
902
452
1
2
. 3
4
5
3
4
2
5
1
Now use the rank to represent the existing row number of the selected plan
and sequence to represents the row number of new plan.
However the third row of the selected plan (rank=3) becomes the
firstrow(sequence=1)then so on.....
The column should be randomized in the same way by using the same
procedure used for rearrangement… the five random numbers selected are
as follows:
C D A E B
D E B A C
B A E C D
E C D B A
A B C D E
Random numbers sequence rank
792
032
947
293
196
1
2
. 3
4
5
4
1
5
3
2
However , the rank will now used to represent the column number of the
plan obtained above and the sequence will be used to represent the
column number of the final plan.
In this way ,the fourth column of the above plan becomes the first column
of the final plan. In addition to this , the fifth column becomes third: third
becomes fourth and seconds becomes fifth.the final plan which becomes
the layout of the design , is as follows:
Row
number
1 2 3 4 5
1
2
3
4
5
E
A
C
B
D
C
D
B
E
A
B
C
D
A
E
A
B
E
D
C
D
E
A
C
B
ANALYSIS OF VARIANCE FOR LSD:
C.F=(GT)
2
/n
Total SS=∑X
2
-CF
Row SS=1/n ∑R
2-
CF
Column SS=1/n ∑C
2
-CF
Treatment SS=1/n ∑T
2
-CF
Error SS=Total SS-Row SS-ColumnSS-Treatment SS
The Anova Table for a Latin Square Experiment
Source d.f. SSM.S. F
Treat n-1 TSSTMS TMS/EMS
Rows n-1 RSSRMS RMS/EMS
Cols n-1 CSSCMS CMS/EMS
Error(n-1)(n-2)ESSEMS
Total n
2
- 1Total
SS
Controls more variation than CR or RCB designs
because of 2-way stratification. Results in a
smaller mean square for error.
Simple analysis of data
Analysis is simple even with missing plots.
Advantages
Disadvantages
Number of treatments is limited to the
number of replicates which seldom exceeds
10.
If have less than 5 treatments, the df for
controlling random variation is relatively
large and the df for error is small.
Applications of biostatistics in pharmacy:
Public health, including epidemiology, health services research, nutrition,
environmental health and healthcare policy & management.
Design and analysis of clinical trials in medicine
Population genetics, and statistical genetics in order to link variation in genotype with a
variation in phenotype. This has been used in agriculture to improve crops and farm
animals (animal breeding). In biomedical research, this work can assist in finding
candidates for gene alleles that can cause or influence predisposition to disease in
human genetics
Analysis of genomics data, for example from microarray or proteomics experiments.Often
concerning diseases or disease stages.
Ecology, ecological forecasting
Biological sequence analysis
Systems biology for gene network inference or pathways analysis
Statistical methods are beginning to be integrated into medical informatics, public health
informatics, bioinformatics and computational biology.
Applications of biostatistics in pharmacy:
Test whether the new treatments / new diagnostics / new
vaccine works or not?
Ideally clinical trial should include all patients. Is it practically
possible? No We test the new treatments / new diagnostics /
new vaccine on a representative sample of the population
Statistics allows us to draw conclusions about the likely effect on
the population using data from the sample
BUT ALWAYS REMEMBER…
Statistics can never PROVE or DISPROVE a hypothesis, it only suggests to
accept or reject the hypothesis based on the available evidences
REFERENCESREFERENCES
Hinkelmann and Kempthorne (2008, Volume 1, Section 6.6: Completely
randomized design; Approximating the randomization test)
http://en.wikipedia.org/wiki/Analysis_of_variance
Montgomery (2001, Section 5-2: Introduction to factorial designs; The
advantages of factorials)
http://www.slideshare.net/Medresearch/analysis-of-variance-ppt-
powerpoint-presentation
http://www.synchronresearch.com/pdf_files/Application-Biostatistics-
in-Trials.pdf