Experimental design in Data Science in capstone project

ckibiwott 0 views 32 slides Oct 08, 2025
Slide 1
Slide 1 of 32
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27
Slide 28
28
Slide 29
29
Slide 30
30
Slide 31
31
Slide 32
32

About This Presentation

Itis a plan used to collect the data relevant to the problem under
study in such a way as to provide the basis for valid and objective
inference about the stated problem


Slide Content

Experimental Design

Experimental Design
•Introduction:
• It is a plan used to collect the data relevant to the problem under
study in such a way as to provide the basis for valid and objective
inference about the stated problem.

•The plan consists of:
•1) Selection of treatments whose effects are to be studied
•2) The specification of the experimental layout
•3) The assignment of treatments to the experimental units
•4) Collection of observation for analysis

•An experiment is planned to:
•a) Get maximum information for minimum expenditure in the min.
possible time
•b) Avoid systematic errors.
•c) Evaluate the outcomes critically and logically
•d) Ignore spurious effect, if any

Following consideration used to
planning of exp.:
•1) What is the exp.inteded to do?
•2) What is the nature of treatments or dependent
variable and how are they to be estimated?
•3) How is the independent variable likely to effect
the treatments or dependent variables?
• 4) Are the factors to be held constant or varied? If
varied whether the variation is quantitative or
qualitative?

Analysis of experiment
•The sigificance of difference between the means of 2 different
samples can be tested by
•Paired t-test or Z-test depending on the sample size.
•If the sample size is less than 30 ----t-test
• ----------------------more than 30 ----Z-test

•Salesman 2 factor
•Time period 4
•Objective is to test the significance of difference between the mean
sales reveneu t-test or ztest
•If the no.of sales man or factor is n more than 2 then F-test can be
used if more then 2 then a comprehensive technique Analysis of
Variance ---ANOVA can be used

ANALYSIS OF VARIANCE ( ANOVA)
•Decompositionng of total variability into its components is called
analysis of variance

Types of factors:
•A factor ,which has effect on response variable of
an exp.
•1) Fixed factor:In an exp.,if a specific set of
treatments of a factor is selected with
certainty ,then that factor is termed as fixed
factor.In such case the inference of the analysis of
the exp. is applied to only the set of treatments of
that factor. e.g if 4 salesman A,B,C and D ----effect
on sales
•then the inference is applied to only those 4.

2) Random Factor:
•In an exp. , if a set of treatments of factors is
selected randomly from among the available
treatments ,then that factor is termed as random
factor. Under such situation, the inference of
analysis of the exp.can be generalized to all of the
treatment of factor.
•e.g if four salesmen A,D ,M and X are selected
randomly from available A,B,C-----Z for studying
their effect on the sales revenue

Effect of different explants sources on shoot
induction
•Explant No. of explant cultured Days to shoot initiation No.of t.t
Shoot initiation showing result
•Apical meristem 10 8-9 9

•Axillary meristem 10 10-12 7.2

•Nodal region 10 12-15 6.8

•Explant = Factor =1
•Level of factor or treatments=3
•Replicate = 5 are carried out to minimize the error
•Response variable =2 Days , No. of t.t
•e.g. Replicate of 8-9=10,8,9,10,9

Types of Design:
•2 types: 1) Systematic 2) Random
•Analysis of variance techniques are suitable to
randomized design only.The basic randomized
design are
•1) Completely randomized design
•2) Randomized complete block
•3) Latin Square
•4) Duncan”multiple range test
•5) Factorial design

Basic Principles of Experimental Design:
•1) Randomization 2) Replication 3) Local control
•1) Randomization: It is a random process assigning
treatments to the experimental units . Random -----every
sample has the equal possibility to selection.
•2) Replication:It is the repetition of basic exp.or It is a
complete run of all the treatments to be tested in an exp.It
is used to ovoid the variation in an exp..An individual
repetition is called a replicate.It is used to:
•1) To secure more accurate estimates of the experimental
error
•2) To decrease the experimental error and increase the
precis

•3) Local Control: It is a term referring to the amount of
balancing ,blocking and grouping of the experimental unit. The main
purpose is to increase the efficiency of an exp.design by decreasing
the experimental error.

1) Completely Randomized design
•CR is the simplest type of the basic design ,in which the
treatments are assigned to experimental units completely
at random.i.e the randomization is done without any
restriction.The design is completely flexible i.e any no. of
treatments and any no. of units per treatment may be used.
•A CR is used in these situation: a) The experimental units are
homogenious b) The exp. are small on Lab.scale.
•Experimental Layout: The layout of an exp. is the actual
placement of the treatments on the experimental
units ,which may pertain to time,space or type of
material.An example of the experimental layout for CR using
4 treatmentsA,B,C,D, each repeated 3 times:

•1: CABD 2) CBCA 3) ADDB
•Advantages of CR:
•1) The design is v.simple and is easily laid out.
•2) It has the simplest statistical analysis
•3)It provides the maximum no.of degree of freedom for error sum of
square.

2) Randomized complete block design
• A RCB design may be defined as one in which
•1) The experimental material is divided into groups or blocks
in such a manner that the experimental units within a
particular block are relatively homogeneous.
•2)Each block contains a complete set of treatments i.e it
constitutes a replication of treatments.
•3) The treatments are assigned at random to the
experimental units within each block,which means that
randomization is restricted within blocks.
•It is the most frequently used experimental design.

Advantages and disadvantages:
•1) The source of variation is controlled by grouping
the experimental material and hence the estimate
of the experimental error is decreased.
•2) The design is flexible i.e any no.not less than 2 of
replication may be run and any no. of treatments
may be tested.
•3) The exp. can be set up easily.
•4) It is easy to adjust for the missing observations

•Disadvantages:
•1) It controls variability only in one direction
•2) It is not a suitable design when the no. of treatments is v.large or
when the blocks are not homogeneous

3) Latin Square design:

• The experimental error in RCB design is reduced by
controlling the source of extraneous variation in
one direction i.e by grouping the experimental units
in one way. When the variation is found in two
directions, it becomes necessary to remove these
two sources of variation simultaneously.This end is
achieved by simultaneously blocking of
experimental Units in two mutually perpendicular
directions called Rows and Columns.

•So each column and rows is a complete block, the
grouping for a balanced arrangement is performed
that each treatment must appear once in each row
and each column. If there are k treatments, the
experimental area will be divided into k rows and k
columns resulting in k2 plot. or experimental units,
as the exp. is laid in square pattern. The treatments
are then assigned at random to plots or
experimental units.

•Such a double blocking of experimental units and
a corresponding doubly restricted random
assignment is called a Latin Square design. LS
design is an arrangement of k treatments in a kxk
square ,where the treatments are grouped in blocks
in two direction and treatments appear once and
only once in each direction. In LS design ,the no. of
rows ,the no. of columns and the no. of treatments
must all be equal

•Experimental layout:
•1) It always constructed by rotation.
•e.g Five fertilizers A,B,C,D and E were tested by arranging plants in LS
design in the field .Yield is shown as.

Advantages and disadvantages:
•1) LS design reduces the error variance by controlling the two
sources of variation.
•2) It is more efficient than a RCB design.
•3)It is less flexible than RCB design.it is practical only for 5-10
treatments. More than 10 it is seldom used.

Disadvantage:
•1) Replication in LS design is costly
•2) In agricultural experimentation, the land requirement is rigid, the
actual layout may be laborious and the approach to the central most
plots is difficult.

•4) Factorial Design:
•If the no. of factors is more than one, then there is
need for generalized design of experiment is called
as factorial design. e.g Exper. are often planned to
investigate the effect of different rates of
fertilizers,diff.dates of planting , diff. categories of
education, differ. Intensities of stimulus etc.

•Independent
variables :fertilizers,planting,education,stimulus are
called factors.
•The values are such as rates,dates,categories and
intensities are known as levels or effects
•An exper. is called a factorial experiment if the
treatment consist of all possible combinations of
several levels of several factors.

4) Duncan’s Multiple Range Test:
•In statistics, Duncan's new multiple range test (MRT) is a multiple
comparison procedure developed by David B. Duncan in 1955.
Duncan's MRT belongs to the general class of multiple comparison
procedures that use the studentized range statistic q
r to compare
sets of means

•If there is significant difference between the treatment means of the
factor with respect to that component in terms of the response
variable,then one can use Duncan multiple range test to compare the
means of the treatments of that component in that model.

•Duncan’s Multiple Range test is convenient,
because it combines the ease of hypothesis testing
with the power of testing each mean to each mean.
•There are a number of methods to test WHICH
factors matter.
•It is used to compare the means of treatments of
that component in that model. This test was
developed by Duncan (1955)

•The steps of this test are:
•Step-1: Arrange the treatments averages in the ascending order from
left to right.
• 
•Step-2: Find the standard error of each treatment mean .
•