Design of Experiments

DrKeertiJain1 2,032 views 60 slides Feb 07, 2020
Slide 1
Slide 1 of 60
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27
Slide 28
28
Slide 29
29
Slide 30
30
Slide 31
31
Slide 32
32
Slide 33
33
Slide 34
34
Slide 35
35
Slide 36
36
Slide 37
37
Slide 38
38
Slide 39
39
Slide 40
40
Slide 41
41
Slide 42
42
Slide 43
43
Slide 44
44
Slide 45
45
Slide 46
46
Slide 47
47
Slide 48
48
Slide 49
49
Slide 50
50
Slide 51
51
Slide 52
52
Slide 53
53
Slide 54
54
Slide 55
55
Slide 56
56
Slide 57
57
Slide 58
58
Slide 59
59
Slide 60
60

About This Presentation

Experiments
A Quick History of Design of Experiments
Why We Use Experimental Designs
What is Design of Experiment
How Design of Experiment contributes
Terminology
Analysis Of Variation (ANOVA)
Basic Principle of Design of Experiments
Some Experimental Designs


Slide Content

An Overview of Design of Experiments Dr. Keerti Jain 1

Experiments A Quick History of Design of Experiments Why We Use Experimental Designs What is Design of Experiment How Design of Experiment contributes Terminology Analysis Of Variation (ANOVA) Basic Principle of Design of Experiments Some Experimental Designs 6/5/2019 2

Experiments involve manipulation of one or more independent variables, and observing the effect on some outcome (dependent variable). Experiments can be done in the field or in a laboratory. 6/5/2019 3

The agricultural origins, 1918 – 1940s R. A. Fisher & his co-workers Profound impact on agricultural science Factorial designs, ANOVA The first industrial era, 1951 – late 1970s Box & Wilson, response surfaces Applications in the chemical & process industries 6/5/2019 4 A Quick History of Design of experiments

The second industrial era, late 1970s – 1990 Quality improvement initiatives in many companies TQM were important ideas and became management goals Taguchi and robust parameter design, process robustness The modern era, beginning 1990 Six sigma, Lean Six sigma Clinical Trails, Mathematical biology. Algorithm design and analysis, Networking, group testing, and cryptography 6/5/2019 5 Contd …

" All experiments are designed experiments, it is just that some are poorly designed and some are well-designed ." Experimental designs are used so that the treatments may be assigned in an organized manner to allow valid statistical analysis to be carried out on the resulting data. 6/5/2019 6

It is a logical planning (or construction) of the experiment having a complete sequence of steps taken ahead of time to ensure that the appropriate data will be obtained in a way which permits an objective analysis of a particular problem leading to valid and precise inference in most economic and useful forms. 6/5/2019 7 What is Design of Experiments

It includes: Planning of the experiment Obtaining data from it Making statistical analysis of the data obtained. 6/5/2019 8 Subject Matter of Design of Experiments

Reduce  time  to design/develop new products & processes Improve  performance  of existing processes Improve  reliability  and performance of products Achieve product & process  robustness Perform  evaluation  of materials, design alternatives,   setting  component & system tolerances, etc. 6/5/2019 9 How Design of Experiment Contributes

6/5/2019 10

Control Group :- A group assigned to the experiment, but not for the purpose of being exposed to the treatment. Performance of this group serves as a baseline. Treatment Group :- The Group in an experiment which receives the specified treatment. Factor :- This term is used when an experiment involves more than one variable. These variables are often identified as factor. Level :- Refers to the degree or intensity of a factor. Randomness :-refers to the property of completely chance events that are not predictable. Replication:- The repetition of the treatment under consideration. Blocks :- refers to the categories of subjects with a treatment group. 6/5/2019 11

6/5/2019 12 Experimental Error is the variation in the responses among experimental units which are assigned the same treatment, and are observed under the same experimental conditions. It is measured by SSE (or MSE). Ideally, we would like experimental error to be zero . This is impossible because of (at least) one or more of the following reasons: There are inherent differences in the experimental units before they receive treatments. There is variation in the devices that record the measurements. There is variation in applying or setting the treatments. There are extraneous factors other than the treatments which affect the response.

This Statistical technique was first developed by R.A.Fisher and was extensively used for agriculture experiments. It is mainly employed for comparison of means of 3 or more samples including the variations in each sample. ANOVA is the method to estimate the contribution made by each factor to the total variation. 6/5/2019 13

Source of Variation (SV) Sum of Squares (SS) Degree of Freedom ( df ) Mean Squares (MS) F Treatment SS t df t = n t -1 MS TR = SS t / df t MS TR / MS E Error SS r df e = df T -df t MS E = SS r / df e Total SS T df T = n T -1 6/5/2019 14

The Steps in Designing an Experiment Step 1:   Identify the problem or claim to be studied. The statement of the problem needs to be as specific as possible. As your text says, it must "identify the response variable and the population to be studied". Step 2:   Determine the factors affecting the response variable. This is best done by an expert in the field, but we'll be able to do this for most examples we'll be looking at. 6/5/2019 15

The Steps in Designing an Experiment ( Contd …) Step 3:   Determine the number of experimental units. In general, more experimental units is better. Unfortunately, time and money will always be limiting factors, so we have to decide an appropriate number 6/5/2019 16

The Steps in Designing an Experiment ( Contd …) Step 4:   Determine the level(s) of each factor. We split factors up into three categories: Control:  If possible, we try to fix the level of factors that we're not interested in. Manipulate:  This is the treatment - we manipulate the levels of the variable that we think will affect the response variable. Randomize:  Often, there are factors we just can't control. To mitigate their effect on the data, we randomize the groups. By randomly assigning experimental units, these factors should be equally spread among all groups. 6/5/2019 17

The Steps in Designing an Experiment ( Contd …) Step 5:   Conduct the experiment. Step 6:   Test the claim. Step 7: Interpret the results 6/5/2019 18

Randomization Replication Local Control (Blocking) 6/5/2019 19 Basic principle of Design of Experiments

6/5/2019 20

Completely Randomized Design (CRD) Randomized Block Design (RBD) Latin Square Design (LSD) Factorial Designs Balanced Incomplete Block Design (BIBD) Nested Balanced Incomplete Block designs (NBIBD) Balanced Incomplete Block Design with Nested Rows and Columns 6/5/2019 21 Some Experimental Designs

6/5/2019 22

Completely randomized designs are the simplest design in which the treatments are assigned to the experimental units completely at random. This allows every experimental unit to have an equal probability of receiving a treatment. For CRD, any difference among experimental units receiving the same treatment is considered as experimental error. 6/5/2019 23

CRD is the simplest design to use. CRD is appropriate only for experiments with homogeneous experimental units, such as laboratory experiments, where environmental effects are relatively easy to control. . The CRD is best suited for experiments with a small number of treatments. For field experiments, where there is generally large variation among experimental plots in such environmental factors as soil, the CRD is rarely used. Every experimental unit has the same probability of receiving any treatment Treatments are assigned to experimental units completely at random using a random number table, computer program, etc. 6/5/2019 24

In order to determine whether there is significant difference in the durability of 3 makes of computers, samples of size 5 are selected from each make and the frequency of repair during the first year is observed. The results are as follows: 6/5/2019 25 Makes A B C 5 8 7 6 10 3 8 11 5 9 12 4 7 4 1

6/5/2019 26

H : The three makes of computers do not differ significantly in the durability. H 1 : Atleast one of the makes of computers differ significantly in the durability. 6/5/2019 27 Hypothesis

6/5/2019 28 M ake X ij T i n i T i 2 T i 2 /ni ∑ X 2 ij A 5 6 8 9 7 35 5 1225 245 255 B 8 10 11 12 4 45 5 2025 405 445 C 7 3 5 4 1 20 5 400 80 100 T otal 100 15 3650 730 800 Table for calculation

CF = (Ti) 2 /n i = (100) 2 /15 = 666.67 SS T = ∑∑ X 2 ij – CF = 800 -666.67 = 133.33 SS M = ∑ T i 2 /ni – CF = 730 - 666.67 = 63.33 SS E = SS T – SS M = 133.33 -63.33 = 70 Null Hypothesis : H : the 3 makes of computers do not differ in the durability 6/5/2019 29

6/5/2019 30 Sources of Variation Sum of Square Degree of freedom Mean sum of Square F Between Makes 63.33 2 31.67 31.67 / 5.83 = 5.43 Within Makes 70 12 5.83 Total 133.33 14 From F – Tables, F 5% (v 1 = 2, v 2 = 12) = 3.88 F > F 5% Null hypothesis is rejected. There is significant difference between the makes of computers. Anova Table

Very flexible design (i.e. number of treatments and replicates is only limited by the available number of experimental units). Statistical analysis is simple compared to other designs. Loss of information due to missing data is small compared to other designs due to the larger number of degrees of freedom for the error source of variation. Provides maximum number of degree of freedom. 6/5/2019 31

If experimental units are not homogeneous and you fail to minimize this variation using blocking, there may be a loss of precision. Usually the least efficient design unless experimental units are homogeneous. Not suited for a large number of treatments. 6/5/2019 32

6/5/2019 33

6/5/2019 34

Any experimental design in which the randomization of treatments is restricted to groups of experimental units within a predefined block of units assumed to be internally homogeneous is called a randomized block design. Divides the group of experimental units into n homogeneous groups of equal or unequal sizes. These homogeneous groups are called blocks. The treatments are then randomly assigned to the experimental units in each block - one treatment to a unit in each block . 6/5/2019 35

A randomized block experiment is assumed to be a two-factor experiment., the factors are blocks and treatments. The blocks of experimental units are uniform. There is one observation per cell. It is assumed that there is no interaction between blocks and treatments. The degrees of freedom for the interaction is used to estimate error. Treatments randomly assigned to each experimental unit of a block. 6/5/2019 36

Source of Variation (SV) Sum of Squares (SS) Degree of Freedom ( df ) Mean Squares (MS) F Blocks SS b df b = n b -1 MS B = SS b / df b MS B / MS Err Treatment SS t df t = n t -1 MS TR = SS t / df t MS TR / MS Err Error SS e df e = df T -df b -df t MS Err = SS e / df e Total SS T df T = n T -1 6/5/2019 37

Four Doctors each test 4 treatments for certain disease and observe the number of each days each patient takes to recover. The results are : 6/5/2019 38 Treatments Doctor 1 2 3 4 A 10 14 19 20 B 11 15 17 21 C 9 12 16 19 D 8 13 17 20

Two WAY ANALYSIS H 0A : There is no significant difference between the doctors. H 1A : Atleast one of the doctor is significantly different. H 0B : There is no significant difference between the treatments. H 1B : Atleast one of the treatment is significantly different. 6/5/2019 39 Hypothesis

Doctor 1 2 3 4 T i K T i 2 / k ∑X 2 ij A 10 14 19 20 63 4 992.25 1057 B 11 15 17 21 64 4 1024 1076 C 9 12 16 19 56 4 784 842 D 8 13 17 20 58 4 841 922 T j 38 54 69 80 241 16 ∑ T i 2 / k = 3641.25 3897 T j 2 / h 361 729 1190.25 1600 ∑ T j 2 / h = 3880.25 ∑X 2 ij 366 734 1195 1602 => 3897 6/5/2019 40 Table for calculations

CF = (Ti) 2 / N = (241) 2 / 16 =3630.06 SS Total = ∑∑X 2 ij - CF = 3897 – 3630.06 = 266.94 SS D = ∑T i 2 / h – CF = 3641.25 – 3630.06 = 11.19 SS t = ∑T j 2 / k – CF = 3880.25 -3630.06 = 250.19 SS e = SS Total - SS D - SS t = 5.56 6/5/2019 41

Anova Table 6/5/2019 42 Source of Variation Sum of Square Degree of Freedom Mean sum of square F Doctors 11.19 3 3.73 3.73 / 0.62 = 6.02 Treatments 250.19 3 83.40 83.40 / 0.62 = 134.52 Error 5.56 9 0.62 - Total 266.94 15 From F – Tables, F 5% (v 1 = 3, v 2 = 9) = 3.86 F > F 5% The difference between the doctors is significant and that between the Treatments is highly significant.

Complete flexibility can have any number of treatments and blocks. Provides more accurate results than the completely randomized design due to grouping. Relatively easy statistical analysis even with missing data. Some treatments may be replicated more times than others. Whole treatments or entire replicates may be deleted from the analysis. 6/5/2019 43

Not suitable for large numbers of treatments because blocks become too large, and there is possibility of hetertrogenity among the experimental units of the blocks Interactions between block and treatment effects increase error. Serious problem with the analysis if a block factor by treatment interaction effect actually exists and no replication within blocks has been included. (solution: use replication within blocks when possible). 6/5/2019 44

6/5/2019 45

A Latin square is a square array of objects (letters A, B, C, …) such that each object appears once and only once in each row and each column. Example - 4 x 4 Latin Square. A B C D B C D A C D A B D A B C The Latin Square Design is for a situation in which there are two extraneous sources of variation. If the rows and columns of a square are thought of as levels of the the two extraneous variables, then in a Latin square each treatment appears exactly once in each row and column. With the Latin Square design we are able to control variation in two directions. 6/5/2019 46

In LSD we have three factors: Treatments, Rows and Columns The number of treatments = the number of rows = the number of colums = t (say) . The row-column treatments are represented by cells in a t x t array. The treatments are assigned to row-column combinations using a Latin-square arrangement, that is each row contains every treatment. and each column contains every treatment. Every treatment occurs once in each row and column. 6/5/2019 47 Characteristics of LSD

Source Of Variation (SV) Sum Of Squares (SS) Degree Of Freedom ( df ) Mean Squares (MS) F Treatment SS t df t = n t -1 MS TR = SS t / df t MS TR / MS Err Rows SS r df r = n r -1 MS Row = SS r / df r MS Row / MS Err Columns SS c df c = n c -1 MS Col = SS c / df c MS Col / MS Err Error SS e df e = df T -df t -df r -df c MS Err = SS e / df e Total SS T df T = n T -1 6/5/2019 48

The Following Data resulted from an experiment to compare three burners B1, B2 and B3. LSD was used as the tests were made on 3 engines and were spread over 3 days. 6/5/2019 49 Engine 1 Engine 2 Engine 3 Day 1 B1 – 16 B2 – 17 B3 - 20 Day 2 B2 – 16 B3 – 21 B1 - 15 Day 3 B3 – 15 B1 - 12 B2 - 13

H 0A : There is no significant difference between burners. H 1A : Atleast one of the burner is significantly different. H 0B : There is no significant difference between the days. H 1B : Atleast one of the day is significantly different H 0C : There is no significant difference between Engines. H 1C : Atleast one of the engine is significantly different 6/5/2019 50 Hypothesis

E 1 E 2 E 3 T i T i 2 / n ∑X 2 ij Day 1 16(B1) 17(B2) 20(B3) 53 936.33 945 Day 2 16(B2) 21(B3) 15(B1) 52 901.33 922 Day 3 15(B3) 12(B1) 13(B2) 40 533.33 538 Tj 47 50 48 145 ∑= 2370.99 2405 T 2 j / n 736.33 833.33 768 ∑= 2337.66 ∑X 2 ij 737 874 794 2405 6/5/2019 51 Rearranging data values according to the Burners : Burner X k T k T k 2 / n B1 16 15 12 43 616.33 B2 17 16 13 46 705.33 B3 20 21 15 56 1045.33 2366.99

CF = (T i ) 2 / n = (145) 2 / 9 = 2336.11 SS Total =∑∑X 2 ij – CF = 2405 – 2336.11 = 68.89 SS D1 =∑∑T i 2 / n – CF = 2370.99 – 2336.11 = 34.88 SS D2 =∑∑T j 2 / n – CF = 2337.66 – 2336.11 = 1.55 SS D3 =∑∑T k 2 / n – CF = 2366.99 – 2336.11 = 30.88 SS E = SS Total – SS D1 – SS D2 – SS D3 = 1.55 6/5/2019 52

Anova Table 6/5/2019 53 S.V S.S d.f M.S F Days 34.88 2 17.44 17.55 / 0.775 = 22.51 Engines 1.55 2 0.775 0.775 / 0.775 = 1 Burners 30.88 2 15.44 15.44 / 0.775 = 19.93 Error 1.55 2 0.775 Total 68.89 8

From F – Tables, F 5% (v 1 = 2, v 2 = 2) = 19.00 F (19.93) > F 5% There is a significant Difference Between the Burners F (22.51) > F 5% The Difference Between the Days is significant F (1) < F 5% The Difference Between the Engine is not significant 6/5/2019 54

We can control variation in two directions. It means LSD is more efficient then CRD and RBD. Being 3-way design it is economic over the corresponding complete 3-way design. Instead of experimental units, here only experimental units are sufficient. The analysis remains relatively simple even with missing data.   6/5/2019 55

Number of treatment is limited to the number of replicates which seldom exceeds 10. If we have less than 5 treatments, the df for controlling random variation is relatively large and the df for error is small. The number of treatments must equal the number of replicates. The experimental error is likely to increase with the size of the square. Evaluation of interactions between rows and columns, rows and treatments & columns and treatments is not possible separately. 6/5/2019 56

Factorial designs include two or more factors, each having more than one level or treatment. Participants typically are randomized to a combination that includes one treatment or level from each factor. 6/5/2019 57 Factorial Experiment

Situation where the number of treatments exceeds number of units per block (or logistics do not allow for assignment of all treatments to all blocks) # of Treatments  v # of Blocks  b Replicates per Treatment  r < b Block Size  k < v Total Number of Units  N = kb = rv All pairs of Treatments appear together in l = r ( k -1)/( v -1) Blocks for some integer l 6/5/2019 58 Balanced Incomplete Block Designs (BIBD)

In certain multifactor experiments, the levels of one factor are similar but not identical for different levels of another factor, (is unique to that particular factor) this is called hierarchical or nested design. http://jrss.in/data/5I12.pdf 6/5/2019 59 Neste d Designs

6/5/2019 60