Learning Objectives
By the end of this lesson, you will be able to:
Conduct regression analysis
Use scatter plots and correlation coefficients
Generate and use regression equations
Correlation
It is defined as the process of establishing a relationship or connection between two
or more items.
It is a statistical measure that indicates the degree of linearity between two
independent variables.
Correlation
It is a measure of the strength of association between two quantitative variables.
For example, correlation shows the relationship between Pressure and Yield.
Correlation: Scatter Plots
Correlation can be visually represented in the form of a graph.
Scatter plots help you see patterns in data.
Scatter Plots
Scatter plots can be used to:
Create or refine hypotheses
Support or refute theories
about the data
Predict the effects of various
causes
Scatter Plots
To use a scatter plot, you must have the measures of two factors for a single item.
Wait time
Deliveries
10.012.515.017.520.022.57.55.0
35
30
25
20
15
Y Axis
(Result)
Paired
Data
X Axis (Suspected Influence)
Scatter Plots
Two data sets are said to be paired when they indicate a correlation when
plotted on a graph.
Height of the tree
Diameter of the tree
40 60 80 100 120200
35
30
25
20
10
0
5
15
40
Correlation vs. Causation
Correlation does not imply a cause-and-effect relationship.
Stock price
No. of years
Average life expectancy
No. of decades
Correlation vs. Causation
Correlation is often used to infer causation.
However, it is not a sufficient condition for the reasons listed below:
Causal analysis is the field of statistics that pertains to the
establishment of cause and effect.
Correlation does not always suggest causation.
Correlation vs. Causation
The possible relationships that can exist for any two correlated events A and B include:
No definite conclusion is possible regarding the cause-and-effect relationship between
A and B based only on a correlation between A and B.
There is zero or no connection
between A and B, and the
correlation is purely
coincidental
A causes B and B causes A
(bidirectional or cyclic
causation)
A and B are both caused by CB causes A (reverse causation)A causes B (direct causation)
Correlation vs. Causation
Determining whether there is an actual cause-and-effect relationship may need
additional investigation.
Hence, these factors still do not conclusively prove that there is a cause-and-effect
relationship between the two variables, A and B.
Correlation vs. Causation
When is it correct to infer causation?
Where there is a causation, there is a correlation. However,
a correlation does not imply a causation.
Causation also implies a sequence in time from cause to effect along
with some common and intermediate causes, if applicable.
Correlation may often be used when inferring causation because it is
a necessary condition but not a sufficient condition.
Input, Process, and Output Context
Input
(X)
Output
(Y)
Process
(X)
Y
X
Predictor measures Result measures
X Axis –
Independent Variable
Y Axis –
Dependent Variable
•Customer
Satisfaction
•Total Defects
•Cycle Time
•Cost Profit
•Arrival Time
•Accuracy
•Cost
•Key Specs
•Time Per Task
•In-Process Errors
•Labor Hours
•No. of Exceptions
Predictor measures focus on the inputs and the process used.
Scatter Plots: Types
These scatter plots can be used to develop or verify hypotheses.
Various types of correlations can be interpreted through the patterns displayed on scatter plots.
Scatter Plots: Types
No CorrelationNegative CurvilinearPositive
It is relationship
between two
variables where
both the variables
move in the same
direction.
It is a relationship
between two
variables where an
increase in one
variable invariably
leads to a decrease in
the other variable.
There is no relationship
between the two
variables.
It is a type of
relationship between
two variables whereas
one variable increases,
the other variable also
increases up to a
certain point after
which, as one variable
continues to increase,
the other variable
decreases.
The more time you spend on a treadmill, the
more calories you burn.
The more petrol you fill your car with, the more
distance the car can travel.
Positive Correlation: Example
The higher you go above sea level, the lower the temperature.
Negative Correlation: Example
Zero or No Correlation: Example
There is no relationship between the amount of
coffee people drink and their intelligence.
The price of a pencil may or may not increase with
an increase in the price of milk.
Curvilinear Correlation: Example
The more cheerful the service staff is, the greater is the customer satisfaction but
only to a certain point after which, it may not increase further.
Curvilinear Correlation: Example
If a causal relationship exists, then there is a correlation, but if there is a
correlation, it does not mean there is a causal relationship.
Intelligenc
e
Assets
Correlation: Coefficients
The Pearson’s correlation coefficient is the most common measure of correlation and is
represented by the letterr.
Correlation coefficients are used to indicate the strength of the correlation between two factors.
r = 1 r = -1 r = -.8
r = 0
Ther value indicates the strength and the direction of the relationship between two variables and
always lies between plus or minus one.
An r value below -0.65 and above +0.65 is considered a meaningful correlation.
An rvalue of +1 indicates a perfectpositive correlation, and an rvalue of -1 indicates
a perfectnegative correlation.
Correlation Coefficients: Interpretation
r = -1r = 1
Correlation Coefficients: Interpretation
r = -.8
An rvalue of -.8 indicates a tilt to
negative correlation.
r = 0
An rvalue of 0 indicates that there
is no linear correlation.
The coefficient of
determination
Correlation Coefficients: Interpretation
Makes a precise inference of a correlation
Tests the hypothesis
Determines the degree of influence of
one factor over another
Helps understand whether there are any
other factors that may affect an outcome
The rvalue can be used to calculate rsquare which represents the coefficient of determination.
Regression Analysis
In statistical modeling, regression analysis is defined as a set of statistical processes that are used for
estimating the relationships between a dependent variable and one or more independent variables.
605040302010-10-20
10
15
5
It is a tool that uses data on relevant
variables to develop a prediction equation
or a prediction model.
It is used in conjunction with correlation
and scatter plots to predict future
performance from past results.
Regression Analysis: Example
Regression analysis can be used to quantify how height is impacted
by factors such as:
•Age
•Gender
•Environment
•Genetics
•Diet
Regression Analysis: Example
RegressionCorrelation
•Shows how much linearity
exists between two variables
•Defines the relationship
more precisely
•Can be used when there is
existing data over a defined
range
Regression Analysis: Linear Equation
In simple linear regression, a single variable, X,is used to define or predict Y.
Simple Regression Equation: Y = β
0 + β
1X
1 + Ɛ
Y
X
Y
X
Regression Analysis: Residuals
35
30
25
20
15
5.0 7.5 10.0 12.5 15.0 17.5 20.0 22.5
Deliveries
Wait time
S
R-Sq
R-Sq(adj)
1.04987
93.2%
93.1%
Fitted Line Plot
Wait time = 10.18 + 1.083 deliveries
Regression equation or model
Fitted Observation
Residual Value
True Observation
•The difference between a data point and the
regression line is called a residual.
•A residual value is positive if the data point is
above the regression line.
•A residual value is negative if the data point is
below the regression line.
•The residual value will be zero if the regression
line passes through the point.
Regression Analysis: Residuals
Since there is a difference between a data point and the regression line, residuals are
sometimes called errors.
However, this does not mean there is a mistake within the regression equation. It is the
difference that cannot be explained by the regression equation.
Regression Analysis: Residuals
Residuals or errors should be random in nature. They should not indicate a pattern.
3
Fitted value
Residual
Versus Fits
(Response is wait time)
2
1
0
-1
-2
20 25 30 35
Regression Analysis: Residuals
When you roll a die, you will not be able to predict the number that will appear next.
There is something wrong with the die, if you start seeing a pattern.
Regression Analysis: Residuals
3
Fitted value
Residual
Versus Fits
(Response is wait time)
2
1
0
-1
-2
20 25 30 35
When you see a pattern in residuals, it indicates a problem in the
regression analysis and the equation.
Key Takeaways
Correlation coefficients are used to indicate the strength
of the correlation between two factors.
Correlation is defined as the process of establishing a
relationship or connection between two or more items.
The r value indicates the strength and the direction of the
relationship between two variables.
Scatter plots can be used to support or refute theories
about the data.
Key Takeaways
Correlation can be visually represented in the form of a graph.
Regression analysis is defined as a set of statistical processes
that are used for estimating the relationships between
dependent and independent variables.
The four types of correlation are:
1.Positive correlation
2.Negative correlation
3.Zero or no correlation
4.Curvilinear correlation
Multiple Regression Analysis
Learning Objectives
By the end of this lesson, you will be able to:
Conduct multi-varianalysis
Design a multi-varisampling plan
Calculate variance inflation factor (VIF)
Examine the effect of more than one independent variable on
dependent variable using multiple regression
Analyze data using box cox transformation
Multi-VariAnalysis
By applying the core concept of statistical process control, you monitor
several variables at various intervals.
The assumption here is all such variables have one value when measured.
But, in reality, itdoesn’t happen.
Multi-VariAnalysis
Example:The temperature of a furnace at various sections varies significantly with the
thickness.
•The variation in measurements is observed
within a piece or an item
•The underlying source of this variation will
vary from piece to piece
Furnace
Multi-VariAnalysis
The multi-varichart is a powerful statistical tool that you can use to analyze and exhibit the
three types of variations:
Piece to piece
Within a piece
Time to time
Multi-VariSampling Plan
Select the underlying
process and the attributes
to be analyzed
Analyze the chart for all
three types of variations
Create a table to record the
time and measurements
from each sample set
Select the right sample
size and time interval
Plot the multi-varichart
using MINITAB
1 2 3 4 5
Multi-VariSampling Plan
Conduct multi-varianalysis again to confirm the results once the process
improvements are made.
Nonlinear Regression
•It is one of the most important types of
regression models.
•It uses sample data for analysis and to
build and fit a model.
Nonlinear Regression
Simple linear regression works with the relationship of two variables in a straight line.
It is represented by the equation:
Regression looks at the relationship of the two variables in a nonlinear or curved relationship.
y = ax + b
Independent
variable
Dependent
variable
Nonlinear Regression
A nonlinear relationship equation or the regression model aims at minimizing the value of the
sum of the squares.
The sum of squares shows by how much the yobservations differ from the values predicted by
the nonlinear or curved function.
58 78606264666870727476
58
78
60
62
64
66
68
70
72
74
76
Nonlinear Regression
The smaller the sum of these squared values, the better the function fits and
predicts yvalues in the data set.
•Aim to minimize the sum of squares
•Square each difference and add all the
squared values
58 78606264666870727476
58
78
60
62
64
66
68
70
72
74
76
Multiple Linear Regression
It helps in modeling the relationship between one dependent variable (y) and two or more
independent variables (x).
It fits the data set into a linear equation.
Simple vs. Multiple Linear Regression
Simple Linear Regression
You model the relationship
between one dependent
variable and one independent
variable or predictor
Multiple Linear Regression
You model the relationship
between one dependent variable
and two or more independent
variables or predictors
Multiple Linear Regression: Equation
Y = a
1*X
1+ a
2*X
2+…..+a
n*X
n+ b + e
•Y = Dependent variable or response
•X
1, X
2. . . X
n= Independent variables or predictors. There are n predictors in the equation.
•b = It is the intercept, that is, the value of Y when all Xsare zero.
•a
1, a
2. . .a
n= Coefficients of predictors
•e = Error or the residual
It measures the multicollinearity of each individual independent variable in the model and
quantifies its degree of severity.
Variance Inflation Factor (VIF)
To build a VIF, consider that you are building a multiple linear regression model using n independent
variables or predictors. In this case, it takes two simple steps to calculate the VIF for X
1:
Build a multiple linear
regression model for
X
1by usingX
1, X
2. . . X
nas
independent variables or
predictors.
Look at the R
2
computed
by the linear model in
step one to calculate the
VIF for X
1.
Variance Inflation Factor (VIF): Inferences
VIF Value Inference
VIF = 1 No multicollinearity
1 < VIF < 5 Small multicollinearity
VIF ≥ 5 Medium multicollinearity
VIF ≥ 10 Large multicollinearity
The standard guideline to understand and analyze the VIF:
There are different ways to deal with multicollinearity:
Variance Inflation Factor (VIF): Inferences
Remove the variable with high multicollinearity and high p-value
Increase the sample size
Change your sampling strategy and collect samples with a specific
focus on some predictors
Remove variables that are included more than once in the model
Combine correlated variables and create a new variable
Variance Inflation Factor (VIF): Inferences
Only one independent variable should be
removed at a time
How do you remove variables with high VIF
and p-values from the model for a better
predictability?
•In this example, none of the three
predictors have VIF higher than 5.
•You see that Trial 1 has the highest VIF
value of 4.36, and all of them have the
same p-value.
•So, you don’t need to remove any
independent variable from the model.
R
2
Adj = 97.97%
97.97% of the variation in the final score can be explained by
the predictor variables trial one, trial two, and trial three.
Variance Inflation Factor (VIF): Inferences
Hereisthe multiple regression equation and interpret the
results.
P-value of the f-test is 0.000.
This means we have a statistically significant model.
Variance Inflation Factor (VIF): Inferences
The p-value of all variables are less than 0.05.
•This indicates that all of them are significant.
•VIFs for trial one, trial two, and trial three are less than
five.
Variance Inflation Factor (VIF): Inferences
•6.31 is the Y intercept
•0.660 is the trial one coefficient, multiply it by trial one score
•0.628 is the trial two coefficient, multiply it by trial two score
•0.633 is the trial three coefficient, multiply it by trial three
score
The final equation is 6.31 + 0.660 (trial one) + 0.6286 (trial
two) + 0.633 (trial three)
Variance Inflation Factor (VIF): Inferences
Variance Inflation Factor: Example
Your students are in the finals of the world athletics championship. As a coach, you want to
understand and predict the final score your athlete might get based on the trials.
Assume that:
•Trial one = 81
•Trial two = 79
•Trial three = 105
6.31 + 0.660 (trial one) + 0.6286 (trial
two) + 0.633 (trial three)
The value is: 175.8944
In multiple linear regression:
•Examine the effect of more than one independent variable (X
k) on dependent variable (Y)
•Know the confidence interval of one of the independent variables
•Estimate if a particular independent variable has a significant effect on the dependent
variable
•Estimate the effect of independent variables as a group on the dependent variable
Y
X
1
X
2
X
3
…..
X
k
Confidence Interval for Multiple Linear Regression
X Y
In simple linear regression:
•Examine the effect of a single independent variable (X) on the dependent variable (Y)
Confidence Interval for Multiple Linear Regression
The true values of coefficients β
0and β
idepend largely on the sample size.
You can estimate the confidence intervals for the intercept and the slope
parameter in the regression equation.
Confidence Interval for Multiple Linear Regression
95%confidence interval forβ
ican be estimated using two approaches:
Hypothesis
testing
Confidence
level
•The estimated confidence interval is the set of values for which the null
hypothesis cannot be rejected.
•The null hypothesis is tested at 5% level of significance.
•The estimated confidence interval contains the true value of β
i
(probability of 95%).
•For 95% of all possible samples drawn from the population, the
confidence interval will contain the true value of β
k.
Confidence Interval for Multiple Linear Regression
To calculate the confidence interval at 95% and with 5% significance level for β
i:
Confidence Interval for Multiple Linear Regression
CI
0.95
??????
??????
=መ??????
??????−1.96×????????????መ??????
??????,መ??????
??????+1.96×????????????መ??????
??????
•You are watching the 2019 World Championship
final of the 100 meters sprint.
•The time difference between the first few
runners, including the winner, is very small.
•Your favorite runner lost the race by the
difference of 1/10th of a second.
Box-Cox Transformation
Large variation Small variation
In case of other runners, the time difference in reaching the finish line is higher than the first
3-4 runners.
This phenomenon is referred to as heteroscedasticityor nonconstant variance.
Box-Cox Transformation
Transform the data into normality using a Box-Cox transformation in MINITAB.
Box-Cox Transformation
Data points from the previous example do not generate a normal distribution.
Key Takeaways
The multi-varichart is a powerful statistical tool that you can use
to analyze and exhibit all three types of variations.
A nonlinear relationship equation or the regression model aims
at minimizing the value of the sum of the squares.
Multiple linear regression is one of the common regression
methods for modeling the relationship between one dependent
variable (y) and two or more independent variables (x).
Nonlinear regression uses sample data for analysis and to build
and fit a model.
Key Takeaways
Simple linear regression works with the relationship of two
variables in a straight line.
The variance inflation factor measures the multicollinearity
of each individual independent variable in the model and
quantifies its degree of severity.
Designed Experiments
Learning Objectives
By the end of this lesson, you will be able to:
Comprehend design of experiments (DOE)
Identify the various phases of DOE
Classify the types of DOE strategies
Summarize the key considerations in an experimental
design
Design of Experiments (DOE)
Hypothesis tests only help in drawing conclusions about the relationship between variables.
However, this is not enough to bring about improvements in a process.
Experiments help Six Sigma teams analyze a process and identify the required improvements.
Design of Experiments (DOE)
DOE is a systematic set of experiments where the effects of one or more
factors can be evaluated without subjective judgments.
•It is known as designed experiments in which the
factors are altered to determine their effect on the
output.
•It involves running a series of experiments in
which the input factors are varied in an organized
way.
•It begins by stating the experimental objectives
and ends when the results are reported.
Design of Experiments (DOE)
Ambiguous results are avoided, and inferred cause-effect relationship is more accurate.
DOE eliminates nonsignificant factors in a process.
Design of experiments often lead to further experimentation, which ultimately
leads to optimization.
Design of Experiments: Uses
It is used when the relationship betweenY and X cannot be defined with equations.
Y= f (X)
DOE helps to determine the variable that has the most significant influence on the output.
Design of experiments is used to infer cause and effect in the problem-solving equation.
Benefits of DOE
Provides a directed approach and saves time
Uses a mathematical model to relate the variables and the responses
Determines how multiple input variables interact to affect results
Minimizes or maximizes mean
Achieves a target while minimizing the variance
There is much greater confidence in the results obtained through designed experiments,
since the statistical significance of the results is known.
Design of Experiments (DOE)
•Are the result of an experiment
•Also called outputs, uncontrollable
variables, dependent variables,
effects, or Y variables
•Variables set to a particular value
during the experiment
•Can be quantitative or qualitative
•Also called parameters, inputs,
controlled variables, independent
variables, causes, or X variables
Factors Response variables
Four Phases of the DOE Process
Objective or
Planning Phase
Screening Phase
Optimization
Phase
Confirmation
Phase
Define experimental
objective and
purpose
Identify the factors
that change the
response variables
Optimize the
response variables
Validate the
experimental
findings
Factor-level settings are the values assigned to the factors or input variables.
Objective Phase
The objective phase defines the objective of the experiment.
•Identify the response variable
•Analyze the measurement system
•Understand the state of control before
beginning the experiment
•Ask if the purpose is consistent with the
practical problem statement
•Conduct performance measurements and
stability assessments
The objective phase tends to be overlooked, which may lead to a significant error.
Objective Phase: Questions
•What is the practical problem?
•What is the response variable?
•Is there only one response variable?
•Can we measure the expected changes with
our response variables?
•What is the desired response?
•What is the objective in terms of response?
•Is the process stable?
Ask the following questions in this phase.
•Identifies variables that have a significant effect
on the response
•Does not define a mathematical relationship
between the factors and response variables
•Determines the factors that must be altered for
further experimentation
Screening Phase: Purpose
Screening Phase: Tasks
Performs a series of screening
experimentsDetermines the factor-level settings
Selects the most appropriate
experimental design
Identifies the noise control and signal
factors for the experiment
Identifies the potential factors or the X
variables
Realizing the goal of the
DOE method
Identifying noise and
control factors for the
experiment
Optimization Phase
Using vital factors that
optimize response
variables
The purpose of this phase is to identify the input variable settings that optimize the response.
The tasks of the optimization phase are:
Choosing optimization
strategy
Confirmation Phase
•Assesses whether the results obtained in the
experiment correlate with the real-life conditions
•Involves performing subsequent experiments to
confirm the optimization results
•Is sometimes conducted directly after the
screening phase
DOE Strategies: Types
Fractional Factorial
Full Factorial
Trial and Error
One Factor at a Time
(OFAT)
Types of
DOE
Strategies
Trial-and-Error Strategy
•Is also called traditional DOE
•Relies on running separate experiments with
separate time and resources
•Is generally expensive
•The response range is usually outside the
acceptable value
Trial and Error
In a trial and errorapproach, the solutionmay not truly solve the problem.
It only offers an improved chance for breakthrough achievement
Trial-and-Error Strategy: Example
You may make a number ofchanges to obtain the
desired mileage, such as:
•Change brand of gas
•Drive slower
•Change tire pressure
•Buy new tires
Problem: The gas mileage for car is 20 miles per gallon.
You would like to attain a mileage greater than 30 miles per gallon.
One-Factor-at-a-Time Approach
One factor or variable is changed at a time, from minimum to maximum, while keeping all
other factors constant
•Used in basic problem-solving to resolve
confusing issues
•Intuitive and simple method for screening
significant factors
•Requires many experimental runs
•Does not consider the interaction between
variables
•Commonly used for verification runs
DOE Strategies
Full factorial approach Fractionalfactorial approach
Full Factorial Approach
It is used to determine the factors that have a statistically significant effect on the response
variables.
•Factors may be quantitative or qualitative
•At least one value of the response is observed at
each treatment combination
•Lengthy experiment due to the multiple
treatment combinations used
•Eliminates the possibility of any confounding
issues
•Rarely used due to the large number of runs
required
Full Factorial Approach
Full factorial experiments evaluate all factors included in the experiment.
Here, all possible treatment combinations are considered.
Minimum number of tests = Xk
X -number of levels
k -number of factors
Full Factorial Approach
Advantages
•Is more efficient than other approaches seen
earlier
•Provides information about all possible effects
and interactions
•Quantifies the equation Y= f(X) appropriately
Drawback
•Requires a lot of time and resources
Fractional Factorial Approach
Determines which factors have a statistically significant effect on the response variables
•Factors may be quantitative or qualitative
•Determines which factors should be included for
further experimentation
•Preferred choice for initial screening design
•Initial experimentation requires fewer runs than
the full factorial method
Fractional Factorial Approach
Advantages
•Information can be obtained with smaller
investment when many factors are being
investigated
•Requires fewer resources than a full factorial
experiment
•Saves time and money as it requires fewer runs
Drawbacks
•Tends to miss some interactions
•Confounding issues may occur
Principles of Experimental Design
Randomization
•Is the process of assigning treatments randomly in an experiment
•Has the same probability for each treatment
Example: Each individualin a group of people divided into experimental and
control groups will have an equal chance of being in either of the groups.
•Removes bias from other sources of uncontrollable variation
Principles of Experimental Design
•Refers to the repetition of the same experiment
•Removes variations by enhancing the number of units
•Repeats an experiment multiple times to obtain a statistically significant
result
•Provides an accurate estimate of the experimental error
•Helps in decreasing the experimental error
Replication
Principles of Experimental Design
•Refers to the balancing, blocking, and grouping of the experimental units
•Balancing ensures that all treatments produce a balanced design of
experiment
•Blocking ensures that all treatments form a homogenous group of similar
units
•Decreases the experimental error
Local control
Randomization and replication will not remove all external sources of variation.
Key Takeaways
Design of experiments is a systematic set of experiments in
which the effect of one or more factors can be evaluated without
subjective judgements.
Design of experiments consists of factors and response variables.
The four phases of DOE process are planning, screening,
optimization, and confirmation.
The different DOE strategies include trial and error, one factor at
a time, full factorial, and fractional factorial.
The basic principles of experimental designs are randomization,
replication, and local control.
Factorial Experiments
Learning Objectives
By the end of this lesson, you will be able to:
Perform a full factorial experiment
Use central composite and Box-Behnken methods to
determine the curvature of a response surface
Measure each effect in a factorial experiment using an
orthogonal design
Identify the confounding effects in a fractional factorial
experiment
Factorial Design
Factorial design is a type of designed experiment in which the effects of several
factors are estimated on a response variable.
All factor levels must be varied simultaneously to estimate the effect.
Factorial Design
Two-factor design
Factor A = 2 levels
Factor B =3 levels
Three-factor design
A, B, and C have three
levels each
A
B
B
A
C
In these images, you can see a unique combination of various factors and levels.
Full Factorial Experiments
In a full factorial design of experiment, the response of every possible combination of
factors and factor levels is measured.
k = number of input variables
2 = number of levels
The number of experimental runs increases with
the increase in the number of factors.
For example:
•A 2-level full factorial design with 4 factors
requires 16 runs.
•A design with 6 factors requires 64 runs.
Full Factorial Experiments: Benefits
•Easy to understand
•Form the basis of fractional factorial designs
•Can be augmented to form composite designs
•Require fewer runs per factor
In case of more than five factors, testing all combinations of factor levels will be extremely
expensive, time-consuming, and complex.
Quadratic Models
There is a response surface
with no curvature and is
represented by linear model.
A quadratic model describes the curvature in the response surface in the images given.
There is a curvature and can
be represented by a
quadratic model.
Quadratic Models
•Helps in mapping a particular region of the response surface
•Identifies levels of variables that can optimize a response
•Identifies the operating conditions required to meet the design specifications
Response Surface Method
The presence of squared terms in the equation is the primary difference between a
response surface method and a factorial design.
Central Composite Design
A well-planned factorial experiment acts as a source of information for the central composite design.
X3
X2
X1
Center points
Factorial points
Star points
•Used when a design plan includes sequential
experimentation
•Contains an imbedded full factorial or fractional
factorial design
•Has center points and axial points to estimate the
curvature of the surface
•Used to estimate the first-and second-order terms
•Can have up to five levels for each factor
A central composite design is one of the most common approaches taken when designing a response
surface experiment.
Central Composite Design: Example
Estimate the optimum conditions for injection molding of a rubber object.
On performing a factorial experiment, the following factors are obtained. Temperature = 190°C and 210°C and
pressure = 50 MPa and 100 Mpa. Based on these values, center points and axial points are added.
210°, 50 MPa
214.1°, 75 MPa
(axial point)
210°, 100 MPa
190°, 50 MPa
185.9°, 75 MPa
(axial point)
190°, 100 MPa
200°, 39.6 MPa (axial point) 200°, 110.4 MPa (axial point)
Box-Behnken Design
A Box-Behnken response surface design does not include a full factorial or
fractional factorial design.
•More economically viable then center
composite designs
•More accurate estimate of the first-and
second-order coefficients than central
composite designs
Box-Behnken Design: Example
•Temperature = 190°C and 210°C
•Pressure = 50 MPa and 100 MPa
•Speed of injection= 10 mm/s and
50 mm/s
Consider the example of injection mouldingof a rubber object by estimating optimum conditions
Box-Behnken Design: Example
A Box-Behnken design will have its design points at high factor levels,
low factor levels, and their midpoints.
The image shows a three-factor Box-Behnken design. The
points here represent the experimental runs.
•Temperature = 190°C, 200°C, and 210°C
•Pressure = 50 MPa, 75 Mpa, and 100 MPa
•Speed of injection= 10 mm/s, 30 mm/s, and 50 mm/s
Balanced Design
Balanced Design
P Q R
0 0 0
0 0 1
0 1 0
0 1 1
1 0 0
1 0 1
1 1 0
1 1 1
P Q R
0 0 0
0 1 0
0 1 0
0 0 1
0 1 1
1 1 0
1 0 1
1 1 1
Unbalanced Design
Equal number of observations Unequal number of observations
All possible combinations for these
three factors appear only once.
One combination (1, 0, 0) is missing
and has two instances of the (0, 1, 0)
combination.
Orthogonal Design
An orthogonal design allows for the measurement of each effect in an experiment.
2
3
5
0
-4
1
1
4
P = Q =
P * Q = 2 (–4) + 3 (1) + 5 (1) + 0 (4)
P * Q = –8 + 3 + 5 + 0
P * Q = 0
You can assess the orthogonality for two vectors by adding the product of their values.
If the result is zero, orthogonality is said to be present.
Example: Find out the orthogonality by multiplying the corresponding values. We can see that
the sum of their product is zero. This indicates that orthogonality is present in both vectors.
Orthogonal Design
Orthogonality is an important concept in DOE, as it identifies independence.
Main effects
Interactions
It is difficult to analyze designs that are not orthogonal.
Orthogonal
Design
Orthogonal Design: Experiment
2
3
experiment with eight runs
X Y Z
1 –1 –1
1 –1 1
–1 –1 1
–1 1 –1
–1 1 1
–1 –1 –1
1 1 1
1 1 –1
X * Y = 1 (–1) +1 (–1) –1 (–1) –1 (1) –1 (1) –1 (–1) + 1 (1) + 1 (1) = 0
X * Z = 1 (–1) +1 (1) –1 (1) –1 (–1) –1 (1) –1 (–1) + 1 (1) + 1 (–1) = 0
Y * Z = –1 (–1) –1 (1) –1 (1) + 1 (–1) + 1 (1) –1 (–1) + 1 (1) + 1 (–1) = 0
Detect orthogonality with the same approach used in the previous example by multiplying x and y.
Based on the calculation, confirm that orthogonality is present.
The factor X is independent from Y and Z and vice versa.
Center Points
Factors Low Setting High Setting
Temp 100 Degrees 150 Degrees
Time 20 Seconds 40 Seconds
Center points are midway experimental runs for low and high settings.
These are additional experimental runs made at the physical center of the design.
Time = 30sTemperature = 125°C
Why Center Points
A factorial design assumes that the relationship between X and Y is linear.
Factorial
Design
Center points help determine if the relationship between X and Y is linear.
Why Center Points
Curvature relationship
Response surface design can be used to detect curvature and build a model.
Recall the p-value principle. If the p-value of the center point is less than the level of alpha, then
you can statistically conclude that there exists a curvature relationship between X and Y.
Center Points: True Variation
When sampling is done right, true variation cannot be detected.
Addition of center points in the design increases the probability of detecting true variation.
Time-consumingExpensive
Replicating an
entire DOE is:
Fractional Factorial Experiment
In a fractional design, the experiment is conducted only on the selected fraction
of the full factorial design.
Limited resources
Multiple design
factors
Fractional factorial design determines the factors that must be included in
further experimentation.
A fractional factorial is a good choice when there is:
Fractional Factorial Design
The main effects will be confounded with two-way interactions.
Full factorial design Fractionalfactorial design
In the images, the full factorial design consists twice the design points when
compared to the half factorial or the fractional factorial design.
The measurement of the response in the fractional factorial will be at any of the
eight corner points of the design.
Fractional Factorial Design: Minitab Selection
Look at how fractions of a design are selected using Minitab. Assign signs to
design generators to select the fraction of a design.
FractionStandard (Yates) Order Design Generator
1 ––– X = –PQ, Y = –PR, Z = –QR
2 + –– X = +PQ, Y = –PR, Z = –QR
3 –+ – X = –PQ, Y = +PR, Z = –QR
4 + + – X = +PQ, Y = +PR, Z = –QR
5 ––+ X = –PQ, Y = –PR, Z = +QR
6 + –+ X = +PQ, Y = –PR, Z = +QR
7 –+ + X = –PQ, Y = +PR, Z = +QR
8 + + + X = +PQ, Y = +PR, Z = +QR
Confounding
Confounding helps solve the issue of losing information on the interactions.
The main effects and two-way interactions are confounded.
Their separation from the effects of other higher order interactions is not possible.
Confounding or Aliasing
Main effects
2-way
interactions
Confounding
The interactions and main effects cannot be separated.
P
Main effect
QRS
3-way
interaction
A
Response
Assume that factor P is confounded with the 3-way interaction QRS.
A is the response variable on which you measure the effect of PQRS variables.
Confounding
The alias structure describes the confounding pattern that occurs in a design.
Multiply the term of interest
Remove the squared term
(QR)(I + PQRST)
QR + PQ 2R 2ST
QR + PST
To determine confounded effects, multiply the term of interest with the identity
statement, and then eliminate the squared terms.
Key Takeaways
In full factorial experiments, every possible combination of
factors and factor levels is measured.
The response surface method helps identify the operating
conditions required to meet the design specifications.
The central composite design is more economically viable
than a Box-Behnken design for an experiment with the
same number of factors.
You can assess the orthogonality for two vectors by adding
the product of their values.
Key Takeaways
Addition of center points in the design increases the
probability of detecting true variation.
In a fractional design, the experiment is conducted only on
the selected fraction of a full factorial design.
Confounding addresses the issue of loss of information on
the interaction of all factors.