Design of Experiments By Dr. Virendra Kumar, ( Ph.D IITD) Email: [email protected] Web site: https://sites.google.com/view/virendra Experimental Design (or DOE) economically maximizes information LECTURE 1
Syllabus
Recommended Books Textbooks: D.C. Montgomery, Design and Analysis of Experiments, Wiley India, 5th Edition, 2006, ISBN – 812651048- X. Madhav S. Phadke , Quality Engineering Using Robust Design, Prentice Hall PTR, Englewood Cliffs, New Jersey 07632,1989, ISBN: 0137451679. Reference Books Robert H. Lochner, Joseph E. Matar , Designing for Quality - an Introduction Best of Taghuchi and Western Methods or Statistical Experimental Design, Chapman and Hall, 1990, ISBN – 0412400200. Philip J. Ross, Taguchi Techniques for Quality Engineering: Loss Function, Orthogonal Experiments, Parameter and Tolerance Design, McGraw-Hill, 2nd Edition, 1996, ISBN: 0070539588.
What is Experiment? The term experiment is defined as the systematic procedure carried out under controlled conditions in order to discover an unknown effect, to test or establish a hypothesis, or to illustrate a known effect. When analyzing a process, experiments are often used to evaluate which process inputs have a significant impact on the process output, and what the target level of those inputs should be to achieve a desired result (output). Experiments can be designed in many different ways to collect this information. Design of Experiments (DOE) is also referred to as Designed Experiments or Experimental Design - all of the terms have the same meaning.
Aim to Design of Experiments
What is experimental design?
Black box process model
Definition of Design of Experiments (DOE) Design of experiments (DOE) can be defined as a set of statistical tools that deal with the planning, executing, analyzing , and interpretation of controlled tests to determine which factors will impact and drive the outcomes of your process.
Development of DOE The agricultural origins, 1908 – 1940s W.S. Gossett and the t-test (1908) R. A. Fisher & his co-workers Profound impact on agricultural science Factorial designs, ANOVA The first industrial era, 1951 – late 1970s Box & Wilson, response surfaces Applications in the chemical & process industries The second industrial era, late 1970s – 1990 Quality improvement initiatives in many companies Taguchi and robust parameter design, process robustness The modern era, beginning circa 1990
Design of Experiments By Dr. Virendra Kumar, ( Ph.D IITD) Email: [email protected] Web site: https://sites.google.com/view/virendra Experimental Design (or DOE) economically maximizes information LECTURE 2
DOE Approaches? Two of the most common approaches to DOE are a full factorial DOE and a fractional factorial DOE. Full factorial DOE: is to determine at what settings of your process inputs will you optimize the values of your process outcomes. Which combination of machine speed, fill speed, and carbonation level will give you the most consistent fill? The experimentation using all possible factor combinations is called a full factorial design. These combinations are called Runs . With three variables, machine speed, fill speed, and carbonation level, how many different unique combinations would you have to test to explore all the possibilities?
where k is the number of variables and 2 is the number of levels, such as (High/Low) or (100 ml per minute/200 ml per minute. What if you aren’t able to run the entire set of combinations of a full factorial? What if you have monetary or time constraints, or too many variables? This is when you might choose to run a fractional factorial, also referred to as a screening DOE, which uses only a fraction of the total runs. That fraction can be one-half, one-quarter, one-eighth, and so forth depending on the number of factors or variables. While there is a formula to calculate the number of runs, suffice it to say you can just calculate your full factorial runs and divide by the fraction that you and your Black Belt or Master Black Belt determine is best for your experiment.
Factorial Designs Example In a factorial experiment, all possible combinations of factor levels can be tested. • The golf experiment: Type of driver Type of ball Walking vs. riding Type of beverage Time of round Weather Type of golf spike Etc, etc, etc
Factorial Designs Example Consider the golf experiment and suppose that only two factors, type of driver and type of ball, are of interest. Figure shows a two-factor factorial experiment for studying the joint effects of these two factors on golf score. Notice that this factorial experiment has both factors at two levels and that all possible combinations of the two factors across their levels are used in the design. Geometrically, the four runs form the corners of a square. This particular type of factorial experiment is called a 2 2 factorial design (two factors, each at two levels). Because I can reasonably expect to play eight rounds of golf to investigate these factors, a reasonable plan would be to play two rounds of golf at each combination of factor levels shown in Figure. An experimental designer would say that we have replicated the design twice. This experimental design would enable the experimenter to investigate the individual effects of each factor (or the main effects) and to determine whether the factors interact. Fig. A two-factor factorial experiment involving type of driver and type of ball
The scores from each round of golf played at the four test combinations are shown at the corners of the square. Notice that there are four rounds of golf that provide information about using the regular-sized driver and four rounds that provide information about using the oversized driver. Factorial Designs Example By finding the average difference in the scores on the right- and left-hand sides of the square (as in Figure b), we have a measure of the effect of switching from the oversized driver to the regular-sized driver, or That is, on average, switching from the oversized to the regular-sized driver increases the score by 3.25 strokes per round. Figure (a) shows the results of performing the factorial experiment .
Factorial Designs Example Similarly, the average difference in the four scores at the top of the square and the four scores at the bottom measures the effect of the type of ball used (see Figure c ): Finally, a measure of the interaction effect between the type of ball and the type of driver can be obtained by subtracting the average scores on the left-to-right diagonal in the square from the average scores on the right-to-left diagonal (see Figure d ), resulting in
Conclusion of Factorial Designs Example The results of this factorial experiment indicate that driver effect is larger than either the ball effect or the interaction. Statistical testing could be used to determine whether any of these effects differ from zero. In fact, it turns out that there is reasonably strong statistical evidence that the driver effect differs from zero and the other two effects do not. Therefore, this experiment indicates that I should always play with the oversized driver. This simple Example showed, factorials make the most efficient use of the experimental data. Notice that this experiment included eight observations, and all eight observations are used to calculate the driver, ball, and interaction effects. No other strategy of experimentation makes such an efficient use of the data. This is an important and useful feature of factorials.
Benefits of DOE Doing a designed experiment as opposed to using a trial-and-error approach has a number of benefits.
Why is DOE important to understand? Choosing Between Alternatives: A common use is planning an experiment to gather data to make a decision between two or more alternatives. Types of comparitive studies Selecting the Key Factors Affecting a Response: Selecting the few that matter from the many possible factors Response Surface Modeling a Process: Some reasons to model a process are below; Hitting a Target: Often we want to "fine tune" a process to consistently hit a target. Maximizing or Minimizing a Response: Optimizing a process output is a common goal. Reducing Variation: Processes that are on target, on the average, may still have too much variability Making a Process Robust: The less a process or product is affected by external conditions, the better it is - this is called "Robustness“ Seeking Multiple Goals: Sometimes we have multiple outputs and we have to compromise to achieve desirable outcomes - DOE can help here Regression Modeling : Regression models are used to fit more precise models
Best practices when thinking about DOE Experiments take planning and proper execution, otherwise the results may be meaningless. Here are a few hints for making sure you properly run your DOE. Your process variables have different impacts on your output. Some are statistically important, and some are just noise. You need to understand which is which. Carefully identify your variables Use existing data and data analysis to try and identify the most logical factors for your experiment. Regression analysis is often a good source of selecting potentially significant factors. Prevent contamination of your experiment During your experiment, you will have your experimental factors as well as other environmental factors around you that you aren’t interested in testing. You will need to control those to reduce the noise and contamination that might occur (which would reduce the value of your DOE).
Unless you’ve done some prior screening of your potential factors, you might want to start your DOE with a screening or fractional factorial design. This will provide information as to potentially significant factors without consuming your whole budget. Once you’ve identified the best potential factors, you can do a full factorial with the reduced number of factors. Best practices when thinking about DOE Use screening experiments to reduce cost and time
What are the steps of DOE? Obtaining good results from a DOE involves these seven steps:
A checklist of practical considerations Important practical considerations in planning and running experiments are
Design of Experiments By Dr. Virendra Kumar, ( Ph.D IITD) Email: [email protected] Web site: https://sites.google.com/view/virendra Experimental Design (or DOE) economically maximizes information LECTURE 3
Basic principles of experimental design Sometimes we add the factorial principle to these three.
Randomization Randomization is the cornerstone underlying the use of statistical methods in experimental design. By randomization we mean that both the allocation of the experimental material and the order in which the individual runs of the experiment are to be performed are randomly determined. Statistical methods require that the observations (or errors) be independently distributed with random variables. Randomization usually makes this assumption valid. By properly randomizing the experiment, we also assist in “averaging out” the effects of extraneous factors that may be present. For example, suppose that the specimens in the hardness experiment are of slightly different thicknesses and that the effectiveness of the quenching medium may be affected by specimen thickness. If all the specimens subjected to the oil quench are thicker than those subjected to the saltwater quench, we may be introducing systematic bias into the experimental results. This bias handicaps one of the quenching media and consequently invalidates our results. Randomly assigning the specimens to the quenching media alleviates this problem.
Computer software programs are widely used to assist experimenters in selecting and constructing experimental designs. These programs often present the runs in the experimental design in random order. This random order is created by using a random number generator. Even with such a computer program, it is still often necessary to assign units of experimental material, operators, gauges or measurement devices, and so forth for use in the experiment. Sometimes experimenters encounter situations where randomization of some aspect of the experiment is difficult.
Replication By replication we mean an independent repeat run of each factor combination. For example: In the metallurgical experiment, replication would consist of treating a specimen by oil quenching and treating a specimen by saltwater quenching. Thus, if five specimens are treated in each quenching medium, we say that five replicates have been obtained. Each of the 10 observations should be run in random order. Replication has two important properties. First, it allows the experimenter to obtain an estimate of the experimental error. This estimate of error becomes a basic unit of measurement for determining whether observed differences in the data are really statistically different. Second, if the sample mean ( ) is used to estimate the true mean response for one of the factor levels in the experiment, replication permits the experimenter to obtain a more precise estimate of this parameter. For example; if is the variance of an individual observation and there are n replicates, the variance of the sample mean is
Blocking Blocking is a design technique used to improve the precision with which comparisons among the factors of interest are made. Often blocking is used to reduce or eliminate the variability transmitted from nuisance factors—that is, factors that may influence the experimental response but in which we are not directly interested. For example , an experiment in a chemical process may require two batches of raw material to make all the required runs. However, there could be differences between the batches due to supplier-to-supplier variability, and if we are not specifically interested in this effect, we would think of the batches of raw material as a nuisance factor. Generally, a block is a set of relatively homogeneous experimental conditions. In the chemical process example, each batch of raw material would form a block, because the variability within a batch would be expected to be smaller than the variability between batches. Typically, as in this example, each level of the nuisance factor becomes a block. Then the experimenter divides the observations from the statistical design into groups that are run in each block.
Guidelines for Designing Experiments To use the statistical approach in designing and analyzing an experiment, it is necessary for everyone involved in the experiment to have a clear idea in advance of exactly what is to be studied, how the data are to be collected, and at least a qualitative understanding of how these data are to be analyzed . STEP 1: Recognition of and statement of the problem. to realize that a problem requiring experimentation exists. to develop a clear and generally accepted statement for problem. It is important to solicit input from all concerned parties: engineering, quality assurance, manufacturing, marketing, management, customer, and operating personnel. It will be helpful if prepare a list of specific problems or questions that are to be addressed by the experiment. Keep always the overall objectives of the experiment in mind. running experiments and each type of experiment will generate its own list of specific questions that need to be addressed.
Guidelines for Designing Experiments There are several broad reasons for running experiments some are follows: STEP 1: Recognition of and statement of the problem Factor screening or characterization: which factors have the most influence on the response(s) of interest. Optimization: find the settings or levels of the important factors that result in desirable values of the response. . Confirmation: to verify that the system operates or behaves in a manner that is consistent with some theory or past experience. Discovery: In discovery experiments, the experimenters are usually trying to determine what happens when we explore new materials, or new factors, or new ranges for factors. Robustness: what conditions do the response variables of interest seriously degrade? Or what conditions would lead to unacceptable variability in the response variables?
STEP 2: Selection of the response variable Guidelines for Designing Experiments In selection of response variable, the experimenter should be certain that this variable really provides useful information about the process under study. Most often, the average or standard deviation (or both) of the measured characteristic will be the response variable. Multiple responses are not unusual. The experimenters must decide how each response will be measured, and address issues such as how will any measurement system be calibrated and how this calibration will be maintained during the experiment. The gauge or measurement system capability (or measurement error) is also an important factor. It is usually critically important to identify issues related to defining the responses of interest and how they are to be measured before conducting the experiment. Sometimes designed experiments are employed to study and improve the performance of measurement systems.
STEP 3: Choice of factors, levels, and range Guidelines for Designing Experiments The experimenter should discover that these factors are either potential design factors or nuisance factors. Further classify; helpful design factors, held-constant factors, and allowed-to-vary factors . The design factors are the factors actually selected for study in the experiment. Held-constant factors are variables that may exert some effect on the response, but for purposes of the present experiment these factors are not of interest, so they will be held at a specific level. An allowed-to-vary factors, the experimental units or the “materials” to which the design factors are applied are usually nonhomogeneous, yet we often ignore this unit-to-unit variability and rely on randomization to balance out any material or experimental unit effect. We often assume that the effects of held-constant factors and allowed-to-vary factors are relatively small. Nuisance factors, may have large effects that must be accounted. Nuisance factors are often classified as controllable , uncontrollable , or noise factors.
STEP 3: Choice of factors, levels, and range Guidelines for Designing Experiments When the objective of the experiment is factor screening or process characterization, it is usually best to keep the number of factor levels low (Generally two levels) . cause-and-effect diagram (fishbone diagram): It is a useful technique for organizing some of the information generated in pre-experimental planning. FIGURE: A cause-and-effect diagram for the etching process experiment FIGUREA: cause-and-effect diagram for the CNC machine experiment
STEP 4: Choice of experimental design. Guidelines for Designing Experiments Choice of design involves consideration of sample size (number of replicates), selection of a suitable run order for the experimental trials, and determination of whether or not blocking or other randomization restrictions are involved. There are several interactive statistical software packages that support this phase of experimental design. The experimenter can enter information about the number of factors, levels, and ranges , and these programs will either present a selection of designs for consideration or recommend a particular design . We usually prefer to see several alternatives instead of relying entirely on a computer recommendation in most cases. Most software packages also provide some diagnostic information about how each design will perform and helps in finding best design alternative. These programs will usually also provide a worksheet (with the orderof the runs randomized) for use in conducting the experiment.
STEP 4: Choice of experimental design. Guidelines for Designing Experiments Design selection also involves thinking about and selecting a tentative empirical model to describe the results. The model is just a quantitative relationship (equation) between the response and the important design factors . In many cases, a low-order polynomial model will be appropriate. A first-order model in two variables is where y is the response, the x’s are the design factors, the x’s are unknown parameters that will be estimated from the data in the experiment, and ε is a random error term that accounts for the experimental error in the system that is being studied. The first-order model is also sometimes called a main effects model. First-order models are used extensively in screening or characterization experiments.
A common extension of the first-order model is to add an interaction term, say STEP 4: Choice of experimental design. Guidelines for Designing Experiments where the cross-product term x 1 x 2 represents the two-factor interaction between the design factors. Interactions between factors is relatively common, the first order model with interaction is widely used. Higher-order interactions can also be included in experiments with more than two factors if necessary. Another widely used model is the second-order model Second-order models are often used in optimization experiments In selecting the design, it is important to keep the experimental objectives in mind.
STEP 5: Performing the experiment Guidelines for Designing Experiments When running the experiment, it is vital to monitor the process carefully to ensure that everything is being done according to plan. Errors in experimental procedure at this stage will usually destroy experimental validity. One of the most common mistakes is that the people conducting the experiment failed to set the variables to the proper levels on some runs. Someone should be assigned to check factor settings before each run. Up-front planning to prevent mistakes like this is crucial to success. It is easy to underestimate the logistical and planning aspects of running a designed experiment in a complex manufacturing or research and development environment. Coleman and Montgomery (1993) suggest that prior to conducting the experiment a few trial runs or pilot runs are often helpful. These runs provide information about consistency of experimental material, a check on the measurement system, a rough idea of experimental error, and a chance to practice the overall experimental technique. This also provides an opportunity to revisit the decisions made in steps 1–4 , if necessary.
STEP 6: Statistical analysis of the data Guidelines for Designing Experiments Statistical methods should be used to analyze the data so that results and conclusions are objective rather than judgmental in nature. There are many excellent software packages designed to assist in data analysis, and many of the programs used in step 4 to select the design provide a seamless, direct interface to the statistical analysis. Often, we find that simple graphical methods play an important role in data analysis and interpretation. It also helps in stablishing results of many experiments in terms of an empirical model. Statical methods only provides guidelines as to the reliability and validity of results. When properly applied, statistical methods do not allow anything to be proved experimentally, but they do allow us to measure the likely. The primary advantage of statistical methods is that they add objectivity to the decision-making process. Statistical techniques coupled with good engineering or process knowledge and common sense will usually lead to sound conclusions.
STEP 7: Conclusions and recommendations Guidelines for Designing Experiments Once the data have been analyzed , the experimenter must draw practical conclusions about the results and recommend a course of action. Graphical methods are often useful in this stage, particularly in presenting the results to others. Follow-up runs and confirmation testing should also be performed to validate the conclusions from the experiment. The experimentation is iterative process. It is usually a major mistake to design a single, large, comprehensive experiment at the start of a study. A successful experiment requires knowledge of the important factors, the ranges over which these factors should be varied, the appropriate number of levels to use, and the proper units of measurement for these variables. Generally, we do not perfectly know the answers to these questions, but we learn about them as we go along. As an experimental program progresses, we often drop some input variables, add others, change the region of exploration for some factors, or add new response variables.
STEP 7: Conclusions and recommendations Guidelines for Designing Experiments Consequently, we usually experiment sequentially, and as a general rule, no more than about 25 percent of the available resources should be invested in the first experiment. This will ensure that sufficient resources are available to perform confirmation runs and ultimately accomplish the final objective of the experiment. Finally, it is important to recognize that all experiments are designed experiments. The important issue is whether they are well designed or not. Good pre-experimental planning will usually lead to a good, successful experiment. Failure to do such planning usually leads to wasted time, money, and other resources and often poor or disappointing results.
Regression analysis It is a method of analysis that enables you to quantify the relationship between two or more variables (X) and (Y) by fitting a line or plane through all the points such that they are evenly distributed about the line or plane.
Design of Experiments By Dr. Virendra Kumar, ( Ph.D IITD) Email: [email protected] Web site: https://sites.google.com/view/virendra Experimental Design (or DOE) economically maximizes information LECTURE 4
Concepts of random variable Random means are unpredictable. Hence, a random variable means a variable whose future value is unpredictable despite knowing its past performance. A random variable is a variable whose possible values are the numerical outcomes of a random experiment . Therefore, it is a function which associates a unique numerical value with every outcome of an experiment. Further, its value varies with every trial of the experiment. For example, when you toss an unbiased coin, the outcome can be a head or a tail. Even if you keep tossing the coin indefinitely, the outcomes are either of the two. Also, you would never know the outcome in advance. Random Experiment: A random experiment is a process which leads to an uncertain outcome. Usually, it is assumed that the experiment is repeated indefinitely under homogeneous conditions. While the result of a random experiment is not unique, it is one of the possible outcomes.
In a random experiment, the outcomes are not always numerical. But we need numbers as outcomes for calculations. Therefore, we define a random variable as a function which associates a unique numerical value with every outcome of a random experiment. For example, in the case of the tossing of an unbiased coin, if there are 3 trials, then the number of times a ‘head’ appears can be a random variable. This has values 0, 1, 2, or 3 since, in 3 trials, you can get a minimum of 0 heads and a maximum of 3 heads. Concepts of random variable
Classify of random variables are based on their probability distribution . A random variable either has an associated probability distribution (Discrete Random Variable), or a probability density function (Continuous Random Variable). Therefore, we have two types of random variables – Discrete and Continuous. Types of Random variables Discrete Random Variables: Discrete random variables take on only a countable number of distinct values. Usually, these variables are counts (not necessarily though). If a random variable can take only a finite number of distinct values, then it is discrete. Number of members in a family, number of defective light bulbs in a box of 10 bulbs, etc. are some examples of discrete random variables. The probability distribution of these variables is a list of probabilities associated with each of its possible values. It is also called the probability function or the probability mass function.
Types of Random variables Example of Discrete Random Variables You toss a coin 10 times. The random variable X is the number of times you get a ‘tail’. X can only take values 0, 1, 2, … , 10. Therefore, X is a discrete random variable. Let’s look at the probability of getting 8 tails. p8 (probability of getting 8 tails) falls in the range 0 to 1. Also, the sum of probabilities for all possible values of tails p0 + p1 + … p10 = 1. If a random variable (X) takes ‘k’ different values, with the probability that X = x i is defined as P(X = x i ) =p i , then it must satisfy the following: 0 < p i < 1 (for each ‘ i ’) p 1 + p 2 + p 3 + … + p k = 1
Types of Random variables Continuous Random Variables: Continuous random variables take up an infinite number of possible values which are usually in a given range. Typically, these are measurements like weight, height, the time needed to finish a task, etc. For example, the life of an individual in a community is a continuous random variable. Let’s say that the average lifespan of an individual in a community is 110 years. Therefore, a person can die immediately on birth (where life = 0 years) or after he attains an age of 110 years. Within this range, he can die at any age. Therefore, the variable ‘Age’ can take any value between 0 and 110. Hence, continuous random variables do not have specific values since the number of values is infinite. Also, the probability at a specific value is almost zero. However, there is always a non-negative probability that a certain outcome will lie within the interval between two values.
Probability Probability means possibility. Probability is a measure of the likelihood of an event to occur. Many events cannot be predicted with total certainty. We can predict only the chance of an event to occur i.e. how likely they are to happen, using it. Probability can range in from 0 to 1, where 0 means the event to be an impossible one and 1 indicates a certain event. The probability of all the events in a sample space adds up to 1. The best example for understanding probability is flipping a coin: There are two possible outcomes—heads or tails. What’s the probability of the coin landing on Heads? We can find out using the equation P(H) = ?, You might intuitively know that the likelihood is half/half, or 50%. Probability=
Probability Density Function (PDFs)/ Density of a continuous random variable Height (%) Expected amount of rain (inch) tomorrow 0 1 2 3 4 5 0.5 P(Y=2)=0.5 ? What is a probability random variable (Y) is exactly =2 inch. Not 2.01/2.0001 or 1.99/1.9999 Not even we have tool which can measure exactly 2 inch Probability to have almost 2 inch is <0.1 (tolerance)) Probability P(1.9<Y>2.1 Area under the curve is Probability Density function F(x) <0.1=
It is a function whose value at any given sample (or point) in the sample space (the set of possible values taken by the random variable) can be interpreted as providing a relative likelihood that the value of the random variable would be close to that sample. In other words, while the absolute likelihood for a continuous random variable to take on any particular value is 0 (since there is an infinite set of possible values to begin with), the value of the PDF at two different samples can be used to infer, in any particular draw of the random variable, how much more likely it is that the random variable would be close to one sample compared to the other sample. Probability Density Function (PDFs)/ Density of a continuous random variable
Design of Experiments By Dr. Virendra Kumar, ( Ph.D IITD) Email: [email protected] Web site: https://sites.google.com/view/virendra Experimental Design (or DOE) economically maximizes information LECTURE 5
Cumulative Distribution Function(CDF) It is used to calculate the area under the curve to the left from a point to interest. It is used to evaluate the accumulated probability. For continuous probability distributions, the probability=area under the curve. Total area=1 The Probability distribution function (PDF) is f(x) which describes the shape of the distribution (uniform, exponential or normal distribution). Let Uniform distribution F(x)= F(x) a b PDF F(x) a b Area=Base × Hight = (x-a) × F(x) x
Cumulative Distribution Function (CDF) It is called CDF Let exponential distribution λ Value decreases with time PDF=f(x)= = Shape of graph CDF=Area λ x CDF
Cumulative Distribution Function (CDF) λ =
Cumulative Distribution Function (CDF) a b Area=P(a<x<b)=P(x<b) - P(x<a) Probability of x<a will be Probability of x<b will be CDF= P(a<x<b)= [1- Area left to b Area left to a Remember that P( x ≤a≤b =P(a<x<b) for continuous probability distribution P(x=a)=0 because x=a is only line which has only hight but no width.
Sampling and Sampling Distributions A sampling distribution is a statistic that is arrived out through repeated sampling from a larger population It describes a range of possible outcomes that of a statistic, such as the mean or mode of some variable, as it truly exists a population. The majority of data analyzed by researchers are actually drawn from samples, and not populations. In statistics, a population is the entire pool from which a statistical sample is drawn. A population may refer to an entire group of people, objects, events, hospital visits, or measurements.
Understanding Sampling Distribution A lot of data drawn and used by academicians, statisticians, researchers, marketers, analysts, etc. are actually samples, not populations. A sample is a subset of a population. For example , a medical researcher that wanted to compare the average weight of all babies born in Uttar Pradesh from 1995 to 2005 to those born in Delhi within the same time period cannot within a reasonable amount of time draw the data for the entire population of over a million childbirths that occurred over the ten-year time frame. He will instead only use the weight of, say, 100 babies, in each continent to make a conclusion. The weight of 200 babies used is the sample and the average weight calculated is the sample mean. Now suppose that instead of taking just one sample of 100 newborn weights from each continent, the medical researcher takes repeated random samples from the general population, and computes the sample mean for each sample group.
The average weight computed for each sample set is the sampling distribution of the mean. Not just the mean can be calculated from a sample. Other statistics, such as the standard deviation, variance, proportion, and range can be calculated from sample data. The standard deviation and variance measure the variability of the sampling distribution. The number of observations in a population, the number of observations in a sample and the procedure used to draw the sample sets determine the variability of a sampling distribution. The standard deviation of a sampling distribution is called the standard error. While the mean of a sampling distribution is equal to the mean of the population, the standard error depends on the standard deviation of the population, the size of the population and the size of the sample. Knowing how spread apart the mean of each of the sample sets are from each other and from the population mean will give an indication of how close the sample mean is to the population mean. The standard error of the sampling distribution decreases as the sample size increases. Understanding Sampling Distribution
For example, suppose that y 1 , y 2 , . . . , y n represents a sample. and the sample variance Then the sample mean These quantities are measures of the central tendency and dispersion of the sample , respectively. Sometimes , called the sample standard deviation, is used as a measure of dispersion. Experimenters often prefer to use the standard deviation to measure dispersion because its units are the same as those for the variable of interest y.
Design of Experiments By Dr. Virendra Kumar, ( Ph.D IITD) Email: [email protected] Web site: https://sites.google.com/view/virendra Experimental Design (or DOE) economically maximizes information LECTURE 6
Measures of Central Tendency: Mean, Median, and Mode A measure of central tendency is a summary statistic that represents the center point or typical value of a dataset . These measures indicate where most values in a distribution fall and are also referred to as the central location of a distribution . You can think of it as the tendency of data to cluster around a middle value. In statistics, the three most common measures of central tendency are the mean, median, and mode. Each of these measures calculates the location of the central point using a different method. Choosing the best measure of central tendency depends on the type of data you have.
In each distribution, look for the region where the most common values fall. Even though the shapes and type of data are different, you can find that central location. That’s the area in the distribution where the most common values are located. The three distributions below represent different data conditions.
Mean The mean is the arithmetic average, and it is probably the measure of central tendency that you are most familiar. Calculating the mean is very simple. You just add up all of the values and divide by the number of observations in your dataset. The calculation of the mean incorporates all values in the data. If you change any value, the mean changes. However, the mean doesn’t always locate the center of the data accurately . Observe the histograms where I showed the mean in the distributions. Extreme values in an extended tail pull the mean away from the center .
Median The median is the middle value. It is the value that splits the dataset in half. To find the median, order your data from smallest to largest , and then find the data point that has an equal amount of values above it and below it. The method for locating the median varies slightly depending on whether your dataset has an even or odd number of values. When there is an even number of values, you The average of 27 and 29 is 28. Consequently, 28 is the median of this dataset. count in to the two innermost values and then take the average. In the examples, I used whole numbers for simplicity, but you can have decimal places. In the dataset with the odd number of observations, notice how the number 12 has six values above it and six below it. Therefore, 12 is the median of this dataset.
Outliners and skewed data have a smaller effect on the median. For example: we have the Median dataset below and find that the median is 46. However, we discover data entry errors and need to change four values, which are shaded in the Median Fixed dataset. We’ll make them all significantly higher so that we now have a skewed distribution with large outliers. As you can see, the median doesn’t change at all. It is still 46. Unlike the mean, the median value doesn’t depend on all the values in the dataset. Consequently, when some of the values are more extreme, the effect on the median is smaller. Of course, with other types of changes, the median can change. When you have a skewed distribution, the median is a better measure of central tendency than the mean.
Comparing the mean and median In a symmetric distribution, the mean and median both find the center accurately. They are approximately equal. In a skewed distribution, the outliers in the tail pull the mean away from the center towards the longer tail. For this example, the mean and median differ by over 9000, and the median better represents the central tendency for the distribution.
Mode The mode is the value that occurs the most frequently in your data set. On a bar chart, the mode is the highest bar. If the data have multiple values that are tied for occurring the most frequently, you have a multimodal distribution. If no value repeats, the data do not have a mode. In the dataset, the value 5 occurs most frequently, which makes it the mode. These data might represent a 5-point Likert scale. Typically, you use the mode with categorical, ordinal, and discrete data. In fact, the mode is the only measure of central tendency that you can use with categorical data —such as the most preferred flavor of ice cream. However, with categorical data, there isn’t a central value because you can’t order the groups. With ordinal and discrete data, the mode can be a value that is not in the center . Again, the mode represents the most common value.
When should you use the mean, median or mode?
Confidence Level: What is it? When a poll is reported in the media, a confidence level is often included in the results. For example , a survey might report a 95 percent confidence level. But what exactly does this mean? At first glance you might think that it means it’s 95 percent accurate. That’s close to the truth, but like many things in statistics, it’s actually a little more defined. It is often expressed as a % whereby a population mean lies between an upper and lower interval. Due to natural sampling variability, the sample mean ( center of the CI) will vary from sample to sample.
As the sample size increases, the range of interval values will narrow, meaning that you know that mean with much more accuracy compared with a smaller sample Accordingly, there is a 5% chance that the population mean lies outside of the upper and lower confidence interval (as illustrated by the 2.5% of outliers on either side of the 1.96 z-scores).
Why do researchers use confidence intervals? It is more or less impossible to study every single person in a population so researchers select a sample or sub-group of the population. This means that the researcher can only estimate the parameters (i.e. characteristics) of a population, the estimated range being calculated from a given set of sample data. Therefore, a confidence interval is simply a way to measure how well your sample represents the population you are studying. The probability that the confidence interval includes the true mean value within a population is called the confidence level of the CI. You can calculate a CI for any confidence level you like, but the most commonly used value is 95%. A 95% means you can be 95% certain.
Factors that Affect Confidence Intervals (CI) Population size: this does not usually affect the CI but can be a factor if you are working with small and known groups of people. Sample Size: the smaller your sample, the less likely it is you can be confident the results reflect the true population parameter. Percentage: Extreme answers come with better accuracy. For example, if 99 percent of voters are for gay marriage, the chances of error are small. However, if 49.9 percent of voters are “for” and 50.1 percent are “against” then the chances of error are bigger.
0% and 100% Confidence Level A 0% confidence level means you have no faith at all that if you repeated the survey that you would get the same results. A 100% confidence level means there is no doubt at all that if you repeated the survey you would get the same results. In reality, you would never publish the results from a survey where you had no confidence at all that your statistics were accurate (you would probably repeat the survey with better techniques). A 100% confidence level doesn’t exist in statistics, unless you surveyed an entire population — and even then you probably couldn’t be 100 percent sure that your survey wasn’t open to some kind or error or bias. The confidence coefficient is the confidence level stated as a proportion, rather than as a percentage. For example, if you had a confidence level of 99%, the confidence coefficient would be 99.
How do I calculate a confidence interval? To calculate the confidence interval, start by computing the mean and standard error of the sample. Remember, you must calculate an upper and low score for the confidence interval using the z-score for the chosen confidence level (see table below). Confidence Interval Formula Where: x is the mean z is the chosen Z-value (1.96 for 95%) s is the standard error n is the sample size
An Example (mean) = 86 Z = 1.960 (from the table above for 95%) s (standard error) = 6.2 n (sample size) = 46 Lower Value: 86 - 1.960 × 6.2 √46 = 86 - 1.79 = 84.21 Upper Value: 86 + 1.960 × 6.2 √46 = 86 + 1.79 = 87.79 So the population mean is likely to be between 84.21 and 87.79