PPT will give a basic knowledge of ANOVA in R and its codes.
One way and two way both are included.
Size: 705.71 KB
Language: en
Added: Jan 06, 2021
Slides: 10 pages
Slide Content
ANOVA Analysis of Variance
What is ANOVA ? Analysis of Variance (ANOVA) is a statistical technique, commonly used to studying differences between two or more group means. ANOVA test is centered on the different sources of variation in a typical variable. ANOVA in R primarily provides evidence of the existence of the mean equality between the groups. This statistical method is an extension of the t-test. It is used in a situation where the factor variable has more than one group.
One-way ANOVA There are many situations where you need to compare the mean between multiple groups. For instance, the marketing department wants to know if three teams have the same sales performance. Team: 3 level factor: A, B, and C Sale: A measure of performance The ANOVA test can tell if the three groups have similar performances. To clarify if the data comes from the same population, you can perform a one-way analysis of variance
Hypothesis in one-way ANOVA test H0: The means between groups are identical H3: At least, the mean of one group is different In other words, the H0 hypothesis implies that there is not enough evidence to prove the mean of the group (factor) are different from another. This test is similar to the t-test, although ANOVA test is recommended in situation with more than 2 groups .
Loading Data in R crop.data <- read.csv("path/to/your/file/crop.data.csv", header = TRUE, colClasses = c("factor", "factor", "factor", "numeric ")) Specifying within the command whether each of the variables should be quantitative (“numeric”) or categorical (“factor ”). Summary( crop.data ) read.csv( file.choose (), colClasses = c("factor", "factor", "factor", "numeric") )
Applying in R one.way <- aov (yield ~ fertilizer, data = crop.data ) Summary( one.way )
The model summary first lists the independent variables being tested in the model (in this case we have only one, ‘fertilizer’) and the model residuals (‘Residual’). All of the variation that is not explained by the independent variables is called residual variance. The rest of the values in the output table describe the independent variable and the residuals: The Df column displays the degrees of freedom for the independent variable (the number of levels in the variable minus 1), and the degrees of freedom for the residuals (the total number of observations minus one and minus the number of levels in the independent variables). The Sum Sq column displays the sum of squares (a.k.a. the total variation between the group means and the overall mean). The Mean Sq column is the mean of the sum of squares, calculated by dividing the sum of squares by the degrees of freedom for each parameter. The F-value column is the test statistic from the F test. This is the mean square of each independent variable divided by the mean square of the residuals. The larger the F value, the more likely it is that the variation caused by the independent variable is real and not due to chance. The Pr (>F) column is the p-value of the F-statistic. This shows how likely it is that the F-value calculated from the test would have occurred if the null hypothesis of no difference among group means were true. The p-value of the fertilizer variable is low (p < 0.001), so it appears that the type of fertilizer used has a real impact on the final crop yield
Two way ANOVA In the two-way ANOVA example, we are modeling crop yield as a function of type of fertilizer and planting density . two.way <- aov (yield ~ fertilizer + density, data = crop.data summary( two.way ) Adding planting density to the model seems to have made the model better: it reduced the residual variance (the residual sum of squares went from 35.89 to 30.765), and both planting density and fertilizer are statistically significant (p-values < 0.001)
Adding interactions between variables Sometimes you have reason to think that two of your independent variables have an interaction effect rather than an additive effect. For example, in our crop yield experiment, it is possible that planting density affects the plants’ ability to take up fertilizer. This might influence the effect of fertilizer type in a way that isn’t accounted for in the two-way model. To test whether two variables have an interaction effect in ANOVA, simply use an asterisk instead of a plus-sign in the model