2030 Goal Planning & Vision Board - Solis & Myers Inc. October 2030 Republic of Iraq Ministry of Higher Education & Scientific Research University of Kerbala College of Engineering Civil Engineering Department CHI SQUARE DISTRIBUTION Prepared by Zaid Aqeel Alalawi Supervisor Asst. Prof. Dr. Riyadh Jasim
Definition A chi-square test is a statistical test that is used to compare observed and expected results. The goal of this test is to identify whether a disparity between actual and predicted data is due to chance or to a link between the variables under consideration. As a result, the chi-square test is an ideal choice for aiding in our understanding and interpretation of the connection between our two categorical variables. A chi-square test or comparable nonparametric test is required to test a hypothesis regarding the distribution of a categorical variable. Categorical variables, which indicate categories such as animals or countries, can be nominal or ordinal. They cannot have a normal distribution since they can only have a few particular values.
Characteristics of chi-square The chi-square test has several key characteristics: It is non-parametric, meaning it does not assume a specific probability distribution for the data. It is sensitive to sample size; larger samples can result in more significant outcomes. It works with categorical data and is used for hypothesis testing and analysing associations. The test output provides a p-value, which indicates the level of significance for the observed relationship between variables. It can be used with different levels of significance (e.g., 0.05 or 0.01) to determine statistical significance
CHI SQUARE DISTRIBUTION If X 1 , X 2 .... X n are independent normal variates and each is distributed normally with mean zero and standard deviation unity, then (X 1 )^2 +(X 2) ^2+......+( X n )^2= is distributed as chi square (c2)with n degrees of freedom ( d.f. ) where n is large. The chi square curve for d.f. N = 1.5 and 9 is as follows. If degree of freedom > 2: Distribution is bell shaped. If degree of freedom = 1 : Distribution is L shaped with maximum ordinate at zero. If degree of freedom <2 ( 0 <) : Distribution L shaped with infinite ordinate at the origin.
General formula for chi square test Where, c = Degrees of freedom O = Observed Value E = Expected Value
Chi-square distribution table
Properties of Chi-square distribution The Mean of χ² distribution is equal to the number of degrees of freedom (n) The variance is equal to two times the number of degrees of freedom. i.e. the variance of equal to 2n The median of χ² distribution divides, the area of the curve into two equal parts. The mode of χ² distribution is equal to (n-2) Since Chi-square values always positive, the Chi-square curve is always positively skewed. Since Chi-square values increase with the increase in the degrees of freedom, there is a new Chi-square distribution with every increase in the number of degrees of freedom. The lowest value of Chi-square is zero and the highest value is infinity.
Conditions for applying χ² test: following conditions should be satisfied before χ2 test can be applied. The data must be in the form of frequencies The frequency data must have a precise numerical value and must be organized into categories or groups. Observations recorded and used are collected on a random basis. All the items in the sample must be independent. No group should contain very few items, say less than 10. In case where the frequencies are less than 10, regrouping is done by combining the frequencies of adjoining groups so that the new frequencies become greater than 10. (Some statisticians take this number as 5, but 10 is regarded as better by most of the statisticians.) The overall number of items must also be reasonably large. It should normally be at least 50.
APPLICATIONS OF A CHI SQUARE TEST This test can be used in Goodness of fit of distributions test of independence of attributes test of homogeneity.
Chi-Square Test for Discrete Data: For discrete data, the chi-square test assesses the difference between observed and expected frequencies within one categorical variable. Where, χ2 = chi-square statistic. Oi = observed frequency for each category. Ei = expected frequency for each category
Question -1 Fabric Defects Cotton 45 Polyester 30 Silk 25 A textile company is testing the quality of three different types of fabrics: cotton, polyester, and silk. They collected data on the number of defects found in a sample of each fabric type. The observed frequencies are as follows: The company believes that the expected distribution of defects should be uniform, with each fabric type having an equal chance of defects. Use the chi-square test to determine whether the observed frequencies differ significantly from the expected frequencies at a 5% significance level.
Solution: Step 1: State the hypothesis. Null hypothesis (H0): The observed frequencies do not differ significantly from the expected frequencies. Alternative hypothesis (H1): The observed frequencies differ significantly from the expected frequencies. Step 2: Calculate the expected frequencies. Since the expected distribution is uniform, each fabric type should ideally have 100/3 = 33.33 defects. Step 3: Calculate the chi-square statistic. Using the formula: χ2 = Σ ((O - E)2 / E) where, O = Observed frequency E = Expected frequency
For the given data, the calculations are as follows: χ2 = 4.08 + 1.33 + 2.08 = 7.49 Fabric Defects (O - E)2 χ2 = Σ ((O - E)2 / E) Cotton 45 136.18 4.08 Polyester 40 44.4 1.33 Silk 25 69.38 2.08 Total 7.49
Step 4: Find the degrees of freedom ( df ). Since there are 3 fabric types, df = n - 1 = 3 - 1 = 2. Step 5: Find the critical value. With a 5% significance level and 2 degrees of freedom, the critical value of chi-square from the table is approximately 5.99. Step 6: Compare the chi-square statistic with the critical value. Since 7.49 > 5.99, we reject the null hypothesis. Step 7: conclusion. Based on the chi-square test, there is significant evidence to suggest that the observed frequencies of defects in the fabrics differ significantly from the expected frequencies.
Chi-Square Test for Contingency Tables (Independence Test): For contingency tables (cross-tabulation) or when dealing with associations between two categorical variables : Where, χ2 = chi-square statistic. Oij = observed frequency in the cell of the contingency table. Eij = expected frequency in the cell assuming no association or independence between the variables .
Question-2 A textile company produces and sells three types of fabric (Cotton, Silk, and Wool) in four different colours (Red, Blue, Green, and Yellow). The company is interested in determining if there is a significant association between the choice of fabric type and the colour of the fabric purchased by the customers. They collected data from 600 customers. The observed frequencies are as follows: Red Blue Green Yellow Total Cotton 100 90 70 40 300 Silk 60 80 50 30 220 Wool 40 60 30 20 150 Total 200 230 150 90 600
solution First condition: Making the hypothesis Null Hypothesis (H0) : There is no association between the choice of fabric type and the colour of the fabric is purchased by the customers. Alternative Hypothesis (H1) : There is an association between the choice of fabric type and the colour of the fabric is purchased by the customers. Now creating Contingency Table with Observed Frequencies: Material Red Blue Green Yellow Total Cotton 100 90 70 40 300 Silk 60 80 50 30 220 Wool 40 60 30 20 150 Total 200 230 150 90 600
Degrees of freedom ( df ) = (R - 1) * (C - 1), where R is the number of rows and C is the number of columns in the contingency table. In this case, df = (3 - 1) * (4 - 1) = 6. Find the Critical Value: To determine if the chi-squared statistic is statistically significant, we need to compare it to a critical value from the chi-squared distribution table. Assuming a significance level (alpha) of 0.05. For df = 6, the critical value is approximately 12.591. The calculated chi-squared statistic (15.76) is greater than the critical value (12.591) for a significance level of 0.05. Therefore, we reject the null hypothesis (H0). Conclusion: There is a significant association between the choice of fabric type and the color of the fabric purchased. The data suggests that the choice of fabric type is not independent of the color of the fabric purchased.
Question-4 Is there a significant association between the choice of manufacturing process (New Textile Manufacturing Method or Conventional Textile Manufacturing Method) and the quality of textile products (Favourable Quality or Non-Favourable Quality) in a sample of 250 textile products? Favourable Quality Non-Favourable Quality Total New Manufacturing Method 140 30 170 Conventional Manufacturing Method 60 20 80 Total 200 50 250 Observed Data :
solution Favourable Quality Non-Favourable Quality Total New Manufacturing Method (170*200)/250 (170*50)/250 170 Conventional Manufacturing Method (80*200)/250 (80*50)/250 80 Total 200 50 250 First condition: Making the hypothesis Null Hypothesis (H0): There is no significant association between the choice of manufacturing process and the quality of textile products. Alternative Hypothesis (H1) : There is an association between the choice of manufacturing process and the quality of textile products. Create a contingency table with the expected values (assuming no association between treatment and outcome).
Step 2: Calculate the expected values. Expected value for "Favourable" in the "New Manufacturing Method " group: (170 * 200) / 250 = 136 Expected value for "Non-Favourable" in the "New Manufacturing Method " group: (170 * 50) / 250 = 34 Expected value for "Favourable" in the "Conventional Manufacturing Method " group: (80 * 200) / 250 = 64 Expected value for "Non-Favourable" in the "Conventional Manufacturing Method " group: (80 * 50) / 250 = 16 Step 3: Calculate the chi-square statistic. χ² = Σ [(O - E)² / E] Where O is the observed value, and E is the expected value. χ² = [(140 - 136)² / 136] + [(30 - 34)² / 34] + [(60 - 64)² / 64] + [(20 - 16)² / 16] Now, calculate χ²: χ² = (16/136) + (16/34) + (16/64) + (16/16) = 0.1176 + 0.4706 + 0.25 + 1 = 1.8382
Observed value(O) Expected value(E) O-E (O - E)² [(O - E)² / E] 140 136 4 16 0.12 60 64 -4 16 0.25 30 34 -4 16 0.45 20 16 4 16 1.00 Total 1.83 Step 4: Determine the degrees of freedom. Df = (number of rows - 1) * (number of columns - 1) = (2 - 1) * (2 - 1) = 1 Step 5: Look up the critical chi-square value in a chi-square table with 1 degree of freedom and desired significance level (e.g., 0.05). Here the chi square table value=3.84 Step 6: Compare the calculated chi-square value (1.8382) with the critical chi-square value. So here the chi square table is greater than calculated value. Conclusion : (H0) is passed and accepted. So there is no significant association between the choice of manufacturing process (New Textile Manufacturing Method or Conventional Textile Manufacturing Method) and the quality of textile products ( Favourable Quality or Non- Favourable Quality
Limitation of a chi square test The data is from a random sample. This test applied in a four fould table, will not give a reliable result with one degree of freedom if the expected value in any cell is less than 5. In contingency tables larger than 2*2, Yate's correction cannot be applied. Interpret this test with caution if sample total or total of values in all the cells is less than 50. This test tells the presence or absence of an association between the events but doesn't measure the strength of association. This test doesn't indicate the cause and effect, it only tells the probability of occurrence of association by chance. the test is to be applied only when the individual observations of sample are independent which means that the occurrence of one individual observation (event) has no effect upon the occurrence of any other observation (event) in the sample under consideration.