Chi-Square Test and Relationship Between Variables.pptx

abelyegon7 10 views 25 slides Nov 02, 2025
Slide 1
Slide 1 of 25
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25

About This Presentation

Presentation Description: Chi-Square Test and Relationship Between Variables
This presentation covers the Chi-Square Test (\chi^2), explaining how to use it to determine the relationship or association between two categorical variables.
Would you like me to find some key slides or a general outline...


Slide Content

Chi-Square Test and Relationship Between Variables Lecturer: Nyamu Waweru

Punnett Square - Inheritance Example: If two carriers (both with sickle cell trait, genotype AS) have a child A  (from Father) S  (from Father) A  (from Mother) AA AS S  (from Mother) AS SS Probability of a child with Sickle Cell Disease (SS):  1/4 or 25% Probability of a child with Sickle Cell Trait (AS):  1/2 or 50% Probability of a normal child (AA):  1/4 or 25%

Sickle Cell Disease, Inheritance, and Statistical Analysis Sickle Cell Disease (SCD) Inheritance Pattern 1. Genetic Basis: Cause:  A point mutation in the  β- globin gene  (on chromosome 11 ) Effect:  This mutation leads to a single amino acid substitution (glutamic acid → valine ) in the hemoglobin protein This produces  Hemoglobin S ( HbS )  instead of the normal Hemoglobin A ( HbA ) Consequence:  Under low oxygen conditions, HbS polymerizes, causing red blood cells to deform into a characteristic "sickle" shape This leads to hemolytic anemia, vaso -occlusive crises, and organ damage 2. Inheritance Pattern: Autosomal Recessive Alleles Involved: A:  Normal allele (codes for HbA ) S:  Mutant sickle allele (codes for HbS ) Genotypes and Phenotypes: AA (Homozygous Dominant):  Normal phenotype. No sickle cell disease or trait AS (Heterozygous):   Sickle Cell Trait.  Individuals are generally healthy carriers They produce both HbA and HbS . They are resistant to severe malaria ( Plasmodium falciparum ), which explains the high allele frequency in malaria-endemic regions (an example of balanced polymorphism ) SS (Homozygous Recessive):   SCD Individuals have the full-blown disease

Chi-square test statistics = X 2 The Chi-square (χ²) test is a non-parametric statistical test used to determine whether there is a significant association between two categorical variables In health and genetics, it helps determine whether observed outcomes differ from expected outcomes due to chance or because of a true relationship between variables Chi is a letter of the Greek alphabet; the symbol is χ and it's pronounced like KYE, the sound in "kite." The chi square test uses the statistic chi squared, written χ2 The "test" that uses this statistic helps an investigator determine whether an observed set of results matches an expected outcome In some types of research (genetics provides many examples) there may be a theoretical basis for expecting a particular result- not a guess, but a predicted outcome based on a sound theoretical foundation

Purpose of Chi-Square Test in Health Research The Chi-square test is a statistical hypothesis test used to determine if there is a  significant association (relationship)  between two categorical variables in a sample It assesses whether observed differences in frequencies are due to chance or reflect a real relationship in the population To test independence between two categorical variables Is Sickle Cell disease status associated with gender? To test goodness of fit —how well observed data fit expected ratios Do Sickle Cell inheritance patterns follow Mendelian ratios? To test association between exposure and outcome Is malaria protection related to presence of Sickle Cell trait?

Chance Factor in a Trial/ experiment Chance (tossing a coin) We need to consider for a moment what might cause the observed outcome to differ from the expected outcome You know what all the possible outcomes are (only two: head and tail), and you know what the probability of each is However, in any single trial (toss) you can't say what the outcome will be Why… because of the element of chance, which is a random factor. Saying that chance is a random factor just means that you can't control it But it's there every time you flip that coin Chance is a factor that must always be considered; it's often present but not recognized Since it may affect experimental work, it must be taken into account when results are interpreted

Inherent and Error Factors in a Trial/experiment Inherent What else might cause an observed outcome to differ from the expected? Suppose that at your last physical exam, your doctor told you that your resting pulse rate was 60 (per minute) and that that's good, that's normal for you When you measure it yourself later you find it's 58 at one moment, 63 ten minutes later, 57 ten minutes later Why isn't it the same every time, and why isn't it 60 every time? When measurements involving living organisms are under study, there will always be the element of inherent variability Your resting pulse rate may vary a bit, but it's consistently about 60, and those slightly different values are still normal Error In addition to these factors, there's the element of error. You've done enough lab work already to realize that people introduce error into experimental work in performing steps of procedures and in making measurements Instruments, tools, implements themselves may have built-in limitations that contribute to error Putting all of these factors together, it's not hard to see how an observed result may differ a bit from an expected result But these small departures from expectation are not significant departures That is, we don't regard the small differences observed as being important

Key Components of Chi-Square :  Null Hypothesis (H₀):   There is  no association  between the two variables They are independent Alternative Hypothesis (H₁):   There  is an association  between the two variables Categorical Variables:   Data is organized in a contingency table (e.g., 2x2 table) Variables that take categories instead of numerical values (e.g., genotype type, disease presence, gender) Observed Frequencies (O):   The actual counts collected from the data The actual count found in the sample or experiment Expected Frequencies (E):   The counts we would expect to see if the null hypothesis were true The count that would be expected if there were no association (based on theory or probability) Formula for expected frequency in a cell: E = (Row Total × Column Total) / Grand Total p-value Probability of obtaining the observed results by chance. If p < 0.05 , the result is statistically significant Degrees of Freedom ( df ) Calculated as (rows − 1) × (columns − 1)

Steps to Perform the Test: State the Hypotheses Null (H₀): No association between variables Alternative (H₁): There is an association between variables Collect and Tabulate Data ( Construct a Contingency Table  with observed frequencies ) Determine the Degrees of Freedom ( df ):   df = (number of rows - 1) * (number of columns - 1 ) Compare χ² value with critical value ( Find the p-value  using the χ² value and df (from a χ² distribution table or software ) If χ² calculated > χ² critical → Reject H₀ If χ² calculated < χ² critical → Fail to reject H₀ Make a Decision : Interpret Results If  p-value ≤ significance level (α, usually 0.05) ,  reject H₀ . The association is statistically significant If  p-value > α ,  fail to reject H₀ . There is not enough evidence to conclude a significant association

Calculation of chi square The formula for calculating χ2 is: χ2 = Σ [(o - e) 2 / e], where "o" is observed and "e" is expected The sigma symbol, Σ, means "sum of what follows." For each category (type or group such as "heads") of outcome that is possible, we would have an expected value and an observed value (for the number of heads and the number of tails, e.g.) For each one of those categories (outcomes) we would calculate the quantity (o - e) 2 /e and then add them for all the categories, which was two in the coin toss example (head category and tail category) It is convenient to organize the data in table form, as shown below for two coin toss experiments

Example: Sickle Cell Disease Inheritance Pattern Sickle Cell Disease (SCD) follows Mendelian inheritance involving two alleles: A = Normal hemoglobin gene S = Sickle cell gene When two carriers (AS × AS) have children, the expected Mendelian ratio of genotypes is: Genotype Expected Ratio Expected % AA 1 25% AS 2 50% SS 1 25% Total 4 100%

Example: Sickle Cell Disease Inheritance Pattern Observed Data Example Let’s say in a hospital genetics clinic, 160 children of carrier parents (AS × AS) were tested: Genotype Observed (O) Expected % Expected (E) = 160 × % AA 48 25% 40 AS 80 50% 80 SS 32 25% 40 Total 160 100% 160

Example: Sickle Cell Disease Inheritance Pattern

Example #2 Scenario: A researcher wants to know if the prevalence of Sickle Cell Trait (a categorical variable: Yes/No) is associated with a history of severe malaria in childhood (another categorical variable: Yes/No) in a population 1. Hypotheses: H₀:  There is no association between sickle cell trait status and history of severe malaria. (They are independent) H₁:  There is an association between sickle cell trait status and history of severe malaria 2. Data Collection (Observed Frequencies): A sample of 400 individuals from a malaria-endemic region is surveyed History of Severe Malaria: YES History of Severe Malaria: NO Row Total Sickle Cell Trait: YES 30 170 200 Sickle Cell Trait: NO 80 120 200 Column Total 110 290 400

Example #2 cont ’’ 3. Calculate Expected Frequencies (E): Expected for (Trait=YES, Malaria=YES):   ( 200 * 110) / 400 =  55 Expected for (Trait=YES, Malaria=NO):   ( 200 * 290) / 400 =  145 Expected for (Trait=NO, Malaria=YES):   ( 200 * 110) / 400 =  55 Expected for (Trait=NO, Malaria=NO):   ( 200 * 290) / 400 =  145

Example #2 cont ’’

Example #2 cont ’’ 6. Find the p-value and Interpret: For df =1, a χ² value of  31.34  is highly significant (p-value < 0.001). Interpretation:  Since the p-value is much less than 0.05, we  reject the null hypothesis . There is a statistically significant association between having sickle cell trait and a history of severe malaria. 7. Epidemiological Conclusion: Looking at the observed data, individuals with the sickle cell trait (AS) were  less likely  to have a history of severe malaria (30/200 = 15%) compared to those without the trait (80/200 = 40 %). The chi-square test confirms that this observed protective effect is statistically significant and unlikely to be due to random chance This supports the established theory of the heterozygote advantage against malaria Summary of Key Points SCD Inheritance:  Autosomal recessive. Carriers (AS) have the sickle cell trait and are generally healthy but are resistant to severe malaria Chi-Square Test:  A tool to test for an association between two categorical variables Application:  Used in epidemiology to validate observed relationships, such as the link between sickle cell trait and malaria resistance, by providing a p-value to judge statistical significance Interpretation:  A low p-value (≤ 0.05) indicates that the relationship observed in the sample data is likely to exist in the broader population

Types of Chi-Square Tests Type Purpose Example Goodness of Fit Test To test if observed data fit a theoretical distribution Sickle Cell inheritance ratios (AA, AS, SS ) Test of Independence To test association between two categorical variables Association between gender and Sickle Cell status Homogeneity Test To test if two populations have the same distribution of a variable Comparing genotype distribution between two ethnic groups

Interpretation in Health Context How do you interpret it? What do the results look like? Usually, the higher the chi-square statistic, the greater likelihood the finding is significant, but you must look at the corresponding p-value to determine significance In our Sickle Cell example: χ² = 3.2 → not significant → inheritance follows Mendelian law In public health example: If χ² shows significance between gender and HIV status → gender influences infection risk

Selection of critical value of chi square Having calculated a χ 2 value for the data in experiment #2, we now need to evaluate that χ2 value To do so we must compare our calculated χ2 with the appropriate critical value of χ 2 from the table shown on the slide 22 here All of these critical values in the table have been predetermined by statisticians To select a value from the table, we need to know 2 things: 1. The number of degrees of freedom. That is one less than the number of categories (groups) we have. For our inheritance SCT experiment that is 2 groups - 1 = 1. So our critical value of χ2 will be in the first row of the table. 2. The probability value, which reflects the degree of confidence we want to have in our interpretation The column headings 0.05 and 0.01 correspond to probabilities, or confidence levels 0.05 means that when we draw our conclusion, we may be 95% confident that we have drawn the correct conclusion That shows that we can't be certain; there would still be a 5% probability of drawing the wrong conclusion. But 95% is very good. 0.01 would give us 99% confidence, only a 1% likelihood of drawing the wrong conclusion We will now agree that, unless told otherwise, we will always use the 0.05 probability column (95% confidence level) For 1 degree of freedom, in our coin toss experiment, the table χ2 value is 3.84. We compare the calculated χ2 (0.36) to that

The interpretation In every χ2-test the calculated χ2 value will either be (i) less than or equal to the critical χ2 value OR (ii) greater that the critical χ2 value If calculated χ 2 ≤ critical χ 2 , then we conclude that there is no statistically significant difference between the two distributions That is, the observed results are not significantly different from the expected results, and the numerical difference between observed and expected can be attributed to chance If calculated χ 2 > critical χ 2 , then we conclude that there is a statistically significant difference between the two distributions That is, the observed results are significantly different from the expected results, and the numerical difference between observed and expected can not be attributed to chance That means that the difference found is due to some other factor This test won't identify that other factor, only that there is some factor other than chance responsible for the difference between the two distributions That much difference cannot be attributed to chance. We may be 95% confident that something else, some other factor, caused the difference. The χ 2 -test won't identify that other factor, only that there is some factor other than chance responsible for the difference between the two distributions

Hypothesis testing Biostatisticians formally describe what we've just done in terms of testing a hypothesis This process begins with stating the "null hypothesis.“ The null hypothesis says that the difference found between observed distribution and expected distribution is not significant, i.e. that the difference is just due to random chance Then we use the χ2 -test to test the validity of that null hypothesis. If calculated χ2 ≤ critical χ2 , then we accept the null hypothesis That means that the two distributions are not significantly different, that the difference we see is due to chance, not some other factor On the other hand, if calculated χ 2 > critical χ2 , then we reject the null hypothesis. That means that the two distributions are significantly different, that the difference we see is not due to chance alone Note this well: In performing the chi squared test in this course, it is not sufficient in your interpretation to say "accept null hypothesis" or "reject null hypothesis." You will be expected to fully state whether the distributions being compared are significantly different or not and whether the difference is due to chance alone or other factors

Application of Chi-Square in Epidemiological Data Chi-square tests are widely applied in public health to test associations between categorical variables, such as: Application Area Example Question Tested with Chi-Square Genetics Is Sickle Cell trait related to malaria resistance? Epidemiology Is gender associated with HIV infection rates? Health Programs Does vaccination status affect infection prevalence? Clinical Research Is there an association between blood group and disease outcome?
Tags