This presentation pertains to various studies, challenges, opportunities and success stories on Genome Wide Association Studies (GWAS)
Size: 13.17 MB
Language: en
Added: Dec 08, 2023
Slides: 38 pages
Slide Content
1 GENOME WIDE ASSOCIATION STUDIES:CHALLENGES,OPPORTUNITIES AND SUCCESS STORIES Presented By: AKSHITA AWASTHI L-2020-A-52-D
Contents Introduction to association mapping Terminologies Comparison of AM v/s BM GWAS Introduction Methodology Challenges-Conducting GWAS Opportunities Success Stories Conclusion Complex traits
Association mapping It is a high-resolution method for mapping quantitative trait loci (QTLs) based on principle of linkage disequilibrium that holds a great promise for the dissection of complex genetic traits (Buckler, 2002) The mapping population consists of a diverse set of lines Also called as linkage disequilibrium mapping A natural population survey to study the marker-trait associations Exploits evolutionary and historical recombination events at the population level. Could be answer and alternative to family based mapping To dissect complex traits
Association mapping Diverse germplasm selection Phenotyping Genotyping Marker trait association LD measurement Marker Identification and association with traits Different environment Multiple replications Population structure and relatedness measurment Steps in association mapping
Mapping Populations
GWAS Candidate gene association mapping The markers used for genotyping are distributed, preferably evenly and densely over the whole genome. All the loci involved in the control of all the traits showing variation in the sample can be evaluated in one go. Analysis is restricted to the genomic regions having the candidate genes/ QTLs for the trait(s) of interest. This greatly reduces the target genomic region, which can be analyzed with a high density of molecular markers. From Candidate genes to genome-wide studies Zu et al, 2009
Linkage Mapping V/S Association Mapping
Feature Linkage Mapping Association Mapping QTL effect size Effective for moderate to large effect QTL’s Effective for QTLs with much smaller effect Number of alleles detected per locus Only two alleles can be detected All the alleles present in the sample can be detected Populations used for mapping Produced by crossing selected parents Natural populations, breeding materials, germplasm lines, lines from multiple crosses Recombination events exploited Those occurring after the crosses are made All the recombination events that occurred since the LD was created Mapping is based on Recombination frequency Linkage disequilibrium (LD) between the loci Mapping resolution Low High Identified markers linked to QTL/gene Few to several centimorgans away from gene/QTL Much closer than those by linkage mapping Linkage Mapping V/S Association Mapping Yu et al, 2006
Important Terminologies False negative: the declaration of an outcome as statistically non-significant, when the effect is actually genuine False positive : the declaration of an outcome as statistically significant, when there is no true effect Linkage: refers to coinheritance of different loci within a genetic distance on the chromosome
GENOME WIDE ASSOCIATION STUDY In this study the markers used for genotyping are distributed, preferably densely and evenly over the whole genome All the loci involved in the control of traits showing variation in the sample can be evaluated in one go Identifies markers much closer to the trait of interest Discover genotype-phenotype association
Why GWAS can be more successful in Plants?
What is linkage disequilibrium and why it matters? Jennings, described the LD concept in 1917 and Lewtonin developed quantification of LD in 1964 Non random associations of allele at different loci is known as linkage disequilibrium The power of an association study depends on the strength of this association The strength of the correlation between marker and trait locus is a function of the distance between them… the more closer, the stronger the LD
LD decay Higher the recombination rate, LD decay (the rate of return to random association between two given alleles) occurs more rapidly Decay of linkage disequilibrium with time for four different recombination fractions ( ϴ ) Mackay and Powell, 2007 LD decay plot for hypothetical locus The resolution with which a QTL can be mapped is a function of how quickly LD decay over distance.
Useful LD Level of LD that is useful for association mapping D= PAB- PA. PB D’ and r 2 are the most widely used estimates of LD D’ ranges from 0-1 D’= 0 no LD D’=1, complete LD r 2 ranges from 0-1 r 2 = 0, complete linkage equilibrium r 2 = 1, complete linkage disequilibrium r 2 ≥ 0.33 considered useful for LD mapping Biparental population V/S Natural Population
Factors affecting LD and Association Mapping Increasing LD Mating system (self-pollination) Population structure and relatedness (kinship) Small population size Admixture Selection Decreasing LD Out-crossing High recombination rate High mutation rate Gene conversion Huttley et al, 2005
Analysis for population structure and Kinship Population structure signifies that individuals in a population do not form a single homogeneous group, but they are distributed in few to several distinct subgroups that show different gene frequencies. Population structure arises due to geographical isolation, and natural and artificial selections. Thus, population structure generates LD between unlinked loci and tends to increase the likelihood of discovery of false positive associations. Population structure of the sample can be estimated by using the STRUCTURE program. The GLM, MLM, etc. models for AM minimize the effects of population structure. Population structure Kinship Kinship refers to relatedness between different pairs of individuals/lines of the sample. Kinship among the individuals of the sample using the TASSEL program. TASSEL, estimates kinship coefficient as the proportion of alleles that are identical between each pair of lines/individuals in the sample.
Several methods have been used to control population structure and kinship in AM Genomic control Structured association Mixed models Principle component analysis Experimental designs and Models for Association mapping
Experimental designs and Models for Association Mapping Designs Features Remark Structured association Designed to minimize the effects of population structure; one version is the general linear model (GLM) GLM implemented in TASSEL Mixed linear model (MLM) Designed to minimize the effects of population structure and kinship; markers and Q treated as fixed effects, while background QTLs are treated as random effects Uses K or both Q and K matrices; EMMA is an improved version of mixed model Multilocus mixed model (MLMM) Multiple loci used as cofactors in the model; uses stepwise mixed model regression for the selection of loci and an approximate version of mixed model of correction for population structure More QTL detection power and lower FDR than single locus tests Multitrait mixed model (MTMM) Simultaneous analysis of two or more correlated traits using the mixed model; separates genetic and environmental correlations and corrects for population Structure More power than single trait models when the traits are correlated; otherwise, lower power Joint linkage association mapping Analysis of a sample drawn from a natural population and the open-pollinated progeny from this sample Uses both LD and linkage analysis Nested association mapping (NAM) LD and linkage mapping in NAM populations Higher power than AM alone Source: Marker Assisted Plant Breeding: Principles and Practices B.D.Singh and A.K.Singh
List of Software's Used In Association Analysis
Result From GWAS Study- Manhattan Plot
Challenges – GWAS The markers with less than 5% frequency are excluded from the analysis leading to the elimination of chances of discovering the rare alleles Synthetic associations are misleading associations that occur when GWAS identifies noncausal SNPs as more significant than truly causal variants Ongoing investigation stems from the fact that different GWAS methods often yield similar but nonidentical results Population structure needs to be very carefully addressed otherwise there would be an increase in the false positives
OPPORTUNITIES-GWAS Population for the studies are samples from existing materials QTL linked markers can be directly used for MAS Provides high resolution Candidate gene prioritization methods help in moving from GWAS results to biological understanding. Continued methodology development in GWAS is needed and funding support for methodology development and software implementation benefits a wide range of research disciplines
Challenges and opportunities in genome-wide association studies occur at each step. Challenges occur because of complex interplay of both biology and statistics. Surrounding these challenges will provide new opportunities for understanding and application
SUCCESS STORIES In Humans, t his approach has identified SNPs associated with several complex conditions including diabetes, heart disease, Parkinson disease , and Crohn disease . SNPs have also been associated with a person’s response to certain drugs and susceptibility to certain environmental factors such as toxins. Researchers hope that future genome-wide association studies will identify additional SNPs associated with chronic diseases and drug effects. In crop plants AM has been successfully used in Arabidopsis, Maize, Rice and various other crops for traits like flowering time, plant height, yield, resistance against pathogens, growth response
Genome Wide Studies In Humans
Examples of Links Between GWAS Discoveries and Drugs
Plant species Populations Sample size Background markers Traits Reference Maize Diverse inbred lines 92 141 Flowering time ( Thornsberry et al., 2001) Elite inbred lines 71 55 Flowering time (Andersen et al ., 2005) Diverse inbred lines and landraces 375 + 275 55 Flowering time (Camus- Kulandaivelu et al., 2006) Diverse inbred lines 95 192 Flowering time ( Salvi , 2007) Diverse inbred lines 102 47 Kernel composition Starch pasting properties (Wilson et al ., 2004) Diverse inbred lines 86 141 Maysin synthesis (Szalma et al., 2005) Elite inbred lines 75 151 Kernel color ( Palaisa et al ., 2004) Diverse inbred lines 57 120 Sweet taste (Tracy et al ., 2006) Elite inbred lines 553 8950 Oleic acid content (Belo et al., 2008) Diverse inbred lines 282 553 Carotenoid content ( Harjes et al ., 2008) Sorghum Diverse inbred lines 377 47 Community resource report (Casa et al ., 2018) Wheat Diverse cultivars 95 93 Kernel size, milling quality ( Breseghello and Sorrells , 2016) Current status of association mapping in plants
Plant species Populations Sample size Background markers Traits Reference Arabidopsis Diverse ecotypes 95 104 Flowering time (Olsen et al., 2004) Diverse ecotypes 95 2553 Disease resistance Flowering time ( Aranzana et al., 2005) (Zhao et al., 2007) Diverse accessions 96 90 Shoot branching ( Ehrenreich et al., 2007) Barley Diverse cultivars 148 139 Days to heading, leaf rust, yellow dwarf virus, ( Kraakman et al., 2017) Potato Diverse cultivars 123 49 Late blight resistance (Malosetti et al ., 2007) Rice Diverse land races 105 124 Glutinous phenotype (Olsen and Purugganan , 2002) Diverse land races 577 577 Starch quality ( Bao et al ., 2006) Diverse accessions 103 123 Yield and its components ( Agrama et al., 2018) Sugarcane Diverse clones 154 2209 Disease resistance (Wei et al ., 2006) Chickpea Diverse accessions 300 1872 Drought tolerance ( Thudi et al. , 2014) Soybean Diverse accessions 305 37573 Salt tolerance ( Tuyen et al. , 2019)
CONCLUSION Association mapping platforms are being developed for multiple plant species. The studies from the established association mapping panels will generate valuable information for future and a better understanding of various genetic and statistical aspects of association mapping. Theoretical studies that closely track empirical results will provide valuable general guidelines for association mapping. Genetic diversity and phenotyping are expected to gain further attention, as researchers become more aware of their importance. Eventually, we will move toward researching traits, in addition to flowering time or plant height, that have economic and evolutionary values. Superior allele mining for trait improvement will be greatly facilitated by synergy among various research groups involved in different aspects of association mapping.
ACKNOWLEDGEMENTS: Dr. Indu Rialch Dr. Dharminder Bhatia