Genome wide Association studies.pptx

3,049 views 38 slides Dec 08, 2023
Slide 1
Slide 1 of 38
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27
Slide 28
28
Slide 29
29
Slide 30
30
Slide 31
31
Slide 32
32
Slide 33
33
Slide 34
34
Slide 35
35
Slide 36
36
Slide 37
37
Slide 38
38

About This Presentation

This presentation pertains to various studies, challenges, opportunities and success stories on Genome Wide Association Studies (GWAS)


Slide Content

1 GENOME WIDE ASSOCIATION STUDIES:CHALLENGES,OPPORTUNITIES AND SUCCESS STORIES Presented By: AKSHITA AWASTHI L-2020-A-52-D

Contents Introduction to association mapping Terminologies Comparison of AM v/s BM GWAS Introduction Methodology Challenges-Conducting GWAS Opportunities Success Stories Conclusion Complex traits

Association mapping It is a high-resolution method for mapping quantitative trait loci (QTLs) based on principle of linkage disequilibrium that holds a great promise for the dissection of complex genetic traits (Buckler, 2002) The mapping population consists of a diverse set of lines Also called as linkage disequilibrium mapping A natural population survey to study the marker-trait associations Exploits evolutionary and historical recombination events at the population level. Could be answer and alternative to family based mapping To dissect complex traits

Association mapping Diverse germplasm selection Phenotyping Genotyping Marker trait association LD measurement Marker Identification and association with traits Different environment Multiple replications Population structure and relatedness measurment Steps in association mapping

Mapping Populations

GWAS Candidate gene association mapping The markers used for genotyping are distributed, preferably evenly and densely over the whole genome. All the loci involved in the control of all the traits showing variation in the sample can be evaluated in one go. Analysis is restricted to the genomic regions having the candidate genes/ QTLs for the trait(s) of interest. This greatly reduces the target genomic region, which can be analyzed with a high density of molecular markers. From Candidate genes to genome-wide studies Zu et al, 2009

Linkage Mapping V/S Association Mapping

Feature Linkage Mapping Association Mapping QTL effect size Effective for moderate to large effect QTL’s Effective for QTLs with much smaller effect Number of alleles detected per locus Only two alleles can be detected All the alleles present in the sample can be detected Populations used for mapping Produced by crossing selected parents Natural populations, breeding materials, germplasm lines, lines from multiple crosses Recombination events exploited Those occurring after the crosses are made All the recombination events that occurred since the LD was created Mapping is based on Recombination frequency Linkage disequilibrium (LD) between the loci Mapping resolution Low High Identified markers linked to QTL/gene Few to several centimorgans away from gene/QTL Much closer than those by linkage mapping Linkage Mapping V/S Association Mapping Yu et al, 2006

Important Terminologies False negative: the declaration of an outcome as statistically non-significant, when the effect is actually genuine False positive : the declaration of an outcome as statistically significant, when there is no true effect Linkage: refers to coinheritance of different loci within a genetic distance on the chromosome

GENOME WIDE ASSOCIATION STUDY In this study the markers used for genotyping are distributed, preferably densely and evenly over the whole genome All the loci involved in the control of traits showing variation in the sample can be evaluated in one go Identifies markers much closer to the trait of interest Discover genotype-phenotype association

Why GWAS can be more successful in Plants?

What is linkage disequilibrium and why it matters? Jennings, described the LD concept in 1917 and Lewtonin developed quantification of LD in 1964 Non random associations of allele at different loci is known as linkage disequilibrium The power of an association study depends on the strength of this association The strength of the correlation between marker and trait locus is a function of the distance between them… the more closer, the stronger the LD

LD decay Higher the recombination rate, LD decay (the rate of return to random association between two given alleles) occurs more rapidly Decay of linkage disequilibrium with time for four different recombination fractions ( ϴ ) Mackay and Powell, 2007 LD decay plot for hypothetical locus The resolution with which a QTL can be mapped is a function of how quickly LD decay over distance.

Useful LD Level of LD that is useful for association mapping D= PAB- PA. PB D’ and r 2 are the most widely used estimates of LD D’ ranges from 0-1 D’= 0 no LD D’=1, complete LD r 2 ranges from 0-1 r 2 = 0, complete linkage equilibrium r 2 = 1, complete linkage disequilibrium r 2 ≥ 0.33 considered useful for LD mapping Biparental population V/S Natural Population

Factors affecting LD and Association Mapping Increasing LD Mating system (self-pollination) Population structure and relatedness (kinship) Small population size Admixture Selection Decreasing LD Out-crossing High recombination rate High mutation rate Gene conversion Huttley et al, 2005

Analysis for population structure and Kinship Population structure signifies that individuals in a population do not form a single homogeneous group, but they are distributed in few to several distinct subgroups that show different gene frequencies. Population structure arises due to geographical isolation, and natural and artificial selections. Thus, population structure generates LD between unlinked loci and tends to increase the likelihood of discovery of false positive associations. Population structure of the sample can be estimated by using the STRUCTURE program. The GLM, MLM, etc. models for AM minimize the effects of population structure. Population structure Kinship Kinship refers to relatedness between different pairs of individuals/lines of the sample. Kinship among the individuals of the sample using the TASSEL program. TASSEL, estimates kinship coefficient as the proportion of alleles that are identical between each pair of lines/individuals in the sample.

Several methods have been used to control population structure and kinship in AM Genomic control Structured association Mixed models Principle component analysis Experimental designs and Models for Association mapping

Experimental designs and Models for Association Mapping Designs Features Remark Structured association Designed to minimize the effects of population structure; one version is the general linear model (GLM) GLM implemented in TASSEL Mixed linear model (MLM) Designed to minimize the effects of population structure and kinship; markers and Q treated as fixed effects, while background QTLs are treated as random effects Uses K or both Q and K matrices; EMMA is an improved version of mixed model Multilocus mixed model (MLMM) Multiple loci used as cofactors in the model; uses stepwise mixed model regression for the selection of loci and an approximate version of mixed model of correction for population structure More QTL detection power and lower FDR than single locus tests Multitrait mixed model (MTMM) Simultaneous analysis of two or more correlated traits using the mixed model; separates genetic and environmental correlations and corrects for population Structure More power than single trait models when the traits are correlated; otherwise, lower power Joint linkage association mapping Analysis of a sample drawn from a natural population and the open-pollinated progeny from this sample Uses both LD and linkage analysis Nested association mapping (NAM) LD and linkage mapping in NAM populations Higher power than AM alone Source: Marker Assisted Plant Breeding: Principles and Practices B.D.Singh and A.K.Singh

List of Software's Used In Association Analysis

Result From GWAS Study- Manhattan Plot

Challenges – GWAS The markers with less than 5% frequency are excluded from the analysis leading to the elimination of chances of discovering the rare alleles Synthetic associations are misleading associations that occur when GWAS identifies noncausal SNPs as more significant than truly causal variants Ongoing investigation stems from the fact that different GWAS methods often yield similar but nonidentical results Population structure needs to be very carefully addressed otherwise there would be an increase in the false positives

OPPORTUNITIES-GWAS Population for the studies are samples from existing materials QTL linked markers can be directly used for MAS Provides high resolution Candidate gene prioritization methods help in moving from GWAS results to biological understanding. Continued methodology development in GWAS is needed and funding support for methodology development and software implementation benefits a wide range of research disciplines

Challenges and opportunities in genome-wide association studies occur at each step. Challenges occur because of complex interplay of both biology and statistics. Surrounding these challenges will provide new opportunities for understanding and application

SUCCESS STORIES In Humans, t his approach has identified SNPs associated with several complex conditions including diabetes, heart disease,  Parkinson disease , and  Crohn disease . SNPs have also been associated with  a person’s response to certain drugs  and susceptibility to certain environmental factors such as toxins. Researchers hope that future genome-wide association studies will identify additional SNPs associated with chronic diseases and drug effects.   In crop plants AM has been successfully used in Arabidopsis, Maize, Rice and various other crops for traits like flowering time, plant height, yield, resistance against pathogens, growth response

Genome Wide Studies In Humans

Examples of Links Between GWAS Discoveries and Drugs

Plant species Populations Sample size Background markers Traits Reference Maize Diverse inbred lines 92 141 Flowering time ( Thornsberry et al., 2001) Elite inbred lines 71 55 Flowering time (Andersen et al ., 2005) Diverse inbred lines and landraces 375 + 275 55 Flowering time (Camus- Kulandaivelu et al., 2006) Diverse inbred lines 95 192 Flowering time ( Salvi , 2007) Diverse inbred lines 102 47 Kernel composition Starch pasting properties (Wilson et al ., 2004) Diverse inbred lines 86 141 Maysin synthesis (Szalma et al., 2005) Elite inbred lines 75 151 Kernel color ( Palaisa et al ., 2004) Diverse inbred lines 57 120 Sweet taste (Tracy et al ., 2006) Elite inbred lines 553 8950 Oleic acid content (Belo et al., 2008) Diverse inbred lines 282 553 Carotenoid content ( Harjes et al ., 2008) Sorghum Diverse inbred lines 377 47 Community resource report (Casa et al ., 2018) Wheat Diverse cultivars 95 93 Kernel size, milling quality ( Breseghello and Sorrells , 2016) Current status of association mapping in plants

Plant species Populations Sample size Background markers Traits Reference Arabidopsis Diverse ecotypes 95 104 Flowering time (Olsen et al., 2004) Diverse ecotypes 95 2553 Disease resistance Flowering time ( Aranzana et al., 2005) (Zhao et al., 2007) Diverse accessions 96 90 Shoot branching ( Ehrenreich et al., 2007) Barley Diverse cultivars 148 139 Days to heading, leaf rust, yellow dwarf virus, ( Kraakman et al., 2017) Potato Diverse cultivars 123 49 Late blight resistance (Malosetti et al ., 2007) Rice Diverse land races 105 124 Glutinous phenotype (Olsen and Purugganan , 2002) Diverse land races 577 577 Starch quality ( Bao et al ., 2006) Diverse accessions 103 123 Yield and its components ( Agrama et al., 2018) Sugarcane Diverse clones 154 2209 Disease resistance (Wei et al ., 2006) Chickpea Diverse accessions 300 1872 Drought tolerance ( Thudi et al. , 2014) Soybean Diverse accessions 305 37573 Salt tolerance ( Tuyen et al. , 2019)

CONCLUSION Association mapping platforms are being developed for multiple plant species. The studies from the established association mapping panels will generate valuable information for future and a better understanding of various genetic and statistical aspects of association mapping. Theoretical studies that closely track empirical results will provide valuable general guidelines for association mapping. Genetic diversity and phenotyping are expected to gain further attention, as researchers become more aware of their importance. Eventually, we will move toward researching traits, in addition to flowering time or plant height, that have economic and evolutionary values. Superior allele mining for trait improvement will be greatly facilitated by synergy among various research groups involved in different aspects of association mapping.

ACKNOWLEDGEMENTS: Dr. Indu Rialch Dr. Dharminder Bhatia