GGWS_M3_L5_Estimation_of_heritability_from_GWAS_summary_statistics.pptx

bhagwatbiotech 69 views 46 slides May 28, 2024
Slide 1
Slide 1 of 46
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27
Slide 28
28
Slide 29
29
Slide 30
30
Slide 31
31
Slide 32
32
Slide 33
33
Slide 34
34
Slide 35
35
Slide 36
36
Slide 37
37
Slide 38
38
Slide 39
39
Slide 40
40
Slide 41
41
Slide 42
42
Slide 43
43
Slide 44
44
Slide 45
45
Slide 46
46

About This Presentation

nn


Slide Content

Estimation of Heritability from GWAS Summary Statistics Genetics & Genomics Winter School A/Prof Loic Yengo [email protected] Institute for Molecular Bioscience The University of Queensland

Outline Overview of Genome-Wide Association Studies (GWAS) Linkage Disequilibrium Score Regression Other methods Partitioned heritability

Outline Overview of Genome-Wide Association Studies (GWAS) Linkage Disequilibrium Score Regression Other methods Partitioned heritability

Chailurkit et al. PeerJ 2022 Association = mean differences between genotypes Genome-Wide = Test for large number of variants Genome-Wide Association Studies (Quantitative Traits) Allele A Allele G Allele T

5 More alleles T in Cases Less alleles T in Controls Allele A Allele G Allele T Association = allele frequency differences between cases and controls Large sample sizes are required… Genome-Wide Association Studies (Binary/Disease Traits)

6 Manhattan plot Mapping the human genetic architecture of COVID-19 COVID-19 host genetics consortium 2021 – Nature Detected genetic associations implicate interferon genes 125,584 cases vs 2.5 M controls 60 studies from 25 countries 26 associations detected

Another popular plot in GWAS = QQ-plot

GWAS summary statistics Comes in different flavor Minimum available SNP ID (e.g., rs number or chromosome:position:genome build) Alleles tested (e.g., effect allele / non-effect allele) Allele frequency Marginal SNP effect ( a.k.a “BETA”) Standard Error Per-SNP Sample size P-value More data sometimes… Imputation accuracy Genotypes frequencies Frequencies in cases and controls Hardy-Weinberg Equilibrium Test Statistics Most GWAS are conducted using regression methods: linear / logistic (mixed) models.

GWAS summary statistics Has Become a standard to share and make publicly available the summary-level data when publishing a GWAS study. 9 —Nat Genet editorial, July 2012

10 2021

Challenges with GWAS summary statistics Test is not always specified Sample size may (substantially) vary across SNPs (consortium / imputation) Imputation accuracy is not always available (effective sample size) Summary statistics may be truncated (identifiability issue) => creates noise Allele frequencies may not always match that across individuals in the GWAS

Notations and Nomenclature Estimated SNP effect of SNP j: Standard error of : SE( ) Z-score of SNP j: /SE( ) Chi-square of SNP j:   This statistic is expected to follow, asymptotically (i.e., when sample size is infinite), a distribution with 1-degree of freedom.   Genomic Control ( ) = median( ) / 0.456   Large values (i.e. >1.1) may indicate confounding due to population stratification

Outline Overview of Genome-Wide Association Studies (GWAS) Linkage Disequilibrium (LD) Score Regression Other methods Partitioned heritability

LD score regression Initial motivation: distinguish polygenicity from confounding (e.g., due to population stratification) Extension(s) Estimation of SNP-based heritability and genetic correlations ( Bulik -Sullivan 2014, 2015) Functional Enrichment (Finucane 2015) Estimation of polygenicity Etc. Credit to Bullik -Sullivan (online lecture)

100Kb 100Kb   Population 1 LD scores LD score of SNP j :   Credit to Bullik -Sullivan (online lecture)

Under genetic drift… Credit to Bullik -Sullivan (online lecture)

…the more you tag, the more likely you are to tag a causal variant ! Key assumption Each SNP explains the same amount of trait variance Credit to Bullik -Sullivan (online lecture)

Simulated Polygenicity Credit to Bullik -Sullivan (online lecture)

Simulated population stratification (UK vs Sweden)

LD score regression theory   is the GWAS sample size is the average heritability explained per SNP. is the LD score regression intercept. Deviations from 1 indicate confounding .  

Proof (More details in Supplementary Note of Bulik -Sullivan et al. 2014)

Key ideas behind the proof 1) Population stratification Model 2) Heritability Model

F ST model: Balding-Nichols 23 p 1 p 2 p Ancestral population Derived population 1 Derived population 2 Stratification Model +S/2 -S/2 Mean difference = S

How does it look like? 24 PC1 explains most the variance

Heritability Model with and All SNPs contribute equally to the trait heritability Under this assumption…(+ genotypes centred and scaled)   You can complete the proof…

LD score regression in practice   Regress the ’s on the ’s Use weights to account for High LD score SNP contribute too much Heteroskedasticity (i.e., residuals don’t have the same variance) Block-Jackknife to assess standard errors (300 blocks)   Practical 4 will use the LDSC software

Caveat   What is the correct “M”? 100Kb 100Kb   Population 1 LDSC estimates of heritability are biased (yet still useful)!

Estimation of genetic correlations (More details in Supplementary Note of Bulik -Sullivan et al. 2014)

Heritability Model with and All SNPs contribute equally to the trait heritability Under this assumption…(+ genotypes centred and scaled)   You can complete the proof…

Formally Bulik -Sullivan et al. (2015)   and are the GWAS sample sizes of study/trait 1 and 2 is the number of participants overlapping study 1 and 2 is the average heritability explained per SNP for trait . is the phenotypic correlation between trait 1 and 2. is the genetic correlation (i.e. correlation between true effects of scaled genotypes)   Estimation uses weighted least-squares Step 1: two univariate analyses Step 2: estimation genetic covariance

Formally Yengo, Yang & Visscher (2018) Additional term only matters when N is large!

Browsing genetic correlation in UK Biobank https:// ukbb-rg.hail.is /

Other methods

HDL method Same heritability model as LD score regression Lower standard errors Splits the genome into 1700 independent LD blocks Standard errors (SE) are still estimated using block jackknife SE are reduced by ~3-fold!

Generalized Random Effect (GRE) Minimal assumptions about the distribution of SNP effects Marginal effects Inverse of in-sample LD matrix

Bayesian Models (e.g., SBayesC )   proportion of SNPs with non-zero effects Dirac Point Mass at 0 Inference is based on Monte Carlo Markov Chain sampling (Lecture 6) Standard Errors are obtained from sampling the posterior distribution A bit more computationally intensive Prior distribution + Data = GWAS summary statistics   = Posterior Distribution (Bayes Rule)

Bayesian Model ( SBayesS )

SumHer method A different heritability model… where, Allele frequency Local LD level

Extensions of LD score regression

Partitioned Heritability Model Effect of the annotation on heritability LD between SNP j And SNPs with that annotations Applications Quantify enrichment of heritability in certain annotations Prioritize tissues/cell-types

A few more extensions Estimation of genetic correlation between populations (POPCORN method – Brown et al. AJHG 2016) Estimation %heritability mediated by gene-expression (Yao et al. Nat. Genet 2020) Estimation of polygenicity of traits (O’Connor et al. AJHG 2019) Causal inference (O’Connor & Price, Nat Genet 2018)

Summary and conclusions The variation of strength of association between SNPs depends on local LD and heritability That variation can be leveraged to estimate heritability using a method like LD score regression (although biased) Genetic correlations can be estimated similarly (no bias) Other methods exist (different heritability model, Bayesian, etc.)
Tags