Ortholog assignment

melvinzhang 2,600 views 32 slides May 03, 2011
Slide 1
Slide 1 of 32
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27
Slide 28
28
Slide 29
29
Slide 30
30
Slide 31
31
Slide 32
32

About This Presentation

No description available for this slideshow.


Slide Content

Computational Prediction of Orthologs
Melvin Zhang
School of Computing,
National University of Singapore
May 4, 2011

A gene is a unit of heredity in a living organism

One gene may encode for multiple proteins

Two genes are homologous if they descended from
a common ancestral gene
1
In practice, homology is determined using.
Figure:
Have you seen phrases like,
homology", or?
1
with respect to a specic speciation event

Two genes are homologous if they descended from
a common ancestral gene
1
In practice, homology is determined using.
Figure:
Have you seen phrases like,
homology", or?
1
with respect to a specic speciation event

Two genes are homologous if they descended from
a common ancestral gene
1
In practice, homology is determined using.
Figure:
Have you seen phrases like,
homology", or?
1
with respect to a specic speciation event

Orthologs are due to speciation, paralogs are due
to duplicationMRCA ofGandHGHspeciationduplicationmain orthologsorthologsghh
0paralogs

Orthologs maintain their function
Annotate genes with unknown
functions.
Infer protein-protein
interactions.

Orthologs maintain their function
Annotate genes with unknown
functions.
Infer protein-protein
interactions.

Orthologs are not one-to-one due to lineage
specic gene duplications
Main orthologs
position.
2MRCA ofGandHGHspeciationduplicationmain orthologsorthologsghh
0paralogs
2
Burgetzet al., Evolutionary Bioinformatics 2006

Problem of identifying main orthologs
Input
Output
direct descendant inGandH
Complications
Igene duplication
Igene loss
Ihorizontal gene transfer
Igene fusion, ssion

Problem of identifying main orthologs
Input
Output
direct descendant inGandH
Complications
Igene duplication
Igene loss
Ihorizontal gene transfer
Igene fusion, ssion

Three main approaches for nding orthologs
Graph based Tree basedRearrangement based

Bidirectional Best Hit and variants
Most popular approach. High
level of functional relatedness.
a
Reciprocal smallest dist
use evolutionary distance
estimate instead of BLAST
scores
OMA stable pairs
introduce a tolerance interval
and stable matching
a
Altenhoet al., PLoS CB 2009

EnsemblCompara GeneTrees
3
Figure: A
Based on reconciliation of gene trees with species tree.
1.
2.
3
Vilellaet al., Genome Res 2009

MSOAR2
4
Figure:
1.
2.
(inversion, translocation, fusion, ssion, duplication)
3.
4
Fuet al., JCB 2007

Can conserved gene neighborhood improve
ortholog predictions?

Human-mouse synteny blocks
Conserved synteny blocks between human and mouse genome
generated by the Cinteny web server
5
5
Sinha and Meller, BMC Bioinformatics 2007

Local synteny criteria
6
Figure:
genes. Homology dened as BLASTP E-value<1e-5
94% of sampled inter-species pairs are identied as orthologs
by Inparanoid (based on BBH) and local synteny criteria.
6
Jin Junet al., BMC Genomics 2009

Local synteny score (LC)GHgh
The local synteny score ofgandhis 4 since there are 4 edges
in the maximum matching.

Smith-Waterman alignment score (SW)

BBH-LS: bidirectional best hits based on linear
combination of SW and LCGHgh
+
sim(g;h) = (1f)SW(g;h)+fLC(g;h)

Human-Mouse-Rat dataset
Input
Human, mouse, and rat genes downloaded from Ensembl.
Benchmark
No \golden" benchmark for true orthology.
Assume that orthologs are assigned the same gene symbol.

Tuning the BBH-LS method
sim(g;h) = (1f)SW(g;h) +fLC(g;h)
Figure:
similarity to sequence similarity on the human-mouse dataset.

Results for various methods on Human-Mouse
Figure:
More true positives and less false positives than MSOAR2.

Results for various methods on Human-Rat
Figure:

Results for various methods on Mouse-Rat
Figure:

How local synteny helpsCTSHMSH3CKMT2RASGRF2MSH3RASGRF1ANKRD34CRASGRF2ANKRD34CRASGRF1CKMT2CTSHsw = 5265ls = 1sw = 2003ls = 5sw = 2466ls = 5
Human
chr 15
Human
chr 5
Mouse
chr 9
Mouse
chr 13
Bold edges are the pairing from BBH-LS, thin edges are the
pairing from BBH.
BBH paired RASGRF2 (human) to RASGRF1 (mouse) due to
high SW, corrected by BBH-LS with LC.

Summary: Identifying main orthologsMRCA ofGandHGHspeciationduplicationmain orthologsorthologsghh
0paralogs
For each gene in their common ancestor, nd its direct
descendant inGandH

Summary: Three approaches
Graph based Tree basedRearrangement based

BBH-LS: bidirectional best hits based on linear
combination of SW and LCGHgh
+

BBH-LS: bidirectional best hits based on linear
combination of SW and LCGHgh
+
Tags