Phylogenetic alignment analysis an important tool in computational biology
HemaNandini4
188 views
34 slides
Oct 09, 2024
Slide 1 of 34
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
About This Presentation
Phylogenetic tree analysis
Size: 627.81 KB
Language: en
Added: Oct 09, 2024
Slides: 34 pages
Slide Content
Phylogenetics is the branch of biology that deals
with evolutionary relatedness.
Uses some measure of evolutionary relatedness:
e.g., morphological features.
Phylogenetics on sequence data is an attempt to
reconstruct the evolutionary history of those
sequences.
Relationships between individual sequences are
not necessarily the same as those between the
organisms they are found.
The ultimate goal is to be able to use sequence
data from many sequences to give information
about phylogenetic history of organisms.
Phylogenetic relationships are usually depicted
as trees, with branches representing ancestors of
“children”; the bottom of the tree (individual
organisms) are leaves. Individual branch points
are nodes.
Aim:
To construct a visual representation (a tree) to
describe the assumed evolution occurring
between and among different groups
(individuals, populations, species, etc.) and to
study the reliability of the consensus tree.
Relationship of phylogenetic Analysis to SA
When two sequences of nucleic acid or protein
molecules found in two different organisms are
similar they are likely to have been derived from
a common ancestor sequence.
A sequence alignment reveals which positions in
the sequences were conserved and which
diverged from a common ancestor sequence.
Concept of EvolutionaryTree
An evolutionary tree is a two dimensional
graph showing evolutionary relationship
among organisms or in the case of sequences,
in certain genes from separate organisms.
The separate sequences are referred to as
taxa, which are phylogenetically distinct
units on the tree.
The tree is represented by outer branches
representing the taxa, nodes and branches
representing relationships among the taxa.
A tree represents graphical relation between
organisms, species, or genomic sequence
In Bioinformatics, it’s based on genomic sequence
Root: origin of evolution
Leaves: current organisms, species, or genomic
sequence
Branches: relationship between organisms, species, or
genomic sequence
Branch length: evolutionary time
Parts of a phylogenetic tree
Node
Root
Outgroup
Ingroup
Branch
Rooted tree
The root of a phylogenetic tree represents the common
ancestor of the sequences.
Some trees are unrooted, and thus do not specify the
common ancestor.
Cladograms: Branch length have no meaning
Phylograms: Branch length represent evolutionary
change
Ultrametric: Branch length represent time, and the
length from the root to the leaves are the same
Sequences A and B are derived from a
common ancestor sequence represented by a
node and C and D are similarly related.
The A/B and C/D common ancestors also
share a common ancestor represented at the
lowest level of the tree.
The length of each branch to the next node
represents the number of sequence changes
that occurred prior to the next level of
separation.
(Here the branch length between the A/B node
and A is approximately equal to that between the
A/B node and B, indicating the species are
evolving at the same rate).
However, it is also likely that for some biological
or environmental reason unique to each species,
one taxon may have undergone more mutations
since diverging from the ancestor than the other.
In this case, different branch lengths would be
shown on the tree.
A root has been placed indicating that in the
evolutionary model of the sequences this basal
node is the common ancestor of all the other
sequences.
Unrooted Tree
It also shows the evolutionary relationship
among sequences A-D, but it does not reveal the
location of oldest ancestory.
It can be converted to a rooted tree by placing a
root anywhere in the tree.
An unrooted, four-taxon tree theoretically can be rooted in five
different places to produce five different rooted trees
The unrooted tree 1:
A C
B D
Rooted tree 1d
C
D
A
B
4
Rooted tree 1c
A
B
C
D
3
Rooted tree 1e
D
C
A
B
5
Rooted tree 1b
A
B
C
D
2
Rooted tree 1a
B
A
C
D
1
These trees show five different evolutionary relationships among the taxa!
• An unrooted tree
• Rooted trees
C
D
B
A
1
C
D
A
B
2
A
B C
D
3
A
B
C
D
4
A
B
D
C
5
A
B
C
Dinternal nodes
branches
external nodes
external nodes
Hypothetical ancestor
• •
Phylogenetic Analysis
A phylogenetic analysis of a family of related
nucleic acid or protein sequences is a
determination of how the family might have
been derived during evolution.
The evolutionary relationships among the
sequence are depicted by placing the
sequences as outer branches on a tree.
The branching relationships on the inner part
of the tree then reflect the degree to which
different sequences are related.
The sequences that are very much alike will be
located as neighbouring branches and will be
joined to a common branch beneath them.
The object of phylogenetic analysis is to discover
all of the branching relationships in the tree and
the branch lengths.
When a gene family is found in an organism or a
group of organisms, phylogenetic relationships
among the genes can help to predict which ones
might have an equivalent function.
These functional predictions can then be tested
by genetic experiments.
Phylogenetic analysis may also be used to follow
the changes occurring in a rapidly changing
species such as a virus.
How to construct a phylogenetic tree?
Step1:
Make a multiple alignment from base alignment or
amino acid sequence (by using MUSCLE, BLAST, or
other method)
Step 2:
Check the multiple alignment if it reflects the
evolutionary process.
Step3:
Choose what method we are going to use and
calculate the distance or use the result depending on
the method
Step 4:
Verify the result statistically.
What is a phylogeny?
A phylogeny is a type of
pedigree
Shows relationships
between species, not
individuals
Reconstructs pattern of
events leading to the
distribution and diversity
of life
Often shown as a network
or tree
Phylogeny
Orangutan Gorilla Chimpanzee Human
From the Tree of the Life Website,
University of Arizona
Distance Method
Feng and Doolittle method
The distance method employs the number of
changes between each pair in a group of
sequences to produce a phylogenetic tree of the
group.
The sequence pairs that have the smallest
number of sequence changes between them are
termed as “neighbours”
On a tree, these sequences share a node or common
ancestor position and are each joined to that node
by a branch.
The goal of distance methods is to identify a tree
that positions the neighbours correctly and that also
has branch lengths which reproduce the original
data as closely as possible.
The most commonly applied distance based
methods are the unweighted pair group method
with arithmetic mean (UPGMA) and neighbor-
joining (NJ).
Distance analysis program in PHYLIP are FITCH,
KITSCH and NEIGHBOR.
Finding the closest neighbors among a group of
sequences by the distance method is often the
first step in producing a MSA.
Parsimony method
Also called minimum evolution method
This method predicts the evolutionary tree that
minimizes the number of steps required to
generate the observed variation in the sequences.
No assumptions on the evolutionary pattern.
May oversimplify evolution.
May produce several equally good trees.
This method is used to construct trees on the basis
of the minimum number of mutations required to
convert one sequence to another.
The main programs for maximum parsimony
analysis in the PHYLIP package are DNAPARAS,
DNAPENNY, DNACOMP, DNAMOVE and
PROTPARS.
Maximum likelihood method
Originally developed for statistics by Ronald Fisher between
1912 and 1922
The best tree is found based on assumptions on evolution model
Nucleotide models more advanced at the moment than
aminoacid models
Programs require lot of capacity from the system
Maximum likelihood is very expensive and extremely slow to
compute
PHYLIP programs include two programs DNAML and
DNAMLK.
Comparison of Methods
Distance Maximum
parsimony
Maximum likelihood
Uses only pairwise
distances
Uses only shared
derived characters
Uses all data
Minimizes distance
between nearest
neighbors
Minimizes total
distance
Maximizes tree likelihood
given specific parameter
values
Very fast Slow Very slow
Easily trapped in local
optima
Assumptions fail
when evolution is
rapid
Highly dependent on
assumed evolution
model
Good for generating
tentative tree, or
choosing among
multiple trees
Best option when
tractable (<30 taxa,
homoplasy rare)
Good for very small data
sets and for testing trees
built using other methods