Phylogenetic alignment analysis an important tool in computational biology

HemaNandini4 188 views 34 slides Oct 09, 2024
Slide 1
Slide 1 of 34
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27
Slide 28
28
Slide 29
29
Slide 30
30
Slide 31
31
Slide 32
32
Slide 33
33
Slide 34
34

About This Presentation

Phylogenetic tree analysis


Slide Content

Phylogenetics is the branch of biology that deals
with evolutionary relatedness.
Uses some measure of evolutionary relatedness:
e.g., morphological features.
Phylogenetics on sequence data is an attempt to
reconstruct the evolutionary history of those
sequences.

Relationships between individual sequences are
not necessarily the same as those between the
organisms they are found.
The ultimate goal is to be able to use sequence
data from many sequences to give information
about phylogenetic history of organisms.

Phylogenetic relationships are usually depicted
as trees, with branches representing ancestors of
“children”; the bottom of the tree (individual
organisms) are leaves. Individual branch points
are nodes.

Aim:
To construct a visual representation (a tree) to
describe the assumed evolution occurring
between and among different groups
(individuals, populations, species, etc.) and to
study the reliability of the consensus tree.

Relationship of phylogenetic Analysis to SA
When two sequences of nucleic acid or protein
molecules found in two different organisms are
similar they are likely to have been derived from
a common ancestor sequence.
A sequence alignment reveals which positions in
the sequences were conserved and which
diverged from a common ancestor sequence.

GAATC seq 1
GAGTT seq 2
GA(A/G)T(C/T) ancestor sequence
GAATC
GAGTT

Concept of EvolutionaryTree
An evolutionary tree is a two dimensional
graph showing evolutionary relationship
among organisms or in the case of sequences,
in certain genes from separate organisms.
The separate sequences are referred to as
taxa, which are phylogenetically distinct
units on the tree.
 The tree is represented by outer branches
representing the taxa, nodes and branches
representing relationships among the taxa.


A tree represents graphical relation between
organisms, species, or genomic sequence

In Bioinformatics, it’s based on genomic sequence

Root: origin of evolution

Leaves: current organisms, species, or genomic
sequence

Branches: relationship between organisms, species, or
genomic sequence

Branch length: evolutionary time

Parts of a phylogenetic tree
Node
Root
Outgroup
Ingroup
Branch

Rooted tree
The root of a phylogenetic tree represents the common
ancestor of the sequences.
Some trees are unrooted, and thus do not specify the
common ancestor.

Cladograms: Branch length have no meaning

Phylograms: Branch length represent evolutionary
change

Ultrametric: Branch length represent time, and the
length from the root to the leaves are the same

Sequences A and B are derived from a
common ancestor sequence represented by a
node and C and D are similarly related.
The A/B and C/D common ancestors also
share a common ancestor represented at the
lowest level of the tree.
The length of each branch to the next node
represents the number of sequence changes
that occurred prior to the next level of
separation.

(Here the branch length between the A/B node
and A is approximately equal to that between the
A/B node and B, indicating the species are
evolving at the same rate).
However, it is also likely that for some biological
or environmental reason unique to each species,
one taxon may have undergone more mutations
since diverging from the ancestor than the other.

In this case, different branch lengths would be
shown on the tree.
A root has been placed indicating that in the
evolutionary model of the sequences this basal
node is the common ancestor of all the other
sequences.

Unrooted Tree
It also shows the evolutionary relationship
among sequences A-D, but it does not reveal the
location of oldest ancestory.
It can be converted to a rooted tree by placing a
root anywhere in the tree.

An unrooted, four-taxon tree theoretically can be rooted in five
different places to produce five different rooted trees
The unrooted tree 1:
A C
B D
Rooted tree 1d
C
D
A
B
4
Rooted tree 1c
A
B
C
D
3
Rooted tree 1e
D
C
A
B
5
Rooted tree 1b
A
B
C
D
2
Rooted tree 1a
B
A
C
D
1
These trees show five different evolutionary relationships among the taxa!

• An unrooted tree
• Rooted trees
C
D
B
A
1
C
D
A
B
2
A
B C
D
3
A
B
C
D
4
A
B
D
C
5
A
B
C
Dinternal nodes
branches
external nodes
external nodes
Hypothetical ancestor
• •

Phylogenetic Analysis
A phylogenetic analysis of a family of related
nucleic acid or protein sequences is a
determination of how the family might have
been derived during evolution.
The evolutionary relationships among the
sequence are depicted by placing the
sequences as outer branches on a tree.
The branching relationships on the inner part
of the tree then reflect the degree to which
different sequences are related.

The sequences that are very much alike will be
located as neighbouring branches and will be
joined to a common branch beneath them.
The object of phylogenetic analysis is to discover
all of the branching relationships in the tree and
the branch lengths.

When a gene family is found in an organism or a
group of organisms, phylogenetic relationships
among the genes can help to predict which ones
might have an equivalent function.
These functional predictions can then be tested
by genetic experiments.

Phylogenetic analysis may also be used to follow
the changes occurring in a rapidly changing
species such as a virus.

How to construct a phylogenetic tree?


Step1:
Make a multiple alignment from base alignment or
amino acid sequence (by using MUSCLE, BLAST, or
other method)

Step 2:
Check the multiple alignment if it reflects the
evolutionary process.


Step3:
Choose what method we are going to use and
calculate the distance or use the result depending on
the method

Step 4:
Verify the result statistically.

What is a phylogeny?
A phylogeny is a type of
pedigree
Shows relationships
between species, not
individuals
Reconstructs pattern of
events leading to the
distribution and diversity
of life
Often shown as a network
or tree

Phylogeny
Orangutan Gorilla Chimpanzee Human
From the Tree of the Life Website,
University of Arizona

Methods for phylogenetic Analysis
Distance method
Parsimony method
Maximum likelihood method

Distance Method
Feng and Doolittle method
The distance method employs the number of
changes between each pair in a group of
sequences to produce a phylogenetic tree of the
group.
The sequence pairs that have the smallest
number of sequence changes between them are
termed as “neighbours”

On a tree, these sequences share a node or common
ancestor position and are each joined to that node
by a branch.
The goal of distance methods is to identify a tree
that positions the neighbours correctly and that also
has branch lengths which reproduce the original
data as closely as possible.
The most commonly applied distance based
methods are the unweighted pair group method
with arithmetic mean (UPGMA) and neighbor-
joining (NJ).

Distance analysis program in PHYLIP are FITCH,
KITSCH and NEIGHBOR.
Finding the closest neighbors among a group of
sequences by the distance method is often the
first step in producing a MSA.

Parsimony method
Also called minimum evolution method
This method predicts the evolutionary tree that
minimizes the number of steps required to
generate the observed variation in the sequences.
No assumptions on the evolutionary pattern.
May oversimplify evolution.
May produce several equally good trees.
This method is used to construct trees on the basis
of the minimum number of mutations required to
convert one sequence to another.

The main programs for maximum parsimony
analysis in the PHYLIP package are DNAPARAS,
DNAPENNY, DNACOMP, DNAMOVE and
PROTPARS.

Maximum likelihood method
Originally developed for statistics by Ronald Fisher between
1912 and 1922
The best tree is found based on assumptions on evolution model
Nucleotide models more advanced at the moment than
aminoacid models
Programs require lot of capacity from the system
Maximum likelihood is very expensive and extremely slow to
compute
PHYLIP programs include two programs DNAML and
DNAMLK.

Comparison of Methods
Distance Maximum
parsimony
Maximum likelihood
Uses only pairwise
distances
Uses only shared
derived characters
Uses all data
Minimizes distance
between nearest
neighbors
Minimizes total
distance
Maximizes tree likelihood
given specific parameter
values
Very fast Slow Very slow
Easily trapped in local
optima
Assumptions fail
when evolution is
rapid
Highly dependent on
assumed evolution
model
Good for generating
tentative tree, or
choosing among
multiple trees
Best option when
tractable (<30 taxa,
homoplasy rare)
Good for very small data
sets and for testing trees
built using other methods

http://www.genome.jp/tools/clustalw/
Tags