Phylogenetic analysis & their methods.ppt

DhanushV26 331 views 34 slides Aug 13, 2024
Slide 1
Slide 1 of 34
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27
Slide 28
28
Slide 29
29
Slide 30
30
Slide 31
31
Slide 32
32
Slide 33
33
Slide 34
34

About This Presentation

About phylogenetic analysis and their methods


Slide Content

Phylogenetic Analysis
Def: Taxonomy, phenetics, cladistics
What is Phylogenetic analysis?
Phylogenetic methods: pheonotypic and molecular
Mechanism of molecular phylogeny
What is phylogenetic tree?
Data: Morphological and genomic data
Def: node, branches, branch length, topology, root, OUT, rooted and unrooted
tree
Difference between species and gene tree
Difference between bifurcating and multifurcating tree
Steps in phylogenetic analysis: Alignment strategy, Tree building, tree
evaluation

Phylogenetic tree construction methods: classified based on data used
1.Molecular data – Character based method
2.Distance data – Distance based method
Difference between character and distance based method
Distance based methods: UPGMA and neighbor joining
Character based methods: maximum parsimony and maximum likelihood method
Tree evaluation methods:
Interpretation of phylogenetic results
Sotware: Phylip

Taxonomy: Taxonomy is the science dealing with description, identification,
nomenclature, and classification of living things.
Phenetics: In biology, phenetics, also known as numerical taxonomy or
taximetrics, is an attempt to classify organisms based on overall
similarity, usually in morphology or other observable traits, regardless of
their phylogeny or evolutionary relation.
Cladistics: or phylogenetic systematics, is a system of classifying living and
extinct organisms based on evolutionary ancestry as determined by
grouping taxa according to "derived characters," that is characteristics or
features shared uniquely by the taxa and their common ancestor or is a
method of classifying species of organisms into groups called clades, which
consist of an ancestor organism and all its descendants

PHYLOGENETIC ANALYSIS
The word “phylogeny” has been derived from two greek words – phylon
(stem) and genesis (origin).
Phylogeny gives an idea about the evolution or origin of the organism.
There are two types of phylogeny methods: Phenotypic phylogeny and
molecular phylogeny.
Phenotypic phylogeny: traditional method of phylogeny as it is based
upon phenotypic observations from the group of organisms. In due
course of time, scientists found that in this method difficult to classify the
micro-organisms because the phenotypic resemblance/dissimilarity may
be superficial.
All these paved the way for the arrival of novel concept of molecular
phylogeny. Linus pauling was the first to make the observation that
genetic sequences could be used for phylogeny, and this method is known
as molecular phylogeny.

Mechanism of Molecular Phylogeny
The primary mechanism of evolution at the molecular level is based on the
nucleotide substitution during the process of DNA replication.
All the other outward evidence of evolution (the phenotype) is the result of the
changes in the DNA sequences within an organism .
This mutation in the germ-line occurs through several inter-related mechanisms
such as base substitution and exon shuffling.
Germ-line mutation: Cells that produce gametes (eg. in ovaries and testes) are
called germ-line cells. If mutations occur in these cells, they can be inherited if
fertilization takes place. The offspring's cells will contain the mutation. Because
of this, mutations in germ-line cells contribute to the species gene pool and can
influence whole populations of organisms and their evolution.
 

Somatic mutations occur in cells in the body other than sex organs such as
ovaries and testes. Somatic cells that undergo mutations cannot pass these on
to offspring and so have no effect on species and their evolution. An example of
this a mutation in skin cells due to exposure to UV rays to cause skin cancer.
The mutation simply stays in the individual within a population and does not
affect the population as a whole.

Base substitutions:
Where one base in the DNA is substituted for another. This results in a change in
the DNA template to make RNA. This alteration is then transferred to the one
resulting codon on the mRNA and the subsequent amino acid sequence in
polypeptide production.
Example
Original DNA :
CAG TAG GTA
 
Substitute copy :
CAG AAG GTA
(A has substituted the original T)
 
Original mRNA :
GUC AUC CAU
  Original amino acids :
valine isoleucine histidine
  Substitute mRNA :
GUC UUC CAU
  Substitute amino acids:
valine phenylalanine histidine
This base substitution occurs through several mechanisms such as transposition,
insertion and deletion

Exon Shuffling:
Two genes exchange exons (represented by colored boxes)
through chromosomal crossover, resulting in the creation of two
completely new genes.

What is a Phylogenetic tree?
What kind of data can be used to build phylogenetic tree?
Evolutionary relationship between the organisms, the order of the
descendant

Representation of phylogeny: The most convenient way of visually representing
the evolutionary relationship among a group of organisms is phylogenetic tree.
Nodes represent taxonomic units or sequences. External nodes represent units
directly compared (eg. extant species), while internal nodes are ancestral or
hypothesized units. Units might be species, subspecies, order, in fact, any kind of
taxonomic unit (OTU: Operational Taxonomic Unit).
branches, which defines the relationship between the taxonomic units in terms of
descents and ancestry.
Branch Length, the number of changes or the rate of evolution
Topology: The branching pattern is called topology
Root: common ancestor of all taxa.
OTU (Operational Taxonomic Unit) is any group of organisms, or sequences
considered to be sufficiently distinct from each other and is treated as separate
unit.
The tree can be rooted or unrooted tree
Rooted Tree: specifies the evolutionary relationship between organisms
Unrooted tree: specifies the relationship among genes, species but not
evolutionary paths.

A tree with exactly two descendants arising from each interior node is
called bifurcating tree. A tree with more than two descendants
arising from each interior node is called multifurcating tree.

Steps in Phylogenetic Analysis:
The phylogenetic analysis in general is a three-step method:
1.Alignment Strategy
2.Tree building
3.Tree evaluation
1.Alignment Strategy:
The first step in producing an phylogenetic tree is the identification and
alignment of homologous sequences. To create an alignment a multiple
sequence alignment (MSA) method is used. The number or types of changes
in the residues of a MSA is a starting point of a phylogenetic analysis. Each
column in the alignment predicts mutations that occurred at one site during
the evolution of the sequence family, revealing which positions in the
sequences were conserved and which diverged from a common ancestor
sequence. Ex: Clustal W
Distance calculation (pairwise distance)
Constructing tree
Multiple alignment

CLUSTALW

Tree building methods: Distance and character based methods

1.UPGMA Method: (Unweighted Pair Group Method with Arithmetic
Mean)
1. The UPGMA is the simplest method of tree construction.
2. It is a cluster analysis derived from the clustering algorithms
popularized by Sokal and Sneath (1973).
Steps:
1.Begins with the construction of distance matrix
2.The two taxa that have the smallest distances are clustered together
and form a new OTU.
3.The branch length for the two taxa are taken to be the half of the
distance between them. Ex: If the number of changes between the two
taxa is 7 means, the branch length for the two taxa is 3.5 and 3.5.
4.A new distance matrix is constructed with a new OTU to the other
taxas, the taxa having smallest distance would be clustered together to
the new OTU and then the branch length will also be calculated .
5.The iteration repeat till all the taxas are clustered together

For example: if 4 sequences are there, pairwise comparisons done between
these groups (1,1), (1,2), (1,3), (1,4), (2,2), (2,3), (2,4) and (3,3), (3,4) and (4,4)
(1,1), (2,2), (3,3) and (4,4) – distance is zero
(1,2), (1,3), (1,4), (2,3), (2,4), (3,4) – the distance (less number of changes)
can be measured.
Find the closest clusters
If (1,3) are having the less number of changes means then that two will be
clustered.
The branch length distance will be calculated by taking the average distance.
Ex: If the number of changes between the two sequences is 7 means, the
branch length for the two taxa is 3.5 and 3.5.
With the (1,3) clusters, the comparison will be proceed with 2 and 4.
With 2, the comparison [(2,1) and (2,3)] and with 4, the comparison [(4,1) and
(4,3)] . Find the average for the joint pair.
This process is repeated until all the sequences are clustered.

Advantage: Outputs a rooted tree. It is very simple and fast.
Disadvantage: It assumes a constant rate of evolution of the sequences in
all branches of the tree. This assumption is unlikely to hold in particular if the
sequences are separated by large evolutionary distance.

Neighbor Joining Method:
Another very popular distance method is the NJ method.
Advantage: Does not assume the rate of evolution is same in all the branches. Fast in
case of large datasets. This method yield an unrooted tree.
Disadvantage: Sequence information is reduced.
Steps:
1.Similar to the UPGMA method, the distance matrix is calculated.
2.Then the net divergence for each OTU from all the other OTU’s will be calculated i.e.
the total amount or sum of changes between the first taxa to the others and similarly for
the other taxa’s also.
3.Now the new distance matrix is calculated for each pair of OTU’s using the formula.
4.The new distance for each pair = The number of changes between two taxa– [(Net
divergence of first OTU + Net divergence of the second OTU)]/(number of operational
taxonomical units-(minus)2).

5. For example A and B are neighbours and the new node called U will be
formed. Then the branch length is calculated from the internal node U to the
external OTUs A and B using the formula.
The branch length of the first taxa from its internal node= the number of
changes between the first taxa to the second taxa/2+[net divergence of the
first taxa-net divergence of the second taxa]/2(number of OTU’s-2) and the
branch length of the second taxa from its internal node is calculated using =
Number of changes between the first and second taxa-the branch length of the
first taxa from its internal nodes.
Then the new distances from U to every other terminal nodes are calculated.
The branch length of the third taxa from its internal node = number of changes
between first and third taxa plus the number of changes between the second
and third taxa – number of changes between the first and second taxa.
Advantages: Fast and suited for large datasets, it permits the lineages with large
different branch lengths
Disadvantages: This method is strongly dependent on the model of evolution
used and sequence information is reduced.

Character based methods: Determination of the substitution model:
The model for how one nucleotide substitutes for another. 1. Jukes-Cantor one-
parameter model and 2. Kimura two-parameter model

Maximum Parsimony method:
Maximum parsimony is a simple but popular technique used to infer
phylogenetic tree for a set of taxa. It explains the number of substitution or
evolutionary changes for all the sequences to derive from common ancestor.
Invariant sites are not used in parsimony (no character changes)
Informative sites (at least two different kinds of residues)
Singletone sites cannot be used in this method.
With the informative sites, number of possible trees are generated and are
given score based on the fitsch scheme and transversion parsimony scheme
In fitsch scheme, G to G is given ‘0’ and for the other conversions 1 is given.
In TP scheme, G-G is 0, Purine to pyrimidine is 4 and vice versa, purine to
purine is 1 and for pyrimidine to pyramidine.
In this way, the number of substitution or evolutionary changes for all the
sequences could be estimated.
Very faster than ML method

Maximum Likelihood method:
ML methods create all the possible trees containing the set of organisms
considered, and then use the statistics to evaluate the most likely tree.
It is possible for small number of organisms and not for very large number of
organisms.
Steps:
1.Substitution model is chosen for the sequence data (alignment data)
i.e. Jukes-cantor model , the number of possible trees will be generated, the
tree which has maximum likeliness will be the closest

The basic output of all the methods will be a text file containing a description
of the tree. The most widely used format for phylogentic tree is phylip format.
Software: PHYLIP (Phylogenetic inference package)

Interpretation of Phylogenetic Data
A dendrogram is a broad term for the diagrammatic representation of a
phylogenetic tree.
A cladogram is a phylogenetic tree formed using cladistic methods. This type
of tree only represents a branching pattern and its branch lengths are equal
A phylogram is a phylogenetic tree that has branch lengths proportional to
the amount of character change.

Applications of Phylogeny

Classification: Phylogenetics based on sequence data provides us with more
accurate descriptions of patterns of relatedness than was available before the
advent of molecular sequencing. Phylogenetics now informs the Linnaean
classification of new species.
 
Forensics: Phylogenetics is used to assess DNA evidence presented in court
cases to inform situations, e.g. where someone has committed a crime, when food
is contaminated, or where the father of a child is unknown.
 
 
Identifying the origin of pathogens:
 Molecular sequencing technologies and
phylogenetic approaches can be used to learn more about a new pathogen
outbreak. This includes finding out about which species the pathogen is related to
and subsequently the likely source of transmission. This can
 lead to new
recommendations for public health policy.
Conservation: Phylogenetics can help to inform conservation policy when
conservation biologists have to make tough decisions about which species they try
to prevent from becoming extinct.
Bioinformatics and computing: Many of the algorithms developed for
phylogenetics have been used to develop software in other fields.
Tags