phylogeny with modern methods explanation with examples
shumailabatool13
14 views
40 slides
Jul 22, 2024
Slide 1 of 40
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
About This Presentation
Explain the new methods of phylogeny
Size: 2.29 MB
Language: en
Added: Jul 22, 2024
Slides: 40 pages
Slide Content
ASSESSING
MICROBIAL
PHYLOGENY
WITH
MODERN
METHODS
LECTURE 4-1
BY: DR. SAADIA IJAZ
•Macromolecular data, meaning gene (DNA) and protein sequences, are accumulating at
an increasing rate because of recent advances in molecular biology.
•For the evolutionary biologist, the rapid accumulation of sequence data from whole
genomes has been a major advance, because the very nature of DNA allows it to be
used as a "document" of evolutionary history.
•Comparisons of the DNA sequences of various genes between different organisms can
tell a scientist a lot about the relationships of organisms that cannot otherwise be
inferred from morphology, or an organism's outer form and inner structure.
•Because genomes evolve by the gradual accumulation of mutations, the amount of
nucleotide sequence difference between a pair of genomes from different organisms
should indicate how recently those two genomes shared a common ancestor.
•Two genomes that diverged in the recent past should have fewer differences than two genomes
whose common ancestor is more ancient.
•Therefore, by comparing different genomes with each other, it should be possible to derive
evolutionary relationships between them, the major objective of molecular phylogenetics.
Homologs
•Studies of gene and protein evolution often involve the comparison of homologs, sequences
that have common origins but may or may not have common activity.
•Sequences that share an arbitrary level of similarity determined by alignment of matching bases
are homologous.
•These sequences are inherited from a common ancestor that possessed similar structure,
although the ancestor may be difficult to determine because it has been modified through
descent.
Homologs are most commonly defined as orthologs,
paralogs, or xenologs.
Orthologs are homologs produced by speciation—they
represent genes derived from a common ancestor that
diverged because of divergence of the organism. Orthologs
tend to have similar function.
Paralogs are homologs produced by gene duplication and
represent genes derived from a common ancestral gene
that duplicated within an organism and then diverged.
Paralogs tend to have different functions.
Xenologs are homologs resulting from the horizontal
transfer of a gene between two organisms. The function of
xenologs can be variable, depending on how significant the
change in context was for the horizontally moving gene. In
general, though, the function tends to be similar.
An ancestral geneduplicatesto produce twoparalogs(Genes A
and B). A speciation event producesorthologsin the two
daughter species.
❖Only orthologes can be used in the construction of
phylogenetic trees. The classical example is the 16S
ribosomal RNA gene.
Orthologs versus paralogs: haemoglobin as an example. Orthologs
are a consequence of speciation, whereas paralogs are a
consequence of gene duplication. Human α-and β-haemoglobin
share 43% identity whereas Human α-haemoglobin and Mouse α-
haemoglobin share 87% identity. When performing phylogenetic
analyses, the orthologous α-haemoglobin subunits from different
animals branch together separate from their paralogs, the β-
haemoglobin subunits. Taken together, all haemoglobin subunits are
homologs.
HORIZONTAL GENE TRANSFER
•Lateral or horizontal gene transfer is the movement of fragments of DNA between species not
closely related to each other.
•While most genes of an organism are received from their parent cell, a HGT event allows a gene
to be transferred from another species via transformation, transduction or conjugation.
•Of the genes transferred in this way, those incorporated into the recipient bacteria's genome
are usually 'contingency genes', meaning that their usefulness is specific to certain
environments, hence why they were not always necessary in the cell throughout evolutionary
time.
•Commonly swapped genes include antibiotic resistance, metabolic
•enzymes for unusual substrates and virulence factors which allow bacteria to quickly adapt to a
new environment or lifestyle.
•This is not a hard and fast rule however, as there is at least one example of an rRNA gene being
transferred and then maintained in the recipient cell.
•The ability of bacteria to take up foreign genes in this way is one of the cornerstones of the
biotechnology industry - but is a problem for evolutionary biologists trying to determine
relationships between organisms.
•Species diverged billions of years ago can have a shared gene from a HGT event only a matter of
decades ago.
•Evolutionary trees are written on the assumption that there is a clonal pattern of evolution.
•Such models assume that there is no horizontal gene transfer between lineages.
•On the other hand a reticulate model is need if horizontal gene movement is significant.
•With minor levels of HGT, cobwebs on the tree of life are an appropriate metaphor for
evolutionary change.
•A diagram showing various degrees of lateral gene transfer.
HGT — gene exchange between non-related
organisms — appears commonplace among
bacteria, but contributes just small fragments of
genetic information, leaving the traditional tree
of life intact.
Examples of horizontal gene transfer
Antibiotic resistance
•In the years 1987-1996 in Vietnam there was a medically alarming appearance of
resistance to the antibiotic chloramphenicol in dangerous meningococci bacteria that
cause infectious meningitis, hampering the treatment of the disease in that country.
•Subsequent laboratory investigations revealed that the gene conferring chloramphenicol
resistance on pathogenic meningococcus bacteria was identical to a previously identified
mobile gene named Tn4451 found in the completely different bacteria Clostridium
perfringens .
•Meningococci are Gram negative and aerobic microbes, while Clostridium is Gram
positive and anaerobic- about as different from one-another as bacteria can be.
In order to determine true evolutionary relationships between organisms, it is essential that the
correct molecules be chosen for sequencing studies.
This is important for several reasons:
1.The molecule should be universally distributed across the group chosen for study.
2.It must be functionally homologous in each organism; phylogenetic comparisons must
start with molecules of identical function.
3.It is crucial in sequence comparisons to be able to properly align the two molecules in order to
identify regions of sequence homology and sequence and sequence heterogeneity.
4.Finally, the sequence of the molecule chosen should change at a rate commensurate with the
evolutionary distance measured. And in fact the broader the phylogenetic distance being
measured the slower must be the rate at which the sequence changes. A molecule that has
undergone too many sequence changes is useless.
Many molecules have been proposed as molecular chronometers but we only discuss the most
widely used: the rRNA.
Comparison of Proteins
•The amino acid sequences of proteins are direct reflections of mRNA sequences and therefore
closely related to the structures of
•the genes coding for their synthesis.
•For this reason, comparisons of proteins from different microorganisms are very useful
taxonomically.
•There are several ways to compare proteins. The most direct approach is to determine the
amino acid sequence of proteins with the same function.
•The sequences of proteins with dissimilar functions often change at different rates; some
sequences change quite rapidly, whereas others are very stable.
•Nevertheless, if the sequences of proteins with the same function are similar, the organisms
possessing them are probably closely related.
•The sequences of cytochromes and other electron transport proteins, histones, heat-shock
proteins, transcription and translation proteins, and a
•variety of metabolic enzymes have been used in taxonomic studies.
Nucleic Acid Base Composition
•Microbial genomes can be directly compared, and taxonomic similarity can be estimated in many
ways.
•The first, and possibly the simplest, technique to be employed is the determination of DNA base
composition.
•DNA contains four purine and pyrimidine bases: adenine (A), guanine (G), cytosine (C), and
thymine (T).
•In double-stranded DNA, A pairs with T, and G pairs with C.
•Thus the (G+C)/(A+T) ratio or G+C content, the percent of G C in DNA, reflects the base sequence
and varies with sequence changes as follows:
•The base composition of DNA can be determined in several ways.
•Although the G+C content can be ascertained after hydrolysis of DNA and analysis of its bases
with high-performance liquid chromatography (HPLC), physical methods are easier and more
often used.
•The G+C content often is determined from the melting temperature (Tm) of DNA.
•In double-stranded DNA three hydrogen bonds join GC base pairs, and two bonds connect AT
base pairs.
•As a result DNA with a greater G+C content will have more hydrogen bonds, and its strands will
separate only at higher temperatures—that is, it will
•have a higher melting point. DNA melting can be easily followed spectrophotometrically
because the absorbance of 260 nm UV light
•by DNA increases during strand separation.
•When a DNA sample is slowly heated, the absorbance increases as hydrogen bonds are
broken and reaches a plateau when all the DNA has become single stranded.
•The midpoint of the rising curve gives the melting temperature, a direct measure of the
G+C content.
•Since the density of DNA also increases linearly with G C content, the percent G+C can be
obtained by centrifuging DNA in a CsCl density gradient.
•The G+C content of many microorganisms has been determined.
•The DNA of both eucaryotic and procaryotic microorganisms varies greatly in G+C
content; procaryotic G+C content is the most variable, ranging from around 25 to almost
80%.
•Despite such a wide range of variation, the G +C content of strains within a
particular species is constant.
•If two organisms differ in their G+C content by more than about 10%, their
genomes have quite different base sequences.
•On the other hand, it is not safe to assume that organisms with very similar
G+C contents also have similar DNA base sequences because two very
different base sequences can be constructed from the same proportions of
AT and GC base pairs.
•Only if two microorganisms also are alike phenotypically does their similar
G+C content suggest close relatedness.
Ribosomal RNA
•Ribosomal ribonucleic acid (rRNA) is the central component of the ribosome.
•Ribosomes are cellular machines for the construction of proteins and enzymes.
•The function of the rRNA is to provide a mechanism for decoding mRNA into amino acids and to
interact with the tRNAs during translation by providing peptidyl transferase activity.
•The tRNA then brings the necessary amino acids corresponding to the appropriate mRNA codon.
Ribosome
•The ribosome is composed of two subunits, named for how rapidly they sediment when subject
to centrifugation.
•mRNA is sandwiched between the small and large subunits and the ribosome catalyzes the
formation of a peptide bond between the 2 amino acids that are contained in the rRNA.
•The ribosome also has 3 binding sites called A, P, and E.
•The A site in the ribosome binds to an aminoacyl-tRNA (a tRNA bound to an
amino acid).
•The amino (NH2) group of the aminoacyl-tRNA, which contains the new
amino acid, attacks the ester linkage of peptidyl-tRNA (contained within the
P site), which contains the last amino acid of the growing chain, forming a
new peptide bond. This reaction is catalyzed by peptidyl transferase.
•The tRNA that was holding on the last amino acid is moved to the E site, and
what used to be the aminoacyl-tRNA is the peptidyl-tRNA.
•A single mRNA can be translated simultaneously by multiple ribosomes.
Prokaryotic VS. Eukaryotic Ribosomes
Type Size Large subunitSmall subunit
prokaryotic70S 50S (5S, 23S)30S (16S)
eukaryotic 80S
60S (5S, 5.8S,
28S)
40S (18S)
•Both prokaryotic and eukaryotic ribosomes can be broken down into two subunits
(the S in 16S represents Svedberg units):
•Bigger particles tend to sediment faster and thus have higher svedberg values.
•Sedimentation rate does not depend only on the mass or volume of a particle, and
when two particles bind together there is inevitably a loss of surface area.
•Thus when measured separately they will have svedberg values that may not add
up to that of the bound particle.
•The svedberg is the most important measure used to distinguish ribosomes, which
are important in phylogenetic studies.
•In prokaryotes a small 30S ribosomal subunit contains the 16S rRNA.
•The large 50S ribosomal subunit contains two rRNA species (the 5S and 23S
rRNAs).
•Bacterial 16S, 23S, and 5S rRNA genes are typically organized as a co-transcribed
operon.
•There may be one or more copies of the operon dispersed in the genome (for
example, Escherichia coli has seven).
•Archaea contains either a single rDNA operon or multiple copies of the operon.
•The 3' end of the 16S rRNA (in a ribosome) binds to a sequence on the 5' end of
mRNA called the Shine-Dalgarno sequence.
Why rRNA is important in Phylogenetic Analysis?
Due to the essential function of ribosomal nucleic acids:
• Mutation is often lethal
• Independent (constant) pressure of selection
• Highly conserved at many positions
• Comparison of analogous, but variable sequences
• Almost no gene transfer
Changes of sequences happen with a constant speed, but slowly
enough to mirror the whole time of bacterial evolution.
•This diagram shows
conserved and
variable regions of the
small subunit rRNA
(16S in prokaryotes or
18S in eukaryotes).
•Each dot and triangle
represents a position
that holds a
nucleotide in 95% of
all organisms
sequenced, though
the actual nucleotide
present (A, U, C, or G)
varies among species.
•The starred region from
part A as it appears in a
bacterium (Escherichia
coli), an archaean
(Methanococcus
vannielii), and a
eukaryote
(Saccharomyces
cerevisiae).
•This region includes
important signature
sequences for the
Bacteria and Archaea.
•Despite the usefulness of G+C content determination and nucleic acid hybridization studies,
genome structures can be directly compared only by sequencing DNA and RNA.
•Techniques for rapidly sequencing both DNA and RNA are now available;
•thus far RNA sequencing has been used more extensively in microbial
•taxonomy.
•Most attention has been given to sequences of the 5S and 16S rRNAs isolated from the 50S and
30S subunits, respectively, of procaryotic ribosomes.
•The rRNAs are almost ideal for studies of microbial evolution and relatedness since they are
essential to a critical organelle found in all microorganisms.
Nucleic Acid Based Sequencing
•Their functional role is the same in all ribosomes.
•Furthermore, their structure changes very slowly with time, presumably because of their
constant and critical role.
•Because rRNA contains variable and stable sequences, both closely related and very distantly
related microorganisms can be compared.
•This is an important advantage as distantly related organisms can be studied only using
sequences that change little with time.
•There are several ways to sequence rRNA. Ribosomal RNAs can be characterized in terms of
partial sequences by the oligonucleotide cataloging method as follows.
•Purified, radioactive 16S rRNA is treated with the enzyme T1 ribonuclease, which cleaves it into
fragments.
•The fragments are separated, and all fragments composed of at least six nucleotides are
sequenced.
•The sequences of corresponding 16S rRNA fragments from different procaryotes are
then aligned and compared using a computer, and association coefficients (Sab values)
are calculated.
•Complete rRNAs now are sequenced using procedures like the following.
•First, RNA is isolated and purified. Then, reverse transcriptase is used to make
complementary DNA (cDNA) using primers that are complementary to conserved rRNA
sequences.
•Next, the polymerase chain reaction amplifies the cDNA.
•Finally, the cDNA is sequenced and the rRNA sequence deduced from the results.
•Recently complete procaryotic genomes have been sequenced
•Direct comparison of complete genome sequences undoubtedly will become important
in procaryotic taxonomy.
Polymerase Chain Reaction
•The commonly used method to obtain DNA for sequence analysis is Polymerase Chain
Reaction (PCR).
•PCR amplifies genes logarithmically - a single molecule of a gene, imbedded in the rest
of the genomic DNA, is specifically amplified to up to a million molecules in just a
couple of hours!
•In a PCR reaction, 3 steps (denaturation, primer annealing, and DNA polymerization)
are cycled over-&-over, each time doubling the amount of the specific DNA fragment.
•The PCR product DNA is then “sequenced” (i.e. it’s nucleotide sequence is
determined), often using the same oligonucleotide primers that were used in the PCR
reaction.
•Sequencing involves denaturing the DNA, annealing an oligonucleotide primer, and
extending from this primer with DNA polymerase in the presence of dNTPs and small
amounts of 'chain terminator' dideoxynucleotides (analogs of dNTPs that DNA
polymerase cannot continue extending from).
•Usually this process is carried out by a commercial service rather than in a research
lab.
1. Denature the DNA (separate the strands) with heat or high pH.
2. Anneal an oligonucleotide primer complementary to the DNA:
3. Add all the 4 dNTPs and a small amount of each of the 4 dideoxydNTP
(ddNTP), each with a different fluorescent 'tag', and DNA polymerase:
4. Run sample on a high-resolution gel or capillary tube that can
separate DNAs that differ by only a single base:
•A fluorometer at the bottom of the gel or end of the capillary detects the
•termination dyes as they run past. The connected computer collects this data and 'reads' the
sequence from the pattern of peaks.
•The output from the computer looks like this:
•
•(notice that the colors used here don't match the example)
•Each reaction typically yields 500-800 bases of reliable sequence data, so it
•is usually necessary to use several primers spaced along the length of the
•molecule to get the complete sequence on an rRNA gene.
•It is also usually expected that you will sequence both strands of the DNA to confirm the
sequence.
Assembling sequences in a multiple sequence alignment -
identifying homologous residues
•The raw material used by a phylogenetic tree generating program is an
“alignment”.
•A sequence alignment is a 2-dimensional matrix of multiple sequences.
•Each sequence is in a line (row) of the matrix.
•Each position (column) in an alignment contains homologous (corresponding)
residues of each sequence.
•Gaps (usually shown as dashes) are added where needed to maintain the
alignment - these gaps represent “absent” bases in the that are present in
some other sequence(s) in the alignment.
ASSESSING
MICROBIAL
PHYLOGENY
WITH
MODERN
METHODS
LECTURE 4-2
BY: DR. SAADIA IJAZ
F.I.S.H
•Fluorescent in situ hybridization (FISH) is a powerful technique for detecting RNA
or DNA sequences in cells, tissues, and tumors.
•FISH provides a unique link among the studies of cell biology, cytogenetics, and
molecular genetics.
•Fluorescent in situ hybridization is a technique in which single-stranded nucleic
acids (usually DNA, but RNA may also be used) are permitted to interact so that
complexes, or hybrids, are formed by molecules with sufficiently similar,
complementary sequences.
•Through nucleic acid hybridization, the degree of sequence identity can be
determined, and specific sequences can be detected and located on a given
chromosome.
•In this method, cells in an environmental sample are treated to make them
permeable (e.g. with toluene) and mixed with an oligonucleotide probe that
contained a fluorescent tag such as Texas Red or Acridine Orange.
•The probe will find matches in the DNA and/or RNA of the permeabilized cells and
stick.
•Unannealed probe is washed out, and the sample is examined by fluorescent
microscopy.
•If enough probe accumulates in a cell, i.e. if it contains the target to which the probe is
designed, it should be fluorescent.
•The most common probes for FISH in the microbial world target the rRNA.
•There are two reasons why rRNAs are a good target for fluorescent in situ
hybridization, Firstly, it allows you to search using phylogenetically relevant sequences
- often rRNA sequence is all you know about an organism.
•This also allows you to 'tune' the range of organisms you will label - the more
conserved the target region of the rRNA, the wider the phylogenetic range of cells you
will label.
•Secondly, because there are thousands of ribosomes in each cell, a lot of probe can
bind to each cell, giving a strong signal.
•In fact, it is generally seen that only metabolically-active cells contain enough
ribosomes to be labeled with an rRNA probe.
Bacterial cells clustered around the filament of
Anabaena and hybridized with a fluorescently
labeled highly specific 16S rRNA probes (FISH).
Protein Profiles and Amino Acid Sequencing
•Every protein molecule consists of a specific sequence of amino acids and has a particular
shape with an assortment of surface charges.
•Modern laboratory methods allow cells or organisms to be compared according to these
properties of their proteins.
•Although variations in proteins among cells make these techniques difficult to apply to
multicellular organisms, they are quite helpful in studying unicellular organisms.
•A protein profile is a laboratory-prepared pattern of the proteins found in a cell.
•Because a cell’s proteins are the products of its genes, the cells of each
•species synthesize a unique array of proteins—as distinctive as a fingerprint is for humans.
•Analysis of the profiles of one or more proteins of different bacterial species provides a
reasonable basis for comparisons.