Alignment Recap What is the fundamental principle underpinning multiple sequence alignment? Homology - we want all columns in our alignment to consist of characters which share common ancestry Alignments are made up from conserved and variable blocks Alignment programs aim to maximise conserved blocks CLUSTAL - very widely used, uses Progressive Sequence Alignment MUSCLE - more sophisticated program, uses multiple processes to avoid the problems of ‘once a gap, always a gap’ Gap penalties - gaps are rare in nature, so we want penalize them in our alignments. Use different gap penalties for datasets of closely and distantly related sequences
Aims: To learn how to read and interpret phylogenetic trees and get a general overview of phylogenetic analysis Objectives: at the end of this lecture you should: b e able to differentiate between monophyletic , paraphyletic & polyphyletic groups Introduction to Phylogenetic Trees be able to differentiate between orthologues & paralogues be able to understand what a phylogenetic tree is understand the standard terminology used in phylogenetics be able to write out phylogenetic trees in the Newick format
branches , also called “ edges ” - internal (nodes to nodes) or terminal (nodes to terminals) phylogenetic tree = evolutionary tree = phylogeny = dendrogram - a graphic display of predicted evolutionary relationships terminals = “ leaves ” = operational taxonomic units, “ OTUs ” - OTUs = genes or proteins , tree = “ gene tree ” - OTUs = organisms (taxa), tree = “ species tree ” d - may be created for genes, proteins or species a b c e f g h i Terminology - consists of “ branches ” and “ nodes ”
node = point at which two branches diverge - gene tree : divergence = gene duplication event - species tree : divergence = speciation event - correspond to hypothetical last common ancestor - branches represent divergence event root = origin of the tree, or sub-tree a b c e f g h i Terminology d
Polytomy Terminology Drosophila simulans complex Unresolved phylogenetic relationships Three or more branches leading from one node Polytomies may result from a lack of phylogenetic data ( soft polytomy ) Soft polytomie s may be resolved by increasing the phylogenetic signal, i.e. using more data Polytomies may arise if multiple speciation events take place instantaneously ( hard polytomy ) - see the D. simulans complex Hard polytomies cannot be resolved by increasing the volume of data
root = oldest point in the tree if molecular clock -> root would be in the middle ( i.e . common ancestor equidistant from everything) v without a clock (i.e., in the real world) need external point of reference v = outgroup , = anything not in your ingroup (= group of interest) for gene trees can use distant relative/gene family for species tree use sister group = closest relative to ingroup v root - doesn ’ t change distances, but shows chronological order of events v 2 1 1 2 Rooting Phylogenetic Trees:
Phylograms & Cladograms Phylogram Cladogram Branch lengths informative Scale bar (no. of subs/site) Branch lengths non-informative Easier to read No scale bar
= Coffee Chocolate Caviar Oyster Lobster Truffle Nori Coffee Chocolate Caviar Oyster Lobster Truffle Nori Coffee Caviar Lobster Chocolate Oyster Truffles Nori Tree Conversion Caviar Lobster Coffee Chocolate Oyster Nori Truffles
paraphyletic group (convenience) nodes define “ clades ” clade = monophyletic group = node plus all descendants - share unique common ancestor (relative to the rest of the tree) and common history monophyletic (pure) group ( clade ) polyphyletic group (similarities = parallel / convergent evolution ) Trees Are About Groupings
Paraphyly & Polyphyly The two groupings are often difficult to differentiate The smaller number of parsimony steps indicates which is the more likely type of group Adapted from James et al. (2006) Nature 443 :818-822
Paraphyly & Polyphyly Chytridiomycota paraphyly : 6 steps Chytridiomycota polyphyly : 4 steps Adapted from James et al. (2006) Nature 443 :818-822 G L G G G G L L L L
Paraphyly & Polyphyly Zygomycota paraphyly : 3 steps Zygomycota polyphyly : 5 steps Adapted from James et al. (2006) Nature 443 :818-822 G G L L G G G G
Questions What type of group are the mammals ? a) monophyletic, b) paraphyletic, c) polyphyletic
Questions What type of group are the mammals ? a) monophyletic , b) paraphyletic, c) polyphyletic G
Questions What type of group are the reptiles ? a) monophyletic, b) paraphyletic, c) polyphyletic
Questions What type of group are the reptiles ? a) monophyletic, b) paraphyletic , c) polyphyletic 3 steps vs. 4 steps G L G L G G G
Questions What type of group are the slugs ? a) monophyletic, b) paraphyletic, c) polyphyletic
Questions What type of group are the slugs ? a) monophyletic, b) paraphyletic, c) polyphyletic 3 steps vs. 5 steps G L G G G L L L
Questions What type of group are the Monosiga ? a) monophyletic, b) paraphyletic, c) polyphyletic Adapted from Nitsche et al. (2011) JEM 58 :452-462
Questions What type of group are the Monosiga ? a) monophyletic, b) paraphyletic, c) polyphyletic 2 steps vs. 7 steps Adapted from Nitsche et al. (2011) JEM 58 :452-462 G G G L L L L L L
Questions What type of group are the Salpingoeca ? a) monophyletic, b) paraphyletic, c) polyphyletic Adapted from Nitsche et al. (2011) JEM 58 :452-462
Questions What type of group are the Salpingoeca ? a) monophyletic, b) paraphyletic, c) polyphyletic Unknown: 4 steps vs. 4 steps Adapted from Nitsche et al. (2011) JEM 58 :452-462 L G L L G G G G
X and X ’ are paralogues , i.e. XX ’ is a multigene family. All the X genes are orthologues of each other All the X’ genes are orthologues of each other. A. B. for orthologues , gene trees = species trees It is essential not to mix up orthologues and paralogues for species trees Homologues can be Orthologues or Paralogues
E. coli Listeria Salmonella Bacillus Mycoplasma Strep E. coli Listeria Salmonella Bacillus Mycoplasma Strep X * very common in bacteria, e.g. pathogenicity island, antibiotic resistance genes important in bacterial evolution -> new metabolic pathways, etc. (e.g. E.coli K12 vs E.coli 0157, ~1.5 mB difference in genome size) Xenologues ( Xenology ) Results from Lateral Gene Transfer (LGT)
v methods: parsimony , maximum likelihood , bayesian inference v tree based single metric: % difference (distance) between sequences Distance methods Discrete data (tree searching) methods v also referred to as “ clustering ” or “ algorithmic ” methods v take data (matrix of % D), plug into equation, -> tree, one solution only v fast, easy, reasonably accurate, good enough for many things v each column in alignment = discrete data point i.e., hypothesis for each column of alignment v for >8 OTUs = a lot of possible trees v much more detail, precision..., much slower v look for tree that best fits this collection of hypotheses Two Main Categories of Phylogenetic Methods v methods: (UPGMA), neighbour-joining
common interchange format, read by most tree drawing programs also called “ New Hampshire format ” ( A , B ) A B C D E F ( D , E ) ( , C) ( , ) describes a tree using set annotation all parenthesis must be balanced all taxa and groups are separated by commas no spaces ended by semi-colon ( , F ); Newick Format: Written Tree Description Very useful for producing representative phylogenies from different studies
Summary A phylogenetic tree provides a graphical representation of evolutionary relationships between genes , proteins or species The root represents the oldest part of the tree from which all descendant sequences are derived A monophyletic group, or clade , is a pure group which contains the ancestral node and all descendant sequences Orthologues are homologous genes derived by speciation events Paralogues are homologous genes derived by gene duplication events within a single genome Molecular phylogenies are created using either distance methods or discrete data methods The Newick format allows phylogenies to be written out in a single line of text An outgroup , of closely related sequences , is required in order to accurately place the root