Dr. ANUPAM KUMAR ANVESHI SR MICROBIOLOGY VMMC & SJH M O L E C U L A R E SCHERICHIACOLI E NVIRONMENT A DENINE G UANINE N UCLEUS O LIGONUCLEOTIDES M ETADATA C YTOSINE S EQUENCING T HYMINE I N-SITU
CONTENTS Great plate count anomaly Introduction Metagenomic study workflow Pre-sequencing considerations Sampling and data generation Screening of metagenomic data Applications Limitations
Great plate count anomaly * Staley & Konopka (1985)
Metagenomics* (Environmental genomics/ Eco-genomics / Community genomics) is the study of the collective genomes of the members of a microbial community WITHOUT CULTURING THE ORGANISMS IN THE COMMUNITY. * Jo Handelsman et.al (1998)
COMMUNITY OF ORGANISMS METAGENOMIC LIBRARY METAGENOMIC STUDY WORKFLOW
Presequencing considerations Community composition Selection of sequencing technology
Community composition Microbial communities comprise combinations of bacteria, archaea, eukaryotes, and viruses, often co-occurring in a single habitat Currently only communities with no eukaryotic or protist members are studied (due to enormous genomic size) Community complexity- is a function of the number of species in the community (richness) and their relative abundance (evenness) Two types- With dominant species (acid mine drainage biofilms, gutless worm symbiont community, enhanced biological phosphorous removing sludge, anaerobic ammonia-oxidising reactor) Without dominant species (species rich communities found in soil)
Selection of sequence technology In metagenomics, LONGER READS ARE BETTER Longer reads = sequencing of more variable regions = more discriminatory power Greater taxa identification
Sampling and data generation Sample collection for metagenomes and metadata Pre- metagenome Community Composition Profiling Nucleic acid extraction Metagenomic Library Preparation
Sample Collection for Metagenomes and metadata Sufficient samples are collected Complementary analysis studies performed to enhance metagenomic data (meta- transcriptomics , metaproteomics, viral metagenomics, metabolomics etc. Collateral non-sequence data (metadata) for comparative analysis of temporal distinction within the same community i.e. collection date, pH , temperature, salinity, geographical data, DNA extraction method and clone library details is collected
Pre-metagenome Community Composition Profiling Community composition is necessary for sequence allocation and processing Conserved marker gene analysis- 16S rRNA analysis Structural RNA of prokaryotic ribosome Ribosomal RNAs and proteins tend to be very similar even in very different organisms Contains highly conserved regions as well as variable regions Utilised to construct primers for pan-bacterial PCR (polymerase chain reaction) 16S rRNA sequencing- basis of microbial ecology studies
Carl Woese American microbiologist and biophysicist (12 July1928 -20 December 2012) Pioneered phylogenetic taxonomy of 16S rRNA Amplify Sequenced- species identification
Nucleic acid extraction Nucleic acid extracted should be representative of all cells present in the sample. Sufficient amounts (micrograms) of high-quality nucleic acids must be obtained for subsequent library production Factors to be considered while choosing the method Extraction yield Maintenance of n ucleic acid integrity Purity of extracted material
Nucleic acid extraction methods can be DIRECT - prior cell lysis in the sample matrix and the subsequent collection and purification of genetic material INDIRECT - separation and collection of cells from the matrix and its subsequent lysis and purification of genetic material (yield is low as separation methods tend to degrade the cells and their genetic material)
Direct Nucleic acid extraction methods Method Cell lysis method DNA purification/concentration Ogram Beads are used to break the cells DNA precipitation by PEG Extraction with phenol-chloroform Concentration- ethidium bromide Tsai Use of lysozyme, followed by freezing and thawing Phenol-chloroform extraction DNA precipitation with isopropanol Purification- gel permeation column Soft lysis Use of lysis buffer, lysozyme and proteinase K Extraction with chloroform Precipitation with isopropanol DNA purification- electrophoresis on 3% agarose gel Harsh lysis zirconia/silica beads, followed by vortexing for lysis Extraction with chloroform Precipitation with isopropanol DNA purification- electrophoresis on 3% agarose gel
Indirect nucleic acid extraction methods Method Cell lysis method DNA purification/concentration Blending method Cell harvesting- successive dilution in specific mixer buffer Centrifugation- low speed then high speed Lysis- lysozyme and proteinase K DNA precipitation by PEG Extraction with phenol-chloroform Concentration- ethidium bromide Jacobsen method Cell harvesting- cation exchange resin Cell lysis- lysozyme and pronase Density gradient separation- cesium chloride DNA purification- ethidium bromide
The construction of the library consists of cloning of DNA fragments at specific vectors to be inserted into a host cell strains, followed by screening for the genes and/or functions of interest Metagenomic library construction
Libraries can be constructed from DNA fragments of lengths 25-200 Kb Vectors used ( depend on size of the insert ) Bacterial artificial chromosome (BAC)- 100-200 Kb Yeast artificial chromosome (YAC)- 40 Kb Fosmids (DNA vector that use F-plasmid origin of replication)- 25-40 Kb Cosmids (hybrid plasmid containing lambda phage sequence)- 25-35 Kb Plasmids- <15 Kb (small insert library) Vectors
used for cloning and expression of genes Escherichia coli (easily transformable and well defined genome)- can express upto 40% of genes from metagenomic library. Alternate hosts - Bacillus, Pseudomonas, Streptomyces and Archaea bacteria ( Methanococcus , Pyrococcus , Sulfolobus , Thermococcus ) Host
Broad range host vectors - able to replicate and express in more than one type of host e.g. VECA (artificial chromosome vectors of E. coli - Streptomyces ) used for Actinomycetes genes (high GC content) Metagenomic DNA is separated based on G+C content by ultracentrifugation Low G+C DNA is transformed in the E. coli host High G+C DNA is used for the library in Streptomyces Contd.
Screening of metagenomic data Function based Sequence based
Function based screening Isolation of DNA from microbial communities to study the functions of encoded proteins Involves cloning DNA fragments, expressing genes in a surrogate host, and screening for enzymatic activities Strategies involved- Direct detection of gene products in individual clones, normally using fluorescent catabolic products to evaluate the enzymatic reaction Heterologous complementation of host strains or mutants, allowing the growth of clones having some supplementation in the insert ( clones grow in selective conditions ) Induced gene expression Enzymatic assay
Newer advancements in functional metagenomics High throughput functional screening methods ( employ the resolving power of FACS (fluorescence-activated cell sorting) or fluorescence microscopy) SIGEX (substrate induced gene expression) METREX (metabolite regulated expression) PIGEX (product induced gene expression) Development of new host systems (currently being explored) Pseudomonas, Rhizobium, Streptomyces, Ralstonia and Bacillus Archaea bacteria- Methanococcus , Pyrococcus , Sulfolobus and Thermococcus
Sequence based screening Screening by PCR Screening by hybridization Screening by high throughput sequencing (HTS)
Allowing the identification of members of a particular environment and their phylogenetic relationships Primers designed for screening-specific characteristics of biotechnological interest, such as enzymes, antibiotics, or resistant genes Applied for the screening of metagenomic libraries which can be obtained in clones. Clones screened in pools of 100. SCREENING BY PCR
Genes screened commonly- e.g. rRNA , recA (DNA repair/recombination) radA (DNA repair) Nif (nitrogen fixation) phenol hydroxylase
Probes are constructed from homologous sequences present in online databases Target- gene-encoding enzymes such as dioxygenases, nitrite reductases, hydrogenases, hydrazine oxidoreductases, chitinases, and glycerol dehydratase; enzymes involved in pollutant compound degradation, genes for different antibiotics or taxonomic groups Currently, analyses based on hybridization using microarrays most applied. Screening by Hybridisation
GeoChip - fixed oligonucleotide chip 4 generations of GeoChips available 1 st generation- 1,662 probes: genes for degradation of organic contaminants and metal resistance 2 nd generation- 24,243 probes: gene families involved in C, N, and P cycles, sulfate reduction, and metal reduction 3 rd generation- 56,990 probes: antibiotic-resistant genes , energy processing, and markers such as 16S rRNA and gyrB . 4 th generation- 83,992 probes: functional genes, including sequences derived from fungi, archaea, bacteria, and viruses
Viral Screening ( Virochip )- developed for detection of different viral families Human Gut Chip ( HuGChip ) and Human Intestinal Tract chip ( HITChip ) HuGChip’s probes are able to identify microorganisms by genus HITChip is able to identify them on the species level Chip for antibiotic-resistant gene screening 8,746 probes for the 9 major groups of resistant genes Aminoglycosides, penicillin beta-lactamases, amphenicols, trimethropim , macrolide lincosamide streptogramin B, sulphonamides, tetracyclines, vancomycin
Screening by high throughput sequencing Next-generation sequencing (NGS) techniques employed Whole metagenomic clones or direct metagenomic DNA are totally sequenced to elucidate the diversity of complex microbial communities Selection of a particular NGS platform has to be made on the basis of varying features Read length, degree of automation, throughput per run, data quality, ease in data analysis, and cost per run
Newer advancements in sequencing technologies Ion Torrent/ Ion proton- Based on the principle that protons released during DNA polymerization can detect nucleotide incorporation . Single- molecule, real-time detection (SMRT): detects incorporation of nucleotides in real time . read lengths of > 100 bp https://youtu.be/_lD8JyAbwEo?t=67
DNA nanoball with probe-anchor ligation read length- 35bp https://youtu.be/oKuyso3FCGI?t=32
Assembly Assembly is the process of combining sequence reads into contiguous stretches of DNA called contigs, based on sequence similarity between reads Two strategies Reference-based assembly (co-assembly)- depends on availability of closely related reference genomes de novo assembly - no reference sequence available Assembly software- AMOS, MIRA, MetaVelvet and Meta-IDBA
Binning Process of sorting DNA sequences into groups that might represent an individual genome or genomes from closely related organisms Two methods Compositional binning - makes use of the fact that genomes have conserved nucleotide composition (GC content) Similarity based binning - Unknown DNA fragment might encode for a gene and the similarity of this gene with known genes in a reference database can be used to classify Binning softwares- Phylopythia , S-GSOM, PCAHIER, TACAO, MG-RAST, MEGAN, CARMA
Gene annotation/gene calling Procedure of identifying protein and RNA sequences coded on the sample DNA Can be done on post-assembly contigs, on reads from the unassembled metagenome, and for a mixture of contigs and individual unassembled reads Two approaches Evidence based- use homology searches to identify genes similar to those observed previously ab- initio - relies on intrinsic features of the DNA sequence to discriminate between coding and noncoding regions, allowing the identification of genes without homologs Bioinformatic tools available- FragGeneScan , MetaGeneMark , MetaGeneAnnotator , CAMERA
Data analysis and storage Statistical analysis tools used for data analysis Include- MG-RAST, MetAMOS , MEGAN 4, IMG/M, CAMERA, GALAXY These bioinformatic tools also provide platform for data storage, centralized sharing and comparative and functional analysis of metagenomic data
APPLICATIONS OF METAGENOMIC STUDIES
Metagenomics in human health Human microbiome project (2008) Viromes and human health Microbial pathogenesis (antimicrobial resistance and bacterial vaccines)
Human microbiome project (2008) Comprehensive characterization of the human microbiome and analyse its role in human health and disease Characterisation of microbial communities in different body parts- oral cavity, gut, skin and vagina Highlighted microbiome dysbiosis in disease development
Two phases of project Phase 1 (2007-2014)- development of reference set of microbial genome sequences and study of relation between disease and human microbe change Phase 2 (2014-2016)- Integrative Human Microbiome Project ( iHMP ) Create a complete characterization of the human microbiome, with a focus on understanding the presence of microbiota in health and disease states Disease states in focus: type 2 diabetes, inflammatory bowel disease and preterm births
Viromes and human health Virome studies are based on metagenomic research Zoonotic reservoirs- screened to predict and prevent viral pathogen outbreaks Bat viromes - Ebola virus, SARS and MERS- coronavirus, polyoma viruses, hantavirus, picorna virus, papilloma virus, influenza A virus Mosquito - orthobunyavirus ( Murrumbidgee and Salt Ash viruses) and rhabdovirus (Beaumont and North Creek viruses) identified Epidemiological studies- metagenomics established transmission of Bornavirus (VSBV-1) from variegated squirrel to humans leading to fatal encephalitis
Microbial pathogenesis Rapid identification of patterns of drug resistance in slow growing bacteria ( Mycobacteria ) Source tracking and foodborne outbreak control- Shiga toxin producing E. coli , Salmonella Entiridis , Salmonella Typhimurium Soil metagenomics- identification of novel antimicrobial compounds and antimicrobial resistance genes Bacterial vaccines- as immunotherapy ( Salmonella T3SS- tumor regression in colon cancer)
Application in other fields Biofuel - targeted screening of enzymes with industrial applications in biofuel production, such as glycoside hydrolase in biomass Environmental remediation - improve strategies for monitoring the impact of pollutants on ecosystems and for cleaning up contaminated environments Agriculture - improved disease detection in crops and livestock and the adaptation of enhanced farming practices Ecology - insights into the functional ecology of environmental communities
Limitations Enormous amount of data Most genes are not identifiable Contamination, chimeric clone sequences Extraction problem Requires proteomics or expression studies to demonstrate phenotypic characteristics Need a standard method for annotating genomes Can only progress as library technology progresses, including sequencing technology Requires high throughput instrumentation not readily available to most institutions