sequencing presentation. providing deep and insightful points about Sanger sequencing, Maxam-gilbert sequencing, Illumina sequencing, and single molecule sequencing.
Size: 9.42 MB
Language: en
Added: May 05, 2024
Slides: 33 pages
Slide Content
Submitted By: Manal Khan Class: Masters Of Science in Biotechnology, Final Year DNA SEQUENCING TECHNIQUES Submitted To: Dr. Preeti Mishra, Department of Botany
CONTENT I ntroduction and History The Past - Sanger Sequencing and Maxam-Gilbert Sequencing The Present - Illumina Sequencing The Future - Single Molecule Sequencing Applications
DNA Sequencing is the biochemical method to determine the sequence of the nucleotide bases ( adenine, guanine, and cytosine, thymine) that makes up the DNA. The first DNA sequence was obtained by academic researchers, using laboratories methods based on 2- dimensional chromatography in the early 1970s. By the development of dye based sequencing method with automated analysis, DNA sequencing has become easier and faster. It is the blueprint that contains the instructions for building an organism, and no understanding of genetic function or evolution could be complete without obtaining this information. INTRODUCTION SEQUENCING Figure 1: DNA Backbone
Steps Nucleic Acid Extraction and Isolation Library Preparation Clonal Amplification and Sequencing Data Analysis Using Bioinformatics Figure 2: Steps involved in Sequencing
Significance Information obtained by DNA sequencing makes it possible to understand or alter the function of genes. DNA sequence analysis demonstrates regulatory regions that control gene expression and genetic “hot spots” particularly susceptible to mutation. Scientists can use sequence information to determine which stretches of DNA contain genes and which stretches carry regulatory instructions, turning genes on or off. In addition, and importantly, sequence data can highlight changes in a gene that may cause disease. DNA sequencing shows gene structure that helps research workers to find out the structure of gene products. DNA sequencing has become sufficiently fast and inexpensive to allow laboratory determination of microbial sequences for identification of microbes. Sequencing of the 16S ribosomal subunit can be used to identify specific bacteria. Sequencing of viruses can be used to identify the virus and distinguish different strains.
History of Sequencing Technology
The Past - Sanger Sequencin g In 1953, Watson and Crick describe the structure of DNA based on the X-ray Crystallography analyses of Franklin and Wilkins. In DNA genetic information is contained in order of the bases ( called sequences ). DNA structure provides the basis for replication and transcription by using the single strand as the template( called base pairing ). In 1957,Korenberg discovers DNA Polymerase as enzyme for DNA replication. In 1970, Berg, Boyer, and Cohen develop Molecular Cloning . Molecular Cloning allows isolation, amplification and manipulation of specific DNA fragments. In 1977, Sanger develop chain-termination sequencing ( Sanger Sequencing ). Gilbert and Maxam develop Sequencing by base specific chemical fragmentation.
Figure 3(a): Walter Gilbert (left side) and Frederick Sanger (right side). Frederick Sanger along with Allan Maxam and Walter Gilbert developed 2 methods in determining the exact base sequence of the cloned piece of DNA. They have made it possible to sequence all 3 billion base pairs of the human genome. In 1980, Gilbert and Sanger shared half of the chemistry Nobel Prize “ for their contributions concerning the determination of base sequence in nucleic acids “. In 1995, First bacterial genome (Haemophilus influenzae) sequenced. In 2001, First draft of the human genome published Figure3 (b): Allan Maxam
The Sanger Chain Termination Sequencing Method Utilization of ddNTPs, primers, single strand of DNA template, DNA polymerase and dNTP. ddNTP are essentially the same as nucleotides except they contain the hydrogen group on the 3’- Carbon instead of hydroxyl group(OH). ddNTPs do not permit any DNA synthesis so their inclusion should be just enough with excess of normal dNTPs. Some strands will terminate early, some later. A sample containing copies of ssDNA is mixed with primers, bind to primer-annealing site. In following ways: Either the primers or the dNTPs are radiolabeled with 32 P. dNTPs and DNA pol. are then incubated allowing DNA polymerase to stretch out DNA strand. 4 tubes contain different ddNTPs: ddATP, ddGTP, ddCTP, ddTTP. Figure 4: ddNTP and dNTP.
Chain Termination by ddNTP incorporation Figure 5 : ddATP attaches then chain terminate. For Sanger sequencing ratio of dNTP:ddNTP ≈ 100:1 . → Mixture of terminated strands is produced. → The identity of the last incorporated base can be identified by gel electrophoresis and radioactive, or fluorescent detection.
Figure 6: Sanger Sequencing 1.0 ( 1977-1980)
Figure 7: Sanger Sequencing 2.0 (1990-2000). Figure 8: ABI 3730xl DNA Analyzer (Capillary Sequencer) 96 DNA samples with ~700 nucleotide reads (~ 70,000 bases) in 2.5 hour.
Figure 9: Results obtained from Sanger Sequencing
Maxam and Gilbert Method of DNA Sequencing This method is developed by Allan Maxam and Walter Gilbert in 1976-1977. Maxam–Gilbert sequencing is a method for chemically sequencing DNA. It used to be popular but has been superseded by Sanger sequencing and next-generation sequencing methods because it is slow, is low-throughput, and uses dangerous chemicals. However, it still has niche uses. For example, it can detect DNA modifications such as methylation and acetylation and can be used with DNA footprinting to deduce protein-binding sequences. Maxam–Gilbert technique depends on the relative chemical liability of different nucleotide bonds, whereas the Sanger method interrupts elongation of DNA sequences by incorporating dideoxynucleotides into the sequences.
Steps The chemical reactions in Maxam - Gilbert method involve a two-step chemical degradation process using piperidine and two chemicals that selectively attack purines and pyrimidines. Purines will react with dimethyl sulfate (DMS), and pyrimidines will react with hydrazine. These reactions will break the glycosidic bond between the ribose sugar and the base and displace the base. Piperidine will then catalyze phosphodiester bond cleavage where the base has been displaced. Dimethyl sulfate (DMS) alone will selectively cleave guanine nucleotides, while DMS along with formic acid will cleave both guanine and adenine nucleotides. Similarly, hydrazine alone will cleave both thymine and cytosine nucleotides, whereas hydrazine and 1.5M NaCl will selectively cleave cytosine nucleotides. Figure 10: Chemical target cleaving Bases.
Figure 11: Chemical targets in the Maxam-Gilbert DNA sequencing strategy. Dimethyl Sulphate or hydrazine will attack the purine or pyrimidine rings respectively and piperidine will cleave the phosphate bond at the 3' carbon. 5. Thus, a series of labelled fragments are generated from the radiolabeled end to each molecule's first “cut” site. 6. The fragments in the four reactions are arranged side by side in gel electrophoresis for size-based separation. The reactions would be loaded on high percentage polyacrylamide gels and the bands resolved by electrophoresis. 7. The smallest sized DNA will migrate faster and present at the bottom of the gel, and larger ones would be towards the top. In each of the four lanes, there would be bands of radiolabeled DNA strands ending with a specific base. 8. The gel is exposed to X-ray film for autoradiography to visualise the fragments. Wherever a labelled fragment stopped on the gel, a radioactive tag would expose the film due to autoradiography.
The Present- Illumina Sequencing (Next Generation Sequencing) The Illumina sequencing technology was developed by British scientists Shankar Balasubramanian and David Klenerman in 1998. The sequencing method is based on sequencing by synthesis and reversible dye-terminators that enable the identification of single nucleotides as they are washed over DNA strands. Massive parallel sequencing and clever optics makes illumina sequencing the most cost effective next-generation sequencing technology for most of the applications. Illumina sequencing has enabled to achieve a $1000 per human genome milestone with its latest novaseq instruments. Latest improvement in this technique chemistry enable an unbiased and uniform coverage across difficult to sequence regions like GC-rich regions or repetitive regions. Figure 12: Illumina NovaSeq Series
Steps Step A : Library preparation Through ultrasonic fragmentation, the genomic DNA becomes DNA fragment with 200-500 bp in length. The 5’ and 3’ adapter are added to the two ends of these small segments, “ tagmentation ” combines the fragmentation and ligation reactions into single step that greatly increases the efficiency of the library preparation process. Adapter-ligated fragments are then PCR amplified and gel purified. The sequencing library is constructed. Figure 13: Library preparation
Step B: Cluster generation Flow cell is a channel for adsorbing mobile DNA fragments, and it’s also a core sequencing reactor vessel — all the sequencing happens here. The DNA fragments in the sequencing library will randomly attach to the lanes on the surface of the flow cell when they pass through it. Each flow cell has 8 Lanes, each lane has a number of adapters attached to the surface, which can match the adapters added at the ends of the DNA fragment in the building process, which is why flow cell can adsorb the DNA after the building, and can support the amplification of the bridge PCR on the surface of the DNA. Bridge PCR was performed using the adapters on flow cell surface as template, after continuous amplification and mutation cycles, each DNA fragment will eventually be clustered in bundles at their respective locations, each containing many copies of a single DNA template. The purpose of this process is to amplify the signal intensity of the base to meet the signal requirements for sequencing. When cluster generation is complete, those templates are ready for sequencing. Figure 14- Cluster generation
Step C: Sequencing The sequencing method is based on sequencing-by-synthesis (SBS). DNA polymerase, connector primers and 4 dNTP with base-specific fluorescent markers or reversible terminator nucleotide were added to the reaction system. The 3′-OH of these dNTP are protected by chemical methods, which ensures that only one base will be added at a time during the sequencing process. All unused free dNTP and DNA polymerase are eluted after the synthesis reaction finished. Then, buffer solution needed for fluorescence excitation are added, the fluorescence signal is excited by laser, and fluorescence signal is recorded by optical equipment. Finally, the optical signal is converted into sequencing base by computer analysis. When the fluorescence signal is recorded, a chemical reagent is added to quench the fluorescence signal and remove the dNTP 3′-OH protective group, so that the next round of sequencing reaction can be performed. Figure 15(a): Reversible terminator Nucleotide (b): Sequencing
Step D: Alignment & Data analysis After the sequencing is finished they are aligned and analyzed. Reads obtained grouped based on their index sequences. Sequences with similar reads are clustered. Forward and reverse reads are paired to form contiguous sequences. The data are aligned and compared to a reference, and sequencing differences are identified. Images transformed into base calls and reads. DNA sequence is analyzed base-by-base. The sequence generated can be aligned to a reference sequence, this looks for matches or changes in the sequenced DNA. Illumina sequencing method is a highly accurate method. Figure 16: Data Analysis
The Future: Single Molecule Sequencing (Third Generation Sequencing) Single-molecule sequencing refers to techniques that can read the base sequence directly from individual strands of DNA or RNA present in a sample of interest. PacBio SMRT ( single molecule real time ) sequencing is one of the most commonly used third-generation sequencing technologies. Compared with the previous two generations, PacBio long-read sequencing enabled by SMRT Sequencing technology requires no PCR amplification and the read length is 100 times longer than that of NGS. The main advantage of SMRT sequencing is the generation of long sequencing reads of high accuracy, which improves the assembly of whole genomes. This is because longer sequencing reads mean less “building” is required to assemble the genome. Figure 17: PacBio RSII
Zero-mode waveguides (ZMWs), subwavelength optical nanostructures fabricated in a thin metallic film, are powerful analytical tools that are capable of confining an excitation volume to the range of attoliters, which allows individual molecules to be isolated for optical analysis at physiologically relevant concentrations of fluorescently labeled biomolecules. Arrays of such nanostructures can also be engineered into systems for real-time analysis of a mass of single-molecule reactions or binding events, which is the principle of PacBio SMRT sequencing. PacBio SMRT Sequencing uses the innovation of ZMW to distinguish the ideal fluorescent signal from the strong fluorescent backgrounds caused by unincorporated free-floating nucleotides . The binding of a DNA polymerase and the template DNA strand is anchored to the bottom glass surface of a ZMW. Laser light travels through the bottom surface of a ZMW and not completely penetrates it, since the ZMW dimensions are smaller than the wavelength of the light. Therefore, it allows selective excitation and identification of light emitted from nucleotides recruited for base elongation. Figure 18: A single SMRT Cell. Each SMRT Cell contains 150,000 ZMWs. Approximately 35,000-75,000 of these wells produce a read in a run lasting 0.5-4 h, resulting in 0.5-1 Gb of sequence.
Steps A .Library Construction Figure 19: Template Preparation Workflow for PacBio RS II system. The template, called a SMRTbell, is a closed single-stranded circular DNA, which is created by ligating hairpin adapters to both ends of target double-stranded DNA (dsDNA) molecules.
Step 1: Fluorescent phospho linked labeled nucleotides (indicated in red, yellow, green, and blue, respectively for G, C, T, and A) are introduced into the ZMW. Step 2: The base being incorporated is held in the detection volume for tens of milliseconds, producing a bright flash of light. Step 3: The phosphate chain is cleaved, releasing the attached dye molecule. Step 4-5: The process repeats. B. Sequencing Figure 20: Sequencing via light pulses.
Nanopore Sequencing Nanopore sequencing makes it possible to sequence DNA or RNA directly from biological samples in real time. Approximately a quarter of all SARS-CoV-2 virus genomes sequenced worldwide to date have been done on a nanopore device. One of the latest generation in sequencing technologies, the technique determines the order of nucleotides in DNA or in RNA by measuring fluctuations in an electric current as the molecule passes through a nanopore. The nanopore, a tiny hole one billionth of a meter in diameter, is embedded in a membrane that separates two chambers containing electrolyte solutions. When a small voltage is applied, an enzyme steadily ratchets the molecule through the nanopore along with an ionic current. Specialised software works out its sequence based on how much short sequences of individual nucleotides block the flow of ions and tiny changes in electrical current. Both DNA and RNA contain A, G, C and differ by T and U. Figure 21: MinIon Device for Nanopore Sequencing
Figure 22(b)- WORKING: It shows a double-strand piece of DNA being unzipped and a single strand passing through a nanopore sensor . The pore sends an electrical signal to show how much of the current running through the pore is blocked by individual nucleotides (the building blocks of nucleic acid - DNA and RNA). Specialised software is used to decode the signal to read the sequence. Figure 22 (a): Diagram of Nanopore Adapter molecule keeps DNA bases in place long enough to be identified electronically Unzips The DNA helix into two strands Ion flow creates electric current through nanopore Another protein creates a pore in membrane and holds Adapter molecule
Applications Newborn and Pediatric Disease- Newborn intensive care units and children’s hospitals see many patients with severe, sometimes fatal diseases that have a genetic basis. Some of these are known genetic disorders, correctly diagnosed and confirmed by clinical genetic testing. A considerable number, however, resemble known diseases but affect patients with negative genetic test results. Numerous pilot programs, like the NIH’s Undiagnosed Disease Network, are using exome sequencing to cases like these. On average, exome sequencing uncovers a pathogenic mutation in 25-30% of cases. Eg: Cystic fibrosis, DiGeorge syndrome, Fragile X syndrome, Marfan syndrome and Prader-Willi syndrome. Drug Trials and Pharmacogenomics- One of the great promises of genomic research is personalized medicine: tailoring disease treatments to an individual’s genetic makeup . Getting there will require studying the genetic variation underlying disease prognosis and pharmaceutical response. Many such pharmacogenomics projects are under way, though most are employing SNP(Single Nucleotide Polymorphism) arrays or targeted sequencing.
Rare Tumor Types- Large-scale cancer sequencing efforts such as TCGA and ICGC have catalogued somatic mutations in a variety of common cancer types. Most of these projects had both an sanger sequencing and a next generation sequencing component. Even so, these studies have been incredibly useful for identifying recurrently mutated genes and pathways. In Detection of SARS Covid-19 variant- The diffusion of significant SARS-CoV-2 variants was discovered by molecular surveillance of the novel coronavirus using novel sequencing technologies. Sanger sequencing has proven helpful in identifying significant SARS-CoV-2 variants. After a two-step RT-PCR test yielded product in samples positive for SARS-CoV-2, a sequencing protocol was developed to cover a region of base pairs. It was decided to construct consensus sequences and identify mutations (variants of Covid- Omicron, Zeta, Gamma and Delta). Every found sequence was aligned with a representative sequence of the variants.
DNA Sequencing in Food Microbiology- Whole genome DNA sequencing is a rapidly developing technology in the field of food microbiology. DNA sequencing can quickly identify all organisms in a sample in one assay , and potentially give a rough estimate of the relative quantity of organisms present. For instance identifying and diagnosing pathogens, epidemiological investigation and tracing, rapid identification of pathogen character, and analyzing and predicting disease prevalence. Agriculture- Endogenous genes can be edited or foreign genes can be introduced to redirect metabolic networks or establish new pathways in plants. CRISPR-Cas (clustered regularly interspaced short palindromic repeats) -mediated multiplex gene editing and regulation could be used for synthetic biology. Photosynthesis systems in plants are inefficient, but CRISPR-mediated DNA insertion could increase efficiency by introducing components to bypass photorespiration or redesigning Rubisco.
Genotyping of HIV-1 to detect drug resistance- The Applied Biosystems HIV-1 Genotyping Kit harnesses gold-standard Sanger sequencing technology to amplify and reliably sequence the diverse and rapidly evolving HIV-1 virus. The kit enables reliable genotyping of the genetically diverse HIV-1 virus from plasma and dried blood spot (DBS) samples, to detect resistance to protease inhibitors, nucleoside reverse-transcriptase inhibitors, and non-nucleoside reverse-transcriptase inhibitors. De Novo Sequencing- For de novo sequencing using capillary electrophoresis, the target DNA is fragmented and cloned into a viral or plasmid vector. Cloning provides amplification of the target DNA (by bacterial growth) and allows sequencing primers to bind to known sequence in the vector and extend the sequence into the unknown target DNA.
References Sanger F. 1980. Frederick Sanger — Biographical. (URL http://www.nobelprize.org/nobel_prizes/chemistry/laureates/1980/sanger-bio.html) [Google Scholar] 2. Watson J., Crick F. Molecular structure of nucleic acids. Nature.;171:709–756. (URL http://www.nature.com/physics/looking-back/crick/ ) Hutchison C. A. DNA sequencing: bench to bedside and beyond. Nucleic Acids Res. 2007;35:6227–6237. Maxam, A. M., & Gilbert, W. (1977). A new method for sequencing DNA. Proceedings of the National Academy of Sciences of the United States of America, 74(2), 560-564. https://doi.org/10.1073/pnas.74.2.560 Benner S, Chen RJ, Wilson NA, Abu-Shumays R, Hurt N, Lieberman KR, … Akeson M. Sequence-specific detection of individual DNA polymerase complexes in real time using a nanopore. Nat Nanotechnol. 2007;2(11):718–724. doi: 10.1038/nnano.2007.344. Ardui, S., Ameur, A., Vermeesch, J. R., & Hestand, M. S. (2018). Single molecule real-time (SMRT) sequencing comes of age: Applications and utilities for medical diagnostics. Nucleic Acids Research, 46(5), 2159-2168. https://doi.org/10.1093/nar/gky066