Gene Sequencing Darshan Maheshbhai Patel 1 st sem M. Pharm Dept. of Pharmacology Anand Pharmacy college Guide: Anjali Patel 1
What is a gene & DNA ? DNA is the molecule that is the hereditary material in all living cells. Genes are made of DNA. A gene consists of enough DNA to code for one protein, and a genome is simply the sum total of an organism's DNA. 2
What is the Function of Gene ? DNA is pivotal to our growth, reproduction, and health. A gene is the basic physical and functional unit of heredity. It is regulate the construction of the proteins necessary for the cell to perform all of its functions. 3
Why do we want to know the sequence of an entire genome?? To know all the genes – then proteins, then pathways… We can understand: the biochemistry of the organism. genetic diseases. Regulation. 4
History of DNA sequencing 5 5 1953 Discovery of the structure of the DNA double helix 1972 Development of Recombinant DNA technology ,. 1977 The first complete DNA genome to be sequenced is that of Bacteriophage φX174 & Frederick Sanger publishes "DNA sequencing with chain- terminating inhibitors“ 1984 Medical Research Council scientists decipher the complete DNA sequence of the Epstein- Barr virus, 170 kb. 1987 Applied Biosystems markets first automated sequencing machine, the model ABI 370. 1990 The U.S. National Institutes of Health (NIH) begins large-scale sequencing trials on M. capricolum , E. coli 1995 Craig Venter Hamilton Smith and colleagues publish the 1 st complete genome of bacterium H. influenzae (whole-genome shotgun sequencing.) 1996 Pål Nyrén and his student Mostafa Ronaghi at the Royal Institute of Technology in Stockholm publish their method of Pyrosequencing 1998 Phil Green and Brent Ewing of the University of Washington publish "phred” for sequencer data analysis . 2001 A draft sequence of the human genome is published. 2004 454 Life Sciences markets a parallelized version of Pyrosequencing. 2006 Era of Next Generation Sequencing- 454 Sequencing, Illumina etc .
Era of sequencing 6 1 st generation sequencing: Sequence many identical molecules. Sequencing in large gels or capillary tubing limits scale Sangar Chain Termination ( 1977 ) Maxam- Gilbert Sequencing (1977)
Era of sequencing 7 2 nd generation sequencing: Sequence many identical molecules Sequencing in large gels or capillary tubing limits scale Illumina MiSeq Life Technologies/Applied Biosystems; SOLID 5500 Roche / 454 Pyro sequencer QIAGEN Gene Reader
Gene Sequencing Techniques It is also known as DNA sequencing. Gene sequencing may be defind as it is a process of determining the nucleic acid sequence-the order of nucleotides in DNA. It includes any method or technology that is used to determine the order of the four bases: adenine, guanine, cytosine, and thymine. 8
Generation of Gene Sequencing 1 st Generation sequencing: Maxam- Gilbert sequencing Sanger sequencing. 9 Next Generation sequencing: Sequencing by ligation Pyrosequencing Single molecular real time sequencing Advance Generation sequencing (shotgun): Whole genome shotgun Double barrel shotgun Hierarchical shotgun
Maxam-Gilbert Walter Gilbert Harvard physicist Knew James Watson Became intrigued with the biological side Became a biophysicist Allan Maxam 10
1.0 The Maxam-Gilbert Technique Principle - Chemical Degradation of Purines and pyrimidines by dimethylsulphate and hydrazine respectively and then labeled it. 11 1. Aliquot A + dimethyl sulphate, which methylates guanine residue 2. Aliquot B + formic acid, which modifies adenine and guanine residues 3. Aliquot C + Hydrazine, which modifies thymine + cytosine residues 4. Aliquot D + Hydrazine + 5 mol/l NaCl, which makes the reaction specific for cytosine
12
Advantages/disadvantages Maxam-Gilbert sequencing Requires lots of purified DNA, and many intermediate purification steps Relatively short readings Automation not available (sequencers) In contrast, the Sanger sequencing methodology requires little if any DNA purification, no restriction digests, and no labeling of the DNA sequencing template 13
2.0 Sanger Method: Fred Sanger, 1958 Was originally a protein chemist Made his first mark in sequencing proteins Made his second mark in sequencing RNA 1980 dideoxy sequencing 14
15 in-vitro DNA synthesis using ‘terminators’, use of dideoxi- nucleotides that do not permit chain elongation after their integration T ermination of synthesis at specific nucleotides. Requires a primer, DNA polymerase, a template, a mixture of nucleotides, and detection system. Incorporation of di-deoxynucleotides into growing strand terminates synthesis. Synthesized strand sizes are determined for each di-deoxynucleotide by using gel or capillary electrophoresis. Sanger Method process:
Schematic of Sanger method 16
17 Sequencing of DNA by the Sanger method
Sagar Sequencing 18
So clearly, sequencing 1500 bases at a time is not going to work if we ever want to make real progress. So, what do the professionals do? Well they use Genome sequencing strategies… We will talk about three ‘classical’ methods: Whole-genome shotgun Double-barrel shotgun Hierarchical shotgun 19 2.0 Advance Generation of sequencing:
2.1 Whole genome shotgun: 20
2.2 Double-barrel shotgun Double-barrel shotgun sequencing is also referred to as “pairwise‐end sequencing”. Same as Whole‐genome shotgun with one difference. Sequencing is performed from both ends of DNA inserts as oppose to just one. Method conceived to reduce “Gaps” and to reduce assembly error. DISADVANTAGE: More amount of data is generated so, it is difficult to assemble. ADVANTAGE: Theoretically it is very accurate. 21
2.3 Hierarchical shotgun: 22
3.0 NEXT GENERATION SEQUENCING Next-generation sequencing (NGS), also known as high throughput sequencing, is the catch-all term used to describe a number of different modern sequencing technologies including: Illumina (Solexa) sequencing Roche 454 sequencing SOLiD sequencing Single Molecule Real Time Sequencing (SMRT): 23
Next Generation sequencing 24
NGS WORKFLOW Sample Extraction , DNA fragmentation and invitro adapter ligation Clonal Amplification by emulsion PCR Sequencing by- ligation (SOLiD platform) Pyrosequencing (454 sequencing) Clonal Amplification by Bridge PCR Sequencing by synthesis (Solexa Technology) 25
NGS WORKFLOW 1. Create DNA fragments 2. Add platform-specific adapter sequences to every fragment. A d ap t er ligation point Adapter molec u le Adapter molecules : Bind library to a flowcell or bead; Add sequence primer binding sites & Add barcodes for multiplexing. Adapter molecule bound to DNA 26
Adapter Binding A d ap t er li ga t i on point DNA 27
Cluster Amplification: ( Bridge PCR ) DNA fragments are put with adaptors which is a library. A solid surface is coated with primers complementary to the two adaptor sequences Isothermal amplification, with one end of each “bridge” attached to the surface Clusters of DNA molecules are generated on the chip. Each cluster is originated from a single DNA fragment, and is thus a clonal population. 28
Cluster Amplification (Emulsion PCR) Fragments with adaptors (the library) are PCR amplified within a water drop in oil. One PCR primer is attached to the surface of a bead. DNA molecules are synthesized on the beads in the water droplet. Each bead bears clonal DNA originated from a single DNA fragment Beads (with attached DNA) are then deposited into the wells of sequencing chips – one well, one bead . 29
3.2 Pyrosequencing: 30
3.2 Pyrosequencing: It is a unique detection technology based on the principle of sequencing-by-synthesis . it’s provides quantitative real-time data without the need for gels, probes, or labels. It is a non-electrophoretic, bioluminescence method that measures the release of inorganic pyrophosphate by proportionally converting it into visible light using a series of enzymatic reaction. 31
32 DNA capture bead containing millions of copies of single clonally amplified fragment
3.1 Sequencing by Ligation 33
3.1 Sequencing by Ligation Sequencing by Ligation (SBL) uses the enzyme DNA ligase to identify the nucleotide present at a given position in a DNA sequence. (according to base pair rule.) Linker with dye 34
3.3 Single Molecule Real Time Sequencing (SMRT): Sing le Molecule Real Time Sequencing (SMRT) is a new approach to DNA sequencing Offered by Pacific Biosciences . When DNA polymerase incorporates nucleotides into a growing chain, a volume is created that is large enough to excite and detect a labeled nucleotide that is being incorporated. Per SMRT, different Dye Phospholinked nucleotides, one for each nucleotide type (A,G,T,C), are used so that the specific nucleotide type being incorporated by DNA polymerase during the chain extension process can be identified. In order for this to be effectively and accurately achieved, a special designed excitation detection chamber, called a Zero Mode Wavelength (ZMW), is used. Template +Polymerase +Phospholinked labeled dNTPs are deposited in microwells. ZMWs of a special designed microarray called a “Sequencing Chip”. Real‐time detection occurs in the ZMW, allowing for Real‐Time Sequencing. 35
36
NGS Technologies Overview Commercially available technology: Illumina/ solexa Roche/454 Helicos Biosciences. Life- APG SOLiD system. Pacific Biosciences. Ion torrent technology. 37
Sequencing by Synthesis 38
Illumina dye sequencing is a technique used to determine the series of base pairs in DNA, also known as DNA sequencing. The reversible terminated chemistry concept was invented by Bruno Canard and Simon Sarfati at the Pasteur Institute in Paris. ILLUMINA/SOLEXA SEQUENCING Run time: 1–10 days Produces: 2–1000 Gb of sequence Read length: 2 x 50 bp – 2 x 250 bp (paired-end) Cost: $0.05–$0.40/Mb Bridge PCR Clustal Amplification 39
40 Applications DNA sequencing Gene Regulation Analysis Sequencing-based Transcriptome Analysis SNPs and SVs discovery Cytogenetic Analysis ChIP-sequencing Small RNA discovery analysis
ROCHE/454 SEQUENCING 41 Sequence much longer reads by sequencing multiple reads at once by reading optical signals as bases are added. The DNA or RNA is fragmented into shorter reads up to 1kb. Uses Emulsion PCR for Clustal Amplification. PYROSEQUENCING as sequencing approach.
All of the sequence reads we get from 454 will be different lengths , because different numbers of these bases will be added with each cycle . Application: Whole genome sequencing Targeted resequencing Sequencing-based Transcriptome Analysis Metagenomics 42
LIFE/APG/ABI- SOLiD SEQUENCING AB SOLID TM 3 System generates over 20 gigabases & 400 M tags per run Library Preparation Emulsion PCR/ Bead Enrichment Bead deposition Sequencing by Ligation Chemical crosslinking to an amino-coated glass surface 43
SOLiD DNA Sequencing 44
Application of gene sequencing Information obtained using sequencing allows researchers to identify changes in genes, associations with diseases and phenotypes , and identify potential drug targets . used in evolutionary biology to study how different organisms are related and how they evolved. In Forensics science ex. DNA finger print technique. Useful into determine risk of Genetic disorders . DNA sequencing may be useful for determining a specific bacteria , to allow for more precise antibiotics treatments. Viral sequencing (gene sequencing) can be used during epidemics to determine the origins of an outbreak using molecular clock technique. Mutation discovery Transcriptome Analysis – RNA-Seq Sequencing clinical isolates in strain-to-reference mechanisms. Discovering non-coding RNAs Molecular diagnostics for Oncology & Inherited Disease study. Gene Regulation Analysis Exploring Chromatin Packaging 45
Reference: Elaine R. Mardis (2008) the impact of next-generation sequencing technology on genetics. Cell vol.24 No.3,133-14. Elaine R. Mardis (2009): Next-Generation Sequencing Methods. Annu. Rev. Genomics hum genet. 9:387-402 Jorge S Reis-Filho (2010): Next-Generation Sequencing, Breast Cancer Research 2010, 11(Suppl 3) Some websites – https://www.ncbi.nlm.nih.gov/pubmed https://en.wikipedia.org/wiki/DNA_sequencing 46