What is DNA Sequencing? “ Sequencing ” means finding the order of nucleotides on a piece of DNA . Nucleotide order determines Amino acid order, and by extension, protein structure and function (proteomics). An alteration in a DNA sequence can lead to an altered or non functional protein, and hence to a harmful effect in a plant or animal.
importance Understanding a particular DNA sequence can shed light on a genetic condition and offer hope for the eventual development of treatment . DNA technology is also extended to environmental, agricultural and forensic applications.
DNA Sequence variation can change the Protein produced by a particular gene Simple point mutations such as this can cause altered protein shape and function . Diseases such as Sickle Cell Anaemia and Cystic Fibrosis are caused by point mutations
DNA Sequencing—General Principle DNA sequencing actually involves synthesizing DNA sub-fragments of all possible lengths and separating them on a gel . Therefore, if the template were 200 base pairs in length, there would be 200 different sub-fragments ranging from one base pair to 200 base pairs
Methods in DNA Sequencing
DNA Sequencing methods Basic DNA sequencing Sanger Sequencing method ( using dideoxynucleotides , chain termination method ) (99.9% accurate but costly, $ 1000/ 1 million bp ) Maxam & Gilbert Sequencing method ( using chemical sequencing, chemical termination method ) Advanced DNA Sequencing Shot gun sequencing method Next Generation DNA sequencing Solid sequencing (99.9% accurate but cost effective, $ 0.50-0.20/1 million bp ) Ilumina sequencing (98 % accurate, $ 0.05-0.15/ 1 million bp ) Pyro sequencing
Frederick Sanger Discovered DNA sequencing by chain termination method Nobel Prize 1 (1958) Complete amino acid sequence of insulin Nobel Prize 2 (1980) For DNA sequencing
Let’s illustrate this using the eight-base sequence ACGATTAG as an example
In practice, the template, primers, radioactive dNTPs for all four bases, and DNA polymerase are all mixed together. This mixture is then distributed among four tubes , each with a different dideoxynucleotide . Since the largest DNA sub-fragments generated by sequencing reactions are usually only 200–300 base pairs in length, the fragments are too small to be resolved by the large pores of agarose and must be separated using a polyacrylamide gel.
Dideoxy nucleotides Incorporation of a dideoxynucleotide to growing DNA strand terminates its further extension Are added in small proportion dATP dd ATP dGTP dd GTP dCTP dd CTP dTTP ddTTP
The Sanger method requires Multiple copies of single stranded template DNA A suitable primer (a small piece of DNA that can pair with the template DNA to act as a starting point for replication) DNA polymerase (an enzyme that copies DNA, adding new nucleotides to the 3’ end of the template) A ‘pool’ of normal nucleotides A small proportion of dideoxynucleotides labeled in some way ( radioactively or with fluorescent dyes)
The template DNA pieces are replicated, incorporating normal nucleotides, but occasionally and at random dideoxy (DD) nucleotides are taken up. This stops replication on that piece of DNA The result is a mix of DNA lengths, each ending with a particular labeled DDnucleotide. Because the different lengths ‘travel’ at different rates during electrophoresis, their order can be determined.
Originally four separate sets of DNA, primer and a single different DD nucleotide were produced and run on a gel. Modern technology allows all the DNA, primers, etc to be mixed and the fluorescent labeled DDnucleotide ‘ends’ of different lengths can be ‘read’ by a laser. Additionally, the gel slab has been replaced by polymer filled capillary tubes in modern equipment
DNA fragments were originally detected by radioactive labeling, although nowadays fluorescent dyes are normally used as labels.
Sample: Chain Termination Output
Dye Sequencing Four different labels Each of the four nucleotide chains has a different dye Individual dyes fluoresce at unique wavelengths Vast majority of sequencing projects easier cheaper
Sample: Dye Sequencing Output
DNA Sequence Files
A 'Scan' of one gel lane: Nowadays, we don't even have to 'read' the sequence from the gel - the computer does that for us. This is an example of what the sequencer's computer shows for one sample. This is a plot of the colors detected in one 'lane' of a gel (one sample), scanned from smallest fragments to largest. The computer even interprets the colors by printing the nucleotide sequence across the top of the plot.
Maxam & Gilbert Chemical Sequencing The chemical sequencing is based on the ability of base-specific chemical reagents. Four sets of deoxyoligonucleotides are generated by subjecting a purified 3′- or 5′-end-labeled deoxyoligonucleotide to a base-specific chemical reagent That randomly cleaves DNA at one or two specific nucleotides . Because only end-labeled fragments are observed following autoradiography of the sequencing gel, four DNA ladders are observed as shown in Figure below….
Chemical sequencing method of Maxam and Gilbert B ase-specific chemical reagents: Hydrazine . D imethyl sulfate (DMS ). or Formic acid to specifically modify bases within the DNA molecule . Then Piperidine is added to catalyze strand breakage at these modified nucleotides . In the first reaction the specificity resides with hydrazine, DMS, or formic acid, which reacts with only few of the bases. The second reaction, there must be quantitative cleavage of piperidine strand .
Chemical sequencing
The chemical mechanisms of the reactions are as follows The shaded base at the 3′end of the fragments to the right of the gel indicates bases that have been chemically modified and Then fragments of these bases can be displaced by piperidine after each cleavage. For example, after a limited reaction with dimethyl sulfate (DMS), which is specific for G′s, followed by quantitative release of the modified G residues by piperidine , A set of oligonucleotides are generated that terminate at the base immediately 5′of each G in the sequence.
Conti….. Formic acid is specific for purines (G′s and A′s), a fragment that terminates in G or A will produce a band in the G A lane . Hydrazine in the absence of NaCl cleaves T′s and C′s resulting in a band in the T C lane. Hydrazine in the presence of NaCl cleaves only C′s; thus, a band is observed in the C lane.
Assembling Small Genomes by Shotgun Sequencing In shotgun sequencing the genome is broken randomly into short fragments (1 to 2 kbp long) suitable for sequencing. The fragments are ligated into a suitable vector and then partially sequenced. Around 400–500 bp of sequence can be generated from each fragment in a single sequencing run. Computerized searching for overlaps between individual sequences then assembles the complete sequence. Overlapping sequences are assembled to generate contigs . The term contig refers to a known DNA sequence that is contiguous and lacks gaps.
No genetic map or prior information is needed about the organism whose genome is to be sequenced The original limitation to shotgun sequencing was the massive data handling that is required . The development of faster computers has overcome this problem. Nowadays , a more important issue is that repetitive sequences create ambiguities.
ShotGun Sequencing
Sequencing a Cloned DNA Fragment by Primer Walking
NEXT GENERATION SEQUENCING
Ion torrent sequencing machine
Barcode: flurecently tagged molecule with radio isotope Marking the genome fragment