DNA Barcoding and its application in species identification

supriyak20 3,348 views 59 slides Nov 19, 2019
Slide 1
Slide 1 of 59
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27
Slide 28
28
Slide 29
29
Slide 30
30
Slide 31
31
Slide 32
32
Slide 33
33
Slide 34
34
Slide 35
35
Slide 36
36
Slide 37
37
Slide 38
38
Slide 39
39
Slide 40
40
Slide 41
41
Slide 42
42
Slide 43
43
Slide 44
44
Slide 45
45
Slide 46
46
Slide 47
47
Slide 48
48
Slide 49
49
Slide 50
50
Slide 51
51
Slide 52
52
Slide 53
53
Slide 54
54
Slide 55
55
Slide 56
56
Slide 57
57
Slide 58
58
Slide 59
59

About This Presentation

DNA Barcoding and its application in species identification


Slide Content

1 Welcome to Seminar series 2018-2019

MAJOR GUIDE: Dr. B. H. Kale Assistant Professor Dept. of Genetics and Plant Breeding NMCA, NAU, Navsari-396 450 MINOR GUIDE: Dr. V. B. Parekh Assistant Professor Dept. of Basic Science and Humanities. ACHF, NAU , Navsari-396 450 2 Presented By: Kaldate Supriya M.Sc. (Agri.) III Sem Reg. No. - 2010117045 Dept. of Genetics and Plant Breeding DNA Barcoding and its Role in Discrimination of Plant S pecies

Concept Gene Fragments used in DNA barcoding Process of DNA Barcoding Applications Limitations Bioinformatics Tools used in DNA barcoding Factors affecting Species discrimination Case studies Conclusion Contents 3

Is an optical, machine-readable, representation of data. Used for identification of original product. Barcode = Bar + code 4 Concept of Barcode

Barcode Metaphor All the products of one type on a supermarket shelf share exactly the same 13 digit barcode, which is distinct from all other barcodes. Minor degree variation among individuals of same species Minor variation is much smaller than differences among species 5

6 Representation of data in the form of DNA sequences. The technology required to isolate the part of the DNA of an organism that contains a gene of interest and determine its sequence (made up the the bases A, G, C and T) has recently become widely acceptable, cheap and easy to master. Barcoding in Plants

Species Discrimination really matters! Cataloguing hidden biodiversity Tracing the phylogeny of the species Environmental sustainability : sustaining natural resources Protecting Endangered species Taxonomist, conservationist, ecologist, agriculturist, foresters and quarantine officers and breeders. 7

Second conference on Convention on Biological Diversity held in Jakarta in 1995, major issue raised by worldwide taxonomist community were increasing disinterest from governments and funding agencies for taxonomy. Another challenge is caused by the lack of consensus on the morphological characters to be used by the community of taxonomists. Global Taxonomic Initiative (GTI) launched in the context of the CBD early in 2002 and failed to reach the CBD goals. Potential solution: Barcode of Life project (iBOL) through the creation of a database system enabling the repository of sequences. Commencement of Idea 8

Birth of DNA Barcoding Dr. P. D . N. Hebert et al first time at the University of Guelph, Ontario in 2003 used DNA barcoding as tool for species identification. Cytochrome oxidase I as a core of bio identification and used to identify 200 closely allied species of lepidopteron. Dr. P. D . N. Hebert – Father of DNA Barcoding First time in plants W. John Kress in 2005 at Smithsonian institute, Washington DC evaluated 7 cp loci to discriminate flowering plants. 9

What is DNA barcoding ? 10 (P. N. Hebert,2003) ( W. J. Kress et al., 2005)

Significant Limitations – traditional approach 11 (Hebert et al., 2003) Ontario (Canada) Diagnostic Morphological Characters Phenotypic plasticity Genetic variability Immaturity ( specific for life stage ) Crypticity

Why DNA barcoding Flowering plants(Angiosperms) 352,000 Conifers(gymnosperm) 1,050 Ferns and horsetails 15,000 Mosses (cryptogams) 22,750 Taxonomic impediment Works with fragments of DNA Works for all stages of life Unmasks look-alike Reduces ambiguity. Opens the way for electronic handheld field guide. 12

Linnaeus classification vs. DNA barcoding 13 Behaviour Morphology Distribution/ Habitat Anatomy Physiology Selection of loci Sequence comparison to databases Amplification and sequencing Biodiversity assessment

Fig.1 Conceptual link between DNA Barcoding and Taxonomy Hebert and Hanner,2015 14 Ontario(Canada)

DNA BARCODING MARKERS vs. TRADITIONAL MOLECULAR MARKERS 1. Marker sequences must be conserved across the species but differ enough so that species can be parted. 1. Marker sequences must be variable , similarity in sequences need not to be across the species. 2.Universal primer 2. Differs with marker technique 3.Discrimination between and within the species 3. Molecular variability within the species. 15 P. M. Hollingsworth,2011 Edinburgh (Scotland) Fig. 2. Comparison of DNA barcoding and genomics

16 Selecting a core-barcode Universality: sequence across the land plants Sequence Quality and coverage: short sequence length, production of bidirectional sequence with few or no ambiguous base calls Species Discrimination: two short regions that are highly conserved between all species and rest of the sequences should variable. MOTUs (Molecular Operational Taxonomic Units) Core barcode Kress and Erickson,2008 Washington DC

17 Problems with mtDNA and nrDNA Slower rate of cytochrome c oxidase 1 (C01) gene evolution in higher plants Recombination Exception ITS regions Complexity W. Kress et al .,2005 Pennsylvania(Philadelphia) ITS : Internal transcribed spacer

18 Why chloroplast DNA cpDNA regions includes Large Single-Copy (LSC) & Small Single-Copy (SSC) regions, and Inverted Repeats . Variation in length mainly due to presence of inverted repeat (IR) Conservative rate of nucleotide substitution. Uniparentally Inherited. Conserved sequences ranging from 110 to 160 kbp . Palmer et al., 2012 Missouri (United States)

19 Kinds of DNA Barcode Markers 1.Single Locus DNA barcode markers 2.Candidate multilocus DNA Barcode Markers 3.Super DNA barcode markers

20 Gene fragments for DNA barcoding Single locus DNA Barcodes W. Kress et al .,2005 Pennsylvania(Philadelphia) matK - Maturase K rbcL - Ribulose biphoshate carboxylase large subunit rpoB - RNA polymerase B subunit rpoC1 - RNA polymerase c subunit trnH - tRNA His psbA - Photosystem II D1 reaction centre proteins atpH - ATP synthase subunit delta psbK-1- photosystem II reaction centre protein k pecursor

21 Potential DNA barcodes Poor candidate DNA barcodes Pending to be investigated Fig. 3. Comparison between Barcode regions high secondary structure fungal contamination

Candidate multilocus DNA Barcode Markers matK: highly evolving but poor amplification rbcL : easy to amplify but slow rate of evolution. CBOL Plant Working Group recommended matK+rbcL as the universal barcode combination. Various combinations of plastid loci have been proposed including rbcL+trnH-psbA , rpoC1+rpoB+matK or rpoC1+matK+trnH-psbA 22 Li et al .,2014 Beijing (China)

23 SUPER-BARCODING: A NEW WAY FOR PLANT DISCRIMINATION The complete cp -genome contained as much variation as the CO1 locus in animals. Conserved sequences ranging from 110 to 160 kbp . Li et al .,2014 Beijing (China)

Hollingsworth et al ., 2011 24 Edinburgh,Uk Table1.Characteristics of different barcoding markers that have been included in plant barcoding studies.

25 Process of DNA barcoding for species discrimination

26

DNA barcoding: toward the establishment of a global information system DNA BARCODE DATA ARE… … Standardized and High quality The CBOL created Database Working Group that has worked with Global Biodiversity Information Facility (GBIF) and others to set data standards. Data that meets standards carry the BARCODE flag. Every BARCODE record is assigned a recognized species name in museum or repository. Each record includes PCR primers used and the trace file from the DNA sequencer. …Accessible and secure in a Permanent Home The International Nucleotide sequence Data Collaboration (www.insdc.org) includes GenBank at the US National Institute of Health, The European Molecular Biology Laboratory in Germany, and the DNA Databank of Japan. These databases agreed to be the global repository for DNA barcodes make them available to the public. 27 CBOL : Consortium for Barcode of Life

Sequence repositories and consortia involved in plant DNA barcoding Consortia : iBOL : International Barcode of life iBOL is the largest biodiversity genomics initiative ever undertaken. Digital identification of life system. From 26 nations with varying levels of investment and responsibilities Séquences : 5.3 million DNA barcodes : 580,000 species . CBOL : Consortium For the Barcode of Life International initiative (2004) 50 countries, consist of high quality DNA barcode records in a public library of DNA sequences. CBOL promotes barcoding through working groups, networks, workshops, conferences, outreach, and training . 28

Databases BOLD : Barcode of Life Database BOLD is created and maintained by the University of Guelph in Ontario, Canada To make DNA barcoding information universally and publically accessible. Specimen Records 8696,850 Species with Barcodes 281,000 International Nucleotide Sequence Database collaboration: Work together with Genbank USA, European Molecular Biology Laboratory in Germany and DNA Data Bank of Japan. Permanent public repository for barcode data records. 29 Hubert and Hanner,2015 Ontario(Canada)

Virtually all species have distinct barcodes gene sequences. Unknown specimen can be identified by ‘looking up’ their sequences in the reference library. Building the Global Reference Barcode Library From Voucher Specimens in Museums… Over 300 years, taxonomist have collected and described more than 1.7 million species of plants. They have built collections of hundreds of millions of representative of these species. These specimens studied, catalogued and now reside in museums. 30 In Reference Libraries Hebert and Hanner,2015 Ontario(Canada)

31 Data Analysis Two methods Multiple Sequence alignment ClustalW , T-coffee, MUSCLE Alignment can be manually edited to increase quality by BioEdit , Jalview ( Bhargava et al .,2013) Phylogeny: Relationship among the species is based on pattern of substitution at homologous bases that varies among taxon . (Kress and Erickson,2012) Pairwise sequence alignment BLAST

32 Construction of Phylogeny Genetic distances are calculated by using Kimura - 2 parameter model (no. of substitution per site) Incorporates the observation that transitions accumulate more rapidly than transversion . Assume all four bases have equal frequencies but that there are 2 rate classes for substitutions. Under this model, the distance between any two sequences is given by d = 1/2In[1/(1-2P-Q)] + 1/4In[1/(1-2Q)], where P and Q are the proportional differences between the two sequences due to transitions and transversions , respectively. Done by MEGA 4.0

33 Cont … Two methods of phylogeny construction Distance based methods : (DNA sequence variation between and within the species) Unweighted Pair Group Method with Arithmetic Mean (UPGMA) Neighbour-joining (NJ) Character based methods : (presence and absence of unique diagnostic characters) Parsimony Maximum likelihood Bhargava et al .,2013 Lucknow (India)

rbcL MatK t rn-H/ITS FASTA FASTA FASTA Edited Sequence Sequencher/ transAlign transAlign/ MAFFT Muscle/ MAFFT Matrix Construction: Nexus File rbcL matK trnH/ ITS trnH/ ITS trnH/ ITS trnH/ ITS 1 2 3 4 5 6 7 8 9 Phylogenetic Reconstruction Sequence Alignment Parsimony: PAUP/TNT Maximum Liklihood: Garli Garli/RAxML 1 2 3 4 5 6 8 9 Outgroup 7 Kress and Erickson,2012 34 species Washington(USA) 7

35 Fig.3. Outline of Supermatrix (or nested matrix) design. Coding genes can be aligned globally, across highly divergent clades, whereas the most rapidly evolving sequences are partitioned into smaller alignment blocks to improve the likelihood of correctly assessing homology among aligned nucleotides. Cont …

New computational methods in DNA barcoding 1.Compensatory base changes (CBCs) Are mutations where nucleotide changes at both positions of paired structural site. Reported in rRNA ITS2 Even one CBC at a conserved paired site in the ITS2 secondary structure they are found to be sexually incompatible. Used for identification of closely related species. Done with CBCAnalyser 2. DNA metabarcoding Identifying a number of organisms simultaneously from eDNA ecoPrimers and OTUbase 36 Bhargava et al .,2013 Lucknow (India)

37 Table 2. Software's available for barcoding of plants.

Factors Influencing the Discrimination Success of Plant Barcodes Factor Situations where lower species discrimination success is expected Hybridization Groups in which hybridization is frequent Polyploidy Groups in which speciation frequently involves polyploidy Life history Groups of long lived organisms and/or those with slow mutation rates Breeding system Species groups consisting of closely related agamospermous or autogamous lineages Seed dispersal Angiosperm species groups in which seed dispersal is poor (plastid barcodes) P. M. Hollingsworth et al .,2011 38

Fig. 4. Impacts of intraspecific gene flow on species discrimination success. Species-1 Species-2 (A) intra-specific gene flow among populations is high. G ene flow occurs between species there is a barrier to extensive neutral introgression because establishment of immigrant alleles is prevented by a regular influx of conspecific alleles from other populations. B) low intra-specific gene flow among populations. Thus populations are more differentiated from one another and are less likely to show taxon-specific barcode markers. In addition, the flux preventing establishment of introgressed alleles is lower because it involves only alleles in the (middle) recipient population and not the other populations of the ‘blue’ species P. M. Hollingsworth et al .,2011 39

Applications Species delineation. Assist in the process of identifying unknown specimens to known species. Cryptic diversity and the discovery of new species. Construction of phylogenetic trees. 40

No single universal DNA barcode gene that is conserved in all domains of life and exhibits enough sequence divergence for species discrimination. Difficulty in cooperation of scientist for establishment of DNA Barcode reference libraries. Difficulties in using various identified regions as barcode that are not yet resolved and needs to be further studied. 41 Limitations

Case studies 42

CBOL (Consortium for Barcode of Life) Plant Working Group,2009 Pennsylvania, Philadelphia To identify a standard DNA barcode for land plants pooled sequence data from Angiosperm :445 Gymnosperm : 38 907 samples Cryptogam : 67 Fig. 6. Universality success A = atpF-atpH B = rpoB P = trnH-psbA 43 DNA barcoding for land plants CS-1 C = rpoC1 K = psbK-psbI M = matK; R = rbcL

Fig. 7. Assessment of sequence quality. 95% confidence intervals are indicated. Colours reflect sequence quality (red, worse; green, better) CBOL (Consortium for Barcode of Life) Plant Plant Working Group,2009 Pennsylvania, Philadelphia A = atpF-atpH ; B = rpoB ; C = rpoC1; K = psbK-psbI ; M = matK; P = trnH- psbA ; R = rbcL 44

Fig. 8. Discrimination success for 1–3 combination and 7 single locus barcode Outer error bars (thin lines) demarcate 95% confidence intervals. Inner error bars (thick lines) indicate the relative magnitude of discrimination failure as measured by the interquartile range (IQR) for the number of species that are indistinguishable from a given query sequence. A = atpF-atpH ; B = rpoB ; C = rpoC1; K = psbK-psbI ; M = matK; P = trnH- psbA ; R = rbcL 45 CBOL (Consortium for Barcode of Life) Plant Plant Working Group,2009 Pennsylvania, Philadelphia

2-locus barcode combinations rbcL + matK as standard barcode. rbcL - high universality but not outstanding discriminating power matK and trnH– psbA - higher resolution but each requires further development Other regions can be used as supplementary. 46 Result CBOL (Consortium for Barcode of Life) Plant Plant Working Group,2009 Pennsylvania, Philadelphia

Comparative analysis of large dataset to incorporate ITS(Internal Transcribed spacer ) into core barcode of seed plants CS-2 China Plant Barcode of Life (BOL) Group,2011 pooled data from research groups enrolled in the DNA Barcoding Chinese Plants project Amplification of four loci (matK, rbcL, trnH-psbA, ITS) and sequencing Level of species discrimination 1.Tree-Building 2.Distance 3.Blast 4.PWG-Distance Beijing (China) 47 6286 samples from 1757 species 1675 angiosperm - 5897 82 gymnosperm - 389 Materials and methods

Fig. 9. Comparison of discrimination success for the four single locus all 2- to 4-marker combinations markers (plus ITS2) (I, ITS; M, matK; P,trnH – psbA ; R,rbcL ). China Plant Barcode of Life (BOL) Group,2011 Beijing (China) 48

Fig. 10. Discrimination success at the ordinal level (1 order of gymnosperms and 23 orders of angiosperms) China Plant Barcode of Life (BOL) Group,2011 Beijing (china) ITS has highest discrimination power and should be incorporated into core barcode. 3 marker combination rbcL+ matK + ITS/ITS2 (ITS2 as backup) 49

Dev et al .,2015 Thrissur (Kerala) Taxon sampling 8 sp. and 1 var. of genus Salacia (S. beddomei, S. chinensis, S. fruticosa, S. macrosperma, S. malabarica, S. oblonga var. oblonga, S. oblonga var. kakkayamana, S. vellaniana, S. agasthiamalana DNA extraction, amplification (rbcL,matK,trnH-psb-A and ITS2) and sequencing Sequence alignment and data analysis (CLUSTAL X) Genetic divergence analysis in accordance with K2P model using MEGA 4.0 Generated sequences submitted to Genbank as well as BOLD 50 Species discrimination through DNA barcoding in the genus Salacia of the Western Ghats in India CS - 3 Materials and methods

DNA region rbcL matK trnH-psbA ITS2 Average interspecific distance 0.0012 ± 0.003 0.005 ± 0.005 0.026 ±0.019 1.362 ± 1.008 Average intraspecific distance 0.01 ± 0.004 0.011 ± 0.015 0.052 ± 0.1484 Table 3. Analysis of interspecific divergence and intraspecific variation of eight Salacia species based on different potential barcoding regions. Dev et al .,2015 Thrissur (Kerala) 51

Fig 11. Phylogram reconstruction of Salacia species based on ITS2 sequences and the NJ clustering method adopting the Kimura 2-parameter in the MEGA 4.0. Dev et al .,2015 Thrissur (Kerala) 52 Length of nodes represent substitution rate Boot strap values ≥ 35 are shown on the branch points.

Applying DNA barcodes for identification of economically important species in Brassicaceae CS-4 58 individual samples belonging to 27 species DNA extraction, amplification, and sequencing Sun et al .,2015 Beijing (China) BLAST 1 and the nearest genetic distance were utilized to assess correct discrimination. MATERIALS AND METHODS 53

matK rbcL trnH-psbA ITS ITS2 All inter-specific distance 0.0183 ± 0.0170 0.0028 ± 0.0043 0.0388 ± 0.0524 0.0846 ± 0.0700 0.0878 ± 0.0702 All intra-specific distance 0.0135 ± 0.0120 0.0012 ± 0.0035 0.0325 ± 0.0567 0.0607 ± 0.0797 0.0706 ± 0.0972 % Success rate of PCR amplification 89.66 100 98.28 100 100 Sun et al .,2015 Beijing (China) Table 4. Analysis of inter-specific divergence between species and intra-specific variation. 54 Note : Genetic distance is measure of genetic divergence

Table 5. Comparison of identification efficiency for candidate barcodes using different methods of species identification. Method No . of species No. of samples Successful Species identification Genus Incorrect Species Identification Genus Ambiguous Species Identification Genus rbcL BLAST1 Distance 27 27 58 58 56.9 56.9 78.6 67.9 43.1 43.1 21.4 32.1 mat K BLAST1 Distance 25 25 52 52 61.1 61.1 78.9 71.2 38.9 38.9 21.1 28.8 trn H- psb A BLAST1 Distance 27 27 57 57 63.2 56.1 76.4 65.5 36.8 43.9 23.6 34.5 ITS BLAST1 Distance 27 27 58 58 67.2 60.3 73.2 67.9 32.8 39.7 26.8 32.1 ITS2 BLAST1 Distance 27 27 58 58 60.3 53.4 64.3 62.5 39.7 46.6 35.7 37.5 Sun et al .,2015 Beijing (China) 55

Upshots of the Study Among the five intensively recommended regions [rbcL, matK, trnH- psbA , internal transcribed spacer (ITS), ITS2] as candidate DNA barcodes ITS showed superiority in species discrimination with an accurate identification of 67.2% at the species level by using the BLAST 1 method. 56 Cont …

Conclusion DNA barcoding is very recent origin and is subjected to major development in last decade linking the DNA world with traditional taxonomy methods for species delineation and species identification. Among the suggested barcode regions three plastid regions matK, rbcL ,(core barcode) trnH- psbA and nrITS have been widely used in plant species discrimination. Among the nine candidate barcode regions, ITS has highest species discrimination power. 57

FUTURE THRUST Find such a locus that is universally linked to the speciation of the different plant groups Newer methods or algorithms for searching the barcode database has to be investigated to clear the ambiguity in species identification. Whole-plastid-based barcodes have shown great potential in species discrimination, especially for closely related taxa. Continuing advances in sequencing technology may make these super-barcodes the method of choice for plant identification. 58

59 Lets join the hands for Conservation of Biodiversity for better tomorrow…. Thank you…
Tags