GenBank Database and its different sections (Bioinformatics)
657 views
16 slides
Sep 28, 2024
Slide 1 of 16
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
About This Presentation
GenBank Database
Size: 454.04 KB
Language: en
Added: Sep 28, 2024
Slides: 16 pages
Slide Content
GenBank Database
Introduction GenBank is a US primary nucleotide sequence database, established in 1988 , which is maintained by NCBI (National Centre for Biotechnology Information) as a part of the INSDC (International Nucleotide Sequence Database Collaboration). It is a collection of all known DNA sequence from the scientist around the world. It is an open-access sequence database , annotated collection of all publicly available nucleotide sequence and their protein translations.
GenBank Sequence Format GenBank is a relational database. However, the search output for sequence files is produced as flat files for easy reading. The resulting flat files contain three sections: Header , Features , and Sequence . There are many fields in the Header and Features sections. Each field has a unique identifier for easy indexing by computer software. Understanding the structure of the GenBank files helps in designing effective search strategies.
Header Features Sequence
1st section…Header Part “ DEFINITION ,” provides the summary information for the sequence record including the name of the sequence, the name and taxonomy of the source organism if known, and whether the sequence is complete or partial.
1st section…Header Part An accession number is a unique number assigned to a piece of DNA when it was first submitted to GenBank and is permanently associated with that sequence
1st section…Header Part “ ORGANISM ” field, which includes the source organism of the DNA with the scientific name of the species and sometimes the tissue type. Along with that the information of the taxonomic classification of the organism.
1st section…Header Part REFERENCE field provides the publication citation related to the sequence entry. It includes author and title information of the published work (or tentative title for unpublished work).
1st section…Header Part The “ JOURNAL ” field includes the citation information as well as the date of sequence submission. The citation is often hyperlinked to the PubMed record for access to the original literature information.
1st section…Header Part The last part of the Header is the COMMENT section, where method of sequencing technology is mentioned.
2nd section…Features Part “ Features ” section includes annotation information about the gene and gene product , as well as regions of biological significance reported in the sequence, with identifiers and qualifiers .
2nd section…Features Part The “ Source ” field provides the length of the sequence, the scientific name of the organism, and the taxonomy, and strain . Some optional information includes the clone source, the tissue type and the cell line.
2nd section…Features Part The “ gene ” field is the information about the nucleotide coding sequence and its name
2nd section…Features Part For DNA entries, there is a “ CDS ” field, which is information about the boundaries of the sequence that can be translated into amino acids and its function.
3rd section…Sequence Part The third section of the flat file is the sequence itself starting with the label “ ORIGIN ”. For DNA entries, there is a BASE COUNT report that includes the numbers of A, G, C, and T in the sequence.