Primary Bioinformatics Database.pptx

1,188 views 34 slides May 20, 2023
Slide 1
Slide 1 of 34
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27
Slide 28
28
Slide 29
29
Slide 30
30
Slide 31
31
Slide 32
32
Slide 33
33
Slide 34
34

About This Presentation

This file contains- introduction, classification, primary database, nucleic acid database, protein sequence database, protein structure database


Slide Content

Primary Bioinformatics Database

Contents Introduction Classification of databases Primary databases Nucleic acid databases Gen Bank EMBL DDBJ Protein sequence databases SWISS-PROT UNIPROT PIR Protein structure database PDB Conclusion References

Introduction Bioinformatics databases or biological databases are storehouses of biological information . They can be defined as libraries containing data collected from scientific experiments, published literature and computational analysis. It provides users an interface to facilitate easy and efficient recording, storing, analyzing and retrieval of biological data through application of computer software. Biological data comes in several different formats like text, sequence data, structure, links, etc. and these needs to be taken into account while creating the databases

CLASSIFICATION OF DATABASES The databases can be classified into 3 categories on the basis of the information stored. Primary Database Secondary Database Composite Database

Primary Database Primary databases (also known as data repositories) are highly organised , user-friendly gateways to the huge amount of biological data produced by researchers around the world. The primary databases were first developed for the storage of experimentally determined DNA and protein sequences in the 1980s and 90s. Nowadays, sequence submissions are made by individual laboratories, as well as “in bulk” by sequencing centres around the world. Most protein sequences found in databases are the product of conceptual translation of the genes and genomes determined using DNA sequencing.

Primary databases Primary databases are also called as archieval database.  They are populated with experimentally derived data such as nucleotide sequence, protein sequence or macromolecular structure. Experimental results are submitted directly into the database by researchers, and the data are essentially archival in nature. Once given a database accession number, the data in primary databases are never changed: they form part of the scientific record.

Once data are deposited in primary databases, they can be accessed freely by anyone around the world. For example, researchers are working on a  Staphylococcus aureus  strain that was isolated from a patient. After some investigations, the researchers suspect that this strain might be genetically different from previously identified strains. They decide to sequence it and, after comparing the DNA sequences already placed in the public repository (“known” strains), they conclude that indeed their strain is different. The research community will benefit from having this new sequence in the public repository so that the next time a researcher finds the same strain, he/she will be able to recognise if their isolate is a novel one, or if it is somehow related to strains previously sequenced.

There are three nucleotide repositories or primary databases for the submission of nucleotide and genome sequences: GenBank  hosted by the National Center for Biotechnology Information (or NCBI). The European Nucleotide archive or  ENA  hosted by the European Molecular Biology Laboratories (EMBL). The DNA Data Bank of Japan or  DDBJ  hosted by the National Centre for Genetics.

GenBank The GenBank sequence database is an open access, annotated collection of all publicly available nucleotide sequences and their protein translations. It is produced and maintained by the National Center for Biotechnology Information as part of the International Nucleotide Sequence Database Collaboration.  Data format :  XML ;  ASN.1 ; Genbank format Data types captured :  Nucleotide sequence; Protein sequence A GenBank release occurs every two months and is available from the  ftp site .

Access to GenBank There are several ways to search and retrieve data from GenBank . Search GenBank for sequence identifiers and annotations with  Entrez Nucleotide . Search and align GenBank sequences to a query sequence using  BLAST  (Basic Local Alignment Search Tool). See  BLAST info  for more information about the numerous BLAST databases. Search, link, and download sequences programatically using  NCBI e-utilities . GenBank Data Usage NCBI places no restrictions on the use or distribution of the GenBank data. However, some submitters may claim patent, copyright, or other intellectual property rights in all or a portion of the data they have submitted.

EMBL The European Molecular Biology Laboratory (EMBL) Nucleotide Sequence Database is maintained at the European Bioinformatics Institute (EBI) in an international collaboration with the DNA Data Bank of Japan (DDBJ) and GenBank (USA). It was first established in 1974. Data is exchanged amongst the collaborative databases on a daily basis. The major contributors to the EMBL database are individual authors and genome project groups. WEBIN is the preferred web-based submission system for individual submitters, while automatic procedures allow incorporation of sequence data from large-scale genome sequencing centres and from the European Patent Office (EPO).

Database releases are produced quarterly. Network services allow free access to the most up-to-date data collection via Internet and WWW interfaces. EBI’s Sequence Retrieval System (SRS) is a network browser for databanks in molecular biology, integrating and linking the main nucleotide and protein databases plus many specialised databases. For sequence similarity searching a variety of tools (e.g., BLITZ, FASTA, BLAST) are available which allow external users to compare their own sequences against the most currently available data in the EMBL Nucleotide Sequence Database and SWISS-PROT. Accesed through the URL, http://www.ebi.ac.uk/embl

PIR database Protein Information Resource database Established in 1984, by National Biomedical Research Foundation (NBRF) It is an integrated public bioinformatics resource that support genomic and proteomic research and scietific studies. It assists researchers in the identification and interpretation of protein sequence information. PIR can be searched for entries or sequence similarity searches. It can be downloaded at http://www.pir.georgetown.edu / . PIR offers a variety of resources maily oriented to assist the propagation and standardization of protein annotation.

Conclusion Bioinformatics databases are storehouses of biological information . They are populated with experimentally derived data such as nucleotide sequence, protein sequence . Experimental results are submitted directly into the database by researchers, and the data are essentially archival in nature. Once given a database accession number, the data in primary databases are never changed: they form part of the scientific record. Examples include Gen bank, EMBL, DDBJ, PIR, SWISS-PROT, UNIPROT, PDB etc.

References M.Selzer Paul, J.Marhofer Richard, Koch oliver 2018 , Applied Bioinformatics (2 nd edition), Springer internatinal publishing. PP-46-58. www.ncbi.nlm.nih.gov>genbank www.biology discussion.com www.researchgate.net>bioinfor... www.futurelearn.com>bio...