Protein Databases

13,337 views 20 slides Apr 20, 2021
Slide 1
Slide 1 of 20
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20

About This Presentation

Bioinformatics


Slide Content

PROTEIN DATABASES PDB PIR SWISSPROT

PROTEIN DATABASES What are PROTEIN ? PROTEIN DATABASES TYPES Protein Information Resource (PIR) SWISS-PROT Protein Databank (PDB ) Importance of Protein Databases

What are PROTEIN ?

PROTEIN DATABASES

Protein Information Resource (PIR) SWISS-PROT Protein Databank (PDB ) PROTEIN DATABASES

Protein Information Resource (PIR) History The Protein Information Resource (PIR) is an integrated public bioinformatics resource to support genomic, proteomic and systems biology research and scientific studies. PIR was established in 1984 by the National Biomedical Research Foundation (NBRF) as a resource to assist researchers in the identification and interpretation of protein sequence information . For over four decades, beginning with the  Atlas of Protein Sequence and Structure , PIR has provided protein databases and analysis tools freely accessible to the scientific community including the Protein Sequence Database (PSD ).

In 2002 PIR, along with its international partners,  EBI  (European Bioinformatics Institute) and  SIB  (Swiss Institute of Bioinformatics), were awarded a grant from  NIH  to create  UniProt , a single worldwide database of protein sequence and function, by unifying the PIR-PSD, Swiss- Prot , and TrEMBL databases . Today, PIR maintains staff at UD and GUMC and continues to offer world leading resources to assist with proteomic and genomic data integration and the propagation and standardization of protein annotation.

PRO PRO provides an ontological representation of protein-related entities by explicitly defining them and showing the relationships between them. Each PRO term represents a distinct class of entities (including specific modified forms, orthologous isoforms, and protein complexes) ranging from the taxon-neutral to the taxon-specific (e.g. the entity representing all protein products of the human SMAD2 gene is described in PR:Q15796; one particular human SMAD2 protein form, phosphorylated on the last two serines of a conserved C-terminal SSxS motif is defined by  PR:000025934) . Current release: 62.0, December 11, 2020.

iPTMnet iPTMnet  is a bioinformatics resource for integrated understanding of protein post-translational modifications (PTMs) in systems biology context. It connects multiple disparate bioinformatics tools and systems text mining, data mining, analysis and visualization tools, and databases and ontologies into an integrated cross-cutting research resource to address the knowledge gaps in exploring and discovering PTM networks. Browse Statistics Project Info  API Help License Citation

Protein Databank (PDB): PDB is a primary protein structure database. It is a crystallographic database for the three-dimensional structure of large biological molecules, such as proteins. In spite of the name, PDB archive the three-dimensional structures of not only proteins but also all biologically important molecules, such as nucleic acid fragments, RNA molecules, large peptides such as antibiotic gramicidin and complexes of protein and nucleic acids. The database holds data derived from mainly three sources: Structure determined by X-ray crystallography, NMR experiments, and molecular modeling.

SWISS-PROT The other well known and extensively used protein database is SWISS-PROT . The data in each entry can be considered separately as core data and annotation. The core data consists of the sequences entered in common single letter amino acid code, and the related references and bibliography. The taxonomy of the organism from which the sequence was obtained also forms part of this core information.

The annotation contains information on the function or functions of the protein, post-translational modification such as phosphorylation, acetylation, etc., functional and structural domains and sites, such as calcium binding regions, ATP-binding sites, zinc fingers, etc., known secondary structural features as for examples alpha helix, beta sheet, etc., the quaternary structure of the protein, similarities to other protein if any, and diseases that may arise due to different authors publishing different sequences for the same protein, or due to mutations in different strains of an described as part of the annotation.

TrEMBL (for Translated EMBL) It  is a also computer-annotated protein sequence database that is released as a supplement to SWISS-PROT. It contains the translation of all coding sequences present in the EMBL Nucleotide database, which have not been fully annotated. Thus it may contain the sequence of proteins that are never expressed and never actually identified in the organisms.

UniProtKB /Swiss- Prot    which is manually annotated and is reviewed and UniProtKB / TrEMBL   which is automatically annotated and is not reviewed

Importance of Protein Databases Huge amounts of data for protein structures, functions, and particularly sequences are being generated. Searching databases are often the first step in the study of a new protein. It has the following uses: Comparison between proteins or between protein families provides information about the relationship between proteins within a genome or across different species and hence offers much more information that can be obtained by studying only an isolated protein.

Importance of Protein Databases Secondary databases derived from experimental databases are also widely available. These databases reorganize and annotate the data or provide predictions. The use of multiple databases often helps researchers understand the structure and function of a protein.

Thanking You