naveedulmushtaq
16,616 views
16 slides
Feb 09, 2017
Slide 1 of 16
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
About This Presentation
PROTEIN SEQUENCE DATA BASES,,INTRODUCTION,TYPES,,UNIVERSAL CURATED DATABASE,SWISS PORT,CATH,SCOP
Size: 84.97 KB
Language: en
Added: Feb 09, 2017
Slides: 16 pages
Slide Content
Protein sequence databases NAVEED UL MUSHTAQ DEPT OF BIORESOURCES KU
INTRODUCTION With the availability of over 165 completed genome sequences from both eukaryotic and prokaryotic organisms, efforts are now being focused on the identification and functional analysis of the proteins encoded by these genomes. The large-scale analysis of these proteins has started to generate huge amounts of data due to the new information provided by the genome projects and to a range of new technologies in protein science.
INTRODUCTION For example, mass spectrometry approaches are being used in protein identification and in determining the nature of post-translational modifications. These and other methods make it possible to quickly identify large numbers of proteins, to map their interactions, to determine their location within the cell and to analyze their biological activities. Protein sequence databases play a vital role as a central resource for storing the data generated by these and more conventional efforts, and making them available to the scientific community
TYPES Universal protein databases cover proteins from all species whereas specialized data collections contain information about a particular protein family or group of proteins, or related to a specific organism. Universal protein sequence databases can be further subdivided into two categories: sequence repositories (depositories), in which data are stored with little or no manual intervention in the creation of the records.
TYPES And expertly curated databases, in which the original data are enhanced by the addition of further information
Sequence repositories Several protein sequence databases act as repositories of protein sequences. These databases add little or no additional information to the sequence records they contain e.g. GenPept , NCBI’s Entrez Protein, e Reference Sequence
Universal curated databases Although repositories are an essential means of providing the user with sequences as quickly as possible, it is clear that, when additional information is added to a sequence, this greatly increases the value of the resource for users. The curated databases enrich the sequence data by adding additional information, which gets validated by expert biologists before being added to the databases to ensure that the data in these collections can be considered to be highly reliable.
Swiss-Prot SWISS-PROT is a universal protein sequence database established in 1986 and maintained collaboratively, since 1987, by the Department of Medical Biochemistry of the University of Geneva and the EMBL Data Library The leading universal curated protein sequence database is Swiss-Prot, which contained 140 000 curated sequence entries from over 8300 different species as on November 2003.
Swiss-Prot The database is non-redundant, which means that all reports for a given protein are merged into a single entry, and is highly integrated with other databases .Each entry in Swiss-Prot is thoroughly analyzed and annotated by biologists to ensure that the database is of a high quality. The SWISS-PROT database distinguishes itself from other protein sequence databases by three distinct criteria i.e. High level of annotation, a minimal level of redundancy and high level of integration with other databases.
The Protein Information Resource PIR Established in 1984 by the National Biomedical Research Foundation (NBRF) as a resource to assist in the identification and understanding of protein sequence information. The PIR database evolved from the original NBRF Protein Sequence Database, developed over a 20 year period by the late Margaret O. Dayhoff and published as the ‘Atlas of Protein Sequence and Structure.
The Protein Information Resource PIR The database is partitioned into four sections; PIR1, PIR2, PIR3 and PIR4 These differ in terms of quality of data. Currently PIR1 and PIR2 account for ∼99% of all entries. Entries in PIR1 are fully classified, fully merged and extensively annotated.
Protein structure database SCOP: a Structural Classification of Proteins database Class Architecture Topology Homologous (CATH):-
SCOP: a Structural Classification of Proteins database This database provides a detailed and comprehensive description of the structural and evolutionary relationships of the proteins of known structure A fundamental unit of classification in scop is the protein domain.The first release of scop in 1995 comprised 3179 domains, 498 families, 366 super families and 279 folds.
SCOP The classification of the proteins is on hierarchical levels: Family Super family Common fold Class
CATH The CATH database is a classification of protein domains based not only on sequence information, but also on structural and functional properties The first CATH release from 1997 contained only 8,078 domains In addition to the four main levels, CATH comprises five more layers, called S, O, L, I and D. The first four layers group domains according to increasing sequence overlap and similarity whereas the D-level assigns a unique identifier to every domain.