INTRODUCTION:
•Structural databases are the essential tools for all
crystallographic works.
•They are used in the process of producing, solving
,refining and publishing the structure of a new material.
THE COMMON INFORMATION FOUND IN THE
STRUCTURAL DATABASE INCLUDE:
•Bibliographic information-author name, journal reference.
•The chemical compound name, formula and oxidation states
of the element present.
•Number of formula units per unit cell(contents)
•Dimension and symmetry of the unit cell.
•symmetry of the structure.
•Atomic coordinates, occupancies and thermal parameters.
•Any special features of the experiment to collect the
diffraction data.
•The structures in the database have been solved using X-ray,
neutron and electron diffraction techniques on sample,
computational modellingor by using NMR.
PDB:(PROTEIN DATABASES)
•Protein database contains the information about 3D structures of
the proteins.
•The structural information of the protein can be determined by
X-ray crystallography or Nuclear magnetic resonance(NMR)
spectroscopy methods.
•The PDB is overseen by an organisationcalled World Wide
Protein Data Bank,wwPDB.
•It is available at
•www.wwpdb.org
•www.pdbe.org
•www.pdbj.org
•Each entry in the PDB is provided with a unique identification
number called PDB ID.Itis a 4 letter identification number which
consists of both alpha numeric characters.
PDB FILE FORMAT:
The PDB file format is the standard file format for protein
structure file. It describes how molecules are held together in
3-D Structure of a protein.
•The file contain hundreds or thousands of lines called
records. Each record provides a different set of information
like
•HEADER:This reocordcontains file name, date of submission
and the PDB ID of the molecule.
•TITLE: This record contains the title of the PDB entry.
•COMPND:This record includes the protein name.
•SOURCE:This record contains the name of the organism in
which the particular protein is obtained.
•KEYWDS:This record contains the keywords that describes
about the protein.
PDB FILE FORMAT:
•EXPDTA:This record contains the method used for the
protein structure experiment.
•AUTHOR:This record contains the name of the
contributors who put the data into the database.
•REVDATA:This record contains the revision date of the
data related to protein.(Date of modification)
•JRNL:This record contains the journal details of the
literature about the protein
•REMARK:This record contains the remarks about the
protein structure.
•DBREF:This record contains the reference to the protein
in the sequence databases.
PDB FILE FORMAT:
•SEQRES:This record contains information about the
amino acid sequence of protein.
•HET:This record contains details about the non protein
substances in the protein.
•HETNAM: This record contain the compound name of
the non protein substances.
•HETSYN:This record contains the identical compound
name for the non protein substances.
•FORMUL:This record contain the chemical formula of
the non protein substances.
•HELIX:This record holds the recognition of helical
substructures.
PDB FILE FORMAT:
•LINK:This record holds the recognition of inter-residue bonds.
•ATOM:This record contains the atomic coordinates for the
structure.
•HEATM:This record contains the atomic coordinate record for
non protein substances.
•CONECT:This record contains the details about the bonds
involved in non protein atoms.
•MASTER:This record contains the details about the number of
REMARK records, HET records, HELIX records, CONECT records
and SEQRES records, etc.
•END:This record represent the end of the file.
•
THE PDB FORMAT
•123456789+123456789+123456789+123456789+123456789+123456789+123456789+123456789+
•HEADER RETINOIC-ACID TRANSPORT 28-SEP-94 1CBS 1CBS2
•COMPND CELLULAR RETINOIC-ACID-BINDING PROTEIN TYPE II COMPLEXED 1CBS 3
•COMPND 2 WITH ALL-TRANS-RETINOIC ACID (THE PRESUMED PHYSIOLOGICAL 1CBS 4
•COMPND 3 LIGAND) 1CBS 5
•SOURCE HUMAN (HOMO SAPIENS) 1CBS 6
•SOURCE 2 EXPRESSION SYSTEM: (ESCHERICHIA COLI) BL21 (DE3) 1CBS 7
•SOURCE 3 PLASMID: PET-3A 1CBS 8
•SOURCE 4 GENE: HUMAN CRABP-II 1CBS 9
•AUTHOR G.J.KLEYWEGT,T.BERGFORS,T.A.JONES 1CBS 10
•REVDAT 1 26-JAN-95 1CBS 0 1CBS 11
•-------------------------------------------------------------------------------------------------------------------------------------------
CATH:
•The CATH means Class, Architecture,Topologyand
homologouussuper family database for proteins
•It was created by Janet Thornton and colleagues at the
university college London.
•It is available at
http://www.biochem.ucl.ac.uk/bsm/cath
• http://www.cathdb.info
•It is a protein classification tool
IT CONSISTS OF FOUR LEVELS
•Class:It includes structural conformations of proteins
and their contents(alpha, beta, alpha/beta, etc.)
•Architecture: It describes the gross orientation of
secondary structures. It also gives information about
folding of polypeptide chains.
•Topology: It deals with the structures formed due to
different topological arrangement of secondary
structures. It explains the super families of the proteins.
•Homologous super family: It compares the sequence
and structure of various proteins. It helps to trace the
evolutionary relationship among the proteins.
CATH
•The CATH aims to provide official releases of protein
structures every 12 months
•It is a free publicly available online resource.
•The latest version of CATH contains 1,14,215
domains,2178 homologous superfamilies,1110 fold
groups.
THE CATH SERVER
•The CATH have recently set up a server which allows
the user to submit the co-ordinates of the newly
determined structure for automatic classification in
CATH.
•DOMAIN BOUNDARIES AND SEQUENCE COMPARISON
•CATH contains a detective programwhich is good for
identifying multidomain proteins.
•The results from the detective are returned to the user in
less than a minutes.
•Identified domains are scanned against non identical
representatives from CATH using a global sequence
alignment method
CATH SERVER
•If a sequence match 95% then the domain is identical
to one in CATH.
•If a sequence match less than 30% then the structures
are compared with all the sequence families (s-level).
•ASSESING STRUCTURAL SIMILARITY:
•TOPSCANcompares the secondary strucutresin each
fold family to identify the possible fold families to which
the new structures belong.
•Subsequently the fast version of structure comparison
SSAP scans represetativesfrom all the families
•Structural pairs having a ssapscore more than 80 are
possible homologues while the score with 70-80 don’t
have no sequence or functional similiarity.
•Finally the SSAP structural alignment is displayed using a
graphical display package.
CSD
•The cambridgestructural Database is both a repository
and a validated resource for 3-D structural data of
molecules containing carbon and hydrogen.
•It is used to know about the structures of organic,
metal-organic and organometallicmolecules
•The specific entries in the CSD are complementary to
PDB and Inorganic crystal structure database.
•The data in the CSD is typically obtained by X-ray
crystallography and less frequently byneutron
diffraction
CSD
•The data in the CSD is submitted by crystallographers and
chemists from all over the world.
•The CSD is maintained by an incorporated company called
Cambridge Crystallographic Data centre, CCDC
•The CCDC are publicly available for download at the point of
publication.
•The CSD is updated with about 50,000 new structures each
year and are freely available to support teaching and other
activities
•The CSD is available at
•www.ccdc.cam.ac.uk
•webcsd.ccdc.cam.ac.uk
Structural
Database
Applications
Prediction
Analysis
Mining
Compariso
n
Classificatio
n
Structure
Refinement
Databases
Annotation