PIR- Protein Information Resource

17,168 views 15 slides Sep 11, 2020
Slide 1
Slide 1 of 15
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15

About This Presentation

An integrated publicly accessible bioinformatics resource to support genomic/proteomic research and scientific discovery.
Established in 1984, by the National Biomedical Research Foundation (NBRF) Georgetown University Medial Center, Washington D.C., USA.
It is the source of annotated protein datab...


Slide Content

Protein Information Resource
(PIR)

Introduction
•Anintegratedpubliclyaccessiblebioinformaticsresourcetosupport
genomic/proteomicresearchandscientificdiscovery.
•Establishedin1984,bytheNationalBiomedicalResearchFoundation
(NBRF)GeorgetownUniversityMedialCenter,WashingtonD.C.,
USA.
•Itisthesourceofannotatedproteindatabasesandanalysistoolsfor
theresearchers.
•Serveasprimaryresourcefortheexplorationofproteininformation.
•Accessiblebytextsearchforentryandlistretrieval,andalsoBLAST
searchandpeptidematch.

Features of PIR
Comprehensive,Non-redundant,Annotateddatabase
containproteinsequencesofprokaryotes,eukaryotes,
viruses,phages,archaea.
Dataiswellorganized.Entriesclassifiedintoprotein
familyandsuper-family.
ProteinSequenceDatabase(PSD)cross-referencesto
othergenomicandproteomicpublicdatabases
Updatedweeklyandfullreleasearepublished
quarterly.
Providecrossreferencebetweenitsowndatabases.

Database Organization and Annotation
•Thebasisofdatabaseorganizationandannotationliesintheirproper
structuringaccordingtoproteinfamilyrelationships.
•Accordingtoproteinfamilyrelationships,thedatabasecanbe
structuredatthreelevel:
1.Superfamiliesandfamiliesforfulllengthsequencesimilarity
2.Homologydomainforlocalfunctionalandstructuralunits
3.Motifsforfunctionalandstructuralsites

Resources of PIR
TheresourcesofPIRcanbebroadlyclassifiedintotwo
categories:
1.Dataretrievalsystems
2.Databases

Data Retrieval in PIR
Data Retrieval in PIR consist of search engines of three types.
Interactive text-based
search engine
Standard Sequence
similarity search engines
Advanced Search
Engines
Boolean queries of
text fields
Peptide match
Pattern match
BLAST
FASTA
Pair-wise alignment
Multiple alignment
0 (false)
1 (true)
Combine sequence
similarity and
annotation searches
Evaluation of gene-
family relationship

Databases of PIR
UniProt-Universal Protein Resource
PIR+
EBI (European Bioinformatics Institute)
SIB (Swiss Institute of Bioinformatics)
UniProt
United Protein Database
Central resource of Protein Sequence & Function

UniProt-Universal Protein Resource
The UniProtdatabase consist of the following three database:
1.UniProtKnowledgebase (UniProtKB)
2.UniProtReference Cluster (UniRef)
3.UniProtArchive (UniParc)

UniProtKnowledgebase (UniProtKB)
•Centraldatabaseofproteinsequenceswithannotationandfunctionalinformation.
•Providesinglerecordforallproteinproductsderivedfromacertaingenefroma
certainspecies.
•Givedetailsofaccessionnumber,alternativesplicing,proteolyticcleavage,post-
translationalmodificationstoeachfromofderivedprotein.
2 Parts
Contain Manually Annotated Records Contain Computationally Analyzed Records
UniProt/Swiss-Prot UniProt/TrEMBL
Which have to be manually annotated

UniProtReference Cluster (UniRef)
•Providenon-redundantdatacollectionsbasedonUniProt
KnowledgebaseandUniParctoobtaincompletecoverageofsequence
spaceatseveralresolution.
3separatedatasetsthatcompresssequencespaceatdifferentresolution:
•Sequencesthatare100%identical(UniRef100database)
•Sequencesthatare>=90%identical(UniRef90database)
•Sequencesthatare>=50%identical(UniRef50database)

UniProtArchive (UniParc)
•Providesastable,comprehensive,non-redundantsequencecollection
bystoringthecompletebodyofpubliclyavailableproteinsequence
data.
•Onadditionofneworrevisedproteinsequences,aUniParcsequence
versionisprovidedorincreasedandthusmakesitpossibletotrackthe
historyofsequencechangesinallthesourcedatabases.
•Toavoidredundancy,eachuniquesequenceisassignedaunique
identifierandisstoredonlyonce.
•BasicinformationstoredwitheachUniParcentryaretheidentifier,the
sequence,cylicredundancychecknumber,sourcedatabasewith
accessionorversionnumberandatimestamp.

iProClass-Integrated Protein
Knowledgebase
•Providescomprehensivedescriptionofaproteinfamily,functionand
structureforUniProtproteinsequences,andserveasaframeworkfor
dataintegrationinadistributednetworkingenvironment.
•Containnon-redundantproteinsequencesfromPIR-PSD,Swiss-Prot,
TrEMBL.
iProClass
Family relationships
Structural
classifications
Functional
classifications
Global level
(superfamily, family)
Local level
(domain, motif, site)

Types of Protein sequence reports
iProClass
2 Types
1st Types 2nd Types
Cover information on
Structure
Function
Family
Genetics
Disease
Ontology
Taxonomy
Literature
With reference to
relevant molecular
databases
Super-family report with
Length
Taxonomy
Keyword statistics
Complete member listing

PIRSF-Protein Family Classification
System
•PIRextendeditssuper-familyconceptanddevelopedtheSuper-
FamilyClassificationsystem.
•Tofacilitatethesensiblepropagationandstandardizationofprotein
annotationandsystematicdetectionofannotationerrors.
•Consistsoftwodatasets:Preliminaryclustersandcuratedfamilies.
•Curatedfamiliesincludefamilyname,proteinmembership,parent-
childrelationship,domainarchitecture,optionaldescriptionand
bibliography.

iProLINK
IntegratedProteinLiteratureINformationandKnowledge
Providesannotatedliterature,proteinnamedirectory,andother
informationtofacilitatetextminingintheareaofliteraturebased
databasecuration,proteinontologydevelopmentandnamedentity
recognition.