Primary and secondary database

9,604 views 22 slides May 11, 2020
Slide 1
Slide 1 of 22
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22

About This Presentation

INTRODUCTION
WHAT IS DATA AND DATABASE?
WHAT IS BIOLOGICAL DATABASE?
TYPES OF BIOLOGICAL DATABASE
PRIMARY DATABASE
Nucleic acid sequence database
Protein sequence database
SECONDARY DATABASE
COMPOSITE DATABASE
TERTIARY DATABASE
WHY NEED?
CONCLUSION
REFRENCES


Slide Content

PRIMARY AND SECONDARY BIOLOGICAL
DATABASE
By
KAUSHAL KUMAR SAHU
Assistant Professor (Ad Hoc)
Department of Biotechnology
Govt. Digvijay Autonomous P. G. College
Raj-Nandgaon ( C. G. )

CONTANTS
•INTRODUCTION
•WHAT IS DATA AND DATABASE?
•WHAT IS BIOLOGICAL DATABASE?
•TYPES OF BIOLOGICAL DATABASE
–PRIMARY DATABASE
•Nucleic acid sequence database
•Protein sequence database
–SECONDARY DATABASE
–COMPOSITE DATABASE
–TERTIARY DATABASE
•WHY NEED?
•CONCLUSION
•REFRENCES
5/11/2020
2

INTRODUCTION
Application of
computational
techniques
Management
and Analysis
of biological
Data.
Bioinformatic
History:
•The first English use of the word "data" is from the 1640s.
•Using the word "data" to mean "transmittable and
storable computer information" was first done in 1946.
•The first database was created in 1956 .
•Insulin protein is the first protein to be sequenced.
5/11/2020 3

DATA
•Aseriesof
observations,
measurements ,
orfacts;
information
and also
called:information
computing.
DATABASE
•A large
systematizedcollecti
onofdatathat can
be
expanded,updated,
andretrievedrapidly
for specific purpose.
5/11/2020 4

BIOLOGICAL DATABASE
•Storage ofbiologicalinformation(Nucleic
acid sequence, Protein sequence and
structure).
5/11/2020
5

DEFINATION
Biologicaldatabasearecomputersites
thatorganise,storeanddisseminatefilesthat
containinformationconsistingofliterature
references,nucleicacidsequencesandProtein
sequencesandstructure.
5/11/2020
6

SOURCES ON THE WEB FOR IMPORTANT
DATABASE
5/11/2020
7

TYPES OF BIOLOGICAL DATABASE
1.
•Primary Database
2.
•Secondary Database
3.
•Composite Database
4.
•Tertiary Database
5/11/2020 8

Primary Database
Storesbiomolecularsequences(ProteinorNucleicacid)
andassociatedannotationinformation(Organism,
species,mutationlinkedtoparticulardiseases,
bibliographicetc.)
Primarysourcesareoriginalmaterialsonwhichresearch
isbased.
Neitherinterpretednorcondensednorevaluatedby
otherwriters.
5/11/2020 9

PRIMARY
Nucleotide
sequences
NCBI GenBank
EMBL
DDBJ
Protein
Sequences
PIR
UniProt
SWISS-
PROT
TrEMBL
5/11/2020
10

NCBI
•Located in Bethesda, Marylandand was founded in 1988
through legislation sponsored by Senator Claude Pepper.
•Was directed byDavid Lipman, one of the original authors of
theBLAST.
•The NCBI houses a series of databases.
EX. : GenBank-DNA sequences.
PubMed(a bibliographic database ) -the biomedical
literature.
Other databases -Epigenomicsdatabase.
5/11/2020 11

GenBank
•ApartofInternationalnucleiotidesequencedatabase
collaborationwhichcomprisedofEMBL,DDBJGenBank
atNCBI.
•Thedatabasestartedin1982byWalterGoadandLos
AlamosNationalLaboratory.
•In15August2017,GenBankrelease221.0has
203,180,606loci,240,343,378,258bases,from
203,180,606reportedsequences.
https://www.revolvy.com/main/index.php?s=GenBank
5/11/2020 12

EMBL-EBI
•Establishedin1980attheEMBLlaboratoriesin
Heidelberg,Germany.
•Aninternational,innovativeandinterdisciplinary
researchorganisationfundedby23memberstatesand
twoassociatememberstates.
•Location-Hinxton,Cambridge,UK.
5/11/2020
13

DDBJ
•1987 DDBJ release 1 was provided.
•Situated in Mishima, Japan.
5/11/2020 14

5/11/2020 15

5/11/2020
16

SECONDARY DATABASE
•Derivedfrom theanalysisof primary data.
•Present in the form of regular expressions(patterns),
fringerprints, blocks.
Secondary
databse
PROSITE
PRINTS
5/11/2020 17

PROSITE
•It is consists of entries describing the protein families,
domains and functional sites as wel as aminocid patterns
and profiles in them.
•Complemented by collection of rules based profiles and
pattern i.e. ProRule.
5/11/2020 18

PRINTS
•Collection of protein motif fringerprints.
•the motifs do not overlap, but are separated along a
sequence, though they may be contiguous in 3D-space.
•Fingerprints can encode protein folds and functionalities
more flexibly and powerfully than can single motifs, full
diagnostic potency deriving from the mutual context
provided by motif neighbours.
5/11/2020
19

COMPOSITE DATABASE
•Representanamalgamationofseveralprimarydatabase
sourcesandareeasytouse.
•Accessalltherelevantinformationfromasinglesource
ratherthanconnecttomultipleresources.
Ex.NCBI,UniProtetc.
5/11/2020
20

CONCLUSION
•Bioinformatics is the application of information
technology to store, organize To make biological data
available in computer-readable form.
•We can easily analyze the vast amount of biological
data which is available in the form of sequences and
structures of proteins(the building block of organisms)
and nucleic acid (the information carrior).
•Need for storing and communicating large datasets has
grown .
•Make biological data available to scientists.
5/11/2020 21

REFERENCES
•Books:
–Bioinformatics –C.S.V.Murthy -edition-1
st
-2003 .
–Bioinformatics –S.C. Rastogi -edition-1
st
-2003.
•Other s source:
–https://www.ncbi.nlm.nih.gov/nuccore/NC_002371.2
–http://vle.du.ac.in/mod/book/print.php?id=8913&chapterid=12618
–https://web.expasy.org/docs/swiss-prot_guideline.html
–nd%20Managing%20Information%20Leicester/page_21.htm
–https://bioinf.comav.upv.es/courses/biotech3/theory/databases.ht
ml
5/11/2020 22