Primary, secondary, tertiary biological database

6,440 views 33 slides May 11, 2020
Slide 1
Slide 1 of 33
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27
Slide 28
28
Slide 29
29
Slide 30
30
Slide 31
31
Slide 32
32
Slide 33
33

About This Presentation

Introduction
Biological database
Types: 1.primary database
Nucleic acid sequence database : Genebank , EMBL, DDJB
Protein sequence database:PIR,SWISS-PROT,TrEMBL

2. Secondary database
PRINTS
PROSITE
PROFILES
BLOCKS
IDENTITY
3. Composite database: Non-redundant databases (NRDB)
Non-redundant prote...


Slide Content

Powerpoint Templates
Page 1
PRIMARY,SECONDARY,TERTIARY BIOLOGICAL DATABASE
By
KAUSHAL KUMAR SAHU
Assistant Professor (Ad Hoc)
Department of Biotechnology
Govt. Digvijay Autonomous P. G. College
Raj-Nandgaon ( C. G. )

Powerpoint Templates
Page 2
Synopsis
Introduction
Biological database
Types: 1.primary database
Nucleic acid sequence database : Genebank, EMBL, DDJB
Protein sequence database:PIR,SWISS-PROT,TrEMBL
2. Secondary database
•PRINTS
•PROSITE
•PROFILES
•BLOCKS
•IDENTITY
3. Composite database: Non-redundant databases (NRDB)
•Non-redundant protein sequence databases (OWL)
•SWISS-PROT+ TrEMBL
•MIPSX
Important database search tool
Application

Powerpoint Templates
Page 3
INTRODUCTION
DATABASE
•Convenient method of vast amount of information
•Allows for proper storing, searching & retrieving of data.
•Before analyzing them we need to assemble them into central,
shareable resources
•Different Database Types
•Depends on the nature of information stored (sequences, 2D gel or
3D structure images)
•Manner of storage (flat files, tables in a relational database, etc)

Powerpoint Templates
Page 4
BIOLOGICAL DATABASE
•It is the library of life science information collected from scientific
experiment, published literature and computational analysis as
much as possible particular type of information should be available
in one single place and make biological data available in computer
readable form.
•They contain information from research areas including genomics,
proteomics, metabolomics, microarraygene expression, and
phylogenetics.

Powerpoint Templates
Page 5
Biological Databases
Types of biological data and the information they contain
Bibliographic databases Literature
Taxonomic databases Classification
Nucleic acid databases DNA information
Genomic databases Gene level information
Protein databases Protein information
Protein families, domains and
functional sites
Classification of proteins and identifying
domains
Enzymes/ metabolic pathway Metabolic pathways

Powerpoint Templates
Page 6
Types of Biological Databases
•Primary database:
•Primary sequence database are a database that stores bimolecular
sequence. (protein or nucleic acid) and associated annotation
information (organism, species, function, mutation linked to
particular diseases functional/structured patterns,bibliographicetc)
•Primary database tool are effective for identifying the sequence
similarities, but analysis of output is sometimes difficult and cannot
always answer some of the more sophisticated question of
sequence analysis.

Powerpoint Templates
Page 7
primary
sequence
database
Nucleic acid
sequence
database
gene bank,
EMBL
DDJB
Protein
sequence
database
PIR, SWISS -
PROT,TrEMBL

Powerpoint Templates
Page 8
Nucleic acid sequence database
gene bank
•The term gene bank refers to any system by which
the genetic composition of some population is identified
and stored
•Set up in 1979 at the LANL ( los Alamos)
•Web server : http://www.ncbi.nlm.nih.gov
•Gene bank is the main nucleotide sequence databases held by the
national center for biotechnology information (NCBI
•Gene bank files contain information like accession numbers and
gene names, phylogeneticsclassification and references to
published literature.

Powerpoint Templates
Page 9

Powerpoint Templates
Page 10

Powerpoint Templates
Page 11

Powerpoint Templates
Page 12

Powerpoint Templates
Page 13
FIG: gene bank file format

Powerpoint Templates
Page 14
European molecular biology laboratory
(EMBL)
•Established in 1978 at Heidelberg.
•Place: Heidelberg. Germany.
•Site: http://www,embl-heidelberg
•The EMBL nucleotide sequence database is a comprehensive
database of DNA and RNA sequences collected from the scientific
literature and patent applications and directly submitted from
researches and sequencing groups.
•Data collection is done in collaboration with gene bank (USA) and
the database of japan(DDBJ).

Powerpoint Templates
Page 15
Homepage of EMBL

Powerpoint Templates
Page 16
DNA Data bank of japan (DDBJ)
•It is located in japan
•Sites: http://www.ddbj.nig.ac.jp
•http://biodatabase.org/index.php/DDBJ
•Establishment: 1984 at the national
institute of genetics (NIG)in mishima,
japan.
•DDBJ has been functioning as an
international nucleotide sequence
database.

Powerpoint Templates
Page 17
Protein sequence database
•SWISS-PROT protein sequence database
•SWISS-PROT was created in at the department of medical
biochemistry in 1986.
•In 1987, European Molecular biology laboratory and Swiss institute
of Bioinformatics (SIB) work in collaboration ,as equal partners , to
develop and maintain this highly annotated repository of protein
sequences.
•It provides high quality annotation with minimum redundancy .
•The structure of SWISS –PROT entry is similar to EMBL nucleotide
sequence database format.
•The format is convenient to humans and is used by several
computer programs for analysis.

Powerpoint Templates
Page 18
Translated EMBL (TrEMBL)
•It was created in 1996 with the objective to fill the gap
between flow of genomic data and annotated protein sequences.
•TrEMBLcontains computer annotated records generated by
translating coding sequences (CDS) available in EMBL nucleotide
sequence database.
•It does not contain translation of those CDS which are already
available in SWISS-PROT ,and acts as a computer annotated
supplement of SWISS-PROT .

Powerpoint Templates
Page 19
Protein information resource
(PIR)
•PIR was established in 1984 by the National Biomedical Research
Foundation (NBRF) as a resource to assist researchers in the
identification and interpretation of protein sequenceinformation.
•The database is split into four sections PIR1 to PIR4

Powerpoint Templates
Page 20
Secondary databases:
•This database contain additional information derived from the
analysis of data available in primary repositories.
SECONDARY
OR PATTERN
DATABASES
PROFILES
PRINTS
Pfam
identity
BLOCKS
PROSITE

Powerpoint Templates
Page 21
1.PROSITE:
•It is a method of determining what is the function of
uncharacterized proteins translated from genomic or cDNAsequences.
•It consists of a database of biologically significant sites, patterns and
profiles that help to reliably identify to which known family of protein (if any)
a new sequence belongs.
•It is first one to develop is PROSITE as a secondary database.
•Maintained collaboratively at the Swiss Institute of bioinformatics .
•SITE: http://ftp.expasy.ch.
•It include protein pattern motifs indicative protein’s function , are widely
used for function prediction studies, cellular localization annotation, and
sequence classification.

Powerpoint Templates
Page 22
Home page of PROSITE

Powerpoint Templates
Page 23

Powerpoint Templates
Page 24
2. PRINTS:
–A different approach to pattern recognition, termed
"fingerprinting" is used by this database.
–Diagnostically, it makes sense to use many, or all, of
the conserved regions to build a family signature.

Powerpoint Templates
Page 25
Direct PRINTS access:
•By accession number
By PRINTS code
By database code
By text
By sequence
By title
By number of motifs
By author
By query language

Powerpoint Templates
Page 26
3. BLOCKS
•Blocks are multiply aligned ungappedsegments corresponding to
the most highly conserved regions of proteins.
•The BLIMPS (blocks improved searcher ) program searches the
blocks database.
•4. Pfam
•create protein family
•They are thus particularly useful when analyzing multidomainproteins.
•The biggest drawback of Pfamis its lack of biological information
(annotation) of the protein families.

Powerpoint Templates
Page 27
Composite database
•A composite database combines information from various primary
databases and makes it convenient to search the desired
information without querying to all these primary databases.
composite
protien
sequence
database
Nonredundant
database
(NRDB)
Nonredundant
protein
sequence
database
(OWL)
MIPSX
SWISS-
PROT+TrEMBL

Powerpoint Templates
Page 28
OWL:
•a composite protein sequence database
•OWL performs fast similarly due to its non redundant which makes it
highly compact.
•Non redundant database (NRDB)
•It is a composite database formed by using PDB sequences,
SWISS-PROT, PIR, TrEMBL.
•It contains non-identical sequences and hence in bigger than OWL
but less proficient for search.

Powerpoint Templates
Page 29
MIPSX
•Merged databases
•Produced at the max-planckinstitute in martinsried. Databases
contains information from resources.
SWISS-PROT+TrEMBL
•Combination of SWISS-PROT and TrEMBLprovides the resources .
contains fewer errors.
•Not truly non-redundant.

Powerpoint Templates
Page 30
Important database search tool:
SEARCH TOOL FUNCTION PROVIDED
BLAST (BASIC LOCAL
ALIGNMENT TOOL)
Used to analyze sequence information
and detect homologous sequences.
ENTREZ Used to access literature , sequence
and structural database.
DNAPLOT Sequence alignment tool
LOCUS LINK Accessing information on homologous
gene
STRUCTURE It support molecular molding database
(MMDB)and software tool for
structure analysis.
TAXONOMY BROWSER Taxonomic classification of various
species as well as genetic information.

Powerpoint Templates
Page 31
Applications
•Protein sequence
•Determination of macromolecular structure
•Molecular evolution
•Biological database in medicines
•Biological database in agriculture
•Drug development
•Sequence alignment
•Evolutionary studies

Powerpoint Templates
Page 32
Conclusion:
•Biological databases represent an invaluable resource in
support of biological research.
•Access to biological databases is so important that today virtually
every molecular biological project starts and ends with querying
biological databases.

Powerpoint Templates
Page 33
References
•Books
•Bioinformatics: C.S.V Murthy
•Biotechnology: U.Satyanarayan
•Bioinformatics concept ,skill and application:S.CRastogi
,ParagRastogi
•Websites
•www.bioinfo.com
•www.wikipedia.com
•www.ncbi.nil.nih.gov