1.Databases for bioinformatics and its types

DrBeenishAftab 178 views 14 slides May 09, 2024
Slide 1
Slide 1 of 14
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14

About This Presentation

1.Databases for bioinformatics and its types


Slide Content

WHAT is a database?
•A collection of data that needs to be:
–Structured
–Searchable
–Updated (periodically)
–Cross referenced
•Challenge:
–To change “meaningless” data into useful information that can be
accessed and analysed the best way possible.
For example:
HOW would YOU organise all biological sequences so that the
biological information is optimally accessible?
You need an appropriate database management system (DBMS)

DBMS
•Internal organization
–Controls speed and
flexibility
•A unity of programs that
–Store
–Extract
–Modify
Database
StoreExtractModify
USER(S)

DBMS organisation types
•Flat file databases (flat DBMS)
–Simple, restrictive, table
•Hierarchical databases (hierarchical DBMS)
–Simple, restrictive, tables
•Relational databases (RDBMS)
–Complex,versatile, tables
•Object-oriented databases (ODBMS)
–Complex, versatile, objects

Relational databases
•Data is stored in multiple relatedtables
•Data relationships across tables can be
either many-to-oneor many-to-many
•A few rules allow the database to be
viewed in many ways
•Lets convert the “course details” to a
relational database

Student 1 Chemistry Biology A B B A C …..
Student 2 Ecology Maths A D A A A …..
.
.
.
.
Course detailsFLAT DATABASE 2
Student 2 Ecology Biology A B A A A …..
Student 1 Chemistry English A A A A A …..
.
.
.
.
Name Depart. Course E1 E2 E3 P1 P2
Student 1 Chemistry Maths C C B A A …..
Our flat file database

Normalize (1NF) …
•We remove repeating records (rows)
sID Name dID
1 Student1 1
2 Student2 2
cID Course
1 Biology
2 Maths
3 English
dID Department
1 Chemistry
2 Ecology
1 1 A B B A C …..
2 2 A D A A A …..
.
.
.
.
2 1 A B A A A …..
1 3 A A A A A …..
.
.
.
.
sID cID E1 E2 E3 P1 P2
1 2 C C B A A …..
Primary keys
Foreign keys

sID Name dID
1 Student1 1
2 Student2 2
cID Course
1 Biology
2 Maths
3 English
gID Grade
1 A
2 B
3 C
dID Department
1 Chemistry
2 Ecology
wID Project
1 E1
2 E2
3 E3
4 P1
5 P2
sID cID gID wID
1 1 1 1
1 1 2 2
1 1 2 3
1 1 1 4
1 1 3 5
2 1 1 1
2 1 1 2
2 1 2 3
2 1 1 4
2 1 1 5
Normalize (2NF) …
•We remove redundant fields (columns)

Relational Databases
•What have we achieved?
–No repeating information
–Less storage space
–Better reality representation
–Easy modification/management
–Easy usage of any combination of records
Remember
the DBMS has programs to access and edit this
information so ignore the human reading limitation of
the primary keys

Accessing database information
•A request for data from a database is
called a query
•Queriescan be of three forms:
–Choose from a list of parameters
–Query by example (QBE)
–Query language
Query by Example (QBE) reports allows end users to query, insert, update, and delete
values into a database table or view.
In the QBE build wizard, you choose which data to display in the report. Or, you can
allow end users to make their own queries in the QBE report's customization form.
Because the QBE system formulates the actual query, QBE is easier to learn than
formal query languages, such as the standard Structured Query Language (SQL).

Distributed databases
•From local to global attitude
•Data appears to be in one location but is most definitely
not
•A definition: Two or more data files in different locations,
periodically synchronized by the DBMS to keep data in
all locations consistent (A,B,C)
•An intricate network for combining and sharing
information
•Administrators praise fast network technologies!!!
•Users praise the internet!!!

Three main Points
•Database proliferation
–Dozens to hundreds at the moment
•More and more scientific discoveries result
from inter-database analysis and mining
•Rising complexity of required data-
combinations
–E.g. translational medicine: “from bench to
bedside” (genomic data vs. clinical data)
Proliferation = great and rapid increase in numbers; Grid = a network of evenly
space horizontal and vertical lines (rooster);
Semantic = related to the meaning;

Biological databases
•Like any other database
–Data organization for optimal analysis
•Data is of different types
–Raw data (DNA, RNA, protein sequences)
–Curated data (DNA, RNA and protein
annotated sequences and structures,
expression data)

A few biological databases
•Nucleotide Databases
Alternative Splicing, EMBL-Bank, Ensembl, Genomes Server, Genome,
MOT, EMBL-Align, Simple Queries, dbSTS Queries, Parasites, Mutations,
IMGT
•Genome Databases
Human, Mouse, Yeast, C.elegans, FLYBASE, Parasites
•Protein Databases
Swiss-Prot, TrEMBL, InterPro, CluSTr, IPI, GOA, GO, Proteome Analysis,
HPI, IntEnz, TrEMBLnew, SP_ML, NEWT, PANDIT
•Structure Databases
PDB, MSD, FSSP, DALI
•Microarray Database
ArrayExpress
•Literature Databases
MEDLINE, Software Biocatalog, Flybase Archives
•Alignment Databases
BAliBASE, Homstrad, FSSP

A short word on problems
•Even today we face some key limitations
–There is no standard format
•Every database or program has its own format
–There is no standard nomenclature
•Every database has its own names
–Data is not fully optimized
•Some datasets have missing information without indications
of it
–Data errors
•Data is sometimes of poor quality, erroneous, misspelled
•Error propagation resulting from computer annotation
Tags