-The applications of computer sciences to molecular
biology in particular to the study of macromolecules
such as proteins and nucleic acids
-Bioinformatics is an interdisciplinary research area
at the interface between computer science and
biological science
Synonyms: Molecular Bioinformatics,
Computational Biology, Biocomputing
Bioinformatics
What is bioinformatics?
Definition: Application of computational and analysis tools to the
capture and interpretation of biological data
Computational Biology is sometimes considered to be
synonymous with Bioinformatics
More commonly, Bioinformatics and Computation Biology are regarded
as overlapping terms as might be represented by a Venn diagram
What does that mean?
Mathematics
IT/Engineering
Statistics
Processor development
Network traffic improvement
Storage solutions
Artificial Intelligence
Pattern recognition
Text mining
Image processing
Simulation
3D structure visualisation
Surface modelling
ontologies
Databases
Sequence alignment
Comparative genomics
Drug design
Protein: protein interactions
Gene finding
Protein folding
Homology searching
Evolutionary modelling
Gene expression analysis
Non-coding RNA
GWAS
Annotation
Epidemiology
Personalised medicine
Biological networks
Bioinformatics Topics
Informatics Biology
Operating Systems
Windows, Macintosh, Linux
All OS options are conceptually identical …
enabling control over files, folders, and programs
Linuxcommand line! … the only option for compute
intense software
Bioinformatics Topics
Informatics Biology
Programming
Sufficient skill to affect basic management of
large datasets is important
Sufficient skill to construct simple customized pipelines
Bioinformatics Topics
Informatics Biology
Statistics
A basic understanding of Statistics is just as vital when
designing an experiment
When large datasets need to be interpreted, it demands a
working familiarity with a quality Statistical Package
Bioinformatics software commonly employs statistics to
select the most probable answer from a set of many possible
answers to a given question
Bioinformatics Topics
Informatics Biology
Data Generation
Experimental Data types include:
Sequences -Typically Next-Generation DNA Sequencing (NGS)
3D Protein Structures -X-ray crystallography or Nuclear
magnetic resonance spectroscopy (NMR)
Gene Expression Data -Microarrays
Bioinformatics Topics
Informatics Biology
Data Analysis
The Alignment of Pairs of Homologous DNA/Protein sequences
Fundamental to most forms of DNA/Protein Sequence analysis
Searching for Homologous Sequences in a Sequence Database
Database searching is the most common Bioinformatics
process by far
Database searching is pairwise comparison repeated many times
A list of matches, ordered by the improbability of occurring just by
chance is generated
Bioinformatics Topics
Informatics Biology
Data/Information Storage/Access
Raw Experimental Data, can next be Annotated in the light of
analytical revelation
Data + Annotation = Information
Information can now be stored in Databases that allow
users easy and unrestrictedaccess
Primary DNA Sequence Databases
Original submission by experimentalists contentcontrolled by the
submitter
EMBL, NCBI-GenBank, DDBJ
Primary Protein Sequence Databases
PIR, Swissprot, TrEMBL
Genome Databases store entire genome sequence(s) AND their
interpretation
Protein Structure Databases
PDB, PDBj, CATH, SCOP
Gene Ontology Database
The Gene Ontology (GO) database provides a hierarchy of formally agreed terms
to describe gene products accurately and unambiguously
Searching with these terms radically improves the efficacy of
annotation searching
A simplistic ordering for the Bioinformatics Topics
Why is bioinformatics needed?
• Small-and large-scale biological analyses
• New laboratory technologies
• Move away from single gene to whole genome
• Genome sequencing
• Collection and storage of biological information
• Manipulation of biological information
• Computers have capability for both, and cheap
Problems and Challenges
Know the sequence of every possible
transcript but not understand the functions of
these transcripts and their corresponding
proteins!
How to make sense of all of the gene and
protein data in order to assign functions to
these genes and proteins and to understand
biological processes at the molecular level?
Challenges
Databases and data resources
Because we need to store and retrieve lots
of data
Search and analysis tools
Because we need to infer
function by comparison
Interfaces and visualisation tools
Because we need to look at
lots of data
From gene to protein and its function(s)
> DNA sequence
AATTCATGAAAATCGTATACTGGTCTGGTACCGGCAACAC
TGAGAAAATGGCAGAGCTCATCGCTAAAGGTATCATCGAA
TCTGGTAAAGACGTCAACACCATCAACGTGTCTGACGTTA
ACATCGATGAACTGCTGAACGAAGATATCCTGATCCTGGG
TTGCTCTGCCATGGGCGATGAAGTTCTCGAGGAAAGCGAA
TTTGAACCGTTCATCGAAGAGATCTCTACCAAAATCTCTG
GTAAGAAGGTTGCGCTGTTCGGTTCTTACGGTTGGGGCGA
CGGTAAGTGGATGCGTGACTTCGAAGAACGTATGAACGGC
TACGGTTGCGTTGTTGTTGAGACCCCGCTGATCGTTCAGA
ACGAGCCGGACGAAGCTGAGCAGGACTGCATCGAATTTGG
TAAGAAGATCGCGAACATCTAGTAGA
Gene
> Protein sequence
MKIVYWSGTGNTEKMAELIAKGIIESGKDVNTINVS
DVNIDELLNEDILILGCSAMGDEVLEESEFEPFIEEIS
TKISGKKVALFGSYGWGDGKWMRDFEERMNGYG
CVVVETPLIVQNEPDEAEQDCIEFGKKIANI
Function
What is the function of these structures?
What is the function of this sequence?
What is the function of this motif?
–the fold provides a scaffold, which can be decorated
in different ways by different sequences to confer
different functions
–knowing the fold & function allows us to rationalise
how the structure effects its function at the molecular
level
Goals of Functional Genomics
Tools currently available for genomics and
functional genomics studies
Standard molecular biology and protein analysis
techniques, i.e. hybridization, 2D gel
electrophoresis, SAGE, etc.
Advance technologies, i.e. microarray, GeneChips,
proteomics, etc.
Bioinformatics: gene annotation, gene and genome
analysis, data mining, etc.
Molecular Biology
•Central Dogma of Molecular Biology:
–molecules and processes.
•Molecular biology studies:
–structure of macromolecules (DNA, RNA and protein)
–flow and expression of genetic information.
–metabolic steps that mediate the flow of information
from the genome to the phenotype of the organism
Transcription
DNA
5’ 3’
mRNA Splicing
Translation
Poly-peptide
Folding
Protein
•Transport / Localization
•Oligomerization
•Post-Translational Modification
Function Function
We needBioinformatics in all levels
Transcription
DNA
5’ 3’
mRNA Splicing
Translation
Poly-peptide
Folding
Protein
•Transport / Localization
•Oligomerization
•Post-Translational Modification
Function Function
At Genome Level
Genome Projects
need to store and
organize DNA
sequences
Transcription
DNA
5’ 3’
mRNA Splicing
Translation
Poly-peptide
Folding
Protein
•Transport / Localization
•Oligomerization
•Post-Translational Modification
Function Function
At Transcription Level
How do we find protein
coding regions, introns
and exons in genomic
DNA sequences?
Transcription
DNA
5’ 3’
mRNA Splicing
Translation
Poly-peptide
Folding
Protein
•Transport / Localization
•Oligomerization
•Post-Translational Modification
Function Function
At Transcription Level
Under which
condition is a certain
gene transcribed?
Transcription
DNA
5’ 3’
mRNA Splicing
Translation
Poly-peptide
Folding
Protein
•Transport / Localization
•Oligomerization
•Post-Translational Modification
Function Function
At Translation Level
What do we
know about a
specific protein?
Transcription
DNA
5’ 3’
mRNA Splicing
Translation
Poly-peptide
Folding
Protein
•Transport / Localization
•Oligomerization
•Post-Translational Modification
Function Function
At Translation Level
How can we
compare protein
sequences?
Transcription
DNA
5’ 3’
mRNA Splicing
Translation
Poly-peptide
Folding
Protein
•Transport / Localization
•Oligomerization
•Post-Translational Modification
Function Function
At Structure Level
Can we predict
protein structures?
Impact of Genomics on Medicine
I. Diagnostics
Genomics: Identifying all known human genes
Functional Genomics: Functional analysis of genes
In what tissues are they important?
When in development are the genes used?
How are they regulated?
Novel diagnostics
Linking genes to diseases and to traits
Predisposition to diseases
Expression of genes and disease
Personal Genomics
Understanding the link between genomics and environment
Increased vigilance and taking action to prevent disease
Improving health care
Impact of Genomics on Medicine
II. Therapeutics
Novel Drug Development
Identifying novel drug targets
Validating drug targets
Predicting toxicity and adverse reactions
Improving clinical trials and testing
Gene therapy
Replacing the gene rather than the gene product
Stem cells therapies
Replacing the entire cell type or tissue to cure a disease
Pharmacogenomics
Personalized medicine
Adjusting drug, amounts and delivery to suit patients
Maximize efficacy and minimize side effects
Identify genetics of adverse reactions
Identify patients who respond optimally
Application of bioinformatics
To clinical problems
Understanding disease
Treatment and management
Development of medicines
Tailoring treatment
Applications of Bioinformatics
Molecular
Interactions
Structure PredictionNH
O
COO
-
H
N
N
N
OH
NH
2
N
CH
2
NH N
NH
O
COO
-
COO
-
H
N
N
NH
N
OH
NH
2
Search for new drugsNH
2
NH
2
N
N
CH
3
Cl
N
CH
3 NH
2
NH
2
N
N CH
2
OCH
3
OCH
3
OCH
3 NH
2
NH
2
N
N CH
2
OCH
3
OCH
3
OCH
3 H
C
NH
NH
2
N
NH
CH
3
Cl
NH
CH
3 H
C
NH
NH
2
N
NH
CH
3
Cl
NH
CH
3
Cl
data analysis, algorithms,
visualization, statistics, etc.
DNA chips
Biochemical Networks
Genetic Variations
Optimizing therapies
Sequence Analysis
Genomes
Proteins
d1dhfa_ LNCIVAVSQNMGIGKNGDLPWPPLRNEFRYFQRMTTTSSVEGKQ -NLVIMGKKTWFSI
d8dfr__ LNSIVAVCQNMGIGKDGNLPWPPLRNEYKYFQRMTSTSHVEGKQ -NAVIMGKKTWFSI
d4dfra_ ISLIAALAVDRVIGMENAMPWN -LPADLAWFKRNTL--------NKPVIMGRHTWESI
d3dfr__ TAFLWAQDRDGLIGKDGHLPWH -LPDDLHYFRAQTV--------GKIMVVGRRTYESF
d1dhfa_ LNCIVAVSQNMGIGKNGDLPWPPLRNEFRYFQRMTTTSSVEGKQ -NLVIMGKKTWFSI
d8dfr__ LNSIVAVCQNMGIGKDGNLPWPPLRNEYKYFQRMTSTSHVEGKQ -NAVIMGKKTWFSI
d4dfra_ ISLIAALAVDRVIGMENAMPW -NLPADLAWFKRNTLD--------KPVIMGRHTWESI
d3dfr__ TAFLWAQDRNGLIGKDGHLPW -HLPDDLHYFRAQTVG--------KIMVVGRRTYESF
caaaaatagggttaatatgaatctcgatctccattttgttcatcgtattcaacaacaagcc
aaaactcgtacaaatatgaccgcacttcgctataaagaacacggcttgtggcgagatatct
cttggaaaaactttcaagagcaactcaatcaactttctcgagcattgcttgctcacaatat
tgacgtacaagataaaatcgccatttttgcccataatatggaacgttgggttgttcatgaa
actttcggtatcaaagatggtttaatgaccactgttcacgcaacgactacaatcgttgaca
ttgcgaccttacaaattcgagcaatcacagtgcctatttacgcaaccaatacagcccagca
agcagaatttatcctaaatcacgccgatgtaaaaattctcttcgtcggcgatcaagagcaa
tacgatcaaacattggaaattgctcatcattgtccaaaattacaaaaaattgtagcaatga
aatccaccattcaattacaacaagatcctctttcttgcacttgg