BASICS OF BIOINFORMATICS.
PART-1
-JAYATI SHRIVASTAVA
“Bioinformatics”
“ The mathematical statistical and computing method that
aim to solve biological problem using DNA, Amino acid
sequence and reteriveinformation.” ~FredjTekaia.
• General definition: computational techniques for solving
biological problems.,
–data problems: representation (graphics), storage and
retrieval (databases), analysis (statistics, artificial
intelligence, optimization, etc.)
–biology problems: sequence analysis, structure or function
prediction, data mining, etc.
•It is basically giving concept to molecular biology in sence
of physical chemistry then applying ” informatics”derived
from computer science, mathsand statics to understand
the information associated with the molecule on large
scale.
Need for Bioinformatics
•When in the early 1980s methods for DNA sequencing became widely
available, molecular sequence data expeditiously started to grow
exponentially. After the sequencing of the first microbial genome in 1995, the
genomes of more than 100 organisms have been sequenced and large-scale
genome sequencing projects have evolved to routine, though still non-trivial,
procedures (Janssen et al., 2003; Kanehisaand Bork, 2003). The imperative of
efficient and powerful tools and databases became obvious during the
realization of the human genome project, whose completion has been
established several years ahead of schedule. The accumulated data was stored
in the first genomic databases such as GenBank, European Molecular Biology
Laboratory Nucleotide Sequence Database (EMBL), and DNA Data Bank of
Japan (DDBJ)
•As an example, the number of entries in a database of gene sequences in
GenBankhas increased from 1,765,847 to 22,318,883 in the last five years.
These entries tend to double every 15 months (Benson et al., 2002).
•There are two major challenging areas in bioinformatics:
(1) data management and
(2) knowledge discovery.
Fig1. The growth of data in GenBank
(source: http://www.ncbi.nih.gov/Genbank/genbankstats.html)
•Our body is made up of trillions of cells. According to human genome
project, the number of genes in each cell is approximately 20,000.
•This microscopic cell has an ultramicroscopic commanding centre called
nucleus within which 2 m of DNA is elegantly packaged. The number of
nucleotides is -3x109. That much enormous data in a cell! How could we
store this, access this data. analyze this data Here comes the use of
computers.
•We developed and used computers for the same purpose, efficient data
storage retrieval and analysis. With the advancement in sequencing
technology, each day thousands of nucleotides of different organisms are
sequenced and submitted to the databases worldwide.
•In Bioinformatics, the use of computer is same as previously but the
data is biological data, the letters of life.
•Actually we are now facing an information overload. Loads of sequence
data but the real challenge is to make sense of this data.
History and Landmark event in the field
of Bioinformatics
.
•1965 Margret Dayhoff’sAtlas of protein sequences.
•1970 Needleman –WunschAlgorithm
•1977 DNA sequencing and software to analyze it
•1981 Smith-Waterman algorithm developed
•1981 The concept of sequence motif
•1982 GeneBankrelase3 made public
•1982 Phage lamdagenome sequenced
•1983 Sequence database searching algorithum
•1985 FASTA/ FASTN Fast sequence similarity searching
•1988 National centre for biotechnology information (NCBI) created at NIH/NLM.
•1988 EMBnetnetwork for database distribution
•1990 BLAST: Fast sequencing searching
•1991 EST: expressed sequence tag sequenceing
•1993 Sanger center, Hinxton, UK
•1994 EMBL European bioinformatics instuteHinxton, UK
•1995 First bacterial genome completely sequenced
•1996 yeast genome completely sequenced
•1997 PSI-BLAST
•1998 Worm genome completely sequenced
•1999 Fly genome completely sequenced.
Founder of Bioinformatics: Margaret O. Dayhoff
PaulienHogewegand Ben Hesper
Scope Of Bioinformatics
Computer-Aided Drug Design(CADD) emerged as an efficient means of identifying
potential lead compounds and for aiding the developments of possible drugs for a wide
range of diseases [8, 9]. Today, a number of computational approaches are being used to
identify potential lead molecules from huge compound libraries.
Pharmacology is the science of how drugs act on biological systems and how
the body responds to the drug. The study of pharmacology encompasses the
sources, chemical properties, biological effects and therapeutic uses of drugs.
1. Pharmacology and CADD
•Bioinformatics leads to accelerate Drug target, identification,
validation, discovery of drug, characterization of side effects, also
help us to predict drug resistance.
•Also use in the development of Biomaker:
Toxigemomics(how protein act in response to toxic substance)
Pharmacogenomics(role of genome against Drug response
Both these tools use to maximize therapeutic benefit of drug.
In Next 10 Years
We will see Quantum Computing, which will highly beneficial for
CADD
“Quantum computing isa rapidly-emerging technology that
harnesses the laws of quantum mechanics to solve problems too
complex for classical computers.”
2.Proteomics
•Proteomics is “Extensive Study Of Proteins”
Proteomics is used to investigate:
•when and where proteinsare expressed
•rates of protein production, degradation, and steady-state abundance
•how proteins are modified (for example, post-translational modifications (PTMs)such
as phosphorylation)
•the movement of proteins betweensubcellularcompartments
•the involvement of proteins in metabolic pathways
•how proteins interact with one another
Proteomicscan provide significant biological information for many biological problems,
such as:
•which proteins interact with a particular protein of interest (for example, the tumour
suppressor protein p53)?
•which proteins are localisedto a subcellularcompartment (for example, cell
membrane)?
•which proteins are involved in a biological process (for example, circadian rhythm)?
•THESE PROCESS OF PROTEOMICS HIGHLY DEPEND UPON BIOINFORMATICS.
•Which means if proteomics Application will expand, field of
bioinformatics will also expand.
3. Centralize Data Analysis
•Bioinformatics provide globally accessible database that enable several
scientists to search, submit and analyseinformation.
•This global Collaboration will grow beyond leaps and bounds.
•Thus learning bioinformatics can put us in global map of collaboration.
4. Cancer bioinformatics
Cancer Bioinformaticsprovides a
unique and outstanding platform
and opportunity for scientists to
integrate omicsscience,
bioinformatics tools and data,
clinical research, disease-specific
biomarkers, dynamic networks,
with precision medicine, together
fighting cancer and improving the
life quality of patients with cancer.
Role of Bioinformatics in Cancer Research and drug Development
Source: https://doi.org/10.1016/B978-0-323-89824-9.00011-2
FigureExampleschematicofuseof
personalisedmedicine.FromABPI2016
5. Personalized Medicine
The concept of Personalisedmedicine –
The right medicine at the right dose for
the right patient.
What is meant by Personalisedmedicine?
A form of medicine that uses information
about a person's own genes or proteins to
prevent, diagnose, or treat disease
Personalized medicine is an emerging practice
of medicine and has high chances of growth
Summary diagrams for patient treatment RA and Psoriasis (A), Alzheimer’s disease (B) and
the scheme of future personalized therapy (C)
Future prospective of Personalized Medicine for Each Disease
6. Agriculture
Within the agricultural industry,
bioinformatics has been used to
expand the current understanding of
various plant functions, protect plants
against harmful stressors, and
improve plant quality for human
consumption.
Bioinformatics is playing an increasingly
important role in the collection, storage, and
analysis of genomic data.
Some of the different ways in which
bioinformatics tools and methods are used in
agriculture, which is collectively referred to as
agri-informatics, primarily include the
improvement of plant resistance against both
biotic and abioticstressors and enhancement
of the nutritional quality in depleted soils.
In addition to these purposes, gene discovery
through the use of computer software has
also allowed researchers to develop targeted
methods for the improvement of seed quality,
incorporate added micronutrients into plants
for enhanced human health, and engineer
plants with phytoremediationcapabilities.
7. System Biology And Bioinformatics
Systems biology examines the interactions
between several components rather than the
individual features of the molecules, in order to
understand the phenotype resulting from the
components of the system. To this end,
computational approaches are employed in
systems biology to create possiblein silicomodels
that can also be verified experimentallyin
vivoorin vitro, thus allowing the analysis of a
large number of data. In the study of biological
systems, various computational tools are used
including techniques for sequence alignment and
for recording molecular dynamics, molecular
interactions and discovering or predicting the
molecular structure.
Figure 1.
Hypothesis-driven research in systems biology
systems biology includes the computational analysis on extensive experimental data in the field of
pharmacology, namely systems pharmacology. Systems pharmacology is focused on the study of
drugs, identifying new drug targets, repurposing of existing drugs and analyzing the properties and
effects of known drugs in a systems-level. Addressing the complexity of the cellular networks and
the mode of action of a drug can lead to a better understanding of side effects and adverse events
of a drug and the identification of off-targets, improving the safety and effectiveness of disease
treatment. In the past decade, systems-based applications have proved to gain better insights into
drug-drug interactions, drug-target networks, drug-target interactions, and drug side-effects,
leading to novel drug discovery
8. Genetics and Genomics in Bioinformatics
Genetics isthe scientific study of genes
and heredity—of how certain qualities or
traits are passed from parents to offspring
as a result of changes in DNA sequence. A
gene is a segment of DNA that contains
instructions for building one or more
molecules that help the body work.
Genomics isthe study of all of
a person's genes (the
genome), including
interactions of those genes
with each other and with the
person's environment
Both Genomics and Genetics apply in bioinformatics and computational
technique to generate data from DNA and RNA sequence.B
Figure Schematic illustration of the cases stemming the need for
immunoinformaticsvaccine development approach
9. Immunoinformaticsand vaccine discovery
Immuno-informatics is the
intersection between
experimental immunology and
computational biology.
Here we can study host pathogen
interaction also use to identify functions
of Unknown gene.
FIGURE 1. Flow diagram of design strategy, representing the steps of the construct of the multi-
epitopesubunit vaccine
During Covidthis field has grown rapidaly.
10. Neuroinformatics
Neuroinformaticsrefers to a research field that focuses on organizing neuroscience data
through analytical tools and computational models. It combines data across all scales and
levels of neuroscience in order to understand the complex functions of the brain and work
toward treatments for brain-related illness. Neuroinformaticsinvolves the techniques and
tools for acquiring, sharing, storing, publishing, analyzing, modeling, visualizing and
simulating data.
Neuroinformaticshelps
researchers to work
together and share data
across different facilities
and different countries
through the exchange of
approaches and tools for
integrating and
analysingdata. This field
makes it possible to
integrate any type of
data across various
biological organization
levels.
The benefits of neuroinformaticsinclude:
•Advancement in neuroscience and
improvement in the treatment of several
neurological disorders
•The enhancement of researchers' knowledge.
Neuroinformaticsenables them to understand
the working pattern of some particular
neurological functions by permitting the
researchers to trace some specific functions
inside the computerized models.
•The accomplishment of huge volumes of new
data for creating more sophisticated models
for testing
Sequences and Nomenclature
The nomenclature system we adopt in Bioinformatics work is based on the International Union
of Pure and Applied Chemistry (IUPAC) recommendations. It is useful to follow this
nomenclature system so that data sets from different laboratories situated around the world can
be compared easily and uniformly.
DNA and Protein sequences
Figure.Summary of single-letter code IUPAC
recommendations
Thefirst4basesG,A,T,C,theirsymbols
andthebasisfornomenclatureisclear.
Whiledeterminingsequencedatathrough
experiments,sometimes,thesequence
identityataparticularpositionmaynotbe
clearlyidentifiableduetocompression
artifactsorothersecondarystructurerelated
problems.Inmostcasestheproblemcanbe
solvedbyrepeatingtheexperimentandalso
bysequencingthecomplementarystrand.In
afewcases,ambiguitiesmaypersist.In
suchcases,themostprobableresultsare
inferredfromthechromatograms.
Forinstance,atapositionwherethe
ambiguityisnotresolvablebetweena'G'or
a'C'butonecanbesurethatthereisno
possibilityof"A'or'T'inthesameposition,
thenthesymboltobeusedis'S'.
In most organisms, DNA is present as double stranded. The two strands are anti-parallel and
complementary to each other (following Watson-Crick base-pairing). However, the problem
arises when we start encountering the symbols that mean more than one base at a given
position. Again, the IUPAC system comes to aid. The symbols to be used in the
complementary strand corresponding to the symbol at the same position in a given strand
are specified in. In certain cases, the complementary symbols are same as in the given
strand because in both cases they mean the same set of bases.
Figure.Definition of complementary symbols
The symbols and their meaning for the protein sequences are presented in.It is
evident that the number of symbols that mean more than one amino acid is very
few.
Figure. Symbol definitions for the amino acids.
BIOTECHNOLOGY INFORMATION SYSTEM
NETWORK (BTISnet)
•It is a National Bioinformatics Network.
•India is the first country in the world to establish in 1987 a Biotechnology Information
System (BTIS) network to create an infrastructure of biotechnology through the
application of Bioinformatics.
•The Department of Biotechnology (DBT), Ministry of Science and
Technology,Governmentof India has taken up this infrastructure development project and
created a distributed network at a very low cost.
•BIOTECHNOLOGY INFORMATION SYSTEM NETWORK Runs by Department of
Biotechnology, Government of India
•BTIS is today recognized as one of the major scientific network in the world dedicated to
provide the-state-of-the-art infrastructure, education, manpower and tools in
bioinformatics.
Need for BTIS
•Research and Development activities in Modern Biology and Biotechnology are
very much information-dependent fields.
•Growth of biotechnology has accelerated particularly during the last decade
due to path breaking advancements in biology and new technologies that
produce large high quality data.
•The rate of growth of these data has been estimated to be more than 200
million bases per year.
•The content of the database itself is doubling insizeapproximately every year.
The large amounts of data generated through various forms are serving as a
source of knowledge to thescientistsengaged in the field of Biotechnology.
•The analysis of such large data and extraction of knowledge from this data is
possible only with thehelpof new algorithms and compute intensive
The broad objectives of Biotechnology Information System Network programmeare:
•To provide a National bioinformationnetwork designed to bridge the inter-disciplinary gaps on
biotechnology information and establish link among scientists in organizations involved in R&D
and manufacturing activities in the country.
•To build information resources, prepare databases on biotechnology and to develop relevant
information handling tools and techniques.
•To continuously assess information requirements, create and improve necessary infrastructure
and to provide informatics based support and services to the national community of users
working in biotechnology and allied areas.
•To coordinate efforts to access Biotechnology information worldwide including establishing
linkages with some of the international resources of Biotechnology information (e.g. Databanks
on genetic materials, published literature, patents, and other information of scientific and
commercial value).
•To undertake research into advanced methods of computer-based information processing for
analyzing the structure and function of biologically important molecules.
•To evolve and implement programmeson education of users and training of information
scientists responsible for handling of biotechnology information and its applications to
biotechnology research and development.
•To establish regional and international cooperation for exchange of scientific information and
expertise in biotechnology through the development of appropriate network arrangements.
Resources
•Databases of BTISNetTRABASby University of
Calcutta, Kolkata
•AgAbDbby University of Pune, PunePDB
Goodies by Indian Institute of Science,
•Bangaloreetc.OpenSource Databases-The
Gene Ontology (GENOME)DIANA LAB (RNA)-
Protein Data Bank/PDB (Protein DBS)etc.
•SoftwaresGene Ontology Based Prediction
Analysis of MicroArraySuite/GOPAMSpectral
Repeat Finder/SRFetc.
BTIS Centresin India
•Centresof Excellence –
•Bioinformatics Centre. DBT,NewDelhi:
•University of PunePune
•Jawaharlal NehrUniversity (JNU) New Delhi
•Madurai KamarajUniversity (MKU) Madurai,
•Indian Institute of Science (BSc), Bangalore:
•Bose Institute, Kolkata: Super Computing Facility
•(IIT) New Delhi Distributed Information Centres(DICs) -11Anna University Centre
for Cellular & Molecular Biology, Indian Agricultural Research Institute, Institute of
Microbial Technology, Kerala Agriculture Unversity.
•M. S. UniveristyofBaroda, National Brain Research Centre, National Iristauteof
immunology, North EasternHillUniversity(Shillong), Pondicherry University,
University of Calcutta Distributed Information Sub Centres(SubDICs)-51Institute of
Life Sciences Bhubaneswar, Indian Institute of Chemical Biology.KolkatIndian
Institute of Technology. Kharagpuretc.BioinformaticsInfrastructure Facility (BIF) for
Biology Teaching Through Bioinformatics (BTBI)-70Vidyasagar University. Midnapur,
West Bengal West Bengal University ofKolkata, West Bengal etc.Technology,North
Eastern State-Bioinformatics Infrastructure Facility (BIF) -28Institute of Advanced
Study in Science and Technology. Ryan Path GuwahatManipur University Canchipur
Manipur etc
Achievements:
According to last 5 five-year data (2015-20), the Network has published more than 1200
research articles and created 200 databases and carried out the training of more than 8000
personals including students and scientists. Some of the most cited web-servers developed
by the network are VirulentPred, PredictBias, Bhageerath, Sanjeevini, ChemGenome2.0
and CylinPredetc. Some of the key highlights (for the period 2015-20) are listed below:
Sr. No. Software type Numbers
1 Databases 206
2 Web servers 72
3 Standalones (Databases: 5;
Others: 35)
40
4 Applications Developed7
Total Software developed325