15-Jul-24 1 SYNOPSIS INTRODUCTION DEFINITION OF BIOINFORMATICS HISTORY OBJECTIVES OF BIOINFORMATICS TOOLS OF BIOINFORMATICS BIOLOGICAL DATABASES HOMOLOGY AND SIMILARITY TOOLS (SEQUENCE ALIGNMENT) PROTEIN FUNCTION ANALYSIS TOOLS STRUCTURAL ANALYSIS TOOLS SEQUENCE MANIPULATION TOOLS SEQUENCE ANALYSIS TOOLS APPLICATION CONCLUSION REFERENCES
INTRODUCTION Bioinformatics is a newly emerged scientific discipline for the computational analysis and storage of biological data. The word bioinformatics has been derived from two words. Bio means biology Informatics (a French word) meaning ‘data processing’. Bioinformatics is the field in which biology, computer science and information technology merge into single discipline for managing and analyzing biological data using advanced computing techniques. 15-Jul-24 2
DEFINITION Keeping in view all the facts, bioinformatics can be defined as the storage, analysis, and searching/retrieval of data(e.g. nucleic acid sequences for the genes and RNAs, amino acid sequence and structural information of protein). Fredj Tekaia at the Institute Pasteur, Paris (France) defined bioinformatics more precisely as the mathematical, statistical and computing methods that aim to solve biological problems using DNA and amino acid sequences, and related information. 15-Jul-24 3
HISTORY YEAR’S SCIENTIST HISTORICAL EVENTS 1958 Jack Kilby The first integrated circuit (IC) was constructed. 1971 Ray Tholinson The e-mail program was invented. 1974 Vint Cerf & Robert Khan The concept of connecting network of computer into an “internet” and develop the Transmission Control Protocol (TCP) was developed. 1981 PC IBM introduces its Personal Computer to the market. 1984 The Macintosh was announced by Apple Computer. 1986 SWISS-PROT The SWISS-PROT database was created by the Department of Medical Biotechnology of the University of Geneva and the European Molecular Biology Laboratory (EMBL). 1987 HGI (Human Genome Initiative) NIH NIGMS begun funding of genome projects. 1990 BLAST The BLAST program is implemented. 1991 Birth of term “Bioinformatics” First time the term Bioinformatics appeared in the scientific literature. 15-Jul-24 4
OBJECTIVES OF BIOINFORMATICS At its simplest and basic level, bioinformatics organizes data in a way that allows researchers to access existing information and to submit new entries, as produced (e.g.) the protein data Bank for 3D macromolecular structures. The second key objective is to develop tools and resources that aid in the analysis of data. For example, having sequenced a particular protein, it is of interest to compare it with previously characterized sequences. The third objective is to use these tools to analysis the data and interpret the results in a biologically meaningful manner. Traditionally, biological studies examined individual systems in detail, and frequently compared them with a few that are related. 15-Jul-24 5
TOOLS OF BIOINFORMATICS These are software programs that are designed for extracting the meaningful information from the mass of molecular biology/biological databases and to carry out sequence and structural analysis. After the formation of the databases, tools become available to search sequences databases. The bioinformatics tools can be categorized in to the following categories: Biological databases Homology and similarity tools (Sequence alignment tool) Protein function analysis tools Structural analysis tools Sequence manipulation tools Sequence analysis tools 15-Jul-24 6
BIOLOGICAL DATABASES This biological database usually contain genomic, proteomic and metabolic data. The data include nucleotide sequences of genes or amino acid sequences. Some of the major biological database are: Major Nucleotide Sequences Database. Major Mutation Databases. Major Gene Expression Databases. Major Microbial Genomic Databases. Major Organism Specific Genome Database. Major protein Database. EMBL (European Molecular Biology Laboratory nucleotide sequence database at EBI, Hinxton, UK) NDB (Nucleic Acid structure Database at Rutgers University, USA) Entrez/Genome (NCBI, USA) 15-Jul-24 7
HOMOLOGY AND SIMILARITY TOOLS Homologous sequences are sequences that are related by divergence from a common ancestor. Thus the degree of similarity between two sequences can be measured. This set of tools can be used to identify similarities between novel query sequences of unknown structure and function and database sequences whose structure and functions have been elucidated. 15-Jul-24 8
1.BLAST (BASIC LOCAL ALIGNMENT SEARCH TOOL) It is a program for sequence similarity searching developed at the NCBI. It identifies genes and genetic features. It executes sequences searches against the entire DNA database in less than 15 seconds. A BLAST search enables a researcher to compare a query sequence with a database of sequence and identify database sequence that resemble the query sequence. 15-Jul-24 9
2.FASTA (FAST-ALL) FASTA is a DNA and protein sequence alignment software package. It is used for a fast protein or fast nucleotide comparison. This program achieves a high level of sensitivity for similarity searching at high speed. 15-Jul-24 10
PROTEIN FUNCTION ANALYSIS TOOLS These groups of programs allow comparing protein sequence to the secondary protein databases that contain information on motifs, signatures and protein domains. Interproscan Search protein sequences. PPSearch Searches protein motifs. Radar Protein repeats detection . 15-Jul-24 11
STRUCTURAL ANALYSIS TOOLS This set of tools allows comparing structures with the known structures databases. The determination of a protein’s 2D/3D structure is crucial in the study of its functions. RasMol It is a powerful research tool to display the structure of biological macromolecules like DNA, proteins and smaller molecules. PROSPECT( PROtein Structure Prediction and Evaluation Computer Toolkit) It is a protein structure prediction system that employs a computational technique called protein threading to construct a protein 3-D model. COPIA(Consensus Pattern Identification and Analysis) It is a protein structure analysis tool for discovering motifs in a family of protein sequences. Such motifs can then be used to determine membership to the family of new proteins sequences, predict secondary and tertiary structures and functions of proteins. 15-Jul-24 12 TOOLS OF BIOINFORMATICS
SEQUENCE MANIPULATION TOOLS These are software programs for analyzing and formatting DNA and protein sequences. RepeatMasker It is a program that screens the DNA for interspersed repeats. Webcut It is an online tool for restriction analysis, silent mutation analysis, and SNP analysis. Translate It is a tool which allows the translation of a nucleotide sequence to a protein sequence. 15-Jul-24 13
SEQUENCE ANALYSIS TOOLS This set of tools allow to carry out further more detailed analysis of query sequence including evolutionary analysis, identification of mutation. Align This tool is used to compare two sequences. DNA Scanner It is a tool that scans DNA for number of different properties such as biophysical, potential for protein interaction. 15-Jul-24 14
APPLICATION Some of the applications related to biological information analysis are : Bioinformatics is used in primer design. Bioinformatics is used to attempt to predict the function of actual gene products. Molecular modeling/structural biology is a growing field which can be considered part of bioinformatics. There are other fields- for example, medical imaging/ image analysis, that might be considered part of bioinformatics. there is also a whole other discipline of biologically inspired computation: genetic algorithms, AI, neural networks etc. 15-Jul-24 15