Bioinformatics

8,478 views 130 slides Jun 12, 2020
Slide 1
Slide 1 of 130
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27
Slide 28
28
Slide 29
29
Slide 30
30
Slide 31
31
Slide 32
32
Slide 33
33
Slide 34
34
Slide 35
35
Slide 36
36
Slide 37
37
Slide 38
38
Slide 39
39
Slide 40
40
Slide 41
41
Slide 42
42
Slide 43
43
Slide 44
44
Slide 45
45
Slide 46
46
Slide 47
47
Slide 48
48
Slide 49
49
Slide 50
50
Slide 51
51
Slide 52
52
Slide 53
53
Slide 54
54
Slide 55
55
Slide 56
56
Slide 57
57
Slide 58
58
Slide 59
59
Slide 60
60
Slide 61
61
Slide 62
62
Slide 63
63
Slide 64
64
Slide 65
65
Slide 66
66
Slide 67
67
Slide 68
68
Slide 69
69
Slide 70
70
Slide 71
71
Slide 72
72
Slide 73
73
Slide 74
74
Slide 75
75
Slide 76
76
Slide 77
77
Slide 78
78
Slide 79
79
Slide 80
80
Slide 81
81
Slide 82
82
Slide 83
83
Slide 84
84
Slide 85
85
Slide 86
86
Slide 87
87
Slide 88
88
Slide 89
89
Slide 90
90
Slide 91
91
Slide 92
92
Slide 93
93
Slide 94
94
Slide 95
95
Slide 96
96
Slide 97
97
Slide 98
98
Slide 99
99
Slide 100
100
Slide 101
101
Slide 102
102
Slide 103
103
Slide 104
104
Slide 105
105
Slide 106
106
Slide 107
107
Slide 108
108
Slide 109
109
Slide 110
110
Slide 111
111
Slide 112
112
Slide 113
113
Slide 114
114
Slide 115
115
Slide 116
116
Slide 117
117
Slide 118
118
Slide 119
119
Slide 120
120
Slide 121
121
Slide 122
122
Slide 123
123
Slide 124
124
Slide 125
125
Slide 126
126
Slide 127
127
Slide 128
128
Slide 129
129
Slide 130
130

About This Presentation

As an interdisciplinary field of science, bioinformatics combines biology, computer science, information engineering, mathematics and statistics to analyze and interpret the biological data.


Slide Content

B) BIOINFORMATICS

1. Introduction : Importance and Scope

IMPORTANCE It is an interdisciplinary subject, where three subjects Biology, Computer science and Information technology compain or merge together to form the new disciplin ….. Bioinformatics. OR Bioinformatics is a branch of biology which deals with very fast, accurate and logical analysis of biological data’s and information for interpretations and predictions by making use of computational techniques. ( Margaret Day Hoff ) DEFINITION Bioinformatics, n. The science of information and information flow in biological systems, esp. of the use of computational methods in genetics and genomics. (Oxford English Dictionary) "The mathematical, statistical and computing methods that aim to solve biological problems using DNA and amino acid sequences and related information." -- Fredj Tekaia

SCOPE Better documentation, store large quantity of data and addition, documentation, delition of data are also possible. Design and discovery of drugs. Considering genomic structure of pathogens and chemical structure of drugs. Study based on the important biomolecules protein and nucleic acid. PROTEIN: Structural and functional unit. NUCLEIC ACID: Hereditary determining path. Bioinformatics is the comparison based on the already available details of protein and nucleic acid. Very easy to search and access information. Fast, accurate, logical analysis. Interpretation and prediction.

Applications Comparison Comparison of nucleic acid and protein sequence. It provides similarities and differences between the sequence of protein and nucleic acids. Two type analysis is there 1) Structural analysis 2) Functional analysis Get structural details Get functional details Molecular level of classification of organism are possible by using bioinformatic tool. Classification by comparing sequences by their similarities and differences of protein as well as nucleic acid sequences and there by relationship of both nucleic acid and protein. In taxonomy morphological, enzymatic analysis and comparisons are only occur but for obtaining accurate level analysis molecular level analysis requires. Comparison of protein and nucleic acid helps to, Classification of protein Classification of nucleic acid Classification of individual Evalutionary relationship between organism

2) Gene finding Using bioinformatic gene finding easy. Nucleic acid is the expression product of genes. By finding the nucleic acid sequences, helps to identify the gene responsible for certain characters. Eg : gene responsible for yeild improvement Gene finding has application in crop improvement such as resistance to insect, disease, drought, salinity etc. higher yeild . In agricultural and medical field – useful in comparison of normal one with diseased one. In medical field, to find out the gene responsible for genetic disorders and rectify in embryo and patient level by compairing normal with diseased one. By Embryo therapy : at embryo level or rectify in sperm/egg Patient therapy : rectify at particular cells or nucleic acid

3) Protein structure prediction Comparison of protein structure with protein structure database. By knowing protein structure, find out the final activities, their influence in physiological and metabolic pathway of an organisms & also related growth of organisms via knowing protein structure. Find out the disease pathway; by identifying defective protein and defective gene. By identifying protein coding gene, helps to cure genetic disorders. NMR technique, X- Ray diffraction technique is used for identifying protein structure. But it is very expensive and time consuming methods. Inted of there 2 method bioinformatics are applicable, very easy, less expensive and time saving method. Very short time required for structure prediction. Discovery of near noval protein using bioinformatics inserted of NMR and X- Ray diffraction technique, which is used in several field, drug discovery and pharmaceutical etc. By knowing protein structure we can synthesis biologically valuable synthetic enzymes.

4) Evalutionary relationship study By structural genomics, functional genomic and comparison genomics. 5) Construction of biological data bases Construction of data bases is a part of coming under better documentation. Depending up of type and kind of information, different type of databases are there. DATA BASE: area or spaces where informations are stored in electric format. Different type of data bases are present, based on the information containing ( information about protein/ nucleic acid) Eg : EMBL, Gene bank 6) Total genomic structural study of an organism Helps to species identification.

7) Used in environmental cleaning up programme By gene finding: scope for bioremediation. Eg : In oil spils we use psuedomonas putrida to decrease the effect of hydrocarbons in oils. Plasmid – degrade hydocarbon – total oil degrade Improve and modify individual useful for bioremediation 8) Creation of bio weapon By gene finding near future bio weapons are used for Eg : different disease causing microorganism identify and used as weapon.

2. Biological Databases

a) Nucleic Acid Databases EMBL, Gene Bank – Structure of Gene Bank entries. Specialized genomic resources. UniGene

EMBL Nucleotide sequence data base It is developed by EBI ( European Bioinformatic Institute in UK) European Molecular Biology Laboratory  ( EMBL ) It collect information from different sources such as * Genome sequencing projects * Scientific literature * Direct auther submission It associated with Gene Bank, DDBJ, for exchanging information each other. So we can see comprehensive collection of information. Its growth rate is very fast, double the information in 9-10 months. It divided in to many subdivisions. The Laboratory operates from six sites: the main laboratory in  Heidelberg , and outstations in  Hinxton  (the  European Bioinformatics Institute  (EBI), in England),  Grenoble  (France),  Hamburg  (Germany),  Rome  (Italy) and  Barcelona  (Spain). EMBL groups and laboratories perform basic research in molecular biology and molecular medicine as well as training for scientists, students and visitors.  Informations are accessing through SRS system. SRS: SEQUENCE RETRIEVAL SYSTEM The first systematic genetic analysis of embryonic development in the fruit fly was conducted at EMBL by  Christiane Nüsslein-Volhard  and  Eric Wieschaus , [13]  for which they were awarded the  Nobel Prize in Physiology or Medicine  in 1995. In the early 1980s,  Jacques Dubochet  and his team at EMBL, developed  cryogenic electron microscopy  for biological structures. It was rewarded with the 2017  Nobel Prize in Chemistry . URL Address : ( Uniform Resource Location) http://w.w.w.ebl.uk/embl/

GENE BANK It is a primary nucleiotide sequence biological data base. Full form Gene Bank Developed by NCBI (National Centre for Biotechnology Information) Less restriction AIM: Helps the scientific and research community in order to support their research activity that contain information without restrictions except copy right sequence and patent sequence. Growth rate : 1 months; with in one month double the informations . Information's are divided into 17 divisions for getting information easily. There are 17 divisions to make convinient & efficient informations in Gene Bank. 2 Retrieval system: 1) Entrenz Integrated Retrieval system : It have a capacity to link with nucleotide sequence db with protein sequence db. 2) MEDLINE Facility: useful to get information of abstract of originally bublised papers related to nucleiotide sequences. http://w.w.w.ncbi.nlm.nih.gov/genebank

Gene Bank incorporates information from # publish available sources # primarily from direct author submissions # large scale sequencing project To help ensure comprehensive coverage, the resource exchanges data with both the EMBL data library and DDBJ.

Structural Entities The Structure of Gene Bank Entries A Gene Bank release includes the sequence files, indices created on various databases fields and information derived from the databases. Gene Bank was made availabe on CD-ROM It is convenient machanism for widespread. Relatively inexpensive distribution As the size of the database, large no.of CD required and dificult to handle for the producers and for the users. Today Gene Bank is available in FTP format. Commonly used is the sequence entry file which contains the sequence itself and disruptive information relating to it. Each entry consist of no. of keywords,relevent associated sub-keywords and an optional features.

The structure of gene bank entries consist of 13 structural components: LOCUS DEFINITION ACCESSION NUMBER VERSION KEYWORDS SOURCE ORGANISM REFERENCE AUTHOR TITLE JOURNEL PUB MED NO REMARK/COMMENT

1) LOCUS : we need to provide an entry number (identification for nucleiotide sequence) [ NM- 000555- mRNA- Tuesday, 21.7.2018] (entry no.) (Type of sequence) (day ) ( day.M.year ) 2) DEFINITION : scientific name of source organism. Eg for Bt gene: Sequence entering there and expresssion product. scientific name: Baccillus thuringenesis , mRNA, β endotoxin . ACCESSION NUMBER : normallysimilar to entry number. [NM: 000555] VERSION: if we want to update information we first write entry No. and version No. and also gene information Id No. along with it. [NM: 000555.5.G: Id No 12345] KEYWORDS: we must provide the key word of our work, if no key word put a dot. Eg : Insert resistivity . 6) SOURCE : name of source of organism which we get, we must write common name . source organism: Bacteria 7) ORGANISM : name of source of organism, we must write scientific name. scientific name of source of organism: Bacillus thuringenesis

8) REFERENCE: reference of that paper published related to enter the nucleotide sequence of interest. 9) AUTHOR : we need to enter the name of author in the same order as in the same order as in the case of published. 10) TITLE : title of the paper 11) JOURNEL : name of the journel where you have publishd the paper. 12) PUB MED NO : this is the no. which helps to access the archived published paper with in PUB MED( scientific journel archiver ). 13) REMARK/COMMENT : we can enter, biological importance/ expression/changes/source organism as comment.

Specialized genomic resources The purpose of specialized resources is to focus on species - species genomics and to particular sequencing techniques. The particular aim of such a data base is the integrated view of a particular biological system. a) UniGene  * The collection represents genes from many organisms and each cluster relating to a unique gene and including related information corresponding to the gene. * A valuable role of UniGene is in gene discovery. * UniGene is also used for gene mapping projects and large scale gene expression analysis.

b)TDB — The TIGR Database * These databases containing DNA and protein sequence, gene expression, protein family information etc. * Also the data such as taxonomic range of plants and humans, role of cellular components are also present. c) SGD ( Saccharomyces Genome Database) * SGD is an online data resource which contain information on the molecular biology and genetics of S.cerevisiae (Budding yeast). * This data base provides internet access to the genome, its genes and their products etc. * SGD helps the research field by uniting together functions to perform sequence similarity search tools.  * The illustration of genetic maps using dynamically created graphical displays make the data base user friendly.

UniGene It is an specilized genomic resources. There are the databases, which tend to be linked, to some extend, with the primary DNA databases from which they may derive their data and into which their results are usually fed. Purpose of specialized genomic resource 1) to species-specific genomics 2) to particular sequencing technique Primary goal of human genome project is to determine the complete sequence of human genome.93 billion base pairs) 3% of the genome encodes protein. Biological significance of remainder is unknown

A transcript map is a vital resource in flagging there parts of the genome that are actually expressed. Unigene attempts to provide a transcript map by utilising sets of non-redundant gene-oriented clusters derived from genebank sequence. The collection represents gene from many organisms each cluster relating to a unique gene and including related information., such as the type in which the gene is expressed, map location etc.

b) Protein Sequencing Databases PIR SWISS-PROT TrEMBL Composite Protein Databases NRDB OWL Secondary Databases PROSITE PRINTS BLOCKS IDENTIFY

SWISS-PROT Protein sequence database Switzerland based database. SWISS-PROT is an annotated protein sequence database established in 1986 and maintained collaboratively, since 1987, by the Department of Medical Biochemistry of the University of Geneva and the EMBL Data Library. It is a curated protein sequence database, which strives to provide a high level of annotation (such as the description of the function of a protein, its domain structure,  posttranslational modifications , variants, source and organisms) a minimal level of redundancy, and a high level of integration with other databases. SWISS-PROT contains the information about the name and origin of the protein, protein attributes, general information, ontologies , sequence annotation,  amino acid sequence , bibliographic references, cross-references with sequence, structure and interaction databases, and entry information.

It is maintained collaboratively by the  Swiss Institute for Bioinformatics  (SIB) and the European Bioinformatics Institute (EBI). The SWISS-PROT group is headed by:  Rolf Apweiler . It contains non-redundant sequence entries and informations are thoroughly revealed and annotated. Provide protein sequence to students researchers and other related industries like pharmasutical industries. SWISS-PROT aims to be minimally redundant and is interlinked to many other resourses . Linked with other databases EMBL and TrEMBL .

TrEMBL It is primary protein sequence database Translated EMBL A protein sequence database of nucleotide translated sequences. Created in 1996 as a computer annotatd suppliment to SWISS-PROT This is complete annotated protein sequence databases. There databases is constructed via translatingeach nucleiotide sequence that are available in EMBL in to protein sequence by using computational techniques. The TrEMBL sequence database contains the translations of all coding sequences (CDS) present in the DDBJ/EMBL/ GenBank Nucleotide Sequence Database and also protein sequences extracted from the literature or submitted to SWISS-PROT, which are not yet integrated into SWISS-PROT.

TrEMBL consist two divisions: SP TrEMBL REM TrEMBL It is an temporary storing area where incomplete sequence have not yet manually annotated.when it is fully discribed contains entries that well eventually be incorporated in to SWISS-PROT. TrEMBL developed by EBI It contains completely explained and fully annotated sequences. Contains sequences that are not destined to be included in SWISS-PROT Eg : # immunoglobulins & t cell receptors. # fragments of four than eight amino acids # synthetic sequences # patented sequences

PIR Primary protein sequence data base. Protein Information Resource[1960] Developed by Margaret Dayhoff in 1960 as a collection of sequence for investigating evolutionary relationships among proteins. Developed at the National Biomedical Research Foundation ( NBRF) The databases is split into 4 distinct sections. Based on kind of informations level. PIR-1, PIR-2, PIR-3, PIR-4 They differ in the terms of # quality of data # level of anotation provided.

PIR-1 Contains fully classified and annotated. PIR-2 Includes preliminary entries, which have not been throughly reviewed and may contain redundancy PIR-3 Contains unverified entries, which have not been reviewed. PIR-4 Contains protein sequences that are not geneticallly encoded and not produced on ribosomes . So they are synthetic protein sequences.

Composite Protein Databases These are the amalgamation or compilation of product of different primary databases. Makes searching easy and efficient for a searcher. They render sequence searching much more, because they obviate the need to interrogate multiple resources NRDB OWL

NRDB- Non-Redundant Data Base It is built localy at NCBI Combination of 6 primary DB SWISS-PROT PDB PIR Gen pept Gen pept update SP update Non-redundant & error free But if strictly speaking chance of redundency and error When redundency and error and incorrect sequence are present in any component DB. As such they where incorporated in to NRDB, especially in SWISS-PROT. Make more efficient via, avoiding to search to too much DB for getting related information.

OWL- Ontology Web Language Web ontology language Compilation of 4 primary DB Gene Bank SWISS-PROT NRL-3D PIR-4 Make searching more efficient via, avoiding or obivating too much DB for getting related information Developed by NCBI If any redundency in Gene Bank, it is as much incorporated into OWL during amalgamation. Development of university deals –UK in association with Daresburg laboratory in warrington 1994 The sources are aligant on the basis of level of annotation and sequence validations SWISS-PROT has the highest priority OWL is only released on a 6-8 weekly basis .

Secondary Databases PROSITE PRINTS BLOCKS IDENTIFY It contains the fruits of analysis of sequences in the primary sources Simply secondary data were derived from primary These are db which are analysed primary databases, which from secondary data. These are several different primary db & a variety of ways of analysing protein sequences.

PROSITE First secondary DB to have been developed was PROSITE Generate its information from the primary data base SWISS-PROT Produced and maintained by SIB Relesed date : 1988 by amosbiroch URL Address: http://www.prosite.expasy.org . It categorises the protein sequences in families. Proteins are grouped into different family. Based on the single most conserved Motif. Motif: it is a ring of aminoacid (10-20 amino acid sequences)they are responsible for protein function and preserves its 3D structure. Such Motifs usually according key biological function. Eg : enzymes active site, ligand or metal binding site Motif indicate or represent charecteristic features or site for each family. The region act as signatures of particular protein family and help to identify the other newly members of family PROSITE is developed a largely manual process of seeking the patterns that best fit particular families and functions.

In PROSITE entries are developed in two different files First of this pattern and list all matches in the new version of SWISS-PROT Documentation file provide: # details of characterized family # discription of biological molecule of choosen Motif # supporting biografy SIGNIFICANCE To find families based on Motif, ie ; presence of motif the same portion of many sequence are considered a single family. Fat functional charecterization and annotation of protein sequences. Identify possible functions of newly discoered protein and analyses of protein for previously unditermined activity Offers tool for protein sequence analyses and Motif detection It is a part of expasy proteomics analysing server APPLICATION Classification of protein is possible based on the highest conserved motif Based on particular motif can identify the charecteristic features of motif and representing character. Eg : the structural and functional details if that proteins

PRINTS Collect information from OWL in future. It will collect information from SP, TrEMBL and SWISS-PROT Information deriving process from OWL is called interactive data base scanning. Contributed by SIB In 1999 it was maintained in the department of biochemistry and molecular biology at university college London (UCL). http://www.bioinf.man.ac.uk/db browser/ bioactivity/ protein 2 frm . html. Here we need to consider multiple Motif. Insert to single common Motif. Helps to find out the more similar sequence. So clear information are available. More accurate analyses is possible based on similar multiple motif sharing by sequences.

BLOCKS Multiple Motifs based database Ungaped multiple alignment of Motifs  Database contains informations on blocks Highly conserved multiple motifs are arranged without any gap  Developed by Henikoff   1998  Automatically derived database Database constructed by using automated PROTOMAT system. Ultimately encoded as ungapped local alignments are calibrated against SWISS-PROT to obtain a measure of the likelihood of a chance match  Two scores are noted for each block : first denotes at the level at which 99.5 percentage of matches are true negative.  Second median value of the true positive scores . The median standardized score for known true positive matches is termed strength . Because the database is derived by fully automatic methods, The blocks are not annotated but links are made to the corresponding PROSITE family documentation file .

These information are derived from the secondary database  PRINTS & PROSITE it can also called as tertiary database . It is based on protein families contained in PROSITE, at Fred Hutchinson Cancer Research Centre (FHCRC). The motifs or BLOCKS are created by automatically detecting the most highly conserved regions of each protein family. The blocks are ultimately and encoded as  Ungappped local or multiple alignment.  Structure of BLOCKS entries: Where each block is identified by a general code (ID) line and accession number. ID line indicates the type of discriminated to expect in the life. AC line indicates the minimum and maximum distance of the blocks from its preceding neighbour . DI line contains the descriptions for a title of the family. BL line indicates the diagnostic power (amino acid triplet, number of sequence it contains) 

IDENTIFY Another automatically derived tertiary source  Derived from BLOCKS and PRINTS Developed in the department of biochemistry at stanford university by  Navill  - Manning et al 1998  Constructed on the basis of  e-motif  e-motif : it is a  based on the similarities of highly conserved Motif sequence. This database is constructed on the basis of generalised expressions of similarities between highly conserved Motif sequences. It is designed to be more flexible band exact regular expression matching. They are accessible for use the protein function web server from the biochemistry department at stanford sets and their properties are used in e-Motif.

Structure Classification DataBases Many proteins share structural similarities, reflecting, common evolutionary origins  SCOP CATH

SCOP  Structural Classification Of Proteins  It is maintained under MRC laboratory of molecular biology and centre for protein engineering.  Which describes structural and evolutionary relationships between proteins of known structure 1995.  It is helpful for at the multi domain level and individual domain level.  It is constructed using a combination of manual inspection and automated methods.  The information of structure of protein is available due to the  Checking done with automatic and manual method result would be more accurate.

  Scope Classification proteins are classified in a hierarchical fashion to reflect their structural and evolutionary relationships. In this protein structures are assigned in a hierarchical order at three levels: Family  Super family Fold  Family proteins are clustered into families with clear evolutionary relationship if they have sequence identify more than 30 percentage sequence similarity  Super family proteins are placed in super families when in spite of low sequence identify their structure  structure and functional characteristics suggest a common evolutionary origin. Fold proteins are classified as a common fold is have the same major secondary structures in the same arrangement and with the same topology  Scope is accessible for keyword via MRC laboratory webserver http://www.bioinf.man.ac.uk/db browser/ bioactivity/ structure frm . html 

CATH Class  Architecture  Topology   Homology It is a hierarchy in classification of protein structures maintained at University College of London (UCL) 1997.  The resource is largely derived using automatic methods but manual inspection is necessary word automatic methods, fail. Developed by UCL's biomolecular structure and protein modelling unit.  Used for classification of protein structure. There are five levels within the hierarchy. A) CLASS   Is derived from gross secondary structure content and packing of protein.  four classes of domain are recognised , 1. SUBCLASS 1 2. SUBCLASS 2 3. SUBCLASS 3 4. SUBCLASS 4 Sub class 1: mainly  similarities in alpha helix  Sub class 2:   similarities in beta sheet Sub class 3: alpha - beta which includes both alternating alpha /beta and alpha + beta structures  Sub class 4: based on  secondary structure content for element secondary structural element contents will be very less in amount 

B)ARCHITECHTURE  Describe the gross arrangement of secondary structure ignoring the connectivities . C) TOPOLOGY both the overall shape and the connectivity of  Secondary structures protein D) HOMOLOGY share more than  35 percentage sequence identity  and share a common and sister (homologous )similarities are first identified by sequence comparison and and structure comparison algorithm E) SEQUENCE # Final level in the hierarchy. # Structures with homology groups are further clustered on te basis of sequence identify. # domains have sequence identifies more than 35 % indicating highly similar structures and functions CATH is as accessable keyword via UCL’s biomolecular structure and modelling unit web server.

3.Data Base Searching

A) Sequence Data Base Searching EST searches Different approaches to EST analysis Merck/IMAGE Incyte TIGR EGAD EST analytical tools Sequence similarity Sequence assembly and Sequence clustering

EST searches Expressed Sequence database. EST data are held in the EST database. EST sequence tag are also called gene transcripts. Which maintains its own format and identification number system. Expression tag sequence is a short sequence . Short nucleotide sequence produced from CDNA mRNA- reverse transcriptase enzyme- single stranded DNA. A typical EST will be between 200 to 500 basis in length, with modern technical advances increasing the theoretical length resulting from a single run 1000 bases are more It is called genes transcript and parcel sequences and series are noisy sequences that, as a result of sequences errors, may not only contain have ambiguous bases but also be missing bases.

In analysing EST’s, the following points should: The EST alphabet is five characters ACGTN. EST will be sum sequence of any other sequence in the database  EST may not represent part of the series of CDS of any gene . EST production is highly automated and results often contaminated with ambiguous are missing bases. This course difficulties in sequence interpretation.  Uses Identification of particular gene Mapping of genes within a genome by using a small stretch of sequence Identification of species For academic analyses or commercial exploitation have been developed  

Different approaches to EST analysis These are the EST’s information providing sources. Where is approaches to establishing libraries of EST’s for academic or commercial exploitation have been developed. Much of the publicity available data are collected together into the EST sections of the year EMBL data library and Gene Bank (db EST) Merck/ IMAGE Incyte TIGR EGAD

Merck/ IMAGE It is a research project was run by the university of washington and funded Merck and company. In 1994 , Merck and co-founded a research project based at the university of washington to sequence 300000 EST’s from a variety of normalised libraries. AIM: To produce 3 lakh EST’s from CDNA libraries. For many years Merck has sponsored the production of a drug index. Approaches of the sources To support academic analysis Commercialization of EST information to drug production The drug index is known as Merck Gene Index as of May 1997, A,84,421 EST’s had been submitted by the project to dbEST

Incyte It is a pharmaceutical company Incyte pharmaceutical Inc. It produces a database Life Seq , that enphasises the quantitative information derived by sequencing strand CDNA libraries. AIM To provide/collect information on relative copy numbers of genes in healthy and deseased tissue. To facilitate the elucidation of potential therapeutic targets. APPROACH Commercialization of genomic information regarding EST’s of healthy and diseased cells. Then it give to the therapeutic targets. Production of drugs for getting money In april 1998, the size of Life Seq was 2.5 million EST’s representing 8000 to 12000 different genes.  

TIGR The Institute for Genomic Rsearch . It is a government organisation . It purely stands for academic purposes . It is a research organisations with interest in structure, functional and comparative analysis of genomes and gene products . The range of organisms covered includes viruses,  Eubacteria ,pathogenic bacteria , archaebacteria and eubacteria (plant and animal) AIM Preparation of Human Gene Index (HGI). This index integrates results from human genome research projects around the world including that from db EST and Gene Bank. To create a non redundant view of all human genes and informations on their expression pattern cellular roles , functions and evolutionary relationship.  Data in HGI are freely available.  TIGR sequence more than   100000 EST’s from over 300 CDNA libraries + data from db EST + non redundant Human Transcript Information using the technique of sequence assembly, to generate Tentative Human Consensus ( THC) sequences .

EGAD Expression Gene Anatomy Database It is database providing information of EST’s 

EST Analytical Tools There are many tools avilable for the analysis of EST’s: Commercially available Tool = Incyte Life Tools Publicaly available Tool = 3 Types 1) Sequence Similarity Search Tools 2) Sequence Assembly Tools 3) Sequence Clustering Tools

1) Sequence Similarity Search Tools We consider the tools as the relate to EST's. If the reason est is told, then identify the tool which shows the sequence similarity with the EST, by comparing the all sequences. Eg : BLAST tool       BLAST P BLAST N BLAST X X BLAST N

2) Sequence Assembly Tools When a search of databases reveals several EST matching with probe sequence, normally the ESTs must be aligned with each other to reveal the consensus sequences.  This tool is used in when there are several EST sequences showing similarity to a probe sequence  . In this situation, this tool will do aligning and merging of different fragments of sequences to reconstruct the original mRNA . Example; Phrap , Staten assembler, TIGR assembler

3) Sequence Clustering Tools These are the programs that take a large set of sequences and divide them into subsets, or clusters, between the extent of shared sequences are defined in a minimum overlap region. These tools having the capacity to analyse a large set of sequences and capable of grouping for clustering sequences based on the sharing of maximum similar regions . Reliable and effective mechanism for clustering EST will reduced redundancy in the database  And save database search time and analysis effort . Example: Wed EST clustering tools  USEARCH CD- HIT    

Sequence similarity searching tools   These are softwares used for searching, assessing, analysis, interpretation and prediction of information containing in databases.  These are two types Pair wise sequence alignment and similarity searching tool  # A pair of sequence involved # one will query sequence and other template. # query – sequence will be studied # template – will be find out from DB Eg ; BLAST , FASTA Multiple sequence alignment and similarity search tool or homology searching tool # more than two sequence involved. # a set of sequence can compare in it & alignment possible Eg ; CLUSTAL , MODELLER  PSI - BLASTA # Position specific Interacted blast  # It is an hybrid of pairwise sequence alignment and multiple sequence similarity search tool  

sequences are aligned to find region of higher density or strong similarity. According to the sequence length, sequence alignment are two types; Local sequence alignment : Sequence alignment that select only regional areas only which exhibit strong similarity  Eg : BLAST,  FASTA, PSI - BLAST   Global sequence alignment : Sequence alignment that consider entire sequence known as global sequence alignment

Functional Analysis Tool  Protein as well as nucleotide. Used for functional analysis. To study the similarities of sequence based on their function GOFFA : # Gene ontology for functional analysis # using for identification of functional elements in genome and related functional analysis of gene and genome   Ermine J : # Used for genome analysis # and also for functional analysis related to gene expression Interproscan : # It is used for the functional analysis of protein    

Structural Analysis Tool  Structural analysis of nucleotide and proteins . Eg : SWISS PROT PDB viewer Ras Mol

Statistical Analysis Tool  Statistical analysis the value of similarity and differences  Eg : Statistica Met Lab Perl

B) Pair-Wise Sequence Alignment Technique Comparison of sequences and sub sequences Identity and similarity Substitution matrics PAM BLOSUM DOTPLOT BLAST FASTA

Substitution matrices  ( BLOSUM & PAM) When two sequences compare, one sequences have Leusine and other also have Leusin at comparing sequences, If the residue  to residue ( Leusin - Leusin )Similarity in amino acid in the both sequences plot alignment score as 1. But according to this substitution matrix program due to mutation  or evolutionary change, the amino acid can change and cause mismatches.  But the mismatch can accept matching ones, since they do not change the basic structural or functional. The matching are considered by deep analysis. Used in the study of evolutionary relationship. If amino acid changes their nature will be considered. if nature  Remains same in deeper  analysis, researcher should be considered them as match one and plotted it in matrices such plotted matrices  produce called substitutional matrices.

BLOSUM Model It is a substitution matrices. BLOCKS amino acid substitution matrices . It was proposed to overcome the problem of alignment of distantly related sequences comparisons  on substitution matrices  . It was proposed by Steven Heinkoff & Jorja G Henikoff    in 1992  , From the conserve regions of blocks the informations are derived from the and amino acid patterns of distantly related protein sequences available in BLOCKS databases hence the name BLOCK SUBSTITUTION MATRIX.  BLOSUM Matrices are based on a much larger data set. Represent distant relationships more explicitly. The closely related sequences are considered and clustered together and treated as single sequences.

The cluster contains sequences that have sequences identifies higher than it cutoff called clustering percentage changes in clustering percentage Leads to a family of matrices. This has three versions of comparison:  BLOSUM 30 - 30 less than 30 percentage similarity BLOSUM 62 - 62 or between 62 and 30 percentage similarity BLOSUM 90-   90 or between 90 and 62 percentage of similarity  It helps to detect all kinds of information and to get diverse type of relationships  (closely and distinct )

PAM (Point Accepted Mutation or DayHoff PAM model) Also known as DayHoff amino acid substitution matrix. It was derived by M.O.DayHoff In 1978. Here  Substitutions of A.As are observed in homologuos protein sequences during evolution, so these amino acids  Substitutions do not significantly change the function of the protein. These substitutions are accepted by natural selection.  These matrixes are known as as accepted point mutation or point accepted mutation PAM. To prepare PAM Matrices , observed substitutions that occur in alignments between similar sequences estimated  Then used to generate a 20×20 mutation probability matrix p representing all amino acid changes.

Each element of matrix Pij Represent  the probability of replacement of  A.A. j by A.A i Over a fixed evolutionary period . For PAM 1 Is the unit of evolutionary divergence in which one percentage of amino acids have been changed . The model has limited value. Applied for highly similar sequence alignment and comparison . Only used in case of closely related sequence comparison . Not provide distantly related  Closely related sequences and relation to overcome this later proposed BLOSUM. Used in evolutionary studies 

DOT PLOT Analysis It is a paradise sequence alignment It is a very simple and basic pair why sequence analysis technique It is done by manual and graphical method of sequence analysis W ithin a plot, two identical sequences are characteristic It is the most basic method of comparing two sequences  A visual approach known as Dot Plot. It was first described by A J Gibbs   and G A Memory in   1970 It is a graphical method for comparing two sequences to identify the region of similarity or dissimilarity, depicted by the presence or absence of a dot on the plot, hence the name Dot Plot.  To construct dot plot of sequences  A and sequence B , the first sequences is taken on the top of the plot (x axis) and the second sequences is taken on the left side (y-axis) of the plot.  A dot is placed on the plot if any sequence character  Ai Present in A sequences is identical to sequences character  Bi Present in sequence B.  

A region of constructive  Identical characters between both sequences forms a diagonal line on the plot space . When large similar sequences are compared, such clouds become crowded or noisy. To overcome this, the sliding window concept is used . From the dot plot, the alignment score is calculated . Uses Used for improvise logical sequence analysis. Useful for comparison of protein sequences.  The plot is characterized by some apparently random dots (noise) indicates regions of greater similarities between two sequences 

BLAST Basic Local Alignment Searching Tool Pair wise sequence alignment tool. Developed and maintained by NCBI It is a tool specialised in local sequence alignment inserted of whole sequence alignment. Tool based on a statistical, theory called explicit statistical theory by  Altschul et al 1990 Ungapped Alignment of regional sequences  Can be used to align both protein and nucleotide sequences but it can provide with alignment for protein sequences Very fast searching tool This tool can be search a data with millions of sequences in the data base with  In a second in pair wise manner. 

Use  Construct pair why sequence alignment by comparisons between two sequence. Best tool for searching single most best sequence from corresponding database. To find out the structural sequence similarity of quary sequence include 3d structure. Used in the interpretation and prediction of structural information. Interpretation and prediction of functional information. Steps Selection of regional areas of information shows best similarity . Extension of searching towards both the sides of selected region to get maximum similarity . Demerits At a time, we can only Compare a query sequence with a single sequence. sensitivity to select sequences. sometimes it may loses its sensitivity in selecting best matches from databases (because when this tool tries to maintain  thier speed in selecting the best .it may missed certain matches that may be better than selected one  .

1) BLAST P Used to search and find out a perfect protein sequences from the  P.S.D.B for for the query sequences.  2) BLAST N Search and find the best N.S from  N.S.D.B For the query sequences . 3) t BLAST N query sequeneces equal to protein sequences. Then the given  N.S.D.B Is converted into protein sequences then comparing the quarry with the translated nucleotide sequences.  4) BLAST  X query sequence = nucleotide sequence we are searching within P.S.D.B,   Then the protein sequences are converted into nucleotide sequences and compare  nucleotide sequences with the translated protein sequences.  5) t BLAST X This translates  Both N & P sequences in the respected databases and then searching is  occurs.

FASTA fast all it is a sequence alignment tool developed by Lipman and pearson 1985 The   FASTA format  is a text-based  format  for representing either  nucleotide sequences  or amino acid (protein) sequences, in which nucleotides or  amino acids  are represented using single-letter codes. The format also allows for sequence names and comments to precede the sequences. The format originates from the  FASTA  software package, but has now become a near universal standard in the field of  bioinformatics . The simplicity of FASTA format makes it easy to manipulate and parse sequences using text-processing tools and  scripting languages  like the  R programming language ,  Python ,  Ruby , and  Perl . comparison with BLAST: It give better results for nucleotides but can used for both P& N sequences . It can provide better results than  BLAST N But not better than  BLAST P. More sensitive than BLAST in selecting best matches  Missing of sequences while searching is lesser than  BLAST.

Different forms of  FASTA: FAST A3 It has a normal function used for both N & P Sequences for searching P& N sequence query FAST S3 Used to compare linked peptides against a protein sequences databases FAST f3   Used to compare mixed peptides against protein sequences databases  FAST X/Y3 Used to search within protein sequences databases against a translated query N.S. t FAST X/Y3 Used to search within a translated protein sequence databases for comparing a query protein sequences 

C) Multiple Alignment Technique Objective, manual, simultaneous and progressive methods Databases of multiple alignments PSI-BLAST CLUSTAL-W

Multiple Sequence Alignment More than two sequences involved. A set of sequences can compare at time and alignment also possible. 2 type alignment: Simultaneous Multiple Sequence Alignment and Progressive Multiple Sequence Alignment. Simultaneous Multiple Sequence Alignment Alignment occur a time, that is simultaneously. There is no hierarchy fashion of arrangement or orderly arrangement. But sequences having similarity. Advantage Very fast, very quick alignment  Disadvantage We can't expect orderly arrangement of sequences based on similarity. Evolutionary relationship study is not possible  

2) Progressive multiple sequence alignment Hierarchical arrangement of sequences and clear cut orderly arrangement can seen. Sequence alignment of occurs progressively by step by step, little time consuming process.  This alignment best and most similar sequence, arrange next after query sequence. Advantage  Arrange at hierarchical fashion . Evolutionary  relationship study possible  Diadvantage   Comparatively slow and little time consuming process

PSI-BLAST PSI-BLAST (Position-Specific Iterative Basic Local Alignment Search Tool) derives a position-specific scoring matrix (PSSM) or profile from the multiple sequence alignment of sequences detected above a given score threshold using protein–protein BLAST. This PSSM is used to further search the database for new matches, and is updated for subsequent iterations with these newly detected sequences. Thus, PSI-BLAST provides a means of detecting distant relationships between proteins.  PSI-BLAST is most conveniently used on the internet with the help of the graphical user interface provided by the PSI-BLAST search page on National Center for Biotechnology Information (NCBI) website ( http://www.ncbi.nlm.nih.gov/BLAST/ ). The PSI-BLAST page may be customized by the user in terms of automated or semiautomated or “two-page formatting” and other parameters modified as desired. This page can then be saved as permanent internet bookmark for repeated use on future occasions.

It is an hybrid tool It is a recent approach  Hybrid element of both device and multiple sequence alignment method It was proposed by  Altschul in 1997 Hybrid of pairwise sequence alignment and multiple sequence alignment and similarity searching tool.  It can aligned sequence via progressive sequences  alignment Searching residue to residue similarity, we compare sequence only, plot dot similarity occurs. If there similarity present, place a dot mark as graphical representation Calculate similarity  Out of 7, 5 is similar  Used mainly for nucleotide sequence comparison

Here, sequences are aligned via pair wise , but with repeated blast in order to get more and more related sequences. So they act  as pair wise as well as look like a multiple sequence alignment . So they contains maximum similarity, median and least similarity  Advantages  To increase the search of BLAST fast to run provide sequences with diverse range of sequence similarity like M.S. alignment Searches are more sensitive and Selective, able to detect weak but meaningful similarities. running the program, increases search sensitivity. Disadvantages To derive diagnostic family motifs can be very time consuming and demands levels of understanding for general  use. Automated interactive stearch may degenerate and lead to profile dilution   

CLUSTAL 3 forms: CLUSTAL X CLUSTAL W CLUSTAL  ω   CLUSTAL X&W: Protein sequence as well as nucleotide sequence alignment possible CLUSTAL ω :  Can only align the protein sequence  CLUSTAL X: In CLUSTAL X Controlling interface is graphical user interface. Menu based operations for this handling or graphical representations are used. CLUSTAL W CLUSTAL ω :  Command  line interface. For controlling interphase using text command.

Clustal W Clustal W like the other Clustal tools is used for aligning multiple nucleotide or protein sequences in an efficient manner. It uses progressive alignment methods, which align the most similar sequences first and work their way down to the least similar sequences until a global alignment is created. Clustal W is a matrix-based algorithm, whereas tools like  T-Coffee  and  Dialign  are consistency-based. ClustalW has a fairly efficient algorithm that competes well against other software. This program requires three or more sequences in order to calculate a global alignment, for pairwise sequence alignment (2 sequences) use tools similar to  EMBOSS ,  LALIGN

Multiple sequence alignment tool progressive multiple sequence alignment possible written in O ++ programming language. this can run almost all platforms like  Unix, Linux, Metash , Windows Developed by Juli Thomson and Toby Gibson Developed and maintained by EBI User interface is command line, interface by write text commands.  Due to progressive multiple sequence alignment comparison is very easy due to orderly arrangement.  Application Very easy to compare sequences due to progressive sequence alignment Very useful for the classification of both protein and nucleotide sequences. Application in predicting structural and functional features of both nucleotide as well as protein sequences.  This is the best tool for evolutionary relationships study .

4.Protein Structure Prediction Secondary structure prediction  Chou- fasman Method  J Pred prediction method

Secondary structure prediction   Commonly two methods are used for protein structure prediction  1) X - ray diffraction technique  2) Nuclear magnetic resonance technique Birthday are very expensive by clever wise and time taking processes. To over comes these issues we are used by biinformatics tools.  Less time consuming and very fast method. Skilled labours are not required. Cheapest method, when comparing with above 2.

Chou- fasman Method   Chou fasman Method is an empirical technique for the prediction of secondary structures in proteins . Development by Peter Y Chou and Gerald D Fasman . The method is based on analysis of the relative frequencies of each amino acid in alpha helix, beta sheets and turn based on known protein structures solve with x-ray crystallography. From these frequencies a set of probability parameters were derived for the appearances of each  amino acid in each secondary structure type,  And these parameters are used to predict the probability that a given sequence of amino acids would form a helix,  a beta strand, for a turn in a protein. Significantly  Low accurate than the modern machine learning based technique. 50 to 60 percentage accurate in identify correct secondary structures 

Definition  It is an statistical  procedure in which each and every amino acids and their frequencies of given sequence  is Compared with the probability of amino acids and their corresponding propensitive Values given by Chou Fasman in order to  Fit the given protein to a particular secondary structure  Probability table  What are the amino acids and their numbers are present in secondary structure of protein according to traditional sequence  Propensitive value Is is the value at which a particular and aminoacid showing their tendency  towards a particular secondary structure. Propensity value of an aminoacid is generally depends the chemical properties and their R groups:   # Alpha helix: 4 helix markers + 2 helix breakers  # Beta sheet:  3 sheet markers  + 2 sheet breakers

Steps  Scan through the given polypeptide chain  For to find out the what are the different amino acids present in the given strand Also for finding out their numbers  Compare the same with the probability and propensitive value given by Chou Fasman

J Pred prediction method A protein secondary structure prediction server Fully automatic method It has been operation since approximately 19 J Pred Incorporate the J net  algorithm in order to make more accurate predictions. Combination of 6  Independent protein structure prediction method  Z PRED MUL PRED DSC PHD NNSSP PREDATOR

All 6 different method predict independency . 396  Domain data support secondary structure information.  Evaluate 6 different methods result with 396 domain data and get final structural information. Inserted of 6 method, using  Gives more accurate results than it using  Z PRED, MUL PRED Methods. 4 methods compilation gives accuracy 72.9 percentage . It is an  Secondary structural prediction method, hear combilation of 6 different independent methods are using .

Tertiary Structure Prediction Comparative modelling - MODELLER RasMol

Comparative modelling Comparative modelling /Homology modelling It predict  the 3d structure of proteins. It uses experimentally determined protein sequences as models  (templates) The method predict the structure of another protein that exhibits  aa sequence similarity to the template protein. Evolutionary related protein have similar sequence and structure. These similarities are very high in  Core regions the sequence similarity should  be greater than  35 percentage

Steps 1) selection of tablet sequences select template from protein sequences database. the template strand should show maximum sequences similarity or homology 2) Preparation of sequence alignment alignment of two sequences for homology  determinations  3) Construct 3d model it is made between the cordinents of template  We consider the length height width For comparing the template with the query sequences between the coordinates of templates 4) Evaluation of the model constructed it is evaluated between known 3d model. the method is more accurate. the accuracy is depends on sequence alignment 

Homologous models are identified and extinct of their sequences similarity with one another and the unknown is determined. Sequence databases search tools BLAST and  FASTA  are used to search related structures. Sequences are aligned together with the help of a MSA tool called clustal W. Structurally conserved and variable regions are identified  Co-ordinate of core residues of unknown structure and those of non are generated. The side chain and combinations are built. Unknown structures are refined and evaluated various software packages are  used WHAT, RASMOL, MODELLER. It exploited  the revolutionary related proteins. 

MODELLER Used for 3d structure prediction.  It is written in  FORTRAN 90 languages. It is a software used in homology  or knowledge based modelling . It was developed by  Anrej sali at the university of california san francisco  . The ModWeb with comparative protein structure modelling webserver is based on MODELLER. It has limited incorporation with abintitio . It is a computer program used in producing homology models of protein tertiary as well as  quarternary   structures.  It is freely available for academic use. Graphical user interface and commercial versions are  different . Computer program. Used for sequence database searching  For protein structural comparison. used for sequence clustering 

4 important steps  1) Selection of tablet sequence select temperature sequence from protein sequence databases template to sequence exhibit maximum homology  with sequence which is used to study 2)  Preparation of sequence alignment preparation of sequence alignment between the sequence which is to be analyised with that template sequence 3)  Construction of 3d model construction 3d model based on the coordinates of the templet using technique called satisfaction spacial restraints Here by using certain geometrical criteria Length, breadth, height compare the complete with query sequence especially on the basis of coordinates of the tablets searches loop, folding, side chains etc. 4) Evaluation of model constructed we can expect 90 % accuracy, when provides sequence alignment highly accurate 

RASMOL Molecular visualisation software. Molecular structural analysis of protein as well as nucleic acid and other similar molecule is possible. Used for visualising molecular structure. Used in a maily for structural analysis. Example : pollen grains, detailed molecular structure study . Zooming facility of molecular structure and getting full size of monitor . Rotating facility in any 3d direction x, y, z  180 degree, 120 degree, 120 degree etc. Periferal analysis is possible. Different colouring scheme available for particular part projection.  We can view entire  structure is possible detailed study is possible by using RASMOL.

Advantage detail study of structure is possible by using RASMOL. Molecular visualisation software . Very good for detailed molecular analysis of small molecules like nucleotide or protein etc. 1) Group colouring scheme 2) Shapely colouring scheme 3) amino colouring scheme 

5.Emerging Areas of Bioinformatics DNA microarrays Functional genomics Comparative genomics Pharmacogenomics Chemoinformatics Medical informatics 

DNA Microarrays it is genetic analysis technique. used for analysis of nucleic acid in genetic analysis technique 100 to 1000 of microscopic dots of dna was spotted on small glass plate in an orderly fashion. Location of each DNA dots, structural details, final details and expression products informations are available. and stored in computer program  . All informations of spotted  DNA are available form computer, by using these information genetic analysis occurs  Started at 1990. Also called  DNA chips, gene chips, DNA array, gene array and biochiyps .  Principles is hybridizations between nucleotides

Procedure for this, normal mRNA from normal expresses cell and it is enter into this microarray, get the rate of gene expression.  Collect mRNA and prepare DNA microarray. Radiolabeling the CDNA (100  NOS )and which is considered as the probe Introduced into DNA microarray. Radiolabelled CDNA Hybridization with DNA microarrays dots  that indicate the number of hybridization 

Application   Gene expression study 1) for comparison of gene expression in similar cell type (diseased cell and normal type ) 2) for comparison of gene expression in different cell type (different cell of different individual) Identification of tissues specific gene Discovery of drugs Diagnostics and genetic mapping Study of protein protein interaction Functional genomics DNA sequencing Agricultural biotechnology  Study the expression of plants DNA polymorphism Detection of pathogen Gene finding Analysis 100 -1000 genes at a time Gene mapping 

Functional genomics   Study the functions of genes. example growth and physiological environment biochemical environment and role in growth.  In activity of genes and its reasons. Genes are inactive by the actions of other genes and expression of genes may die to  the suppression of other gene, the causing reason. Development and application of genomic analysis technique . Identify the genes involving in the disease. 1) Positional cloning technique 2) genome sequencing technique Example:  # Mirring Shotgun method # enzymatic method # chemical method are developed on the basis of functional genomics get information about structural and functions of gene  3) Gene expression  Profiling technique comparison of similar cell type but different in gene expression due to mutation So used to find out the expression  4) Knockout technique 

Comparative genomics  Compare the structural and functional details and based on the similarities and differences find out the relationship  Gene finding classification of nucleotide sequence find out the evolutionary relationship comparison of gene expression Analysis of protein sets from completely sequenced genomes For better understanding of the genomes and biology of the respective organism  Example methanococcus , mycoplasma , E.coli , bacillus subtilis   are fully sequenced Genes involved in ripening green mangoes to yellow mangoes In this genome of mango is compared to the annotated genome of similar species to identify the genes and the functions that they do Databases used for comparative genomics: PEDANT Give informations about proteins, enzyme KEGG A comprehensive set of metabolic pathway of genome  MBGD Microbial genome database. search for microbial genome  WIT Metabolic reconstruction of completely sequenced genomes 

Pharmacogenomics   Is the study of the role of the genome in drug response its name reflects its combining of pharmacology and genomics  Pharmacogenomics analyses how the genetic makeup of an individual affects his or her response to drugs  It deals with the  influence of acquired and inherited genetic variation on drug response in patients by correlating gene expression for single nucleotide polymorphism with pharmaco kinetic and pharmacodynamics   Pharmacogenomics aims to develop rational means to optimise drug therapy. with respective patients genotype, to ensure maximum efficiency with minimal adverse effect  Genomic research will allow drugmakers to tailor a therapy to the individual specific need

It is described as a marriage between functional genomics and molecular pharmacology  A new journel pharmacogenomics was started by the nature group of journals The entire spectrum of genes that determine response and sensitivity to individual drugs  Example human genome project  Pharmacogenetics is the  narrow spectrum of inherited differences in drug metabolism and disposition . Both pharmacogenomics and genetics are  Interchangeable  It provide tools to classify  interogenity of disease,  Individual response to medicine. give fascinating area in biotechnology research. Example: diagnosis, mechanism of disease and  Response of patients to medicine

2 approaches to pharmacogenomics 1) candidate gene approach 2) linkage disequilibrium approach In industrial level, it is used to know  variability in clinical trials Disturb differential side effects Inconsistency in disease models 

Chemoinformatics   Also known as chemoinformatics , Chemio informatics and  Chemical informatics  It is the use of computer and informational  techniques applied to a range of problems in the field of chemistry Application In pharmaceutical companies and academic settings in the process of drug discovery  These methods can also be used in chemical and allied industries in various other forms 

Medical informatics Also called health informatics Clinical informatics It is information engineering applied to the field of healthcare, essentially the management and use of patient healthcare information  It is a multidisciplinary field that uses health information technology to improve health care via any combination of higher quality, higher efficiency and new opportunities  Used in gene therapy  Neurological and metabolic disorders  Cystic fibrosis Infectious diseases  More efficient to patient case  Cardiovascular diseases, cancer gene therapy, human gene therapy
Tags