Bioinformatics- An overwiew..................

SoumitraNath9 35 views 54 slides Jul 15, 2024
Slide 1
Slide 1 of 54
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27
Slide 28
28
Slide 29
29
Slide 30
30
Slide 31
31
Slide 32
32
Slide 33
33
Slide 34
34
Slide 35
35
Slide 36
36
Slide 37
37
Slide 38
38
Slide 39
39
Slide 40
40
Slide 41
41
Slide 42
42
Slide 43
43
Slide 44
44
Slide 45
45
Slide 46
46
Slide 47
47
Slide 48
48
Slide 49
49
Slide 50
50
Slide 51
51
Slide 52
52
Slide 53
53
Slide 54
54

About This Presentation

Bioinformatics


Slide Content

Bioinformatics An overview Soumitra Nath m ail: [email protected] Department of Biotechnology GURUCHARAN COLLEGE:: SILCHAR

Bioinformatics Biological Data Computer Calculations +

What is Bioinformatics? “The field of science in which biology, computer science, and information technology merge to form a single discipline”

Central Dogma in Molecular Biology mRNA Gene (DNA) Protein 21 ST century Genome Transcriptome Proteome

The Human Genome Project Initiated in 1986 Completed in 2003 Project goals were to identify all the genes in human DNA, determine the sequences of the 3 billion chemical base pairs that make up human DNA, store this information in databases, improve tools for data analysis and develop new tools address the ethical, legal, and social issues that may arise from the project.

What makes us human? CHIMP GENOME Chimpanzees are similar to humans in so many ways: they are socially complex, sensitive and communicative, and yet indisputably on the animal side of the man/beast divide. Scientists have now sequenced the genetic code of our closest living relative, showing the striking concordances and divergences between the two species, and perhaps holding up a mirror to our own humanity.

Perhaps not surprising!!! Comparison between the full drafts of the human and chimp genomes revealed that they differ only by 1.23% How humans are chimps?

Annotation Open reading frames Functional sites Structure, function

CCTGACAAATTCGACGTGCGGCATTGCATGCAGACGTGCATG CGTGCAAATAATCAATGTGGACTTTTCTGCGATTATGGAAGAA CTTTGTTACGCGTTTTTGTCATGGCTTTGGTCCCGCTTTGTTC AGAATGCTTTTAATAAGCGGGGTTACCGGTTTGGTTAGCGAGA AGAGCCAGTAAAAGACGCAGTGACGGAGATGTCTGATG CAA TAT GGA CAA TTG GTT TCT TCT CTG AAT ...... .............. TGAAAAACGTA

CCTGACAAATTCGACGTGCGGCATTGCATGC AGACGTGCAT G CGTGCAAA TAATCA ATGTGGACTTTTCTGC GATTAT GGA AGA A CTTTGTTACGCGTTTTTGTCATGGCTTTGGTCCCGCTTTGTTC AGAATGCTTTTAATAAGCGGGGTTACCGGTTTGGTTAGCGAGA AGAGCCAGTAAAAGACGCAGTGAC GGAG ATGTCTG ATG CAA TAT GGA CAA TTG GTT TCT TCT CTG AAT ................................. .............. TGA AAAACGTA Transcription Factor binding site promoter Ribosome binding Site ORF=Open Reading Frame CDS=Coding Sequence Transcription Start Site

Organisms Genome maps DNA sequences RNA sequences ...AATGGTACCGATGACCTGGAGCTTGGTTCGA... Molecular biology data types Lei Liu

DNA sequences RNA sequences Protein sequences ...TRLRPLLALLALWPPPPARAFVNQHLCGSHLVEA... Molecular biology data types Organisms Genome maps Lei Liu

Protein sequences Protein structures RNA structures Molecular biology data types Organisms Genome maps DNA sequences RNA sequences Lei Liu

Protein structures DNA motifs Protein motifs RNA expression Molecular biology data types Organisms Genome maps DNA sequences RNA sequences RNA structures Protein sequences Lei Liu

Bioinformatics

Sequence Analysis

What we want to know about a sequence? Is this sequence similar to any known genes? How close is the best match? Significance? What do we know about that gene? Genomic (chromosomal location, allelic information, regulatory regions, etc.) Structural (known structure? structural domains? etc.) Functional (molecular, cellular & disease) Evolutionary information: Is this gene found in other organisms? What is its taxonomic tree? Larry Hunter

Biological databases Data is of different types Raw data (DNA, RNA, protein sequences) Curated data (DNA, RNA and protein annotated sequences and structures, expression data)

EMBL / GenBank / DDBJ Serve as a rchives / storage containing all sequences (single genes, ESTs, complete genomes, etc.) derived from: Genome projects Sequencing centers Individual scientists Patent offices (i.e. European Patent Office, EPO) Non-confidential data are exchanged daily Currently: 18 x10 6 sequences, over 20 x10 9 bp ; Over the last 12 months the database size has tripled Sequences from > 50’000 different species ; These 3 db contain mainly the same informations within 2-3 days (few differences in the format and syntax )

www.ncbi.nlm.nih.gov Created in 1988 as part of the National Library of Medicine at NIH Establish public databases Research in computational biology Develop software tools for sequence analysis Disseminate biomedical information 20

NCBI and Entrez NCBI provides interesting summaries, browsers for genome data, and search tools Entrez is their database search interface http://www.ncbi.nlm.nih.gov/Entrez Can search on gene names, sequences, chromosomal location, diseases, keywords, ...

Sequence Comparison DNA is blue print for living organisms Evolution is related to changes in DNA By comparing DNA sequences we can infer evolutionary relationships between the sequences

Copyright  2004 limsoon wong Sequence Alignment Sequence U Sequence V mismatch match indel Key aspect of sequence comparison is sequence alignment A sequence alignment maximizes the number of positions that are in agreement in two sequences

Copyright  2004 limsoon wong Multiple Alignment: An Example Conserved sites Multiple seq alignment maximizes number of positions in agreement across several seqs seqs belonging to same “family” usually have more conserved positions in a multiple seq alignment

Copyright  2004 limsoon wong Phylogeny: An Example By looking at extent of conserved positions in the multiple seq alignment of different groups of seqs, can infer when they last shared an ancestor Construct “family tree” or phylogeny

Visualizing the 3d structure of Proteins

From: Brandon & Tooze, “Introduction to Protein Structure” primary (1º) secondary (2º) tertiary (3º) quaternary (4º)

Small-scale X-ray source in lab or at national synchrotron facility Getting crystals of proteins or nucleic acids is no small feat! Diffraction pattern Computers: Aid in model building, phase determination, visualization Problem: no way to “focus” Need to determine phases

Cn3d Cn3D is a visualization tool for macromolecules. It allows you to view 3-D structures from NCBI's Entrez retrieval service. Cn3D is able to correlate structure and sequence information; for example, you can find the residues in a crystal structure that correspond to known disease mutations. Software for 3d structure visualization

RasMol RasMol is a molecular graphics program Intended for the visualization of proteins, nucleic acids, and small molecules Aimed at display, teaching, and generation of publication quality images. Software for 3d structure visualization

Swiss- Pdb Viewer Swiss- Pdb viewer is used to calculate the distance and angle between atoms atoms . It allows browsing a rotamer library in order to change amino acids side chains. This can be very useful to quickly evaluate the assumed effect of mutation before actually doing the lab work. It allows altering the torsions angles of amino-acids and hetero-atoms, as well as the backbone omega, phi and psi angles. Software for 3d structure visualization

CADD

What is a drug target? A drug target may be a native protein (or sometimes DNA/RNA) in the body whose activity is modified by a drug resulting in a desirable therapeutic effect. Drug Targets may be: Enzymes Hormone Receptors Ion Channel Proteins sometimes, DNA or RNA CADD

The Drug Designing Pathway: Disease Drug Target Ligand Database Natural Product Combinatorial Library Ligand Side chain modification Lipinski & ADMET Filters -ve Docking Result +ve Docking Result Synthesis Docking ( in silico binding study) In vitro screening +ve Result -ve Result In vivo screening Clinical Trials

Ligand (analog) based drug design 1. Receptor structure is not known 2. Mechanism is known/ unknown 3. Ligands and their biological activities are known Target (structure) based drug design 1. Receptor structure is known 2. Mechanism is known 3. Ligands and their biological activities are known/ unknown Computational tools are used to: Identify and study drug targets of various diseases Study and identify suitable ligand that binds with the drug target Prediction of toxicity and drug likeness of small molecules (Lipinski Filters & ADMET Screening) Generation of Combinatorial Library There are two major types of drug design.

3D Structure of the protein (Drug Target) Download from Protein Data Bank (www.rcsb.org/pdb) (It is a macromolecular structure database) If not available in PDB, predict the structure (Homology Modeling, Ab initio prediction, Threading etc.) 3D Structure of the small molecule ( Ligand ) Small molecule 2D Structures are available in Databases like PubChem , KEGG- Ligand etc. The structure of isolated natural product or synthetic compound may also be derived using NMR spectroscopy or/and XRC. Convert the 2D small molecule to its 3D structure using software, like CORINA (It stands for C o OR d INA tes ) Prerequisites of a docking experiment:

The Molecular Wt. must be less than (≤) 500 C logP ≤ 5 ( Octanol /Water Partition Coefficient) H-bond Donors ≤ 5 H-bond Acceptors (sum of N and O atoms) ≤ 10 No. of Rotatable Bonds ≤ 10 Lipinski‘s Rule of Five is applicable to orally active compounds. Lipinski‘s Rule of Five

Absorption:- Must be easily absorbed by body Distribution:- Compound needs to be easily transferred and distributed to its target site. Metabolism:- Should take part in various metabolic activities Excretion:- Byproducts need to be excreted out from the body. Toxicity:- The toxic effect must be neutralized ADME- Tox Screening

Examples: Tubulin : As a Cancer Drug Target Tubulin heterodimer (a + b) is the basic structural unit of microtubule. Drug molecule ( Taxol ) binds to the tubulin , so that heterodimer can’t be formed. As a result, cell division ceases. Tubulin-a + Tubulin-b Heterodimer Microtubule Taxol

Benefits of Bioinformatics To the patient: Better drug, better treatment To the pharma : Save time, save cost, make more $ To the scientist: Better science

Programme Designing

PERL : P ractical E xtraction and R eport L anguage Perl 1.0.0 Larry Wall 1987 http://www.perl.org/ 42 Perl is a programming language that is offered at no cost.

Why Perl? Fairly easy to learn the basics Many powerful functions for working with text: search & extract, modify, combine Can control other programs Free and available for all operating systems Most popular language in bioinformatics Many pre-built “modules” are available that do useful things 43

Get Perl You can install Perl on any type of computer. Download and install Perl on your own computer: www.perl.org Windows version: http://www.activestate.com/Products/ActivePerl/ On your desktop Set up a shortcut to the Command Prompt Programs/Accessories/Command Prompt Edit the properties of the command prompt to set the Start in to be blank 44

Extension and Path On Windows systems, it's usual to associate the filename extension .pl . This is done as part of the Perl installation process, which modifies the registry settings to include this file association. You can then launch this_program.pl In MS-DOS type the complete pathname to the program, for instance perl c:\windows\desktop\my_program.pl. Notepad works satisfactorily. 45

( Computers are VERY dumb -they do exactly what you tell them to do, so be careful what you ask for…........) 46

Program details Perl programs always start with the line: #!/ usr /bin/ perl this tells LINUX that this is a Perl program and where to get the Perl interpreter. In windows this is not needed the .pl extension is enough but it is a good idea to include this card. All other lines that start with # are considered comments, and are ignored by Perl Lines that are Perl commands end with a ; 47

The most simpliest #!/ usr /bin/ perl print "Hello"; 48

#!/ usr /bin/ perl $a="ATGCTGATGCGT"; $b=length($a); print"$b"; 49 Length

#!/ usr /bin/ perl $a=“ATGCAGC”; $b=reverse($a); print"$b"; 50 Reverse

#!/ usr /bin/ perl $DNA="ATGCAGTCAGT"; $ revcom = reverse$DNA ; $ revcom =~ tr /ATGC/TACG/; print"$ revcom "; 51 Reverse Complement

#!/ usr /bin/ perl $DNA='ATGTGCGTGACGTGCAGT'; $RNA=$DNA; $RNA=~s/T/U/g; print"$RNA\n\n"; 52 Translation

Using <STDIN> print"TYPE THE DNA FRAGMENT: "; $DNA=<STDIN>; chomp($DNA); $L=length($DNA); print“The length of the sequence is $L";

We are living just because of non living chemical compound THANK YOU
Tags