Lec 1.2.3. Structural Bioinformatics.pptx

MazharIqbal393276 59 views 92 slides Jul 15, 2024
Slide 1
Slide 1 of 92
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27
Slide 28
28
Slide 29
29
Slide 30
30
Slide 31
31
Slide 32
32
Slide 33
33
Slide 34
34
Slide 35
35
Slide 36
36
Slide 37
37
Slide 38
38
Slide 39
39
Slide 40
40
Slide 41
41
Slide 42
42
Slide 43
43
Slide 44
44
Slide 45
45
Slide 46
46
Slide 47
47
Slide 48
48
Slide 49
49
Slide 50
50
Slide 51
51
Slide 52
52
Slide 53
53
Slide 54
54
Slide 55
55
Slide 56
56
Slide 57
57
Slide 58
58
Slide 59
59
Slide 60
60
Slide 61
61
Slide 62
62
Slide 63
63
Slide 64
64
Slide 65
65
Slide 66
66
Slide 67
67
Slide 68
68
Slide 69
69
Slide 70
70
Slide 71
71
Slide 72
72
Slide 73
73
Slide 74
74
Slide 75
75
Slide 76
76
Slide 77
77
Slide 78
78
Slide 79
79
Slide 80
80
Slide 81
81
Slide 82
82
Slide 83
83
Slide 84
84
Slide 85
85
Slide 86
86
Slide 87
87
Slide 88
88
Slide 89
89
Slide 90
90
Slide 91
91
Slide 92
92

About This Presentation

• Structural biology gives an understanding of biological function at the molecular level.
• These functions are ultimately due to interactions between molecules.
• Ideally, we want experimentally determined 3D structures of molecules and complexes.
•Rely on computer models of molecules and ...


Slide Content

Structural Bioinformatics

Motivation • Structural biology gives an understanding of biological function at the molecular level. • These functions are ultimately due to interactions between molecules. • Ideally, we want experimentally determined 3D structures of molecules and complexes. • Sometimes we have to rely on computer models of molecules and their interactions. Structural bioinformatics!

What is Biology? Ecosystem Rain forest, desert, fresh water lake, digestive tract of animal Community All species in an ecosystem Population All individuals of a single species Organism One single individual Organ System A specialized functional system of an organism, e.g. nervous system Organ A specialized structural system of an organism, e.g. brain or kidney Tissue A specialized substructure of an organ, e.g. nervous tissue, smooth muscle Cell A single cell, e.g. neuron, skin cell, stem cell, bacteria Molecule e.g. protein, DNA, RNA, sugar, fatty acid, metabolites, pharmaceutical drugs

Structural bioinformatics Structural bioinformatics represents a section of bioinformatics dealing with analysis and prediction of three-dimensional (3D) structures of biological macromolecules such as proteins, RNA, and DNA.

Structural bioinformatics In this course we consider "structural bioinformatics" to be the development and application of computational methods ( i ) To analyze and predict the conformations of biological macromolecules (ii) To study relationships between macromolecular structure and function (iii) Protein molecules and other biological molecules will also be studied. Aims • To present some of the computational challenges in structural biology; • To describe computational methods for analyzing and predicting macromolecular conformations and interactions; • To give practice in programming techniques for structural bioinformatics. • To give practice in the use of molecular graphics and modelling software; • To emphasise the relationship between macromolecular shape and function .

Some challenges in structural bioinformatics Given the sequence of a protein, what is its three-dimensional structure ? Given the three-dimensional structures of two macromolecules, will they associate with one another, and what will be the docking orientation ? Given the three-dimensional structure of a macromolecule, what can be inferred about its biological function? Given the three-dimensional structures of a set of proteins, what can be inferred about their evolutionary relationship ? Given the three-dimensional structure of a macromolecule, can a molecule be designed that will affect its function ?

Review of Protein Structure Basics Amino Acids Peptide Formation

Amino Acid Structure Although more than 300 different amino acids have been described in nature, only 20 are commonly found as constituents of mammalian proteins. [Note: These are the only amino acids that are coded for by DNA, the genetic material in the cell.] Each amino acid has a carboxyl group, A primary amino group (except for proline, which has a secondary amino group), Side chain (“R group”) In proteins, almost all of these carboxyl and amino groups are combined through peptide linkage In general, there are not available for chemical reaction except for hydrogen bond formation

Amino acids side chains Non polar side chains Each of these amino acids has a nonpolar side chain that does not gain or lose protons or participate in hydrogen ionic bond s. The side chains of these amino acids can be thought of as “oily” or lipid-like, a property that promotes hydrophobic interactions. Location of nonpolar amino acids in proteins In proteins found in aqueous solutions (a polar environment) the side chains of the nonpolar amino acids tend to cluster together in the interior of the protein. This phenomenon, known as the hydrophobic effect. The nonpolar R groups, thus, fill up the interior of the folded protein and help give it its three-dimensional shape. Located in a hydrophobic environment, such as a membrane The nonpolar R groups are found on the outside surface of the protein, interacting with the lipid environment.

Polar side chain These amino acids have zero net charge at physiologic pH The side chains of cysteine and tyrosine can lose a proton at an alkaline pH Serine, threonine, and tyrosine each contain a polar hydroxyl group that can participate in hydrogen bond formation The side chains of asparagine and glutamine each contain a carbonyl group and an amide group, both of which can also participate in hydrogen bonds.

Optical properties of amino acids The a-carbon of an amino acid is attached to four different chemical groups (asymmetric) and is, therefore, a chiral, or optically active carbon atom. Glycine is the exception because its a-carbon has two hydrogen substituents . Amino acids with an asymmetric center at the a-carbon can exist in two forms, designated D and L, that are mirror images. The two forms in each pair are termed stereoisomers , optical isomers, or enantiomers. D-amino acids are found in some antibiotics and in bacterial cell walls.

Lecture 2 Protein Structures

Protein Structure and Folding Primary Structure Secondary Structure Tertiary Structure Q ua ter n ary Str uc t u re and Symmetry Protein Stability Protein Folding

History Proteins were long thought to be colloids of random structure 1934, crystal of pepsin in X-ray beam produces discrete diffraction pattern -> atoms are ordered 1958 first X-ray structure solved, sperm whale myoglobin, no structural regularity observed Today, approx 50’000 structures solved => remarkable degree of structural regularity observed

Hierarc h y of Str uc t u ral La y ers Primary structure: amino acid sequence Secondary structure: local arrangement of peptide backbone Tertiary structure: three dimensional arrangement of all atoms, peptide backbone and amino acid side chains Quaternary structure: spatial arrangement of subunits

Protein Structure

Secondary Structure A ) T h e p l a n ar peptide gro u p l imits po ly peptide conformations The peptide group ha a rigid, planar structure as a consequence of resonance interactions that give the peptide bond ~40% double bond character

The most common regular secondary s tr uc t u res , He l ix a nd S h eet Two structures are widespread:  Helix and  Sheet = regular secondary structures

 S heets are formed from extended chains like  helix,  sheets are stabilized by hydrogen bonding Bonding occurs between neighboring chains rather than within the structure as is the case with  helix Parallel and antiparallel

Pleated appearance of a  sheet From 2 to 22 strands Average 6 strands in protein Exhibit right-handed twist 7-stranded

B ovin e carbo x yp e p tid as e A Coils Helices  -sheet

T u r n s co nn ect s ome u nits of secondary structure topology = connectivity of  sheets can be complex or made by a simple turn/loop reverse turns or  bends, occur at the protein surface, connect  sheets

C) F ibro u s protei n s h a v e repeati n g secondary structure

Tertiary Structure Tertiary structure of a protein describes the folding of its secondary structural elements and specifies the position of every atom in the protein This information is deposited in protein structure database (PDB) Experimentally determined by X-ray crystallography or NMR

Protein crystals

A ) Most protei n s str uc t u res are determi n ed by X - ray crystallography or nuclear magnetic resonance Ray crystallography: technique that directly images molecules X-ray wavelength is short, ~1.5 Å, equivalent to distance of atoms (visible light 4000 Å) Crystal: repetitive arrangement of the same structure => diffraction pattern (darkness of spot is function of crystals electron density) X-ray interact with electrons (not with nuclei) -> X-ray structure is thus an electron density map of a given protein -> represents contours of atoms

A thin section through a 1.5 Å resolution electron de n sity map of a protein t h at is co n to u red in t h ree dimensions

Most protein crystal structures exhibit less than atomic resolution Crystal is build up by repeating units, containing protein in native conformation, highly hydrated (40-60% water) soft, jellylike consistency, unlike NaCl crystals molecules are slightly disordered and display Brownian motion -> this determines the resolution limit of a given protein crystal (typical 1.5 – 3 Å) inability to crystallize a protein to form crystals of sufficiently high resolution is a major limiting factor in structure determination

E l ectron de n sity maps of di k etopiperazi n e at different resolution levels Electron density map alone is not sufficient to determine the structure if the protein, Amino acid sequence is also required Computerized fitting algorithm of atoms into the experimentally determined electron density map results in protein structure determination of up to 0.1 Å resolution

Most crystallized proteins maintain their native conformations Key question: does the structure of protein in a crystal accurately reflect the structure of the protein in solution, where it normally functions ? The protein in the crystal is hydrated like it is in solution X-ray structure is similar to NMR structure, which is determined from proteins that are in solution Many enzymes remain catalytically active in the crytsal

Protein s tr uc t u res can be determi n ed by NMR Nuclear magnetic resonance, NMR, an atom nucleus resonates if a magnetic field is applied. This resonance is sensitive to the electronic environment of the nucleus and its interaction with nearby nuclei Developed since 1980, Kurt Wüthrich (ETH-Z), to determine protein structures in solution Because there are many nuclei in a protein that would crowd in a conventional one-dimensional NMR two- di mens i ona l (2 D ) NM R w a s d e v elope d to measure atomic distances of chemically linked atoms (COSY) or of spatially close atoms (NOESY) Size limit of about 40 kD, may reach 100 kD soon Dynamic, can follow protein motion or folding

B) Side chain location varies with polarity Since Kendrew solved the first protein structure, nearly 50’000 protein structures have been reported No two are the same, but they exhibit some remarkable consistency: globular structures lack the repeating sequences that support the conformation of fibrous proteins Amino acid side chains in globular proteins are distributed according to their polarity: Nonpolar residues Val, Leu, Ile, Met and Phe occur mostly in the interior of a protein, excluded from the contact with water, hydrophobic core, compact packing (no empty room) Charged amino acids Arg, His, Lys, Asp, Glu are located on the surface, never in hydrophobic core Uncharged polar groups Ser, Thr, Asn, Gln, Tyr are usually on the surface but are also found inside but then are hydrogen bonded

nonpola r polar Side chain locations in an  helix and a  sheet Surface of protein

Side c h ain distrib u tion in h orse h eart c y toc h rome c Hydrophilic amino acids Hydrophobic amino acids

C) Tertiary s tr uc t u res co n tain combi n atio n s of secondary structure Globular proteins are build from combinations of secondary structure elements These combinations of secondary structure elements form protein motifs = supersecondary structures

Most common is  motif,  helix connects two parallel strands of a  sheet Equally common is a  hairpin, antiparallel strands connected by tight reverse turn   motif, two successive antiparallel  helices packed against each other 4. Greek key motif,  hairpin is folded over to form 4- stranded antiparallel  sheet

Most proteins can be classified as  ,  , or  Secondary structural elements occur in globular proteins in varying proportions E. coli cytochrome b 562 for example consists only of  helices =>  protein Immunoglobulins contain the immunoglobulin fold =>  proteins, contain large proportion of  sheets Most proteins, including lactate dehydrogenase and carboxypeptidase A are  proteins (average ~31%  helix, 28%  sheet) Further subdivision of proteins by their topology : that is connection of secondary structural elements

Selection of protein structures Cytochrome b 562 with heme Immunog l obu l in fragment Lactate dehydrogenase

Structures of  barrels Human retinol binding protein Peptide Tri ose amidase F phosphate i s omer as e

Large polypeptides form domains Polypeptides of more than 200 amino acids usually fold into more than one domain in eukaryotes, prokaryotes can only fold mono-domain proteins -> bi or multilobal appearance Most domains consist of 40 to 200 Aa, average diameter of ~25 Å Many domains are structurally independent units that have the characteristic of globular proteins Individual domains often have specific function, i.e. binding of the dinucleotide NAD+ by nucleotide binding site: Rossmann fold: 2 (  )

Glyceraldehyde-3-phosphate dehydrogenase 2 globular domains Dinucleotide binding in N-term domain Rossmann fold

D) Structure is conserved more than sequence Grouping structures into families of high similarity, 50’000 structures define 1’400 protein domain families 200 different folding patterns account for about half of all known structures the protein domain is the evolutionary unit, not its sequence comparison of c-type cytochromes

Q ua ter n ary Str uc t u re Many proteins, particularly those of > 100 kD consist of more than one polypeptide chain. Multi-subunits associate into defined structure = quaternary structure For example collagen, assembly of multiple subunits is easier than synthesizing one very large polypeptide chain…. Site of synthesis can differ from site of assembly Damaged components can be replaced Less genetic information required to encode self-assembling subunits Multi-subunit enzymes have multiple catalytic sites that can be co-regulated

Subunits usually associate noncovalently Multi-subunit protein may consist of identical or nonidentical subunits (homo-, heterooligomeric) oligomers, protomers Example: hemoglobin is a dimer of protomers Contact region between subunits resembles the interior of a single subunit protein: closely packed nonpolar residues, hydrogen bonding, interchain disulfide bridges, but generally less hydrophobic than the hydrophobic core of a single subunit protein (they are synthesized as monomers and need to be soluble each one before assembly…)

Quaternary structure of hemoglobin h e me

Subunits are symmetrically arranged In the majority of oligomeric proteins, the subunits are symmetrically arranged That is: protomers occupy geometrically equivalent positions No inversion or mirror symmetry because this would require D-amino acids Thus proteins can have only rotational symmetry Simples case, cyclic symmetry, single axis of rotation (2-,3-, 4-,n-fold). C 2 is most common Dihedral symmetry (D n ): n-fold rotational axis intersects with a twofold rotational axis. D 2 most common Tetrahedron, cube, and icosahedron, for example spherical viruses

Symmetries of oligomeric proteins Rotational symmetry

Symmetries of oligomeric proteins Dihedral symmetry

Symmetries of oligomeric proteins

4) Protein Stability Native proteins are only marginally stable under physiological conditions -> high turnover Free energy required for denaturation is ~0.4 kJ/mol/ residue -> fully folded 100 residue protein is ~40 kJ/mol more stable than its unfolded form = energy of 2 hydrogen bonds But energy of all noncovalent interactions within a protein is in the order of thousands of kJ/mol => native structure results from a delicate balance of powerful counteracting forces

A) Proteins are stabilized by several forces Protein structures are governed mainly by hydrophobic effects and to lesser extend by interactions between polar residues The hydrophobic effect causes nonpolar substances to minimize their contact with water (degree of order, entropy, of water is decreased because water has not to form “cages” around the hydrophobic groups) Relative hydropathy of residues: energy required to solubilize a given amino acid in water

Lecture 3 Biological Databases

Protein Databases

Biological Databases Biological databases are indispensable tools for an efficient and rational storage, accession, and dissemination of the huge amount of biological data. Biological databases collect information and data coming from: Literature Experimental analysis ( in vitro and in vivo ) Bioinformatics analysis ( in silico )

Biological Databases Need to collect and store biological data and its associated knowledge into databases. Fundamental into survival of science Each year, Nucleic Acid Research (NAR) journal dedicates an entire issue on the available databases.

What we expect from a database..!! Sequence, functional, structural information, related bibliography Well Structured and Indexed Well cross-referenced (with other databases) Periodically updated Tools for analysis and visualization

Types of Biological Databases Primary databases Secondary databases The nucleotide and protein databases are primary databases The information gathered from primary databases are summarized in secondary databases.

Sequence databases

Nucleotide databases International Nucleotide Sequence Database Collaboration (INSDC) NCBI EMBL DDBJ

Standard contents of a sequence database Sequences Accession number References Taxonomic data Annotation/curation Keywords Cross-references Documentation

NCBI Very comprehensive biological database GENBANK: The nucleotide sequence database Provides 42 different resource Provides a simple and easy to use web interface http://www.ncbi.nlm.nih.gov/ Sequence submission: done using Bankit or Sequin Search Engine for data retrieval: Entrez Retrieves information across all the resources under NCBI Example: PubMed, taxonomy, SNP, PubChem etc.

Tools for analysis BLAST Primer-BLAST B-Link ORF finder Genome workbench

DDBJ Bioinformation and DDBJ Center provides sharing and analysis services for data from life science researches and advances science Provides freely available nucleotide sequence data

EMBL European Bioinformatics Institute (EMBL-EBI) is part of EMBL. Data resources Research Training Industry

Protein Sequence databases

UniprotKb /Swiss- Prot UniProtKB /Swiss- Prot is the expertly curated component of UniProtKB (produced by the UniProt consortium). It contains hundreds of thousands of protein descriptions, including function, domain structure, subcellular location, post-translational modifications and functionally characterized variants . 💡 UniProt is one of the most widely used protein information resources in the world.

Uniprot Uni versal Prot ein Resource , formed through the merger of : SIB EBI- SwissProt TrEMBL (Translated EMBL) PIR-PSD Features: Blast Align Retrieve ID mapping Entry names are often the names of the gene followed by the species. Accession numbers are of the following format: e.g. P26367 (PAX6_HUMAN)

PIR Protein Information Resource The scientific community with a single centralized authoritative resource for protein sequences and functional information.

Pfam Proteins contain conserved regions Based on the conserved regions, proteins are classified into families Provides links to external databases like PDB, SCOP, CATH etc. Features: Sequence search, Keyword search View Pfam family, View a sequence, View a structure

Prosite PROSITE consists of documentation entries describing protein domains, families and functional sites as well as associated patterns profiles to identify them PROSITE is complemented by ProRule A collection of rules based on profiles and patterns, which increases the discriminatory power of profiles and patterns by providing additional information about functionally and/or structurally critical amino acids

Structural databases

PDB – Protein Data Bank CATH SCOP – Structural Classification of Proteins

wwPDB Contains information about experimentally determined structures of proteins, nucleic acids, and complex assemblies RCSB-PDB, PDBe, PDBj, BMRB – repositories of protein structure data Files in PDB, mmCIF, PDBML/XML formats

Advanced search – provides comprehensive information about a protein. Sequence info, domain info, sequence similarity, literature, apart from the details of the structure. Cross referenced to SCOP and CATH

CATH Classification of proteins based on domain structures Each protein chopped into individual domains and assigned into homologous superfamilies. Hierarchial domain classification of PDB entries.

CATH hierarchy Class – derived from secondary structure content is assigned automatically Architecture – describes gross orientation of secondary structures, independent of connectivity Topology – clusters structures according to their topological connections and numbers of secondary structures Homologous superfamily – this level groups together protein domains which are thought to share a common ancestor and can therefore be described as homologous

SCOP Description of structural and evolutionary relationships between all the proteins with known structures Uses the PDB entries Search using keywords or PDB identifiers

Hierarchy in SCOP Class Fold Superfamily Family Species