• Structural biology gives an understanding of biological function at the molecular level.
• These functions are ultimately due to interactions between molecules.
• Ideally, we want experimentally determined 3D structures of molecules and complexes.
•Rely on computer models of molecules and ...
• Structural biology gives an understanding of biological function at the molecular level.
• These functions are ultimately due to interactions between molecules.
• Ideally, we want experimentally determined 3D structures of molecules and complexes.
•Rely on computer models of molecules and their interactions
Size: 27.28 MB
Language: en
Added: Jul 15, 2024
Slides: 92 pages
Slide Content
Structural Bioinformatics
Motivation • Structural biology gives an understanding of biological function at the molecular level. • These functions are ultimately due to interactions between molecules. • Ideally, we want experimentally determined 3D structures of molecules and complexes. • Sometimes we have to rely on computer models of molecules and their interactions. Structural bioinformatics!
What is Biology? Ecosystem Rain forest, desert, fresh water lake, digestive tract of animal Community All species in an ecosystem Population All individuals of a single species Organism One single individual Organ System A specialized functional system of an organism, e.g. nervous system Organ A specialized structural system of an organism, e.g. brain or kidney Tissue A specialized substructure of an organ, e.g. nervous tissue, smooth muscle Cell A single cell, e.g. neuron, skin cell, stem cell, bacteria Molecule e.g. protein, DNA, RNA, sugar, fatty acid, metabolites, pharmaceutical drugs
Structural bioinformatics Structural bioinformatics represents a section of bioinformatics dealing with analysis and prediction of three-dimensional (3D) structures of biological macromolecules such as proteins, RNA, and DNA.
Structural bioinformatics In this course we consider "structural bioinformatics" to be the development and application of computational methods ( i ) To analyze and predict the conformations of biological macromolecules (ii) To study relationships between macromolecular structure and function (iii) Protein molecules and other biological molecules will also be studied. Aims • To present some of the computational challenges in structural biology; • To describe computational methods for analyzing and predicting macromolecular conformations and interactions; • To give practice in programming techniques for structural bioinformatics. • To give practice in the use of molecular graphics and modelling software; • To emphasise the relationship between macromolecular shape and function .
Some challenges in structural bioinformatics Given the sequence of a protein, what is its three-dimensional structure ? Given the three-dimensional structures of two macromolecules, will they associate with one another, and what will be the docking orientation ? Given the three-dimensional structure of a macromolecule, what can be inferred about its biological function? Given the three-dimensional structures of a set of proteins, what can be inferred about their evolutionary relationship ? Given the three-dimensional structure of a macromolecule, can a molecule be designed that will affect its function ?
Review of Protein Structure Basics Amino Acids Peptide Formation
Amino Acid Structure Although more than 300 different amino acids have been described in nature, only 20 are commonly found as constituents of mammalian proteins. [Note: These are the only amino acids that are coded for by DNA, the genetic material in the cell.] Each amino acid has a carboxyl group, A primary amino group (except for proline, which has a secondary amino group), Side chain (“R group”) In proteins, almost all of these carboxyl and amino groups are combined through peptide linkage In general, there are not available for chemical reaction except for hydrogen bond formation
Amino acids side chains Non polar side chains Each of these amino acids has a nonpolar side chain that does not gain or lose protons or participate in hydrogen ionic bond s. The side chains of these amino acids can be thought of as “oily” or lipid-like, a property that promotes hydrophobic interactions. Location of nonpolar amino acids in proteins In proteins found in aqueous solutions (a polar environment) the side chains of the nonpolar amino acids tend to cluster together in the interior of the protein. This phenomenon, known as the hydrophobic effect. The nonpolar R groups, thus, fill up the interior of the folded protein and help give it its three-dimensional shape. Located in a hydrophobic environment, such as a membrane The nonpolar R groups are found on the outside surface of the protein, interacting with the lipid environment.
Polar side chain These amino acids have zero net charge at physiologic pH The side chains of cysteine and tyrosine can lose a proton at an alkaline pH Serine, threonine, and tyrosine each contain a polar hydroxyl group that can participate in hydrogen bond formation The side chains of asparagine and glutamine each contain a carbonyl group and an amide group, both of which can also participate in hydrogen bonds.
Optical properties of amino acids The a-carbon of an amino acid is attached to four different chemical groups (asymmetric) and is, therefore, a chiral, or optically active carbon atom. Glycine is the exception because its a-carbon has two hydrogen substituents . Amino acids with an asymmetric center at the a-carbon can exist in two forms, designated D and L, that are mirror images. The two forms in each pair are termed stereoisomers , optical isomers, or enantiomers. D-amino acids are found in some antibiotics and in bacterial cell walls.
Lecture 2 Protein Structures
Protein Structure and Folding Primary Structure Secondary Structure Tertiary Structure Q ua ter n ary Str uc t u re and Symmetry Protein Stability Protein Folding
History Proteins were long thought to be colloids of random structure 1934, crystal of pepsin in X-ray beam produces discrete diffraction pattern -> atoms are ordered 1958 first X-ray structure solved, sperm whale myoglobin, no structural regularity observed Today, approx 50’000 structures solved => remarkable degree of structural regularity observed
Hierarc h y of Str uc t u ral La y ers Primary structure: amino acid sequence Secondary structure: local arrangement of peptide backbone Tertiary structure: three dimensional arrangement of all atoms, peptide backbone and amino acid side chains Quaternary structure: spatial arrangement of subunits
Protein Structure
Secondary Structure A ) T h e p l a n ar peptide gro u p l imits po ly peptide conformations The peptide group ha a rigid, planar structure as a consequence of resonance interactions that give the peptide bond ~40% double bond character
The most common regular secondary s tr uc t u res , He l ix a nd S h eet Two structures are widespread: Helix and Sheet = regular secondary structures
S heets are formed from extended chains like helix, sheets are stabilized by hydrogen bonding Bonding occurs between neighboring chains rather than within the structure as is the case with helix Parallel and antiparallel
Pleated appearance of a sheet From 2 to 22 strands Average 6 strands in protein Exhibit right-handed twist 7-stranded
B ovin e carbo x yp e p tid as e A Coils Helices -sheet
T u r n s co nn ect s ome u nits of secondary structure topology = connectivity of sheets can be complex or made by a simple turn/loop reverse turns or bends, occur at the protein surface, connect sheets
C) F ibro u s protei n s h a v e repeati n g secondary structure
Tertiary Structure Tertiary structure of a protein describes the folding of its secondary structural elements and specifies the position of every atom in the protein This information is deposited in protein structure database (PDB) Experimentally determined by X-ray crystallography or NMR
Protein crystals
A ) Most protei n s str uc t u res are determi n ed by X - ray crystallography or nuclear magnetic resonance Ray crystallography: technique that directly images molecules X-ray wavelength is short, ~1.5 Å, equivalent to distance of atoms (visible light 4000 Å) Crystal: repetitive arrangement of the same structure => diffraction pattern (darkness of spot is function of crystals electron density) X-ray interact with electrons (not with nuclei) -> X-ray structure is thus an electron density map of a given protein -> represents contours of atoms
A thin section through a 1.5 Å resolution electron de n sity map of a protein t h at is co n to u red in t h ree dimensions
Most protein crystal structures exhibit less than atomic resolution Crystal is build up by repeating units, containing protein in native conformation, highly hydrated (40-60% water) soft, jellylike consistency, unlike NaCl crystals molecules are slightly disordered and display Brownian motion -> this determines the resolution limit of a given protein crystal (typical 1.5 – 3 Å) inability to crystallize a protein to form crystals of sufficiently high resolution is a major limiting factor in structure determination
E l ectron de n sity maps of di k etopiperazi n e at different resolution levels Electron density map alone is not sufficient to determine the structure if the protein, Amino acid sequence is also required Computerized fitting algorithm of atoms into the experimentally determined electron density map results in protein structure determination of up to 0.1 Å resolution
Most crystallized proteins maintain their native conformations Key question: does the structure of protein in a crystal accurately reflect the structure of the protein in solution, where it normally functions ? The protein in the crystal is hydrated like it is in solution X-ray structure is similar to NMR structure, which is determined from proteins that are in solution Many enzymes remain catalytically active in the crytsal
Protein s tr uc t u res can be determi n ed by NMR Nuclear magnetic resonance, NMR, an atom nucleus resonates if a magnetic field is applied. This resonance is sensitive to the electronic environment of the nucleus and its interaction with nearby nuclei Developed since 1980, Kurt Wüthrich (ETH-Z), to determine protein structures in solution Because there are many nuclei in a protein that would crowd in a conventional one-dimensional NMR two- di mens i ona l (2 D ) NM R w a s d e v elope d to measure atomic distances of chemically linked atoms (COSY) or of spatially close atoms (NOESY) Size limit of about 40 kD, may reach 100 kD soon Dynamic, can follow protein motion or folding
B) Side chain location varies with polarity Since Kendrew solved the first protein structure, nearly 50’000 protein structures have been reported No two are the same, but they exhibit some remarkable consistency: globular structures lack the repeating sequences that support the conformation of fibrous proteins Amino acid side chains in globular proteins are distributed according to their polarity: Nonpolar residues Val, Leu, Ile, Met and Phe occur mostly in the interior of a protein, excluded from the contact with water, hydrophobic core, compact packing (no empty room) Charged amino acids Arg, His, Lys, Asp, Glu are located on the surface, never in hydrophobic core Uncharged polar groups Ser, Thr, Asn, Gln, Tyr are usually on the surface but are also found inside but then are hydrogen bonded
nonpola r polar Side chain locations in an helix and a sheet Surface of protein
Side c h ain distrib u tion in h orse h eart c y toc h rome c Hydrophilic amino acids Hydrophobic amino acids
C) Tertiary s tr uc t u res co n tain combi n atio n s of secondary structure Globular proteins are build from combinations of secondary structure elements These combinations of secondary structure elements form protein motifs = supersecondary structures
Most common is motif, helix connects two parallel strands of a sheet Equally common is a hairpin, antiparallel strands connected by tight reverse turn motif, two successive antiparallel helices packed against each other 4. Greek key motif, hairpin is folded over to form 4- stranded antiparallel sheet
Most proteins can be classified as , , or Secondary structural elements occur in globular proteins in varying proportions E. coli cytochrome b 562 for example consists only of helices => protein Immunoglobulins contain the immunoglobulin fold => proteins, contain large proportion of sheets Most proteins, including lactate dehydrogenase and carboxypeptidase A are proteins (average ~31% helix, 28% sheet) Further subdivision of proteins by their topology : that is connection of secondary structural elements
Selection of protein structures Cytochrome b 562 with heme Immunog l obu l in fragment Lactate dehydrogenase
Structures of barrels Human retinol binding protein Peptide Tri ose amidase F phosphate i s omer as e
Large polypeptides form domains Polypeptides of more than 200 amino acids usually fold into more than one domain in eukaryotes, prokaryotes can only fold mono-domain proteins -> bi or multilobal appearance Most domains consist of 40 to 200 Aa, average diameter of ~25 Å Many domains are structurally independent units that have the characteristic of globular proteins Individual domains often have specific function, i.e. binding of the dinucleotide NAD+ by nucleotide binding site: Rossmann fold: 2 ( )
D) Structure is conserved more than sequence Grouping structures into families of high similarity, 50’000 structures define 1’400 protein domain families 200 different folding patterns account for about half of all known structures the protein domain is the evolutionary unit, not its sequence comparison of c-type cytochromes
Q ua ter n ary Str uc t u re Many proteins, particularly those of > 100 kD consist of more than one polypeptide chain. Multi-subunits associate into defined structure = quaternary structure For example collagen, assembly of multiple subunits is easier than synthesizing one very large polypeptide chain…. Site of synthesis can differ from site of assembly Damaged components can be replaced Less genetic information required to encode self-assembling subunits Multi-subunit enzymes have multiple catalytic sites that can be co-regulated
Subunits usually associate noncovalently Multi-subunit protein may consist of identical or nonidentical subunits (homo-, heterooligomeric) oligomers, protomers Example: hemoglobin is a dimer of protomers Contact region between subunits resembles the interior of a single subunit protein: closely packed nonpolar residues, hydrogen bonding, interchain disulfide bridges, but generally less hydrophobic than the hydrophobic core of a single subunit protein (they are synthesized as monomers and need to be soluble each one before assembly…)
Quaternary structure of hemoglobin h e me
Subunits are symmetrically arranged In the majority of oligomeric proteins, the subunits are symmetrically arranged That is: protomers occupy geometrically equivalent positions No inversion or mirror symmetry because this would require D-amino acids Thus proteins can have only rotational symmetry Simples case, cyclic symmetry, single axis of rotation (2-,3-, 4-,n-fold). C 2 is most common Dihedral symmetry (D n ): n-fold rotational axis intersects with a twofold rotational axis. D 2 most common Tetrahedron, cube, and icosahedron, for example spherical viruses
Symmetries of oligomeric proteins Rotational symmetry
Symmetries of oligomeric proteins Dihedral symmetry
Symmetries of oligomeric proteins
4) Protein Stability Native proteins are only marginally stable under physiological conditions -> high turnover Free energy required for denaturation is ~0.4 kJ/mol/ residue -> fully folded 100 residue protein is ~40 kJ/mol more stable than its unfolded form = energy of 2 hydrogen bonds But energy of all noncovalent interactions within a protein is in the order of thousands of kJ/mol => native structure results from a delicate balance of powerful counteracting forces
A) Proteins are stabilized by several forces Protein structures are governed mainly by hydrophobic effects and to lesser extend by interactions between polar residues The hydrophobic effect causes nonpolar substances to minimize their contact with water (degree of order, entropy, of water is decreased because water has not to form “cages” around the hydrophobic groups) Relative hydropathy of residues: energy required to solubilize a given amino acid in water
Lecture 3 Biological Databases
Protein Databases
Biological Databases Biological databases are indispensable tools for an efficient and rational storage, accession, and dissemination of the huge amount of biological data. Biological databases collect information and data coming from: Literature Experimental analysis ( in vitro and in vivo ) Bioinformatics analysis ( in silico )
Biological Databases Need to collect and store biological data and its associated knowledge into databases. Fundamental into survival of science Each year, Nucleic Acid Research (NAR) journal dedicates an entire issue on the available databases.
What we expect from a database..!! Sequence, functional, structural information, related bibliography Well Structured and Indexed Well cross-referenced (with other databases) Periodically updated Tools for analysis and visualization
Types of Biological Databases Primary databases Secondary databases The nucleotide and protein databases are primary databases The information gathered from primary databases are summarized in secondary databases.
Standard contents of a sequence database Sequences Accession number References Taxonomic data Annotation/curation Keywords Cross-references Documentation
NCBI Very comprehensive biological database GENBANK: The nucleotide sequence database Provides 42 different resource Provides a simple and easy to use web interface http://www.ncbi.nlm.nih.gov/ Sequence submission: done using Bankit or Sequin Search Engine for data retrieval: Entrez Retrieves information across all the resources under NCBI Example: PubMed, taxonomy, SNP, PubChem etc.
Tools for analysis BLAST Primer-BLAST B-Link ORF finder Genome workbench
DDBJ Bioinformation and DDBJ Center provides sharing and analysis services for data from life science researches and advances science Provides freely available nucleotide sequence data
EMBL European Bioinformatics Institute (EMBL-EBI) is part of EMBL. Data resources Research Training Industry
Protein Sequence databases
UniprotKb /Swiss- Prot UniProtKB /Swiss- Prot is the expertly curated component of UniProtKB (produced by the UniProt consortium). It contains hundreds of thousands of protein descriptions, including function, domain structure, subcellular location, post-translational modifications and functionally characterized variants . 💡 UniProt is one of the most widely used protein information resources in the world.
Uniprot Uni versal Prot ein Resource , formed through the merger of : SIB EBI- SwissProt TrEMBL (Translated EMBL) PIR-PSD Features: Blast Align Retrieve ID mapping Entry names are often the names of the gene followed by the species. Accession numbers are of the following format: e.g. P26367 (PAX6_HUMAN)
PIR Protein Information Resource The scientific community with a single centralized authoritative resource for protein sequences and functional information.
Pfam Proteins contain conserved regions Based on the conserved regions, proteins are classified into families Provides links to external databases like PDB, SCOP, CATH etc. Features: Sequence search, Keyword search View Pfam family, View a sequence, View a structure
Prosite PROSITE consists of documentation entries describing protein domains, families and functional sites as well as associated patterns profiles to identify them PROSITE is complemented by ProRule A collection of rules based on profiles and patterns, which increases the discriminatory power of profiles and patterns by providing additional information about functionally and/or structurally critical amino acids
Structural databases
PDB – Protein Data Bank CATH SCOP – Structural Classification of Proteins
wwPDB Contains information about experimentally determined structures of proteins, nucleic acids, and complex assemblies RCSB-PDB, PDBe, PDBj, BMRB – repositories of protein structure data Files in PDB, mmCIF, PDBML/XML formats
Advanced search – provides comprehensive information about a protein. Sequence info, domain info, sequence similarity, literature, apart from the details of the structure. Cross referenced to SCOP and CATH
CATH Classification of proteins based on domain structures Each protein chopped into individual domains and assigned into homologous superfamilies. Hierarchial domain classification of PDB entries.
CATH hierarchy Class – derived from secondary structure content is assigned automatically Architecture – describes gross orientation of secondary structures, independent of connectivity Topology – clusters structures according to their topological connections and numbers of secondary structures Homologous superfamily – this level groups together protein domains which are thought to share a common ancestor and can therefore be described as homologous
SCOP Description of structural and evolutionary relationships between all the proteins with known structures Uses the PDB entries Search using keywords or PDB identifiers
Hierarchy in SCOP Class Fold Superfamily Family Species