protein sequence analysis

RamikaSingla 14,363 views 30 slides Mar 15, 2019
Slide 1
Slide 1 of 30
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27
Slide 28
28
Slide 29
29
Slide 30
30

About This Presentation

By Ramika


Slide Content

KVA DAV COLLEGE FOR WOMEN KARNAL DEPARTMENT OF BIOTECHNOLOGY Presented by- Ramika MSC Biotechnology 1 st year 2399620006 PROTEIN SEQUENCE ANALYSIS

CONTENTS Introduction History Prepare the proteins for sequencing Sequencing methods N-terminal sequencing C-terminal sequencing DNA sequencing Protein mass spectrometry Bioinformatics tools

INTRODUCTION Protein: Polymer of amino acids Protein structure and function depends upon amino acid sequence. Protein Sequencing : Technique to find out amino acid sequences in protein. Imp for understanding cellular functions. Imp in targeting drugs to specific metabolic pathways.

HISTORY 1951 : The very first sequence of insulin protein were characterized by Fred Sanger. The method used in this study , which is called “SANGER METHOD” was a milestone in sequencing long strand molecule such as DNA. This method was eventually used in human genome project . 1969: Analysis of sequence of tRNA were used to infer residues interactions from corelated changes in nucleotide sequence, giving rise to tRNA secondary structure . 1970: Saul B.Needleman and Christain D.Wunsh published the first computer algorithm for aligning two sequences. 1977: Publication of first complete genome of bacteriophage.

Prepare the proteins for sequencing If the protein contains more than one polypeptide chain, the chains are separated and purified. Intrachain S--S (disulfide) cross-bridges between cysteine residues in the polypeptide chain are cleaved. If these disulfides are interchain linkages, then step 2 precedes step1. The amino acid composition of each polypeptide chain is determined. The N-terminal and C-terminal residues are identified. Each polypeptide chain is cleaved into smaller fragments. Sequence determination of peptide fragments. The overall amino acid sequence of the protein is reconstructed from the sequences in overlapping fragments. The positions of S--S cross-bridges formed between cysteine residues are located.

Separation of Polypeptide Chains: Subunit associations in multimeric proteins are typically maintained solely by noncovalent forces, and therefore most multimeric proteins can usually be dissociated by exposure to pH extremes, 8 M urea, 6 M guanidinium hydrochloride, or high salt concentrations. Cleavage of Disulfide Bridges: Oxidation of a disulfide by performic acid results in the formation of two equivalents of cysteic acid.

SEQUENCING METHODS N-terminal sequencing C-terminal sequencing Prediction from DNA sequence

N-TERMINAL SEQUENCING The N-terminal sequencing is done through: Sanger’s method Dansyl chloride method Edman’s degradation method

Sanger’s method Treat with DNFB to form a derivative of amino terminal amino acid. A cid hydrolysis. Extraction of DNP-derivative with organic solvent. Identification of DNP-derivative by chromatography and comparison with standards.

Dansyl chloride method Reagent:1-dimethyl aminophthalene-5-sulfonyl chloride ( dansyl chloride) Dansyl polypeptide chain is prepared. Acidic hydrolysis liberates all amino acid and N terminal dansyl amino acid. Amino acids are separated. Fluorescence of dansyl amino acid is detected. Types of amino acid is obtained from comparison with standard dansylated amino acids .

Edman ‘s degradation method Principle : It sequentially remove one residue at a time from amino end of a peptide. Mechanism : Phenyl isothiocyanate is reacted with uncharged N-terminal amino group to form phenylthiocarbamoyl derivative. Then under acidic conditions it is cleaved to form thiazolinone derivative. This thiazolinone derivative is extracted into organic solvent and treated with acid to form more stable phenylthiohydantoin that can be identified using chromatography .

C-TERMINAL SEQUENCING Add carboxypeptidases to a solution of protein. Take sample at regular intervals. Determine the terminal amino acid by analyzing a plot of amino acid concentration against time .

DNA Sequencing Protein sequence can also be determined indirectly from mRNa Design primers from the amino acid sequene and amplify the gene. Sequence the gene and determine the amino acid sequence of proteins.

MASS SPECTROMETRY It is an important method for accurate mass determination and characterization of protein . Basic Principle : This technique basically studies the effect of ionizing energy on molecules . It depends upon chemical reactions in the gas phase in which sample molecules are consumed during the formation of ionic and neutral species. Components: The instrument consists of three major components: Ion source : For producing gaseous ions from the substance being studied. Analyzer : For resolving the ions into their characteristics mass components according to their mass to charge ratio. Detector system : For detecting the ions and recording the relative abundance of each of resolved ionic species.

BIOINFORMATICS TOOLS Bioinformatics : The collection, classification ,storage and analysis of biochemical and biological information using computers especially as applied to moleculer genetics and genomics. It is an interdisciplinary field that develops method and software tools for understanding biological data. It combines biology, computer, science, information engineering, mathematics and statistics to analyze and interpret biological data.

MASTER LAYOUT FOR PROTEIN SEQUENCING

On the basis of number of comparing sequencing strand, it is of two types: Pairwise alignment Multiple alignment Types

Pairwise Sequence Alignment Pairwise sequences alignment only compares two sequences at a time. a b a c d a b _ c d Optimality is based on SCORE. A pairwise alignment consist of series of paired bases, one base from each sequence. There are three types of pairs: Matches : the same nucleotide appears in both sequence. Mismatches : different nucleotides are found in two sequences. Gaps: a base in one sequence and null base in the other.

ALGORITHM used are Needleman- Wunsh algorithm and the Smith-Waterman algorithm. BLAST ( B asic L ocal A lignment S earch T ool) BLAST encompasses many different implementations and enhancements to a search algorithm that finds “ High Scoring Pairs ” of sequence alignment in databases. It is a Fast way to find similar sequences. It is not the most sensitive way to search. It is by a wide margin the most commonly used tool in bioinformatics .

BLAST Steps Seeding: Prepare a list of short, fixed length segments from the query. Searching : Find highly similar or exact match for each word. Extension: Extend each match to a longer match. Evaluation: Evaluation the results using E values .

Multiple Sequence Alignment Multiple Sequence Alignment can be seen as a generalization of Pairwise Sequence Alignment . Instead of aligning just two sequences , three or more sequences are aligned simultaneously. a b a c d a b _ c d x b a c e MSA is used for: Detection of conserved domains in a group of genes or proteins. Construction of a phylogenetic tree. Prediction of protein structure. Determination of consensus sequences.

CLUSTAL A popular heuristic algorithm is CLUSTAL, by Des Higgins and Paul Sharp (1988) CLUSTAL makes a global multiple alignment using a “progressive alignment” approach. First computes all pairwise alignments and calculates sequence similarity between pairs. These similarities are used to build a rough guide tree.

Basic Information Comes From Sequence One sequence -can get some information eg -amino acid properties. More than one sequence- get more info on conserved residues , fold and function. Multiple alignments of related sequence- can build up consensus sequences of known families , domains , motifs or sites. Sequence alignments can give information on loops, families and function from conserved regions.

APPLICATIONS OF PROTEIN SEQUENCING Recombinant protein synthesis. Drugs production. Antibiotic production. Functional genomics. Determination of protein folding patterns. In bioinformatics. It plays vital role in proteomics. Used for the prediction of final structure, function and location of protein. To find out location of gene coding for that protein. Genetic diseases. Identification of sequence differences and variations such as point mutations. Revealing the evolution and genetic diversity of sequence and organisms.

THANK YOU
Tags