I- Tasser

1,706 views 31 slides Aug 11, 2017
Slide 1
Slide 1 of 31
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27
Slide 28
28
Slide 29
29
Slide 30
30
Slide 31
31

About This Presentation

A simplified presentation on "I-Tasser"


Slide Content

(Iterative Threading ASSembly Refinement ) I -TASSER By: Animesh kumar M . Sc. (Bioinformatics) IASRI, New Delhi Mob: 8512800911

Background S tructure modeling processes often involve human interventions because the human-expert knowledge combined with biochemical information ( function, mutagenesis , catalytic residues, etc .) could help in both structural assembly and model selection. D evelopment of fully automated algorithms allows non-experts to generate structural models for their own sequences through Internet services . I-TASSER (as 'Zhang-Server') was ranked as the no. 1 server in recent CASP7 and CASP8 experiments.

The Zhang Lab On-line Service System contains : On-line Servers - [Folding, docking, design, domains etc ; some are downloadable] Bioinformatics Tools - [Alignment, image, clustering etc ; all are downloadable] Databases - [Ligand, GPCR, genome, decoy, potential, CASP etc; all are downloadable ]

Protein Structure and Function Prediction Services (folding, threading, potential, contact, torsion, docking etc.) II. Bioinformatics Tools (structure alignment, sequence alignment , 3D visualization , surface, and clustering, etc.) III. Databases and Potentials

Introduction : I-TASSER server is an Internet service for protein structure and function predictions. It allows academic users to automatically generate high-quality predictions of 3D structure and biological function of protein molecules from their amino acid sequences . Models are built based on multiple-threading alignments by LOMETS and iterative TASSER simulations . I-TASSER (as 'Zhang-Server') was ranked as the No 1 server in recent CASP7 and CASP8 experiments .

I-TASSER method I-TASSER is a hierarchical protein structure modeling approach. It is based on the Profile-Profile threading Alignment (PPA) and the iterative implementation of the Threading ASSEmbly Refinement (TASSER ) program. The target sequences are first threaded through a representative PDB structure library (with a pair-wise sequence identity cut-off of 70%) to search for the possible folds. It is done by four simple variants of PPA methods, with different combinations of the hidden Markov model, PSI-BLAST profiles, Needleman- Wunsch and Smith-Waterman alignment algorithms .

Threading aligned regions are used to reassemble full-length models while the threading unaligned regions (mainly loops) are built by ab initio modeling. conformational space is searched by replica-exchange Monte Carlo simulations. Structure trajectories are clustered by SPICKER and the cluster centroids are obtained by the averaging the coordinates of all clustered structures . Fragment assembly simulation t o rule out the steric clashes on the centroid structures and to refine the models further.

Spatial restraints are extracted from the centroids and the PDB structures searched by the structure alignment program TM-align. Finally, the structure decoys are clustered and the lowest energy structure in each cluster is selected, which has the Cα atoms and the side-chain centers of mass specified. Pulchra is used to add backbone atoms (N, C, O) and Scwrl_3.0 to build side-chain rotamers .

How does I-TASSER generate structure and function predictions? submission of an amino acid sequence server first tries to retrieve template proteins of similar folds (or super-secondary structures) from the PDB library by LOMETS, (a locally installed meta-threading approach). if fragments found then are reassembled into full-length models by replica-exchange Monte Carlo simulations by threading unaligned regions (mainly loops) through ab initio modeling. If not I-TASSER will build the whole structures by ab initio modeling. The low free-energy states are identified by SPICKER through clustering the simulation decoys .

Third step, the fragment assembly simulation is performed again starting from the SPICKER cluster centroids, where the spatial restrains collected from both the LOMETS templates and the PDB structures by TM-align are used to guide the simulations . The purpose of the second iteration is to remove the steric clash as well as to refine the global topology of the cluster centroids. The decoys generated in the second simulations are then clustered and the lowest energy structures are selected. The final full-atomic models are obtained by REMO which builds the atomic details from the selected I-TASSER decoys through the optimization of the hydrogen-bonding network.

Binding sites lomets

If any region with >80 residues has no aligned residues in at least two strong PPA alignments of Z-score > Z , the target will be judged as a multiple domain protein and domain boundaries are automatically assigned based on the borders of the large gaps . I-TASSER simulations will be run for the full chain as well as the separate domains . The final full-length models are generated by docking the model of domains together. The domain docking is performed by a quick Metropolis Monte Carlo simulation where the energy is defined as the RMSD of domain models to the full-chain model plus the reciprocal of the number of steric clashes between domains. The goal of the docking is to find the domain orientation that is closest to the I-TASSER full-chain model but has the minimum steric clashes. This procedure does not influence the multiple domain proteins which have all domains completely aligned by the PPAs .

Server setting Project name: I-TASSER server Home page: http ://zhanglab.ccmb.med.umich.edu/I-TASSER / I-TASSER Standalone Package (Version 4.0) Operating system(s): Windows, Linux, Mac Programming language: Perl, Fortran77 License: GPL Input: Amino acid sequence of the proteins(10–1,500 residues in FASTA format) Output: Email sent to the users, include the PDB format files of up to 5 predicted models, C-score of the models, and the predicted RMSD and Tm-score of the first model. A brief explanation of the RMSD, TM-score , and C-score

URL : http://zhanglab.ccmb.med.umich.edu/I-TASSER/

Running: Log on to I- Tasser web page. Copy & paste or directlt upload the a.a . sequence int the provided box also provide e-mail address and name of the job. There is also option for specifying the inter residue contact or distance restraints to exclude some tempelate . To submit click on “Run I- Tasser ” Check the status of submitted job by visiting the I- Tasser queue page. Click on search to find out the submitted job by providing the job-Id no. After the structure and modelling is finished a notification e-mail containing image of the predicted structure and web link will be sent. Click to view and download the result.

Structure analysis 1. Predicted secondary structure Displayed H as -helix, S for β -strand and C for coil. Also consider the confidence score for each residue. Look for region for long stretches of secondary structure to estimate the core region in the protein.

2. Predicted Solvent Accessibility To predict the buried and exposed region in the query. It is “0” for buried residue and “9” for residue. The region with solvent exposed and hydrophilic residue are potential hydration or functional site .

3. Top 5 Models predicted by I-TASSER There is predicted tertiary structure of protein in the interactive j-mol appellate. Left click to change the appearance of displayed structure (style, zoom, select etc.) C-score To analyze the quality of the predict. its range is -5 to 2. Higher score reflect a model of better quality.

TM score and RMSD standards for measuring structural similarity between query and tempelate used to measure the accuracy of structure modeling when the native structure is known TM-score >0.5 indicates a model of correct topology and a TM-score<0.17 means a random similarity 4. Top 10 templates used by I-TASSER Analyze sequence identity in the threading aligned region and for the whole chain to access the homology between query and template . Z-score Used to analyze the quality of threading alignment Z-score>1 reflect confident alignment and most likely to have the same fold as query protein

High sequence identity 1 in the threading aligned region as compared to whole chain to access the homology between query and template High sequence identity 2 represent the evolutionary relationship between the query and template. Colored residue shows the conserved residue or motif in the query protein and template protein . High sequence identity in the threading aligned region as compared to whole chain alignment indicate presence of conserved motif or domain in the sequence. Access the coverage of the alignment by inspecting the alignment. If the coverage of top alignment is low and confined to a small region of a query protein shows absence of long segment of query segment, which indicate query protein contain more than one domain. In this case it is suggested to split the sequence and model the domain individually .

5. Structural analog in PDB and Enzyme commission number prediction To determine the top 10 structural analog of the first predicted models as identified by the alignment program TM-align TM-score>0.5 indicate the detected homology and the model have a similar topology and used to determine structural class of protein family of query sequence TM-score<0.3 signify random structure. Analyze the sequence identity and RMSD to access the conservational special motif of the model and structural analog for that (see colored portion

Function prediction using COFACTOR EC number Gives the potential homologue of the query protein Confidence level is shown as EC score Cscore EC is the confidence score for the Enzyme Classification (EC) number prediction Cscore EC values range in between [ 0-1] A higher score indicates a more reliable EC number prediction . RMSDa is the RMSD between residues that are structurally aligned by TM-align while IDENa is the percentage sequence identity in the structurally aligned region . Cov . represents the coverage of global structural alignment and is equal to the number of structurally aligned residues divided by length of the query protein . If the EC score is very high then there is lack of consensus against identified hits, the prediction becomes less reliable then we go for gene ontology prediction .

Gene ontology (GO) term and protein- ligand binding site prediction CscoreGO , which is a combined measure for evaluating global and local similarity between query and template protein. CscoreGO values range in between [ 0-1] A higher value indicates a better confidence in predicting the function using the template . Each protein is associated with multiple GO term describing its molecular function, biological process and cellular location (each term is linked to respective amigo website and lineage) Analyze the Fh score(functional homology score) column to access the functional homology between query and template GO score >0.5 indicates a reliable prediction.

7 . Template proteins with similar binding site They are ranked based on the no. of predicted ligand conformation. They share a common binding pocket. The best indentified ligand are already displayed in the J- mol applet. Click on the other radio buttons to visualize predicted binding site and ligand interacting residues. CscoreLB is the confidence score of predicted binding site Its values range in between [ 0-1] A higher score indicates a more reliable ligand-binding site prediction. BS-score measure of local similarity (sequence & structure) between template binding site and predicted binding site in the query structure. BS-score >1 reflects a significant local match(structural similarity) between the predicted and template binding site.

Advantage Main advantage over the existing structural modelling method is the inherent structure fragment assembly approach which consistently drive the threading alignment close to the native state.

Thank you