prediction methods for ORF

karamveer37 8,099 views 21 slides Nov 09, 2015
Slide 1
Slide 1 of 21
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21

About This Presentation

prediction methods for ORF. homology based method, ab initio based . this slide is fully content with prediction methods for ORF.


Slide Content

METHODS FOR ORF PREDICTION BY:- BY:- KARAMVEER M.Sc. LIFE SCIENCES WITH SPECIALISATION BIOINFORMATICS (2015-17) WEL-COME

What is gene prediction ? From a genomic DNA sequence we want to predict the regions that will encode for a protein: the genes . • Gene finding is about detecting these coding regions and infer the gene structure starting from genomic DNA sequences . an   open reading frame  (ORF) is the part of a  reading frame  that has the potential to code for a protein or peptide. An ORF is a continuous stretch of  codons  that do not contain a  stop codon • We need to distinguish coding from non-coding regions using properties specific to each type of DNA region. • Gene finding is not an easy task ! • DNA sequence signals have low information content. • It is difficult to discriminate real signals from noise (degenerated and highly unspecific signals); • Gene structure can be complex (sparse exons, alternative splicing, ...); • DNA signals may vary in different organisms.

Gene components

Identifying ORFs Simple 1 st step in gene findings. Translate genomic sequence in six frames. Identify stop codon in each frame. Regions without stop codons are called “open reading frames” or ORFs. Locate and tag all of the likely ORFs in a sequence. The longest ORF from a methionine codon is a good prediction of a protein encoding sequence.

NCBI ORF finder The ORF finder is a graphical analysis tool which finds all open reading of a selectable minimum size in a user’s sequence or in a sequence already in the database. This tool identifies all open reading frames using the standard genetic codes. The deduced amino acid sequence can be saved in various format and searched against the sequence database using the blast server. The orf finder should be helpful in preparing complete and accurate sequence.

Current gene prediction methods

Homology based method Based on sequence similarity of query sequence with annotated genes present in databases. Given a database of sequences of other organism. Search for query sequence in this database . Identify database sequence (known genes) that resemble the query sequence. If the identified sequences are genes , the query sequence is probably (putatively) a gene.

BLAST Basic local alignment search tool. Well known search tool in this category. Strengths:- able to identify biologically relevant genes. Accuracy weakness:- Could not identify genes that code for protein , not present in database. Only 50% genes can be found by homology to other known genes or proteins.

Homology methods: Genewise Uses HMMs to compare DNA sequences to protein sequences at the level of its conceptual translation, regardless of sequencing errors and introns. • Principle: • The exon model used in genewise is a HMM with 3 base states (match, insert, delete) with the addition of more transitions between states to consider frame-shifts. • Intron states have been added to the base model. • Genewise directly compare HMM-profiles of proteins or domains to the gene structure HMM model. • Genewise is a powerful tool, but time consuming. • Requires strong similarities (>70% identity) to produce good predictions. • Genewise is part of the Wise2 package: http://www.ebi.ac.uk/Wise2/.

AB initio method Computational prediction that use most elementary information. Can predict both eukaryotic and prokaryotic genes. Predict genes based on the given sequence alone. It works on two major features associated with genes:- Gene signals Gene content

Methods for signal detection • Hidden Markov Models (HMMs ):- • HMMs use a probabilistic framework to infer the probability that a sequence correspond to a real signal. • Neural Networks (NNs): • NNs are trained with positive and negative examples. NNs ”discover” the features that distinguish the two sets . . The gene structure information is separated into several classes of features such as hexamer frequencies, splice sites, and GC composition. Example: NN for acceptor sites, the perceptron, ( Horton and Kanehisa, 1992 )

Ab initio methods: GRAIL Neural network recognizing coding potential • Incorporates genomic context information (splice junctions, start and stop codons , poly-A signals) • Not appropriate for sequences without genomic context • http://compbio.ornl.gov • Human, Mouse, Drosophila, Arabidopsis, and E. coli

Performance Evaluation accuracy of a prediction program can be evaluated using parameters such as sensitivity and specificity . To describe the concept of sensitivity and specificity accurately, four features are used :- true positive (TP), which is a correctly predicted feature; false positive (FP), which is an incorrectly predicted feature; false negative (FN), which is a missed feature; and true negative (TN), which is the correctly predicted absence of a feature.

C onclusion

Reference https:// www.google.co.in/search?q=gene+components&biw=1366&bih=623&source=lnms&tbm=isch&sa=X&sqi=2&ved=0CAYQ_AUoAWoVChMIld-fy7_4yAIVwh-UCh1dfwEb#tbm=isch&q=rbs+in+prokaryotic+gene&imgrc=p4VQkhXIIG_DsM%3A . http://www.aun.edu.eg/molecular_biology/Procedure%20Bioinformatics22.23-4-2015/Xiong%20-% 20Essential%20Bioinformatics%20send%20by%20Amira.pdf .

Thank You…