BIOINFORMATICS- gene prediction programs
grail, augustus, genscan, hmmgene, mzef
with features, alogrithm, input, output, and species
Size: 117.43 KB
Language: en
Added: Sep 29, 2021
Slides: 10 pages
Slide Content
Gene Prediction Programs Submitted by: Mugdha Sharma Roll No.: 2185026 MSc. Biotechnology 3 rd Semester Submitted To: Ms. Ruchi Sachdeva
INDEX GENE PREDICTION PROGRAM Introduction 5 GENE PREDICTION PROGRAMS GRAIL AUGUTUS GENSCAN HMMGENE MZEF
Gene Prediction Programs Ab Intio Based Program Homology Based Program Consensus Based Program GOAL: to discriminate exons from noncoding sequences subsequently join the exons together in the correct order. ALOGRITHMS RELY: Gene signals- include gene start and stop sites and putative splice sites, poly-A sites . Gene content- coding statistics includes nonrandom nucleotide distribution, amino acid distribution, synonymous codon usage, and hexamer frequencies. ab initio programs make use of neural networks, HMMs, and discriminant analysis. It is based on the fact that exon structures and exon sequences of related species are highly conserved. Coding frames in a query sequence are translated and used to alig n with closest protein homologs found in databases, near perfectly matched regions can be used to reveal the exon boundaries in the query. DRAWBACKS- Reliance on the presence of homolog. GenomeScan = GENSCAN prediction + BLASTX These programs work by retaining common predictions greed by most programs and removing inconsistent predictions. GeneComber = HMMgene + GenScan prediction DIGIT = FGENESH + GENSCAN + HMMgene
GRAIL Gene Recognition and Assembly Internet Link It is a web-based program It is a tool to examine relationships between genes in different disease associated loci. FEATURES : The program is trained on several statistical features such as: Splice junctions, start and stop codons, poly-A sites, promoters, and CpG islands . ALGORITHM: Based on a neural network algorithm. INPUT: SNPs or Genomic regions OUTPUTS : Exon candidates (predicts possible exons, their positions, reading frame & scores) SPECIES : For human, mouse, Arabidopsis, Drosophila, and Escherichia coli sequence s
AUGUSTUS It is program that predicts genes in Eukaryotic Genomic Sequences. FEATURES: It can report a large number of alternative genes, including probabilities for the transcripts and each for exon and intron. It can predict alternate splicing and alternative transcripts. It can predict the 5’UTR and 3’UTR including intron. It can be used ab inito & as a flexible mechanisms for incorporating extrinsic information, e.g. from EST alignments & protein alignments. ALGORITHM: Based on Generalized Hidden Markov Model (GHMM) INPUT: The user can upload their sequences in FASTA format or paste them into a web form . The total length of the sequences submitted to the server is 3 million base pairs (max). OUTPUT: Consists of exon, intron, transcript and gene boundaries in the common General Feature Format (GFF) as well as predicted amino acid sequences and predicted coding sequences in FASTA format. SPECIES: Homo sapiens, Drosophila melanogaster , Arabidopsis thaliana, Brugia malayi,Tribolium castaneum
GENSCAN To identify complete gene structures in genomic DNA. FEATURES: It can be used to predict the location of genes and their exon-intron boundaries in genomic sequences from a variety of organisms. it has the ability to predict multiple genes and to deal with partial as well as complete genes. It has the ability to predict consistent sets of genes occurring on either or both strands of the DNA. ALGORITHM: Based on GHMM Program INPUT: DNA sequence or protein OUTPUT: Predicts peptides; predicts gene\exon together with corresponding predicted sequence, shows location of DNA strand of each predicted exon. SPECIES: Vertebrate, Arabidopsis, Maize.
HMMGENE HMM-based web program. This program is for prediction of genes in anonymous DNA. The program predicts whole gene, so the predicted exons always splice correctly. FEATURES: The unique feature of the program is that it uses a criterion called the conditional maximum likelihood to discriminate coding from noncoding features. ALGORITHM: The program is a hybrid algorithm that uses both ab initio-based and homology-based criteria. INPUT: DNA sequences OUTPUT: Prediction of partial or complete genes in the sequences. SPECIES: Vertebrates, C. elegans
MZEF Michael Zhang’s Exon Finder It was developed to help identify internal coding exons on human genomic DNA sequences. ALGORITHM : QDA for exon prediction. (QDA= quadratic discriminant analysis) INPUT: DNA sequence OUTPUT: Predicts internal coding regions on genomic DNA sequence. SPECIES: Human, mouse, Arabidopsis