Comparative genomics

prateekkumar100 8,000 views 19 slides Apr 27, 2016
Slide 1
Slide 1 of 19
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19

About This Presentation

comparative genomics containing databases and the understandable terms


Slide Content

Topic : COGs and Comparative Genomics Durdam M.Sc. Bioinformatics sem-2

Some important terminologies: Orthologs are genes in different species that evolved from a common ancestral gene by speciation. Normally, orthologs retain the same function in the course of evolution. Identification of orthologs is critical for reliable prediction of gene function in newly sequenced genomes. Paralogs are genes related by duplication within a genome. Orthologs retain the same function in the course of evolution, whereas paralogs evolve new functions, even if these are related to the original one. Speciation is the origin of a new species capable of making a living in a new way from the species from which it arose. As part of this process it has also acquired some barrier to genetic exchange with the parent species.

Speciation

COGs Cluster of orthologous genes. C lusters of O rthologous G roups, are groups of three or more ortholog genes , meaning they are direct evolutionary counter parts and are considered to be part of an 'ancient conserved domain'.  A COG is defined as three or more proteins from the genomes of distant species that are more similar to each other than to any other protein within the individual genome . COGs can be used to predict the function of homologous proteins in poorly studied species and can also be used to track the evolutionary divergence from a common ancestor, hence providing a powerful tool for functional annotation of uncharacterized proteins.   Important in comparative genomics studies

Application of COG The most straightforward application of the COGs is for the prediction of functions of individual proteins or protein sets, including those from newly completed genomes . NCBI provides a COG database that consists of 4,873 COGs that code for over 136,000 proteins from the genomes of 50 bacteria, 13 archaea and 3 unicellular eukaryotes.  This database uses completely sequenced genomes to classify proteins using the orthology concept. The COG database

What are some questions that comparative genomics can address? How has the organism evolved? What differentiates species? Which non-coding regions are important? Which genes are required for organisms to survive in a certain environment?

What is Comparative Genomics? It is the comparison of one genome to another. Genomics DNA (Gene) Functional Genomics Transcriptomics RNA Proteomics PROTEIN Metabolomics METABOLITE Transcription Translation Enzymatic reaction

Difference is in Scale and Direction One or several genes compared against all other known genes. Use genome to inform us about the entire organism. Use information from many genomes to learn more about the individual genes. Entire Genome compared to other entire genomes. Other “omics” Comparative

Comparative genomics Discover what lies hidden in genomic sequence by comparing sequence information. Main areas Whole genome alignment Gene prediction Regulatory element prediction Phylogenomics Pharmacogenetics

Comparative Genomics Comparative genomics is a powerful tool for identifying the features and dissecting the functions of genomes. The approach is based on selection for the gene or regulatory region constraining the evolution of the sequence. Comparison with other genomes has become an integral part of the analysis of the human genome sequence and is one of the most effective methods for identifying genes ( Batzoglou et al . , 2000; Roest Crollius et al . , 2000) Comparative genomics is a field of biological research in which the genomic features of different organisms are compared. The genomic features may include the DNA sequence, genes, gene order, regulatory sequences, and other genomic structural landmarks

Figure: Species tree of different organisms

Figure: Distribution and clustering of orthologous genes of Tulsi genome to other related plant genomes. a. Distribution of gene families among five plant genomes. Ocimum tenuiflorum ( Ote - green), Arabidopsis thaliana ( Ath – black rectangle), Oryza sativa ( Osa – red), Solanum lycopersicum (Sly – blue) and Mimulus guttatus ( Mgu – black circle). The numbers in the Venn diagram represent shared and unique gene families across these 5 species obtained by OrthoMCL . b . Horizontal stacked bar plot of all the genes in 23 different genomes. This figure shows ortholog group distribution in all 23 plant species including Tulsi . Each row represents a plant species - Physcomitrella patens ( Ppa ), Selaginella moellendorffii ( Smo ), Oryza sativa ( Osa ), Setaria italic (Sit), Zea mays ( Zma ), Sorghum bicolor ( Sbi ), Aquilegia caerulea ( Aca ), Ocimum tenuiflorum ( Ote ) , Mimulus guttatus ( Mgu ), Solanum lycopersicum (Sly), Solanum tuberosum (Stu), Vitis vinifera ( Vvi ), Eucalyptus grandis ( Egr ), Citrus sinensis ( Csi ), Theobroma cacao ( Tca ), Carica papaya ( Cpa ), Brassica rapa (Bra), Arabidopsis thaliana ( Ath ), Fragaria vesca ( Fve ), Prunus persica ( Ppe ), Glycine max ( Gma ), Medicago truncatula ( Mtr ), Populus trichocarpa ( Ptr ). The bar graph represents ortholog protein groups for that species subdivided into 22 categories depending on the degree of sharing with the other 22 plant species e.g., category 2 represents the number of orthologous groups that have representatives from the species of interest and from one more species out of the 23 species selected for the study

Background: Shortly after multiple genome sequences of bacteria, archae and unicellular eukaryotes became available, an attempt on such a classification was implemented in Cluster of Orthologous Groups of proteins (COGs). Rapid accumulation of genome sequences creates opportunities for refining COGs but also represents a challenge because of error amplification. Conclusion: The arCOGs provide a convenient, flexible framework for functional annotation of archael genomes, comparative genomics and evolutionary reconstructions. Genomic reconstructions suggest that the last common ancestor of archaea might have been (nearly) as advanced as the modern archael hyperthermophiles . For more info: ftp://ftp.ncbi.nih.gov/pub/koonin/arCOGs/ .

MBGD Database MBGD is a database for comparative analysis of completely sequenced microbial genomes, the number of which is now growing rapidly. The aim of MBGD is to facilitate comparative genomics from various points of view such as ortholog identification, paralog clustering, motif analysis and gene order comparison.

Conclusion The study of Cluster of Orthologous Genes play a vital role in the Comparative genomic studies.

References and links NCBI COGs database Chapter 22 of the NCBI handbook: The Clusters of Orthologous Groups (COGs) Database: Phylogenetic Classification of Proteins from Complete Genomes.  NCBI Bookshelf ID: NBK21101 . NCBI News Letter: Protein Families and Genome Evolution .  Published Feb 1998. http://homepage.usask.ca/~ctl271/857/def_homolog.shtml http:// biologydirect.biomedcentral.com/articles/10.1186/1745-6150-2-33 Nucleic Acids Res. 2015 Jan;43(Database issue):D261-9. doi : 10.1093/ nar /gku1223. Epub 2014 Nov 26 . http:// www.ncbi.nlm.nih.gov/pubmed/25428365 http://biologydirect.biomedcentral.com/articles/10.1186/1745-6150-2-33

THANK YOU
Tags