The Gene Ontology & Gene Ontology Annotation resources
mcourtot
6,974 views
36 slides
Mar 17, 2016
Slide 1 of 36
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
About This Presentation
An introduction to the GO and GOA, with practical examples of use.
Size: 23.52 MB
Language: en
Added: Mar 17, 2016
Slides: 36 pages
Slide Content
The Gene Ontology and Gene Ontology Annotation resources Mélanie Courtot, Ph.D. EMBL-EBI GO/GOA Project leader SPOT/UniProt content teams [email protected] Industry workshop March 17 2016
In 1999, collaboration between 3 Model Organism Databases Ashburner et al., Nat Genet. 2000 May;25(1):25-9.
A way to capture biological knowledge for individual gene products in a written and computable form A set of concepts and their relationships to each other arranged as a hierarchy http:// www.ebi.ac.uk / QuickGO Less specific concepts More specific concepts The Gene Ontology
1. Molecular Function An elemental activity or task or job protein kinase activity insulin receptor activity 3. Cellular Component Where a gene product is located mitochondrion mitochondrial matrix mitochondrial inner membrane 2. Biological Process A commonly recognized series of events cell division
Provide a public resource of data and tools Annotate gene products using ontology terms Develop the ontology Aims of the GO project
Develop the ontology An OWL ontology of >41,000 classes biological process, cellular component, molecular function > 14,000 imported classes (CL, Uberon , ChEBI , NCBI_tax ) > 136,000 logical axioms, including: ~72,000 subClassOf axioms between named GO classes ~41,000 simple existential restrictions ( subClassOf R some C ) EL expressivity => fast, scalable reasoning (with ELK) https:// www.cs.ox.ac.uk / isg /tools/ELK/
Building the GO The GO editorial team Submission via GitHub , https://github.com/geneontology / Submissions via TermGenie , http:// go.termgenie.org ~80% terms are now created this way
Annotate gene products gene -> GO term associated genes GO Database genome and protein databases
…a statement that a gene product; P00505 Accession Name GO ID GO term name Reference Evidence code IDA PMID:2731362 aspartate transaminase activity GO:0004069 GOT2 A GO annotation is …
…a statement that a gene product; 1. has a particular molecular function or is involved in a particular biological process or is located within a certain cellular component A GO annotation is … P00505 Accession Name GO ID GO term name Reference Evidence code IDA PMID:2731362 aspartate transaminase activity GO:0004069 GOT2
…a statement that a gene product; 1. has a particular molecular function or is involved in a particular biological process or is located within a certain cellular component 2. as described in a particular reference A GO annotation is … P00505 Accession Name GO ID GO term name Reference Evidence code IDA PMID:2731362 aspartate transaminase activity GO:0004069 GOT2
…a statement that a gene product; 1. has a particular molecular function or is involved in a particular biological process or is located within a certain cellular component 2. as described in a particular reference 3. as determined by a particular method A GO annotation is … P00505 Accession Name GO ID GO term name Reference Evidence code IDA PMID:2731362 aspartate transaminase activity GO:0004069 GOT2
Experimental data Computational analysis Author statements/ curator inference ( + Inferred from electronic annotations) http:// www.evidenceontology.org / Tracking provenance
Manual annotations Time -consuming process producing lower numbers of annotations (~2,800 taxons covered) More specific GO terms Manual annotation is essential for creating predictions Aleksandra Shypitsyna Elena Speretta Alex Holmes Tony Sawford
Electronic Annotations Quick way of producing large numbers of annotations Annotations use less-specific GO terms Only source of annotation for ~438,000 non-model organism species orthology taxon constraints
* Includes manual annotations integrated from external model organism and specialist groups 2,752,604 Manual annotations* 269,207,317 Electronic annotations Provide a public resource of data and tools Number of annotations in UniProt-GOA database (March 2016) http://www.ebi.ac.uk/GOA https:// www.ebi.ac.uk / QuickGO /
Enrichment analysis Sample Reference 40% 2 0% 2 0% 2 0% => The sample is over-enriched for
Spinocerebellar ataxia type 28 Paola Roncaglia
Novel biomarkers of rectal radiotherapy
Biomarker for diagnosis and prognosis
Gene expression changes in diabetes
Improved network analysis
25
Many gene products are associated with a large number of descriptive, leaf GO nodes: GO slims
…however annotations can be mapped up to a smaller set of parent GO terms: GO slims
Slim generation for industry Collaboration funded by Roche Need a custom GO slim for analysis of genesets of interest Need to be descriptive enough Without redundancy Internal proprietary vocabulary – hard to maintain Desire to automatically map to GO http://www.swat4ls.org/ wp -content/uploads/2015/10/SWAT4LS_2015_paper_44.pdf
ROCHE CV GSEA with full GO GSEA with Roche CV Courtesy Laura Badi
Mapping query: participant_OR_reg _ participant some cannabinoid Description : “A process in which a cannabinoid participates, or that regulates a process in which a cannabinoid participates.”
Results We have successfully mapped 84% of terms from RCV (308/365) to OWL queries that can be used to replicate some proportion of the original manual mapping. In addition, these queries find 1000s of terms that were missed in the original mapping. David Osumi -Sutherland
GO SLIM (generic)
ROCHE CV – MANUAL ONLY
ROCHE CV MANUAL + AUTO
Acknowledgements GO editors and developers GO annotators The Gene Ontology (GO) Consortium Samples, Phenotype and Ontology team (Helen Parkinson) Protein Function Content team (Claire O’Donovan) Funding : EMBL-EBI, National Human Genome Research Institute (NHGRI )