The Gene Ontology & Gene Ontology Annotation resources

mcourtot 6,974 views 36 slides Mar 17, 2016
Slide 1
Slide 1 of 36
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27
Slide 28
28
Slide 29
29
Slide 30
30
Slide 31
31
Slide 32
32
Slide 33
33
Slide 34
34
Slide 35
35
Slide 36
36

About This Presentation

An introduction to the GO and GOA, with practical examples of use.


Slide Content

The Gene Ontology and Gene Ontology Annotation resources Mélanie Courtot, Ph.D. EMBL-EBI GO/GOA Project leader SPOT/UniProt content teams [email protected] Industry workshop March 17 2016

In 1999, collaboration between 3 Model Organism Databases Ashburner et al., Nat Genet. 2000 May;25(1):25-9.

A way to capture biological knowledge for individual gene products in a written and computable form A set of concepts and their relationships to each other arranged as a hierarchy http:// www.ebi.ac.uk / QuickGO Less specific concepts More specific concepts The Gene Ontology

1. Molecular Function An elemental activity or task or job protein kinase activity insulin receptor activity 3. Cellular Component Where a gene product is located mitochondrion mitochondrial matrix mitochondrial inner membrane 2. Biological Process A commonly recognized series of events cell division

Provide a public resource of data and tools Annotate gene products using ontology terms Develop the ontology Aims of the GO project

Develop the ontology An OWL ontology of >41,000 classes biological process, cellular component, molecular function > 14,000 imported classes (CL, Uberon , ChEBI , NCBI_tax ) > 136,000 logical axioms, including: ~72,000 subClassOf axioms between named GO classes ~41,000 simple existential restrictions ( subClassOf R some C ) EL expressivity => fast, scalable reasoning (with ELK) https:// www.cs.ox.ac.uk / isg /tools/ELK/

Building the GO The GO editorial team Submission via GitHub , https://github.com/geneontology / Submissions via TermGenie , http:// go.termgenie.org ~80% terms are now created this way

Annotate gene products gene -> GO term associated genes GO Database genome and protein databases

…a statement that a gene product; P00505 Accession Name GO ID GO term name Reference Evidence code IDA PMID:2731362 aspartate transaminase activity GO:0004069 GOT2 A GO annotation is …

…a statement that a gene product; 1. has a particular molecular function or is involved in a particular biological process or is located within a certain cellular component A GO annotation is … P00505 Accession Name GO ID GO term name Reference Evidence code IDA PMID:2731362 aspartate transaminase activity GO:0004069 GOT2

…a statement that a gene product; 1. has a particular molecular function or is involved in a particular biological process or is located within a certain cellular component 2. as described in a particular reference A GO annotation is … P00505 Accession Name GO ID GO term name Reference Evidence code IDA PMID:2731362 aspartate transaminase activity GO:0004069 GOT2

…a statement that a gene product; 1. has a particular molecular function or is involved in a particular biological process or is located within a certain cellular component 2. as described in a particular reference 3. as determined by a particular method A GO annotation is … P00505 Accession Name GO ID GO term name Reference Evidence code IDA PMID:2731362 aspartate transaminase activity GO:0004069 GOT2

Experimental data Computational analysis Author statements/ curator inference ( + Inferred from electronic annotations) http:// www.evidenceontology.org / Tracking provenance

Manual annotations Time -consuming process producing lower numbers of annotations (~2,800 taxons covered) More specific GO terms Manual annotation is essential for creating predictions Aleksandra Shypitsyna Elena Speretta Alex Holmes Tony Sawford

Electronic Annotations Quick way of producing large numbers of annotations Annotations use less-specific GO terms Only source of annotation for ~438,000 non-model organism species orthology taxon constraints

* Includes manual annotations integrated from external model organism and specialist groups 2,752,604 Manual annotations* 269,207,317 Electronic annotations Provide a public resource of data and tools Number of annotations in UniProt-GOA database (March 2016) http://www.ebi.ac.uk/GOA https:// www.ebi.ac.uk / QuickGO /

Enrichment analysis Sample Reference 40% 2 0% 2 0% 2 0% => The sample is over-enriched for

Spinocerebellar ataxia type 28 Paola Roncaglia

Novel biomarkers of rectal radiotherapy

Biomarker for diagnosis and prognosis

Gene expression changes in diabetes

Improved network analysis

25

Many gene products are associated with a large number of descriptive, leaf GO nodes: GO slims

…however annotations can be mapped up to a smaller set of parent GO terms: GO slims

Slim generation for industry Collaboration funded by Roche Need a custom GO slim for analysis of genesets of interest Need to be descriptive enough Without redundancy Internal proprietary vocabulary – hard to maintain Desire to automatically map to GO http://www.swat4ls.org/ wp -content/uploads/2015/10/SWAT4LS_2015_paper_44.pdf

ROCHE CV GSEA with full GO GSEA with Roche CV Courtesy Laura Badi

Mapping query: participant_OR_reg _ participant some cannabinoid Description : “A process in which a cannabinoid participates, or that regulates a process in which a cannabinoid participates.”

Results We have successfully mapped 84% of terms from RCV (308/365) to OWL queries that can be used to replicate some proportion of the original manual mapping. In addition, these queries find 1000s of terms that were missed in the original mapping. David Osumi -Sutherland

GO SLIM (generic)

ROCHE CV – MANUAL ONLY

ROCHE CV MANUAL + AUTO

Acknowledgements GO editors and developers GO annotators The Gene Ontology (GO) Consortium Samples, Phenotype and Ontology team (Helen Parkinson) Protein Function Content team (Claire O’Donovan) Funding : EMBL-EBI, National Human Genome Research Institute (NHGRI )

Useful links Ontology browser : http:// www.ebi.ac.uk / ols /beta/ontologies/go Browsing GO & annotations, GO slims: https:// www.ebi.ac.uk / QuickGO / GO Annotation: http:// www.ebi.ac.uk /GOA EBI-Roche collaboration paper: http://www.swat4ls.org/wp-content/uploads/2015/10/SWAT4LS_2015_paper_44.pdf Contact: [email protected]