Presentation in INFO-703 Biological Data Management
Size: 9.74 MB
Language: en
Added: Jun 09, 2020
Slides: 10 pages
Slide Content
Enrichr a comprehensive gene set enrichment analysis web server INFO-703 Biological Data Management March 27 th, 2019 Presented by Thi Nguyen http://amp.pharm.mssm.edu/Enrichr/
EnrichR gene-set libraries 35 gene-set libraries: transcription, pathways, ontologies, diseases/drugs, cell types and misc. total = 31,026 gene-sets that completely cover human and mouse genome + proteome on average, each gene-set has ~ 350 genes and > 6 million connections between gene and term. gene frequencies for most gene-set libraries follow power law.
EnrichR gene-set libraries I. transcription category: link DEG with transcription factors: 1. ChIP -x Enrichment Analysis ( ChEA ) database 2. Position weight matrices (PWM) from TRANSFAC database 3. transcription factor target genes inferred from PWM 4. ENCODE transcription factor gene-set library 5. Histone modification extracted from from NIH Roadmap Epigenomics 6. microRNA gene set library from TargetScan
EnrichR gene-set libraries II. Pathway Category includes gene-set libraries from well-known databases: WikiPathways KEGG BioCarta Reactome and other libraries are created from their own resources: kinase enrichment analysis (KEA) PPI hubs CORUM complexes from IP-MS study mannually assembled lists of phosphoproteins from SILAC phosphoproteomics III. Ontology Category : contains gene-set libraries from 3 gene ontology threes and from the knockout mouse phenotypes ontology from MGI-MP browser (Jackson lab)
EnrichR gene-set libraries V. Cell type category : highly expressed genes from Mouse and Human Gene Atlases highly expressed genes from cancer cells from Cancer Cell Line Encyclopedia (CCLE) NCI-60 Cell line data set VI. Misc category: chromosome location ( MSigDB ) metabolites (HMDB) structural domains (PFAM and InterPro ) IV. Disease/drug category : CMAP database GeneSigDB MSigDB OMIM VirusMINT
3 methods to rank enrichment scores Fisher Exact test z-score of the deviation from the expected rank by Fisher Exact test 3. combined score that multiplies the log of p-value (Fisher exact test) by the z-score
Using EnrichR https:// bmcbioinformatics.biomedcentral.com /articles/10.1186/1471-2105-14-128
2016 NAR update 180, 184 annotated gene sets from 102 gene set libraries (GEO) new features: submit fuzzy sets upload BED files improve API visualization tool: clustergram different scoring scheme visualize overlap between Enrichr and other gene set libraries
2016 NAR update comparison of resources https:// www.ncbi.nlm.nih.gov / pmc /articles/PMC4987924/ User interface pros and cons There are many other gene set enrichment analysis tools that could be compared with Enrichr ; for example, some leading tools are Fidea ( 39 ), DAVID ( 13 ), WebGestalt ( 12 ), g:Profiler ( 12 ) and GSEA ( 40 ). The advantages of Enrichr over some of these tools are its comprehensiveness, ease of use and interactive visualization of the results. Enrichr is lacking some of the flexibility available with those other tools. For example, Enrichr merges human, mouse and rat genes, which has advantages and disadvantages. Enrichr does not have an ID conversion tool, which is highly desired by many users. Enrichr also does not have the ability to upload a background list, and it does not have implementation of parametric tests such as Gene Set Enrichment Analysis (GSEA) ( 40 ), Parametric Analysis of Gene set Enrichment (PAGE) ( 9 ), and our own Principal Angle Enrichment Analysis (PAEA) ( 41 ). These features are planned.