My work at ISI during the summer, its relation with wf4Ever and next steps in my research
Size: 2.72 MB
Language: en
Added: Oct 13, 2011
Slides: 16 pages
Slide Content
Date: 13/10/2011 Work at ISI, relation with wf4Ever, future steps Daniel Garijo Verdejo , Yolanda Gil Ontology Engineering Group. Laboratorio de Inteligencia Artificial Departamento de Inteligencia Artificial Facultad de Informática Universidad Politécnica de Madrid
1 The TB Drugome
3 Project goals Text: Narrative of method, software packages used Software: scripted codes + manual steps + notes/emails Workflow: Workflow/scripts describing dataflow, codes, and parameters Data: Key datasets and figures/plots Typical Published Article Text: Narrative of method, software packages used Data: Key datasets and figures/plots Reproducible Article: Weaver, GenePattern GRRD, etc. NOT published, loosely recorded:
4 Problem with existing approaches Only executable workflow is published: Must have the same codes to re-execute the workflow, but: Codes become unavailable Eg : eHits was proprietary and replaced by AutodockVina Different labs prefer different codes Eg : R vs Matlab Eg : viz in Citoscape vs yEd Must have the same workflow framework to re-execute the workflow Must have R for Weaver Must import files to local file system and workflow framework Must import bundle of workflow/data/code files to reproduce Workflow: Workflow/scripts describing dataflow, codes, and parameters Text: Narrative of method, software packages used Data: Key datasets and figures/plots Reproducible Article: Weaver, GenePattern GRRD, etc.
5 Key Features of our approach Publish an abstract workflow in addition to executable w. Description of workflow that is independent of the codes executed Maps to the codes executed (the “ executable workflow ” ) Publish both abstract and executable workflow using the OPM standard OPM (Open Provenance Model) is independent of workflow framework and is widely implemented Other groups can import to their own workflow framework Publish data and workflows as Linked Data on the Web All workflows and related files are web-accessible Simple mechanism to share across local file systems
6 High level architecture Interactive Browsing (Pubby frontend) Programatic access (external apps) Wings workflow generation OPM conversion Publication Share Reuse Core Portal WINGS on local laptop Workflow Template Workflow Instance OPM export Core Portal WINGS on shared host Workflow Template Workflow Instance OPM export Core Portal WINGS on web server Workflow Template Workflow Instance OPM export Linked Data Publication Users Other workflow environments
7 High level architecture (2) RDF Triple store Permanent web-accessible file store RDF Upload Interface SPARQL Endpoint Linked Data publication Abstract Workflow (OPM) ExecutableWorkflow (OPM) Web accessible Workflow Data, Components, etc. Needed if workflow was developed in local host instead of a public server OPM export Other workflow frameworks OPM import Wings Web browser ISI web servers (http://wings.isi.edu/…) Amazon EC2 cloud (http://ec2-184-72-160-64.…)
8 Executable and abstract workflow
9 OPMV extended model account account account Abstract template Node Workflow template Input artifact1 Input artifact2 Output artifact1 Abstract component Execution Node Execution Input1 Execution Input2 Execution result Specific component Execution account Workflow Template Execution Results user account account hasArtifact hasArtifact hasWorkflowTemplate hasArtifactTemplate hasProcessTemplate hasArtifactTemplate hasArtifactTemplate subClassOf wasGeneratedBy wasGeneratedBy used used used used wasControlledBy hasSpecificComponent hasAbstractComponent hasProcess Process Artifact Artifact Artifact Agent Account OPM Graph Process Artifact Artifact Artifact Red: OPM model Black: OPM profile (extension)
10 Reproducibility 3 perspectives: Reproducibility by an expert Basic reproducibility by non-experts Reproducibility by students from text only Or, not reproducible at all
11 Reproducibility Maps Comparison of ligand binding sites using SMAP Comparison of dissimilar protein structures using FATCAT Docking using eHits/AutodockVina
12 Reproducibility maps: accessing the scripts and intermediate data
13 How can we use this in Wf4Ever ? The abstract workflow notion can be reused and imported to the workflows used in RO’s. Complement to the workflow, to understand it better. Allows tackling incomplete provenance. Additional workflow repository for recommendation OPM (Open Provenance Model) is independent of workflow framework and is widely implemented ( Taverna has a OPM export too) Other groups can import to their own workflow framework Workflow integration with WINGS. Semantic annotation of workflows. Distributed workflow execution engine
14 Next steps Keep working on workflow abstraction. Research on compatibility with problem solving methods (PSMs). Create an OPMV/W3C PROV-O profile for common workflow representation. Interoperability between workflow systems ( Taverna ). Work in workflows in different domains. Biology, Astronomy. Workflow reuse between different domains?
15 References The TB Drugome paper: http://funsite.sdsc.edu/drugome/TB/ OPMO + OPMV mapped version: http://openprovenance.org/model/opmo WINGS workflow system: http://seagull.isi.edu/marbles/ TB Drugome Wiki (Evolution of the work): http://seagull.isi.edu/wings-drugome/index.php/Main_Page Thanks to Yolanda Gil for letting me borrowing some of the Slides based on USCD slides for this presentation.
Date: 03/10/2011 Daniel Garijo Verdejo Ontology Engineering Group. Laboratorio de Inteligencia Artificial Departamento de Inteligencia Artificial Facultad de Informática Universidad Politécnica de Madrid Work at ISI, relation with wf4Ever, future steps