Difference between revisions of "Provenance challenge"
Line 12: | Line 12: | ||
<b>Module definition</b> is a description of a processor that takes inputs and generates outputs. | <b>Module definition</b> is a description of a processor that takes inputs and generates outputs. | ||
<b>Workflow definition</b> is a description of a workflow that contains modules and connections between | <b>Workflow definition</b> is a description of a workflow that contains modules and connections between them through ports. In the case of VisTrails, it also contains the evolution of the workflow through a parent relation. | ||
<b>Execution log</b> is the information about a workflow execution. It contains information about the processors that were executed and the data items that were created. | <b>Execution log</b> is the information about a workflow execution. It contains information about the processors that were executed and the data items that were created. |
Revision as of 09:11, 13 April 2007
Second provenance challenge design overview
This page describes the implementation of how to answer the queries of the second provenance challenge.
The goal of this project is to create an api capable of querying different kinds of databases containing provenance data. The main focus will be on provenance generated by scientific workflows.
data model overview
This is a description of the data model that i am trying to implement.
Module definition is a description of a processor that takes inputs and generates outputs.
Workflow definition is a description of a workflow that contains modules and connections between them through ports. In the case of VisTrails, it also contains the evolution of the workflow through a parent relation.
Execution log is the information about a workflow execution. It contains information about the processors that were executed and the data items that were created.
primitives
The api will deal with the basic primitives describing workflow executions.
node types:
dataitem | a dataitem that is input/output to a module execution |
module | the module/service that is to be executed |
moduleInstance | the module as represented in a workflow |
moduleExecution | the execution of a module |
workflow | a description of a process containing modules and connections |
workflowExecution | the representation of a workflow execution |
inputPort | represents a specific port thas can be assigned an input to a module execution |
outputPort | represents a specific port thas can contain a product of a module execution |
connection | represents a connection between module Instances |
Relations
Relation | Input | Output | |
exists | all | boolean | |
equals | all | boolean | |
annotations | all | dict of key/value pairs | |
getInputPortForData dataItem inputPort getOutputPortForData dataItem outputPort getDataFromInputPort inputPort dataItem getDataFromOutputPort outputPort dataItem
hasInputPort moduleInstance inputPort inputPortOf inputPort moduleInstance hasOutputPort moduleInstance outputPort outputPortOf outputPort moduleInstance
outputOf dataItem moduleExecution
inputOf dataItem moduleExecution
hasOutput moduleExecution dataItem
hasInput moduleExecution dataItem
startTime moduleExecution time endTime moduleExecution time startTime workflowExecution time endTime workflowExecution time
executionOf moduleExecution moduleInstance executionOf workflowExecution workflowInstance
hasExecution moduleInstance moduleExecution hasExecution workflowInstance workflowExecution
executions workflowExecution moduleExecution executedIn moduleExecution workflowExecution
inWorkflow moduleInstance workflow hasModule workflow moduleInstance
connectedTo inputPort outputPort connectedTo outputPort inputPort
runsModule moduleInstance module hasInstance module moduleInstance
derived relations: (might be native)
derivedFrom dataItem dataItem derivedData dataItem dataItem previousModuleExecution moduleExecution moduleExecution
transitive relations:
datatype relation
upstreams:
dataitem derivedFrom - .outputOf()[forall].hasInput() moduleInstance prevModuleInstance - .hasInputPort()[forall].connectedTo().outputPortOf() moduleExecution prevModuleExecution - .hasInput()[forall].OutputOf()
downstreams:
dataitem derivedData - .inputOf()[forall].hasOutput() moduleInstance nextModuleInstance - .hasOutputPort()[forall].connectedTo().inputPortOf() moduleExecution nextModuleExecution - .hasOutput()[forall].inputOf()
--Tommy 09:05, 12 April 2007 (MDT)