Log API
Introduction
The purpose of a log API is to make detailed queries on the vistrails execution log possible. An index has been created that can answer general queries about workflow executions. However, to enable queries on individual module executions, the log inside each vistrail file need to be accessed, which may be slow. Also, to answer queries related to workflow modules such as parameters, the original workflow pipeline need to be correlated with the execution log.
The information required to process different log queries can be divided into 3 parts. First, using only the index. Second, using the index and complete log files. Third, correlating the index, and log file with the corresponding pipelines. For each part I will state the information available and a few example queries that might be possible.
Part 1: Queries only requiring information from the index
The index contains execution-level information such as:
- start-end time of pipeline executions
- user
- result (success, failed, cached, or not executed).
Example queries:
- Time range queries, e.g. executions on a specific day or month.
- Specific users or execution results
Part 2: Queries on individual module executions
The log contains module-level execution information such as:
- start-end times of module executions
- result (success, failed, cached, or not executed).
- Errors
- Execution annotations
Example queries:
- Module execution lasting more than 5 minutes
- Error annotation containing a specific word
Part 3: Queries on module types and parameters
The vistrail contain the pipeline information which includes:
- Module types
- Parameters
- Module annotations
- Connections
Example queries:
- Failed executions of a specific module type
- Visual Query where a specific module type is connected to a module that have failed.
- All failed executions of a specific module type with a specific parameter.
Summary
It would be good to ask users which types of queries are important, as not all types of queries may be required in day-to-day work.
It may be the case that most queries can be answered by using the index, and more advanced queries can be answered by using an execution viewer, which should contain all the module executions for a specific pipeline execution as well as the relevant pipeline.