What Is VisTrails?
VisTrails is a new system that provides data and process management
support for exploratory computational tasks. It combines features of
both workflow and visualization systems. Similar to workflow systems, it allows the combination of loosely-coupled resources, specialized
libraries, and grid and Web services. Similar to some
visualization systems, it provides a mechanism for parameter
exploration and comparison of different results. But unlike these other systems,
VisTrails was designed to manage exploratory processes in which
computational tasks evolve over time as a user iteratively
formulates and tests hypotheses. A key distinguishing
feature of VisTrails is its comprehensive provenance infrastructure that
maintains detailed history information about the steps followed in the
course of an exploratory task. VisTrails leverages this information to
provide novel operations and user interfaces that streamline this
process.
Important Features
One of our main uses for VisTrails has been exploratory visualization,
but the system is much more general and provides many other features,
such as:
- Flexible Provenance Architecture. VisTrails transparently
tracks changes made to workflows, including all the steps followed in the
exploration. The system can optionally track run-time information
about the execution of workflows (e.g., who executed a module, on
which machine, elapsed time etc.). VisTrails also provides a
flexible annotation framework whereby you can specify
application-specific provenance information.
- Querying and Re-using History. The provenance
information is stored in a structured way. You have a choice of
using a relational database (such as MySQL or IBM DB2) or XML files in
the file system. The system provides flexible and intuitive query
interfaces through which you can explore and reuse provenance
information. You can formulate simple keyword-based and selection
queries (e.g., find a visualization created by a given user) as well
as structured queries (e.g., find visualizations that apply
simplification before an isosurface computation for irregular grid
data sets).
- Support for collaborative exploration. The system can be
configured with a database backend that can be used as a shared
repository. It also provides a synchronization facility that allows
multiple users to collaborate asynchronously and in a disconnected
fashion—you can check in and check out changes, akin to a
version control system (e.g., SVN: http://subversion.tigris.org).
- Extensibility. VisTrails provides a very simple plugin
functionality that can be used to dynamically add packages and
libraries. Neither changes to the user interface nor re-compilation
of the system are necessary. Because VisTrails is written in
Python, the integration of Python-wrapped libraries is
straightforward. For example, a single line in the VisTrails
start-up file is needed to import all of VTK’s classes.
- Scalable Derivation of Data Products and Parameter Exploration.
VisTrails supports a series of operations
for the simultaneous generation of multiple data products, including
an interface that allows you to specify sets of values for
different parameters in a workflow. The results of a parameter
exploration can be displayed side by side in the VisTrails
Spreadsheet for easy comparison.
- Task Creation by Analogy. Analogies are supported as
first-class operations to guide semi-automated changes to multiple
workflows, without requiring you to directly manipulate or edit
the workflow specifications.
Obtaining the software
Visit http://www.vistrails.org to access the VisTrails community
website. Here you will find information including instructions for
obtaining the software, online documentation, video tutorials, and
pointers to papers and presentations.
VisTrails is available as open
source; it is released under the GPL 2.0 license. The pre-compiled
versions for Windows and Mac OS X come with an installer and
include a number of packages, including VTK, matplotlib, and Image
Magick. Additional packages, including packages written by users, are
also available (e.g., ITK, Matlab, Metro). Developers can easily add new
packages using the VisTrails plugin infrastructure.