FAQ

Also check our Known Issues page for troubleshooting.

Running workflows

How can I run a workflow using the command line?

(Updated for version 1.2) Call vistrails using the following options:

python vistrails.py -b path_to_vistrails_file:pipeline

where pipeline can be a version tag name or version id

NOTE: If you downloaded the MacOS X bundle, you can run vistrails from the command line via the following commands in the Terminal. Change the current directory to wherever VisTrails was installed (often /Applications), and then type:

Vistrails.app/Contents/MacOS/vistrails [<cmd_line_options>]

Using the command line, we'd like to execute a workflow multiple times, with slightly different parameters, and create a series of output files. Is this possible?

(Updated for version 1.2) We can change parameters that have an alias through the command line.

For example, offscreen pipeline in offscreen.vt always creates the file called image.png. If you want generate it with a different filename:

python vistrails.py -b ../examples/offscreen.vt:offscreen -a"filename=other.png"

filename in the example above is the alias name assigned to the parameter in the value method inside the String module. When running a pipeline from the command line, VisTrails will try to start the spreadsheet automatically if the pipeline requires it. For example, this other execution will also start the spreadsheet:

python vistrails.py -b ../examples/head.vt:aliases \ -a"isovalue=30&&Diffuse_Color_R=0.8&&Diffuse_Color_G=0.4&&Diffuse_Color_B=0.2"

You can also execute more than one pipeline on the command line:

python vistrails.py -b ../examples/head.vt:aliases ../examples/spx.vt:spx \ -a"isovalue=30"

Use the -a parameter only once regardless the number of pipelines.

I can load a vistrail, and the version tree shows up fine. However, no pipelines appear when I click on a version. What gives?

The most likely reason is that the vistrail uses a package that is not registered with VisTrails. You need to identify the needed package and add it to your .vistrails/startup.py. A single line like the following should be enough:

addPackage('enter_package_name_here')

Some packages might need more information. For example:

addPackage('afront', executable_path='/path/to/afront')

Refer to the package documentation for details. The one inconvenient step is that currently there's no automated way to describe what is the missing package. We're working on this feature for future releases.

I have a workflow that reads a file and then does some processing. The first time it runs, it executes correctly. But in subsequent, nothing happens.

VisTrails caches by default, so after a workflow is executed, if none of its parameters change, it won't be executed again.

If a workflow reads a file using the basic module File, VisTrails does check whether the file was modified since the last run. It does so by keeping a signature that is based on the modification time of the file. And if the file was modified, the File module and all downstream modules (the ones which depend on File) will be executed.

Note: If you would like your input and output data to be versioned, you can use the Persistence package.

If you do not want VisTrails to cache executions, you can turn off caching: go to Menu Edit -> Preferences and in the General Configuration tab, change Cache execution results to Never.

Can VisTrails execute workflows in parallel?

The VisTrails server can only execute pipelines in parallel if there's more than one instance of VisTrails running. The command

self.rpcserver = ThreadedXMLRPCServer((self.temp_xml_rpc_options.server, self.temp_xml_rpc_options.port))

starts a multithreaded version of the XML-RPC server, so it will create a thread for each request received by the server. The problem is that Qt/PyQT doesn't allow these multiple threads create GUI objects, only in the main thread. To overcome this limitation, the multithreaded version can instantiate other single threaded versions of VisTrails and put them in a queue, so workflow executions and other GUI-related requests, such as generating workflow graphs and history trees can be forwarded to this queue, and each instance takes turns in answering the request. If the results are in the cache, the multithreaded version answers the requests directly.

Note that this infrastructure works on Linux only. To make this work on Windows, you have to create a script similar to start_vistrails_xvfb.sh (located in the scripts folder) where you can send the number of other instances via command-line options to VisTrails. The command line options are:

python vistrails_server.py -T <ADDRESS> -R <PORT> -O<NUMBER_OF_OTHER_VISTRAILS_INSTANCES> [-M]&

If you want the main vistrails instance to be multithreaded, use the -M at the end.

After creating this script, update function start_other_instances in vistrails/gui/application_server.py lines 1007-1023 and set the script variable to point to your script. You may also have to change the arguments sent to your script (line 1016: for example, you don't need to set a virtual display). You will need to change the path to the stop_vistrails_server.py script (on line 1026) according to your installation path.

When a workflow is executed, what do the colors mean?

- lilac: module was notexecuted

- yellow: module is currently being executed

- green: module was successfully executed

- orange: module was cached

- red: the execution of the module failed

Building workflows

Is there a way to give each widget a "display name" in addition to the module name at the center of the widget?

Yes, but it is not easily accessible from the GUI and it definitely needs to be more intuitive. For now, we use the annotation value of key "__desc__" as a module label. If you want to set a PythonSource label, you have to select the module. Then click on the Annotation tab, and add a key named "__desc__", whatever value you set to this key will be the label. We are currently working on a new interface for this functionality.

Is there a way to re-center the picture-in-picture (PiP) view?

Yes. If you click on the PIP window to bring it to focus, you can press Ctrl-R (or Command-R on Mac) to re-center the PiP window.

How do I search for a literal "?" (question mark) in the search box in the Property panel?

Since we allow regular expressions in our search box, question marks are treated as meta-characters. Thus, searching for "?" returns everything and "abc?" will return everything containing "abc". You need to use "\?" instead to search for "?". So the search for "??" would be "\?\?".

Using VisTrails as a server

What is the VisTrails server-mode?

Using the VisTrails server mode, it is possible to execute workflows and control VisTrails through another application. For example, the CrowdLabs Web portal (http://www.crowdlabs.org) accesses a VisTrails sever to execute workflows, retrieve and display vistrail trees and workflows.

How do I execute workflows and control VisTrails through another application?

The way you access the server is by doing XML-RPC calls. In the current VisTrails release, we include a set of PHP scripts that can talk to a VisTrails server instance. They are in "extensions/http" folder. The files are reasonably well documented. Also, it should be not difficult to create python scripts to access the server (just use xmlrpclib module).

Note that the VisTrails server requires the provenance and workflows to be in a database. More detailed instructions on how to setup the server and the database are available here:

http://www.crowdlabs.org/site_media/static/dev_docs/vistrails_server_setup.html

http://www.crowdlabs.org/site_media/static/dev_docs/vistrails_database_setup.html

If what you want is just to execute a series of workflows in batch mode, a simpler solution would be to use the VisTrails client in batch mode. Chapter 12 of the user's guide contains detailed information and examples on that.

Control Flow

Note: using map

When using 'map', the module (or subworkflow) used as function port in the map module MUST be a function, i.e., it can only define 1 output port.

Spreadsheet

Where pipeline is a version number or a tag.

How can I save an image from the spreadsheet?

While having the focus on a spreadsheet cell and select the camera on the toolbar to take a snapshot. The system will prompt you for the location and file name where it should be saved. The other icons can be used for saving multiple images that can be used for generating an animation on demand. A whole sheet can also be saved by selecting Export (either from the menu or from the toolbar).

Is it possible to save the complete state of the spreadsheet?

Can I view multiple sheets at the same time?

Yes. Each sheet on the spreadsheet can be displayed as a dock widget separated from the main spreadsheet window by dragging its tab name out of the tab bar at the bottom of the spreadsheet.

Then, how can I put back a separated sheet?

A sheet can be docked back to the main window by dragging it back to the tab bar or double-click on its title bar.

How can I order sheets on the spreadsheet?

This can be done by dragging the sheet name on the bottom top bar and drop it to the right place.

Can I control where a cell will be placed on the spreadsheet window?

By default, an unoccupied cell on the active sheet will be chosen to display the result. However, you can specify exactly in the pipeline where a spreadsheet cell will be placed by using CellLocation and SheetReference. CellLocation specifies the location (row and column) of a cell when connecting to a spreadsheet cell (VTKCell, ImageViewerCell, ...). Similarly, a SheetReference module (when connecting to a CellLocation) will specify which sheet the cell will be put on given its name, minimum row size and minimum column size. There is an example of this in examples/vtk.xml (select the version below Double Renderer).

How do I output results to the spreadsheet?

By inspecting the VisTrails Spreadsheet package (in the list of packages, to the left of the pipeline builder), you can see there are built-in cells for different kinds of data, e.g., RichTextCell to display HTML and plain text. op You (the user) can also define new cell types to display application-specific data. For example, we have developed VtkCell, MplFigureCell, and OpenGLCell. It is possible to display pretty much anything on the Spreadsheet!

Examples of writing cell modules can be found in: RichTextCell: packages/spreadsheet/widgets/richtext/richtext.py VTK: packages/vtk/vtkcell.py

Here is the summary of some requirements on a cell widget:

(1) It must be a Qt widget. It should inherit from spreadsheet_cell.QCellWidget in the spreadsheet package. Although any Qt Widget would work, certain features such as animation will not be available (without rewriting it).

(2) It must re-implement the updateContents() function to take a set of inputs (usually coming from input ports of a wrapper Module) and display on the cells. VisTrails uses this function to update/reuse cells on the spreadsheet when new data comes in.

(3) It needs a wrapper VisTrails Module (inherited from basic_widgets.SpreadsheetCell of the spreadsheet package). Inside the compute() method of this module, it may call self.display(CellWidgetType, (inputs)) to trigger the display event on the spreadsheet.

How do I control the default number of cells in the spreadsheet?

You can configure the rowCount and colCount using the preferences dialog. Just go to the Module Packages tab, select spreadsheet in the "Enabled packages" and press the Configure button. Then a list of all the configuration options for the spreadsheet will show up.

Is it possible to launch a web browser from the vistrails spreadsheet? We would like to output several urls from a parameter sweep and then have the option to click on each one to view the resulting page. I can view the page within the spreadsheet, but it is really too crowded.

Currently, there isn't a widget that provides exactly this functionality, but I can think of a few solutions that may work for you:

(1) You can use parameter exploration to generate multiple sheets so you might have an exploration that opens each page in a new sheet. Use the third column/dimension in the exploration interface to have a parameter span sheets.

(2) The spreadsheet is extensible so you can write a custom spreadsheet cell widget that has a button or label with the desired link (a QLabel with openExternalLinks set to True, for example).

(3) You can tweak the existing RichTextCell be adding the line "self.browser.setOpenExternalLinks(True)" at line 63 of the source file "vistrails/packages/spreadsheet/widgets/richtext/richtext.py". Then, if your workflow creates a file with html markup text like "<a href="http://www.vistrails.org">VisTrails</a>" connected to a RichTextCell, clicking on the rendered link in the cell will open it in a web browser. You need to add the aforementioned line to the source to let Qt know that you want the link opened externally; by default, it will just issue an event that isn't processed.

Integrating your software into VisTrails

How can I integrate my own program into VisTrails?

The easiest way is to create a package. Writing a package is often very simple, here are instructions on how to do it: UsersGuideVisTrailsPackages

You can also dynamically generate modules. For an example see:

http://www.vistrails.org/index.php/UsersGuideVisTrailsPackages#Packages_that_generate_modules_dynamically

In particular, see the new_module call which uses python's type() function to generate new classes dynamically.

How do modules deal with multiple inputs in a same port?

(And should that even be allowed?)

For compatibility reasons, we do need to allow multiple connections to an input port. However, most package developers should never have to use this, and so we do our best to hide it. the default behavior for getting inputs from a port, then, is to always return a single input.

If on your module you need multiple inputs connected to a single port, use the 'forceGetInputListFromPort' method. It will return a list of all the data items coming through the port. The VTK package uses this feature, so look there for usage examples (packages/vtk/base_widget.py)

Are there mechanisms for attaching widgets to different modules/parameters?

Right now, we have a mechanism for putting a specific widget for an input port. For example, if a port is SetColor(red, green, blue), we can put a color wheel widget there. Or we can also replace the SetFileName port with a File Widget. However, this is not per parameter (only per port). We are currently working on this problem.

Can I organize my package so it appears hierarchical in the module palette?

Yes. Use the namespace keyword argument when adding the module to the registry. For example,

registry.add_module(MyModule, namespace='MyNamespace')

Can I nest namespaces?

Yes. Use the '|' character to separate different the hierarchy. For example,

registry.add_module(MyModule, namespace='ParentNamespace|ChildNamespace')

Are there shortucts for registry initialization?

Yes. If you define _modules as a list of classes in the __init__.py file of your package, VisTrails will attempt to load all classes specified as modules. You can provide add_module options as keyword arguments by specifying a tuple (class, kwargs) in the list. For example:

_modules = [MyModule1, (MyModule2, {'namespace': 'MyNamespace'})]

In addition, you need to identify the ports of your modules as a field in your class by defining _input_ports and _output_ports lists. Here, the items in each list must be tuples of the form (portName, portSignature, optional=False, sort_key=-1). For example:

class MyModule(Module):
    def compute(self):
       pass

   _input_ports = [('firstInput', String), ('secondInput', Integer, True)]
   _output_ports = [('firstOutput', String), ('secondOutput', String)]

Can I define ports to be of types that I do not import into my package?

Yes. You can pass an identifier string as the portSignature instead. The port_signature string is defined by:

<module_string> := <package_identifier>:[<namespace>|]<module_name>,
<port_signature> := (<module_string>*)

For example,

registry.add_input_port(MyModule, 'myInputPort', '(edu.utah.sci.vistrails.basic:String)')

or

 _input_ports = [('myInputPort', '(edu.utah.sci.vistrails.basic:String)')]

What do I need to change in my package to make it reloadable (new in v1.4.2)?

See UsersGuideVisTrailsPackages for an explanation.

Can I add default values or labels for parameters?

Yes. Versions 1.4 and greater support these features. See UsersGuideVisTrailsPacakges for more details.

I want to write a module to load HDF data whose output (e.g., data, string) varies according to the input I give it. Is is possible to do this in VisTrails, and if yes, how can I do that? Ideally, I would like to avoid having to change the connection of my output every time I change the input.

 There are a few ways to tackle this - each has it's own benefits and pitfalls.  Firstly, module connections do respect class hierarchies as we're familiar with in object oriented languages.  For instance, A module can output a Constant of which String, Float, Integer, etc are specifications.  In this way, you can have a subclass of something like HDFData be passed out of the module and the connections will be established regardless of the sub-type.  This is a bit dangerous though.  Modules downstream of such a class may not really know how to operate on certain types derived from the super-class.  Extreme care must be taken both when creating the modules as well as connecting them to prevent things like this from happening.

 A second method that I employ in several different packages is the idea of a container class.  For instance, the NumSciPy package uses a relatively generic container "Numpy Array" to encapsulate the data.  Of course, these encapsulating objects can store dictionaries that other modules can easily access and understand how to operate on.   Although this method is slightly more work, the benefits of a stricter typing of ports is beneficial - particularly upon interfacing with other packages that may depend on strongly typed constants (for example).

The Console

Where should I go to find out what I can call from the console and how to import it?

We have tried to make some methods more accessible in the console via an api. You can import the api via import api in the console and see the available methods with dir(api). To open a vistrail:

import api
api.open_vistrail_from_file('/Applications/VisTrails/examples/terminator.vt')

To execute a version of a workflow, you currently have to go through the controller:

api.select_version('Histogram')
api.get_current_controller().execute_current_workflow()

Currently, only a subset of VisTrails functionality is directly available from the api. However, since VisTrails is written in python, you can dig down starting with the VistrailsApplication or controller object to expose most of our internal methods. If you have suggestions for calls to be added to the api, please let us know.

One other feature that we're working on, but is still in progress is the ability to construct workflows via the console. For example:

vtk = load_package('edu.utah.sci.vistrails.vtk')
vtk.vtkDataSetReader() # adds a vtkDataSetReader module to the pipeline
# click on the new module
a = selected_modules()[0] # get the one currently selected module
a.SetFile('/vistrails/examples/data/head120.vtk') # sets the SetFile parmaeter for the data set reader
b = vtk.vtkContourFilter() # adds a vtkContourFilter module to the pipeline and saves to var b
b.SetInputConnection0(a.GetOutputPort0()) # connects a's GetOutputPort0 port to b's SetInputConnection0

Persistence Package

How do I use the output of one workflow as the input for another using the persistence package?

You need to configure the persistence modules using the module's configuration dialog. After adding a PersistentOutputFile to the workflow, click on the triangle in the upper-right corner of the PersistentOutputFile, and select "Edit Configuration" from the menu that appears. In this dialog, select "Create New Reference" and give the reference a name (and any space-delimited tags). Upon running that workflow, the data will be written to the persistent store. In the second workflow where you wish to use that file, add a PersistentInputFile and go to its configuration dialog in the same manner as with the output file. In that dialog, select "Use Existing Reference" and select the data that you just added in the first workflow in the list of files below. Now, when you run that workflow, it will grab the data from the persistent store.

Here is an example: offscreen_persistent.vt. Run the "persistent offscreen" workflow first, and then run the "display persistent output" to use the output of the first workflow as the input for the second.

VTK

Given a VTK visualization, how can I generate a webpage from it?

Check out the html pipeline in offscreen.xml.

I'm trying to use VTK, but there doesn't seem to be any output. What is wrong?

To use VTK on VisTrails, you need a slightly different way of connecting the renderer modules. Instead of using the standard RenderWindow/RenderWindowInteractor infrastructure, you simply connect the renderer to a VTKCell. The examples directory in the distribution has several VTK examples that illustrate.

I am trying to add a module to the workflow via Python, but how can I access vtk modules?

Here's an example:

import api

vtvtk = 'edu.utah.sci.vistrails.vtk'

module = api.add_module(0, 0, vtvtk, 'vtkContourFilter', )

The third argument in add_module is the package identifier. You can find this in the "Module Packages" panel of the Preferences; just click on the package you're interested in and it will appear in the information on the right.

matplotlib

I'm experiencing a problem with Latex labels and the matplotlib that comes with VisTrails 1.5. The script below entered to the interpreter that comes with VT is sufficient to reproduce it.

  import matplotlib.pyplot as plt
  plt.plot([1,2,3],[1,2,3])
  plt.xlabel("$foo$")

Remove your ~/.matplotlib folder and re-start VisTrails

VisTrails Development

I would like to build VisTrails from source. Are there instructions on how to do this?

Yes! Take a look at http://www.vistrails.org/index.php/Mac_Intel_Instructions

Accessing Provenance Information

How do I access the information in the execution log?

The code responsible for storing execution information is located in the "core/log" directories, and the code that generates much of that information is in "core/interpreter/cached.py". Modules can add execution-specific annotations to provenance via annotate() calls during execution, but much of the data (like timing and errors) is captured by the LogController and CachedInterpreter (the execution engine) objects. To analyze the log from a vistrail (.vt) file, you might have something like the following:

 import core.log.log
 import db.services.io

 def run(fname):
  # open the .vt bundle specified by the filename "fname"
  bundle = db.services.io.open_vistrail_bundle_from_zip_xml(fname)[0]
  # get the log filename
  log_fname = bundle.vistrail.db_log_filename
  if log_fname is not None:
      # open the log
      log = db.services.io.open_log_from_xml(log_fname, True)
      # convert the log from a db object
      core.log.log.Log.convert(log)
      for workflow_exec in log.workflow_execs:
          print 'workflow version:', workflow_exec.parent_version
          print 'time started:', workflow_exec.ts_start
          print 'time ended:', workflow_exec.ts_end
          print 'modules executed:', [i.module_id 
                                      for i in workflow_exec.item_execs]
 if __name__ == '__main__':
    run("some_vistrail.vt")

You should be able to see what information is available by looking at the "core/log" classes.

FAQ