Writing VisTrails Packages¶
Introduction¶
VisTrails provides a plugin infrastructure to integrate user-defined functions and libraries. Specifically, users can incorporate their own visualization and simulation code into pipelines by defining custom modules (or wrappers). These modules are bundled in what we call packages. A VisTrails package is simply a collection of Python classes stored in one or more files, respecting some conventions that will be described shortly. Each of these classes will represent a new module. In this chapter, we will build progressively more complicated modules. Note that even though each section introduces a specific large feature of the VisTrails package mechanism, new small features are highlighted and explained as we go along. Because of this, we recommend at least skimming through the entire chapter at least once.
Who Should Read This Chapter?¶
This chapter is written for developers who wish to extend VisTrails with customized modules, tailored for their specific needs. It is assumed that you have experience writing code in the Python programming language. Teaching the syntax of Python is beyond the scope of this manual; for experienced programmers who would like a compact introduction to Python, we recommend the book Python in a Nutshell by Alex Martelli (published by O’Reilly).
However, if you do not yet know Python but are familiar with another object-oriented language such as Java or C#, you should be able to get the gist of these examples from looking at the code and by reading our line-by-line commentaries.
An Example Module¶
Here is the definition of a very simple module:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 | class Divide(Module):
_input_ports = [IPort(name='arg1',\
signature='basic:Float',\
label="dividend"),
IPort(name='arg2',\
signature='basic:Float',\
label='divisor')]
_output_ports = [OPort(name='result',\
signature='basic:Float',\
label='quotient')]
def compute(self):
arg1 = self.get_input("arg1")
arg2 = self.get_input("arg2")
if arg2 == 0.0:
raise ModuleError(self, "Division by zero")
self.set_output("result", arg1 / arg2)
|
New VisTrails modules must subclass from Module
, the base class that defines basic functionality. The only required override is the compute()
method, which performs the actual module computation. Input and output is specified through ports, which must be explicitly registered with VisTrails using the _input_ports
and _output_ports
lists. Simple ports are specified using InputPort
(IPort
) and OutputPort
(OPort
) objects.
An Example Package¶
The previous section only shows the definition of a single module. To create a full package that loads and runs in VisTrails, a few more items are required. In this example, we define a basic calculator package named PythonCalc. Note that this package includes two files, __init__.py
and init.py
that live in a directory named pythonCalc
; each file is an important piece of a VisTrails package.
__init__.py
1 2 3 4 5 6 7 8 9 10 11 12 | """This package implements a very simple VisTrails module called
PythonCalc. This is intended as a simple example that can be referred
to by users to create their own packages and modules later.
If you're interested in developing new modules for VisTrails, you
should also consult the documentation in the User's Guide and in
core/modules/vistrails_module.py.
"""
identifier = 'org.vistrails.vistrails.pythoncalc'
name = 'PythonCalc'
version = '0.9.2'
|
init.py
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 | ###############################################################################
# PythonCalc
#
# A VisTrails package is simply a Python class that subclasses from
# Module. For this class to be executable, it must define a method
# compute(self) that will perform the appropriate computations and set
# the results.
#
# Extra helper methods can be defined, as usual. In this case, we're
# using a helper method op(self, v1, v2) that performs the right
# operations.
from vistrails.core.modules.vistrails_module import Module, ModuleError
from vistrails.core.modules.config import IPort, OPort
class PythonCalc(Module):
"""PythonCalc is a module that performs simple arithmetic operations
on its inputs."""
# You need to report the ports the module wants to make
# available. This is done by creating _input_ports and
# _output_ports lists composed of InputPort (IPort) and OutputPort
# (OPort) objects. These are simple ports that take only one
# value. We'll see in later tutorials how to create compound ports
# which can take a tuple of values. Each port must specify its
# name and signature. The signature specifies the package
# (e.g. "basic" which is shorthand for
# "org.vistrails.vistrails.basic") and module (e.g. "Float").
# Note that the third input port (op) has two other arguments.
# The "enum" entry_type specifies that there are a set of options
# the user should choose from, and the values then specifies those
# options.
_input_ports = [IPort(name="value1", signature="basic:Float"),
IPort(name="value2", signature="basic:Float"),
IPort(name="op", signature="basic:String",
entry_type="enum", values=["+", "-", "*", "/"])]
_output_ports = [OPort(name="value", signature="basic:Float")]
# This constructor is strictly unnecessary. However, some modules
# might want to initialize per-object data. When implementing your
# own constructor, remember that it must not take any extra
# parameters.
def __init__(self):
Module.__init__(self)
# This is the method you should implement in every module that
# will be executed directly. VisTrails does not use the return
# value of this method.
def compute(self):
# get_input is a method defined in Module that returns
# the value stored at an input port. If there's no value
# stored on the port, the method will return None.
v1 = self.get_input("value1")
v2 = self.get_input("value2")
# You should call set_output to store the appropriate results
# on the ports. In this case, we are only storing a
# floating-point result, so we can use the number types
# directly. For more complicated data, you should
# return an instance of a VisTrails Module. This will be made
# clear in further examples that use these more complicated data.
self.set_output("value", self.op(v1, v2))
def op(self, v1, v2):
op = self.get_input("op")
if op == '+':
return v1 + v2
elif op == '-':
return v1 - v2
elif op == '*':
return v1 * v2
elif op == '/':
return v1 / v2
# If a module wants to report an error to VisTrails, it should raise
# ModuleError with a descriptive error. This allows the interpreter
# to capture the error and report it to the caller of the evaluation
# function.
raise ModuleError(self, "unrecognized operation: '%s'" % op)
# VisTrails will only load the modules specified in the _modules list.
# This list contains all of the modules a package defines.
_modules = [PythonCalc,]
|
To create and install this package in VisTrails, first create a new directory named pythonCalc
in the .vistrails/userpackages
subdirectory of your home directory. Then, save the two code blocks above to the corresponding __init__.py
and init.py
files in the newly created pythonCalc
directory. Now, click on the Edit
menu (or the VisTrails
menu on Mac OS X), select the Preferences
option and select the Module Packages
tab. A dialog similar to what is shown
in Figure All available packages... should appear. Select the
pythonCalc
package, then click on
Enable
. This should move the package to the
Enabled packages
list. Close the dialog. The package and
module should now be visible in the VisTrails builder.
Now create a workflow similar to what is shown in Figure A simple workflow that uses PythonCalc.... When executed, this workflow will print the following on your terminal:
7.0
Let’s now examine how this works. The __init__.py
file provides metadata about the package. Version
is simply information about the package version. This might be tied to the underlying library or not. The only recommended guideline is that compatibility is not broken across minor releases, but this is not enforced in any way. Name
is a human-readable name for the package.
The most important piece of metadata,
however, is the package identifier, stored in the variable called
identifier
. This is a string that must be globally unique
across all packages, not only in your system, but in any possible
system. We recommend using an identifier similar to Java’s package
identifiers. These look essentially like regular DNS names, but the
word order is reversed. This makes sorting on the strings a lot more
meaningful. You should generally go for
institution.project.packagename
for a package related to a
certain project from some institution, and
institution.creatorname
for a personally developed
package. If you are wrapping third-party functionality, do not
use their institution’s DNS, use your own. The rationale for this is
that the third party itself might decide to create their own VisTrails
package, and you do not want to introduce conflicts.
The init.py
file contains the actual definitions of the modules. Every VisTrails module corresponds to a Python class that ultimately derives from the Module
class, which is defined in vistrails.core.modules.vistrails_module
. Each module must define input ports and output ports as well as implement a compute()
method that takes no extra parameters.
We need to tell VisTrails about the input and output ports we want to expose in a module. Input ports are set in the _input_ports
list and output ports in the _output_ports
list. Each object in these lists is defined from a type from vistrails.core.modules.config
. The most basic port types are InputPort
(aka IPort
) and OutputPort
(aka OPort
). Each requires two arguments, the name of the port and the signature of the port. A name may be any string, but must be unique across all inputs or outputs. The same name may be used both for an input and an output. The signature defines the type of the port; VisTrails allows any module to also be a type. A signature is a string composed of the module’s package identifier followed by a colon and the module’s name. Many basic module types including String
, Float
, and Integer
are defined by VisTrails in the Basic Modules package. Thus, the Float
module’s signature is org.vistrails.vistrails.basic:Float
. Any core package that is distributed with VisTrails has an identifier that begins org.vistrails.vistrails
and thus you may omit that prefix for brevity; basic:Float
defines the same signature. There are a number of other options for ports, but we will cover these later.
The compute method on Line 49 defines the actual computation that happens in a module. This computation typically involves getting the necessary input and generating the output. Lines 53-54 shows how to extract input from a port. Specifically, we’re getting the values passed to input ports value1
and value2
. We then perform some
operation with these values, and need to report the output on an output port, so that it is available for downstream modules. This is done on Line 62, where the result is set to port value
.
Let us now look more carefully at the remainder of the class definition. Notice
that developers are allowed to define extra helper methods, for example the op
method on Line
64. These helper methods can naturally use the ports
API. The other important feature of the op
method is
error checking. PythonCalc
requires a string that
represents the operation to be performed with the two numbers. If the
string is invalid, it signals an error by simply raising a Python
exception, ModuleError
, that is provided in
vistrails.core.modules.vistrails_module
. This exception expects two
parameters: the module that generated the exception (typically
self
) and a string describing the error. In the Pipeline view, this error message is displayed in the tooltip that appears when the user moves the cursor over the PythonCalc
module icon.
The final step is to specify the list of modules your package defines. This is done via the _modules
list which is simply a list of all the modules the package wishes to define. Leaving a class out of that list will mean it will not appear as an available module for use in VisTrails. That is it — you have successfully created a new package and module. From now on, we will look at more complicated examples, and more advanced features of the package mechanism.
Note
Older versions of VisTrails used explicit calls to the ModuleRegistry in an initialize()
method. These calls like ModuleRegistry.add_module()
, ModuleRegistry.add_input_port()
, and ModuleRegistry.add_output_port()
are still supported though their use is discouraged as the new syntax places all attributes and configuration options in the module definition, making code more readable and localized. The arguments available in the registry functions are mirrored in the new configuration objects used for _settings
, _input_ports
, and _output_ports
.
Package Specification¶
Structure¶
A package should contain the following files inside a directory named for the package:
__init__.py
– identifiers and configurationinit.py
– modules, other imports
Optionally, it might also contain:
identifiers.py
– the identifers might be specified here and imported in__init__.py
widgets.py
– any GUI widgets the package’s modules use- any other files and/or python submodules that the package depends on in
The reason for the separation between __init__.py
and init.py
is that VisTrails inspects packages for identification, configurations, and information to populate the list of available packages, and for large packages with dependent libraries, including everything (including the subpackage imports) in __init__.py
would take significant time. Thus, we encourage package developers to define modules and include sub-imports only from init.py
to speed up loading times. The optional identifiers.py
allows developers to import configuration information, like the identifier and version, into both __init__.py
and init.py
. Then, __init__.py
may consist of the line from identifiers import *
. widgets.py
is a suggested separation between GUI configuration widgets and the module definitions because VisTrails can run in batch mode or as a python package without Qt/PyQt, and if the widgets are imported into or defined from init.py
, VisTrails will unnecessarily try to import the Qt/PyQt libraries. Instead, modules can define their configuration widgets as path strings (see Configuration Widgets ), and the widgets will only be imported when the GUI is running.
Most third-party packages should be installed into a user’s /.vistrails/userpackages
directory. The package’s codepath
is the name of the directory in that userpackages directory. A few third-party packages install into the packages
directory of the VisTrails codebase due to specific dependencies or to install for all users of the application. If you are interested in such installation features, please contact us.
The identifier, name, version, configuration, and package_dependencies fields/methods should be specified or imported into __init__.py
. An example of __init__.py
from VisTrails‘ matplotlib package follows.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 | identifier = 'org.vistrails.vistrails.matplotlib'
name = 'matplotlib'
version = '1.0.1'
old_identifiers = ['edu.utah.sci.vistrails.matplotlib']
def package_dependencies():
import vistrails.core.packagemanager
manager = vistrails.core.packagemanager.get_package_manager()
if manager.has_package('org.vistrals.vistrails.spreadsheet'):
return ['org.vistrails.vistrails.spreadsheet']
else:
return []
def package_requirements():
import vistrails.core.requirements
if not vistrails.core.requirements.python_module_exists('matplotlib'):
raise vistrails.core.requirements.MissingRequirement('matplotlib')
if not vistrails.core.requirements.python_module_exists('pylab'):
raise vistrails.core.requirements.MissingRequirement('pylab')
|
The old_identifiers
field is used to identify packages whose identifiers have changed. This allows VisTrails to migrate old vistrails to the new packages. Other imports (excluding vistrails.core.configuration), other class definitions, and the initialize method should be in the init.py
file.
Configuration¶
In addition to “pure-python” packages, VisTrails packages can also be
designed to wrap existing libraries and command-line tools (see
Wrapping Command-line tools for more information). For
command-line tools, there are often some configuration options that
may change from machine to machine. In addition, there may also be
flags (e.g. for debugging) that a user may wish to toggle on or off
depending on the situation. VisTrails provides the
configuration
package attribute for such situations; the
ConfigurationObject
stored here is accessible both during module
computations and from the GUI in the Preferences
dialog.
In the following example, we have some code from a package designed to
control runs of afront
, a command-line program for generating 3D
triangle meshes. [1] It uses a general run()
method to run each
command, and we use the configuration object to determine where the
executable lives and whether we should print debugging information.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 | import os
from vistrails.core.configuration import ConfigurationObject
from vistrails.core.modules.vistrails_module import Module, ModuleError
from vistrails.core.system import list2cmdline
configuration = ConfigurationObject(path=(None, str),
debug=False)
class AfrontRun(object):
def run(self, args):
if configuration.check('path'): # there's a set directory
afront_cmd = os.path.join(configuration.path, 'afront')
else: # Assume afront is on path
afront_cmd = 'afront'
cmd = [afront_cmd, '-nogui'] + args
cmdline = list2cmdline(cmd)
if configuration.debug:
print cmdline
...
...
|
Let us first look at how to specify configuration options. Named
arguments passed to the ConfigurationObject
constructor (Lines 6 and 7) become attributes in the object. If the attribute has a default value, simply pass it to the constructor. If the attribute should be unset by default, pass the constructor a pair whose first element is None
and second element is the type of the expected
parameter. Currently, the valid types are bool
,
int
, float
and str
.
To use the configuration object in your code, you can simply access
the attributes (as on line 18). This is fine when there is a
default value set for the attribute. In the case of path
,
however, the absence of a value is encoded by a tuple
(None, str)
, so using it directly is inconvenient. That is
where the check()
method comes in (line 12). It returns False
if there is no set value, and returns the value otherwise.
Perhaps the biggest advantage of using a configuration object is that the values can
be changed through a GUI, and they are persistent across VisTrails
sessions. To configure a package, open the Preferences
menu (VisTrails
Preferences
on Mac OS X, or Edit
Preferences
on other platforms). Then, select the package you want to
configure by clicking on it (a package must be enabled to be
configurable). If the Configure
button is disabled, it
means the package does not have a configuration object. When you do
click Configure
, a dialog like the one in Figure Configuration window for a package... will appear.
To edit the value for a particular field, double-click on it, and change the value. The values set in this dialog are persistent across VisTrails sessions, being saved on a per-user basis.
Dependencies¶
When creating more sophisticated VisTrails packages, you might want to
create a new module that requires a module from another package. For example, using modules from different packages as
input ports, or even subclassing modules from other packages, require
management of interpackage dependencies. VisTrails needs to know about
these so that packages can be initialized in the correct order. To specify these dependencies, you should add a function named
package_dependencies
to your package. This function should return a list containing the identifier strings of the required packages.
As an example of this function’s usage, let’s take a look at a (simplified) code segment from the VTK package, which is included in the standard VisTrails distribution:
1 2 | def package_dependencies():
return ['org.vistrails.vistrails.spreadsheet']
|
As you can see, the package_dependencies
function is quite straightforward; it simply returns a list of the identifiers for the packages required by the VTK package. In this case, the list contains just a single string, as the VisTrails Spreadsheet is the only package dependency for the VTK package.
The simple approach taken by the above code works well for the majority of cases, but in practice you may want to add some error-checking to your package_dependencies
function. This allows VisTrails to recover gracefully in the unlikely event that the Spreadsheet package is missing. Below is the complete package_dependencies
function for the VTK package:
1 2 3 4 5 6 7 | def package_dependencies():
import vistrails.core.packagemanager
manager = vistrails.core.packagemanager.get_package_manager()
if manager.has_package('org.vistrails.vistrails.spreadsheet'):
return ['org.vistrails.vistrails.spreadsheet']
else:
return []
|
The above code segment also demonstrates the VisTrails API function has_package
which simply verifies that a given package exists in the system.
Requirements¶
In Section Dependencies, we saw how packages can depend on other packages. However, some packages may also depend on the presence of external libraries (in the form of Python modules) or executable files in order to run correctly.
Python Modules¶
To check for the presence of a required Python module, you should add a function named package_requirements
to your package. This function need not return any value; however it may raise exceptions or output error messages as necessary.
Here is an example of the syntax of the package_requirements
function, taken from the VisTrails VTK package:
1 2 3 4 5 6 7 8 | def package_requirements():
import vistrails.core.requirements
if not vistrails.core.requirements.python_module_exists('vtk'):
raise vistrails.core.requirements.MissingRequirement('vtk')
if not vistrails.core.requirements.python_module_exists('PyQt4'):
print 'PyQt4 is not available. There will be no interaction',
print 'between VTK and the spreadsheet.'
import vtk
|
A key element of package_requirements
is the use of the function python_module_exists
(see Lines 3 and 5), which checks whether a given module has been installed in your local Python system.
Automatically Installation¶
A more advanced method is to attempt to install a python module
automatically using a system package manager. This method currently
works for apt- and rpm-based systems. By using
core.bundles.py_import
, you can attempt to automatically install a
system dependency, all you need to specify is the python module name
and the name of the package that contains it. The following example
can be put in your init.py
file, with the desired module and
package names changed:
1 2 3 4 5 6 7 8 | from vistrails.core.bundles import py_import
from vistrails.core import debug
try:
pkg_dict = {'linux-ubuntu': 'your-apt-package',
'linux-fedora': 'your-deb-package'}
your-py-module = py_import('your-py-module', pkg_dict)
except Exception, e:
debug.critical("Exception: %s" % e)
|
Note that, if you use this method, you should not specify it in the
package_requirements
, because that would block the install attempt.
Executables¶
As explained in Section Wrapping Command-line tools, a common motivation for writing new VisTrails modules is to wrap existing command-line tools. To this end, the VisTrails API provides a function called executable_file_exists
which checks for the presence of specific executables on the path.
Here is an example of its usage, taken from the initialize
function of the ImageMagick
package. This package is included in the standard VisTrails distribution. The following code snippet checks to see if convert
, a command-line program associated with the ImageMagick suite of graphics utilities, is on the path.
1 2 3 4 5 6 | import vistrails.core.requirements
...
if (not vistrails.core.requirements.executable_file_exists('convert')):
raise vistrails.core.requirements.MissingRequirement("ImageMagick suite")
|
Note that this function is not strictly required in order to wrap third party executables into a module. Using a Configuration
object (see Section Configuration) that lets the user specify the path to an executable may be a cleaner solution.
For additional information or examples of any of the functions described above, please refer to the VisTrails source code or contact the VisTrails development team.
Upgrades¶
When revising a package, it is important that workflows containing old
modules can be translated to their corresponding new versions. If no
upgrade is explicitly specified, VisTrails attempts to automatically
upgrade the old module to the new version. However, if a module’s
interface has changed (e.g. a port was added or removed or the name
was changed), the automated upgrade will fail. For such cases,
VisTrails provides hooks for developers to specify the upgrade
paths. The recommended method is to use the _upgrades
attribute
in the package to specify a dictionary where each key is a module name
and the corresponding value is a list of potential upgrade paths for
those modules. The upgrade path is specified by an
UpgradeModuleRemap
instance which specifies the versions
for which the upgrade is valid, the output version, the new module,
and a set of remaps for module entities. For example,
1 2 3 4 5 6 7 | _upgrades = {"TestUpgradeA":
[UpgradeModuleRemap('0.8', '0.9', '0.9', None,
function_remap={'a': 'aa'},
src_port_remap={'z': 'zz'}),
UpgradeModuleRemap('0.9', '1.0', '1.0', None,
function_remap={'aa': 'aaa'},
src_port_remap={'zz': 'zzz'})]}
|
Here, we have two upgrade paths for the module TestUpgradeA
. The
first works for version 0.8
through–but not including–0.9
,
and the second for 0.9
to 1.0
. The output versions are
0.9
and 1.0
, respectively, and both specify None
as the
new module type which means that the new module has the same name as
the old one. The new module type could also be a string representing
a different module name. There are four remap types:
function_remap
, src_port_remap
, dst_port_remap
, and
annotation_remap
. Each one is a dictionary where the name of
affected function, port, or annotation is the key and the value
specifies either the output name (if this is just a name change) or a
function to be used to perform the remap. For example, one might
write a method that transforms the value of a temperature parameter
from Fahrenheit to Celsius. Such a method should return a list of
actions that accomplish this change. Note that because the
dst_port_remap
and function_remap
both affect input ports, any
remaps for dst_port_remap
are also used for functions unless
explicitly overridden.
If you require more control over the upgrade process, you may also
define a handle_module_upgrade_request
method in the VisTrails
package. It will be passed the controller, id of the module needing
an upgrade, and the current pipeline as inputs, and should return a
set of actions that will upgrade that single module to the latest
version.
Module Specification¶
In this section, we will explore different options for specifying modules and associated attributes, including those which affect their appearance and organization in the GUI. Details about all of the options available for modules can be found in the VisTrails API Documentation. VisTrails provides the ModuleSettings
class to offer a number of configuration options for modules. A module should define the _settings
attribute in the class to use these settings.
Caching¶
VisTrails provides a caching mechanism, in which portions of pipelines that are common across different executions are automatically shared. However, some modules should not be shared. Caching control is therefore up to the package developer. By default, caching is enabled. So a developer that doesn’t want caching to apply must make small changes to the module. For example, look at the StandardOutput
module:
from vistrails.core.modules.vistrails_module import Module, newModule, NotCacheable, ModuleError
from vistrails.core.modules.config import IPort
...
class StandardOutput(NotCacheable, Module):
"""StandardOutput is a VisTrails Module that simply prints the
value connected on its port to standard output. It is intended
mostly as a debugging device."""
_input_ports = [IPort(name="value", signature="basic:Module")]
def compute(self):
v = self.get_input("value")
print v
By subclassing from NotCacheable
and Module
(or one of its subclasses), we are telling VisTrails not to cache this module, or anything downstream from it.
VisTrails also allows a more sophisticated decision on whether or not to use caching. To do that, a user simply overrides the method is_cacheable
to return the appropriate value (the default implementation returns True
). For example, in the teem <http://teem.sourceforge.net/> package, there’s a module that generates a scalar field with random numbers. This is non-deterministic, so shouldn’t be cached. However, this module only generates non-deterministic values in special occasions, depending on its input port values. To keep efficiency when caching is possible, while still maintaining correctness, that module implements the following override:
class Unu1op(Unu):
(...)
def is_cacheable(self):
return not self.get_input('op') in ['rand', 'nrand']
(...)
Notice that the module explicitly uses inputs to decide whether it should be cached. This allows reasonably fine-grained control over the process.
Namespaces¶
ModuleSettings.namespace
can be used to define a hierarchy for modules in a package that is used to organize the module palette. Hierarchies can be nested through the use of the ‘|’ character. For example,
1 2 3 4 5 6 7 8 | class MyModule1(Module):
_settings = ModuleSettings(namespace="MyNamespace")
...
class MyModule2(Module):
_settings = ModuleSettings(namespace="ParentNamespace|\
ChildNamespace")
...
|
Documentation¶
The docstring you set on your Module subclass will be displayed to the user when he clicks on the ‘Documentation’ button in the ‘Module Information’ panel. Be sure to put a readable description and your usage information there.
If you want to customize that documentation, you can provide a staticmethod or classmethod ‘get_documentation’ on your Module. The string it returns will be used as the documentation.
1 2 3 4 5 6 7 | class TestMod(Module):
"""This very simple module doesn't do anything sensible.
"""
@classmethod
def get_documentation(cls, docstring, module=None):
return docstring.upper()
|
The function receives two arguments: the string that was about to be used (the module’s docstring or an empty string), and the module object from the pipeline if the documentation was requested for a specific instance of that module (else, None is passed).
Visibility¶
ModuleSettings.abstract
and ModuleSettings.hide_descriptor
can be used to prevent modules from appearing in the module palette. abstract
is for use with modules that should never be instantiated in the workflow and will not add the item to the module palette. On the other hand, hide_descriptor
will add the item to the palette, but hides it. This will prevent users from adding the module to a pipeline, but allow code to add it programmatically. To use either of these options, abstract
or hide_descriptor
, set it to True
:
1 2 3 4 5 6 7 | class AbstractModule(Module):
_settings = ModuleSettings(abstract=True)
...
class InvisibleModule(Module):
_settings = ModuleSettings(hide_descriptor=True)
...
|
Shape and Color¶
VisTrails allows users to assign custom colors and shapes to modules by using the ModuleSettings.color
and ModuleSettings.fringe
options. For example,
class FancyModule(Module):
_settings = ModuleSettings(color=(1.0, 0.0, 0.0),
fringe=[(0.0, 0.0),
(0.2, 0.0),
(0.2, 0.4),
(0.0, 0.4),
(0.0, 1.0)])
produces
and
class FancyModule2(Module):
_settings = ModuleSettings(color=(0.4,0.6,0.8),
fringe=[(0.0, 0.0),
(0.2, 0.0),
(0.0, 0.2),
(0.2, 0.4),
(0.0, 0.6),
(0.2, 0.8),
(0.0, 1.0)])
produces
The ModuleSettings.color
parameter must be a tuple of three floats between 0 and 1 that specify RGB colors for the module background, while ModuleSettings.fringe
is a list of pairs of floats that specify points as they go around a side of the module (the same one is used to go from the top-right corner to bottom-right corner, and from the bottom-left corner to the top-left one. If this is not enough, let the developers know!)
Alternatively, you may use different fringes for the left and right borders:
class FancyModule3(Module):
_settings = ModuleSettings(color=(1.0,0.8,0.6),
left_fringe=[(0.0, 0.0),
(-0.2, 0.0),
(0.0, 1.0)],
right_fringe=[(0.0, 0.0),
(0.2, 1.0),
(0.0, 1.0)])
Configuration Widgets¶
There are two types of widgets that are associated with modules. The first, the module configuration widget, is available to all modules regardless of inheritance. This type of widget allows users to configure modules in ways other than with the ports list in the Module Information panel. For example, the PythonSource
module uses a special widget that allows users to add ports as well as write code in a editor with line numbers and highlighting features. Developers wishing to create similar widgets should subclass from vistrails.gui.modules.module_configure.StandardModuleConfigurationWidget
and implement the saveTriggered
and resetTriggered
methods. Note that both the module and controller are passed into the constructor and are available as self.module
and self.controller
.
The second type of widget is the constant configuration widget which can only be defined for constant modules, that is those which subclass from vistrails.core.modules.basic_modules.Constant
. When such a module is used as the type of an input port, the user is allowed to edit the value in the ports list of the Module Information panel. The constant configuration widget is used to display and allow the user to edit the value of a parameter. The default widget is a simple line edit widget, but certain basic types in VisTrails like Color
and File
have specialized widgets that make specification easier.
Creation¶
Developers may build new constant configuration widgets using the vistrails.gui.modules.constant_configuration.ConstantWidgetBase
or vistrails.gui.modules.constant_configuration.ConstantEnumWidgetBase
base classes. Note that these base classes should be the second base class listed; the first should be a QWidget subclass. ConstantWidgetBase
is intended for use with “normal” while ConstantEnumWidgetBase
is intended for use with ports where the possible values are enumerated. For ConstantWidgetBase
subclasses, developers should implement the setContents
and contents
methods and optionally the setDefault
method. For ConstantEnumWidgetBase
subclasses, developers should implement the setValues
method and optionally the setFree
and setNonEmpty
methods.
As an example, consider the following widget:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 | from PyQt4 import QtCore, QtGui
from vistrails.gui.modules.constant_configuration import ConstantEnumWidgetBase
class NumericSliderWidget(QtGui.QSlider, ConstantEnumWidgetBase):
def __init__(self, param, parent=None):
QtGui.QSlider.__init__(self, parent)
self.setOrientation(QtCore.Qt.Horizontal)
self.setTracking(False)
self.setTickPosition(QtGui.QSlider.TicksBelow)
ConstantEnumWidgetBase.__init__(self, param)
self.connect(self, QtCore.SIGNAL("valueChanged(int)"),
self.update_parent)
def setValues(self, values):
self.setMinimum(int(values[0]))
self.setMaximum(int(values[1]))
def contents(self):
return unicode(self.value())
def setContents(self, contents, silent=True):
if contents:
self.setValue(int(contents))
if not silent:
self.update_parent()
|
Registration¶
To make VisTrails aware of these new widgets, developers should specifying them in the ModuleSettings
options. For example,
1 2 3 | class TestWidgets(Constant):
_settings = ModuleSettings(configure_widget="widgets:MyWidget",
constant_widget="widgets:ConstWgt")
|
Note that the PathString
is best specified relative to the base path of the package. Important: If MyWidget
is defined in the widgets
module of the test_widgets
package in userpackages
, its full path might be userpackages.test_widgets.widgets:MyWidget
, but we only include the inner path (widgets:MyWidget
). (The full path is used for internal packages, but this should be avoided for third-party packages.)
For constant widgets, VisTrails allows users to associate different widgets with different uses. A widget used for query may differ from the default display & edit widget, and developers may specify different widgets for these uses. Current uses include “query” and “paramexp” (parameter exploration). In addition, individual ports may specify different constant widgets using the InputPort.entry_type
setting. These specifications are tied to the widget’s type. To specify these associations, developers should use the ConstantWidgetConfig
settings. Also, QueryWidgetConfig
and ParamExpWidgetConfig
provide shortcuts for configurations for query and parameter exploration uses, respectively. Multiple widgets can be specified via the ModuleSettings.constant_widgets
setting. For example,
1 2 3 4 5 | class TestWidgets(Constant):
_settings = ModuleSettings(constant_widgets=[
ConstantWidgetConfig(widget="widgets:MyEnumWidget",
widget_type="enum"),
QueryWidgetConfig(widget="widgets:MyQueryWidget")])
|
Note that if a query or parameter exploration widget is not specified, VisTrails will generically adapt the default widget for those uses so you do not need to create a widget for each use.
Port Specification¶
Defaults and Labels¶
In versions 2.0 and greater, package developers can add labels and default values for parameters. To add this functionality, you need to use the default(s) and label(s) keyword arguments. For example,
1 2 3 4 5 6 | class TestDefaults(Module):
_input_ports = [IPort('word', 'basic:String',
default="Hello",
label="greeting"),
CIPort('center', 'basic:Float, basic:Float',
defaults=[10.0, 10.0], labels=["x", "y"])]
|
Note that simple ports use the singular InputPort.default
and InputPort.label
kwargs while compound input ports use plural forms, CompoundInputPort.defaults
and CompoundInputPort.labels
.
Optional Ports¶
An optional port is one that will not be visible by default in the module shape. For modules with many ports, developers might less-used ports optional to reduce clutter. To make a port optional, set the optional
flag to true:
1 2 3 | class ModuleWithManyPorts(Module):
_input_ports = [IPort('Port14', 'basic:String',
optional=True)]
|
Cardinality¶
By default, ports will accept any number of connections or parameters. However, the Module.get_input()
method will only access one of the inputs, and which one is not well-defined. To access all of the inputs, developers should use the Module.get_input_list()
method. The spreadsheet package uses this feature, so look there for usage examples (vistrails/packages/spreadsheet/basic_widgets.py)
In addition, VisTrails 2.1 introduced new port configuration arguments min_conns
and max_conns
that allow developers to enforce specific cardinalities on their ports. For example, a port that required at least two inputs could set min_conns=2
, and a port that does not accept more than a single input could set max_conns=1
. Currently, the values for min_conns
and max_conns
default to 0 and -1, respectively, which means that no connections are required and any number of connections are allowed. These will eventually be enforced by the GUI to help users building workflows.
Shape¶
As with modules, port shape can also be customized. There are three basic types besides the default square, “triangle”, “circle”, and “diamond”. Such types are specified as string values to the shape
setting. In addition, the triangle may be rotated by appending the degree of rotation (90, 180, or 270 only!) in the string. Finally, custom shapes are supported in a similar fashion to the module fringe. The shape should be defined in the [0,1] x [0,1] domain with 0 representing the top/left) and 1 being the bottom/right.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 | class FancyPorts(Module):
_input_ports = [IPort("normal", "basic:Float"),
IPort("triangle", "basic:Float",
shape="triangle"),
IPort("triangle90", "basic:Float",
shape="triangle90"),
IPort("circle", "basic:Float",
shape="circle"),
IPort("diamond", "basic:Float",
shape="diamond"),
IPort("pentagon", "basic:Float",
shape=[(0.0, 0.0), (0.0, 0.66),
(0.5, 1.0), (1.0, 0.66),
(1.0, 0.0)])]
|
This produces a module with ports that look like the following figure:
Signatures¶
We recommend using strings to define ports, but we still allow the actual classes to be used instead for backward compatibility. For example,
1 2 3 4 | from vistrails.core.modules.basic_modules import String
class MyModule(Module):
_input_ports = [IPort("a", String)]
|
This is not recommended for non-basic types due to the required import of the dependent package modules. If a package develoepr wants to use a module from another package, they must determine where in that package the module is defined, import that specific module, and then hope that future versions of that package do not change the location of that module. String-based signatures do not face the same issues as code reorganization is independent of the package definition. The grammar for a simple port signature is
1 2 | <module_string> := <package_identifier>:[<namespace>|]<module_name>
<port_signature> := "<module_string>"
|
and for a compound port:
1 2 | <compound_string> := ,<module_string>
<port_signature> := "<module_string><compound_string>*"
|
For example,
class MyModule(Module):
_input_ports = ("myInputPort", "org.suborg.pkg_name:Namespace|ModuleB")
Variable Output¶
There may be cases where a port may output values of different types. There are a few ways to tackle this–each has its own benefits and pitfalls. Because VisTrails modules obey inheritance principles, a port of a given type may produce/accept subclasses of itself. For example, an output port of type Constant
may output String
, Float
, or Integer
values since all are subclasses of Constant
. For input ports, Module
(the base class for all modules) is the most general input type and will accept any input. For example, the StandardOutput
module’s input port value
is of type Module
and it prints the string representation of the input value to stdout. However, for output ports, note that having an output of type Module
is less useful because there may be cases where a user wishes to use a general output as an input to a port that accepts a specific type. For example, consider a GetItem
module that takes a List
module and a Integer
parameter and outputs the element at the specified index. Its output port must be the most general type (e.g. Module
), but that means that a user who knows the list only contains floats cannot pass the output to a calculator that only takes floats as inputs. To address this issue, VisTrails provides the Variant
type which allows connections to any input port. VisTrails attempts to do run-time type-checking to ensure that the type passed in to the module is as advertised but allows general computations to remain general. For example, the GetItem
module might be constructed as:
1 2 3 4 | class GetItem(Module):
_input_ports = [IPort("list", "basic:List"),
IPort("index", "basic:Integer")]
_output_ports = [OPort("value", "basic:Variant")]
|
Connectivity¶
In some cases, it may be desirable to know which outputs are used before running a computation. The outputPorts
dictionary of the module stores connection information. Thus, you should be able to check
("myPortName" in self.outputPorts)
on the parent module to check if there are any downstream connections from the port “myPortName”. Note, however, that the caching algorithm assumes that all outputs are computed so adding a new connection to a previously unconnected output port will not work as desired if that module is cached. For this reason, we would currently recommend making such a module not cacheable. Another possibility is overriding the update()
method to check the output ports and set the upToDate
flag if they are not equal. Here is an example:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 | class TestModule(Module):
_output_ports = [('a1', 'basic:String'),
('a2', 'basic:String')]
def __init__(self):
Module.__init__(self)
self._cached_output_ports = set()
def update(self):
if len(set(self.outputPorts) - self._cached_output_ports) > 0:
self.upToDate = False
Module.update(self)
def compute(self):
if "a1" in self.outputPorts:
self.set_output("a1", "test")
if "a2" in self.outputPorts:
self.set_output("a2", "test2")
self._cached_output_ports = set(self.outputPorts)
|
Generating Modules Dynamically¶
When wrapping existing libraries or trying to generate modules in a more procedural manner, it is useful to dynamically generate modules. In our work, we have created some shortcuts to make this easier. In addition, the list of modules can also be based on the package configuration. Here is some example code:
__init__.py
1 2 3 4 5 6 7 | from vistrails.core.configuration import ConfigurationObject
identifier = "org.vistrails.examples.auto_example"
version = "0.0.1"
name = "AutoExample"
configuration = ConfigurationObject(use_b=True)
|
init.py
The expand_ports and build_modules methods are functions to help the construction of the modules easier. The key parts are the new_module call and setting the _modules variable.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 | from vistrails.core.modules.vistrails_module import new_module, Module
identifier = "org.vistrails.examples.auto_example"
def expand_ports(port_list):
new_port_list = []
for port in port_list:
port_spec = port[1]
if type(port_spec) == str: # or unicode...
if port_spec.startswith('('):
port_spec = port_spec[1:]
if port_spec.endswith(')'):
port_spec = port_spec[:-1]
new_spec_list = []
for spec in port_spec.split(','):
spec = spec.strip()
parts = spec.split(':', 1)
print 'parts:', parts
namespace = None
if len(parts) > 1:
mod_parts = parts[1].rsplit('|', 1)
if len(mod_parts) > 1:
namespace, module_name = mod_parts
else:
module_name = parts[1]
if len(parts[0].split('.')) == 1:
id_str = 'org.vistrails.vistrails.' + parts[0]
else:
id_str = parts[0]
else:
mod_parts = spec.rsplit('|', 1)
if len(mod_parts) > 1:
namespace, module_name = mod_parts
else:
module_name = spec
id_str = identifier
if namespace:
new_spec_list.append(id_str + ':' + module_name + \
':' + namespace)
else:
new_spec_list.append(id_str + ':' + module_name)
port_spec = '(' + ','.join(new_spec_list) + ')'
new_port_list.append((port[0], port_spec) + port[2:])
print new_port_list
return new_port_list
def build_modules(module_descs):
new_classes = {}
for m_name, m_dict in module_descs:
m_doc = m_dict.get("_doc", None)
m_inputs = m_dict.get("_inputs", [])
m_outputs = m_dict.get("_outputs", [])
if "_inputs" in m_dict:
del m_dict["_inputs"]
if "_outputs" in m_dict:
del m_dict["_outputs"]
if "_doc" in m_dict:
del m_dict["_doc"]
klass_dict = {}
if "_compute" in m_dict:
klass_dict["compute"] = m_dict["_compute"]
del m_dict["_compute"]
m_class = new_module(Module, m_name, klass_dict, m_doc)
m_class._input_ports = expand_ports(m_inputs)
m_class._output_ports = expand_ports(m_outputs)
new_classes[m_name] = (m_class, m_dict)
return new_classes.values()
def initialize():
global _modules
def a_compute(self):
a = self.get_input("a")
i = 0
if self.has_input("i"):
i = self.get_input("i")
if a == "abc":
i += 100
self.set_output("b", i)
module_descs = [("ModuleA", {"_inputs": [("a", "basic:String")],
"_outputs": [("b", "basic:Integer")],
"_doc": "ModuleA documentation",
"_compute": a_compute,
"namespace": "Test"}),
("ModuleB", {"_inputs": [("a", "Test|ModuleA")],
"_outputs": [("b", "Test|ModuleA")],
"_doc": "ModuleB documentation"})
]
if configuration.use_b:
_modules = build_modules(module_descs)
else:
_modules = build_modules(module_descs[:1])
_modules = []
|
Wrapping Command-line tools¶
Many existing programs are readily available through a command-line
interface. Also, many existing workflows are first implemented
through scripts, which work primarily with command-line
tools. This section describes how to wrap command-line applications so
they can be used with VisTrails. We will use as a running example the
afront
package, which wraps afront
, a command-line program
for generating 3D triangle meshes. We will wrap the basic
functionality in three different modules: Afront
, AfrontIso
, and MeshQualityHistogram
.
Each of these modules will be implemented by a Python
class, and they will all invoke the afront
binary.
Afront
is the base execution module, and
AfrontIso
requires extra parameters on top of the original
ones. Because of this, we will implement AfrontIso
as a
subclass of Afront
. MeshQualityHistogram
,
however, requires entirely different parameters, and so will not be
a subclass of Afront
. A first attempt at writing this package might look something like this:
__init__.py
1 2 3 | name = "Afront"
version = "0.1.0"
identifier = "edu.utah.sci.vistrails.afront"
|
init.py
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 | from vistrails.core.modules.vistrails_module import Module
... # other import statements
class Afront(Module):
def compute(self):
... # invokes afront
class AfrontIso(Afront):
def compute(self):
... # invokes afront with additional parameters
class MeshQualityHistogram(Module):
def compute(self):
... # invokes afront with completely different parameters
_modules = [Afront, AfrontIso, MeshQualityHistogram, ...]
|
Class Mixins¶
While this approach is a good start, it does require significant duplication of effort. Each module must contain code to invoke the afront
binary and pass it some parameters. Since this functionality is required by all three modules, we would like to put this code in a separate class called, say, AfrontRun
, and let each of our modules inherit from it. AfrontRun
itself is not a module, and thus does not extend the Module
class. So our three modules will inherit from both AfrontRun
and Module
. Helper classes such as this are often referred to as mixin classes. [2]
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 | from vistrails.core.modules.vistrails_module import Module, ModuleError
from vistrails.core.system import list2cmdline
import os
class AfrontRun(object):
_debug = False
def run(self, args):
cmd = ['afront', '-nogui'] + args
cmdline = list2cmdline(cmd)
if self._debug:
print cmdline
result = os.system(cmdline)
if result != 0:
raise ModuleError(self, "Execution failed")
class Afront(Module, AfrontRun):
...
class MeshQualityHistogram(Module, AfrontRun):
...
|
Now every module in the afront
package has access to
run()
. The other new feature in this snippet is
list2cmdline
, which turns a list of strings into a command
line. It does this in a careful way (protecting arguments with spaces,
for example). Notice that we use a call to a shell
(os.system()
) to invoke afront
. This is
frequently the easiest way to get third-party functionality into VisTrails.
Temporary File Management¶
Command-line programs typically generate files as outputs. On complicated pipelines, many files get created and passed to other modules. To facilitate the use of files as communication objects, VisTrails provides basic infrastructure for temporary file management. This way, package developers do not have to worry about file ownership and lifetimes.
To use this infrastructure, it must be possible to tell the program being called which filename to use as output. VisTrails can accommodate some filename requirements (in particular, specific filename extensions might be important in Windows environments, and these can be set), but it must be possible to direct the output to a certain filename.
We will use Afront's compute()
method to
illustrate the feature.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 | ...
class Afront(Module, AfrontRun):
def compute(self):
o = self.interpreter.filePool.create_file(suffix='.m')
args = []
if not self.has_input("file"):
raise ModuleError(self, "Needs input file")
args.append(self.get_input("file").name)
if self.has_input("rho"):
args.append("-rho")
args.append(str(self.get_input("rho")))
if self.has_input("eta"):
args.append("-reduction")
args.append(str(self.get_input("eta")))
args.append("-outname")
args.append(o.name)
args.append("-tri")
self.run(args)
self.set_output("output", o)
...
|
Line 5 shows how to create a temporary file
during the execution of a pipeline. There are a few new things
happening, so let us look at them one at a time. Every module holds a
reference to the current interpreter, the object responsible
for orchestrating the execution of a pipeline. This object has a
filePool
, which is what we will use to create a pipeline,
through the create_file
method. This method takes
optionally a named parameter suffix
, which forces the
temporary file that will be created to have the right extension.
The file pool returns an instance of basic_modules.File
,
a module that is provided by the basic VisTrails packages. There are
two important things you should know about File
. First, it
has a name
attribute that stores the name of the file it
represents. In this case, it is the name of the
recently-created temporary file. This allows you to safely use this
file when calling a shell, as seen on Line 17.
The other important feature is that it can be passed directly to an
output port, so that this file can be used by subsequent modules. This
is shown on Line 20.
The above code also introduces the boolean function has_input
(see Lines 7, 10, and 13). This is a simple error-checking function that verifies that the port has incoming data before the program attempts to read from it. It is considered good practice to call this function before invoking get_input
for any input port.
Accommodating badly-designed programs Even though it is considered bad design for a command-line program not to allow the specification of an output filename, there do exist programs that lack this functionality. In this case, a possible workaround is to execute the command-line tool, and move the generated file to the name given by VisTrails.
For System Administrators¶
Most users will want to put their custom packages in their
/.vistrails/userpackages
directory, as described in Section An Example Package. This makes the package available to the current user only. However, if you are a power user or a system administrator, you may wish to make certain packages available to all users of a VisTrails installation. To do this, copy the appropriate package files and/or directories to the
vistrails/packages
directory of the VisTrails distribution. The packages will be made visible to users the next time they launch VisTrails.
Footnotes
[1] | This package is not included in binary distributions of VisTrails, but is available in the source code distribution. The stand-alone Afront utility is available at http://afront.sourceforge.net. |
[2] | Programmers who are more comfortable with single-inheritance languages like Java and C# may be unfamiliar with mixins. A mixin class is similar to an interface in Java or C#, except that a mixin provides an implementation as well. Python’s support for multiple inheritance allows subclasses to “import” functionality as needed from a mixin class, without artificially cluttering the base class’s interface. |