Difference between revisions of "UsersGuideVisTrailsPackages"
m |
m (→This page might be outdated: Fixes link) |
||
(9 intermediate revisions by 4 users not shown) | |||
Line 1: | Line 1: | ||
== This page might be outdated == | |||
Please go to http://www.vistrails.org/usersguide/v2.1/html/packages.html for the most recent version of this page. | |||
== Introduction == | == Introduction == | ||
VisTrails provides infrastructure | VisTrails provides a plugin infrastructure to integrate user-defined | ||
functions and libraries. Specifically, users can incorporate their own visualization and simulation codes into pipelines by defining custom modules (or wrappers). These modules are bundled in what we call '''packages'''. A VisTrails package is simply a collection of Python classes -- each of these classes will represent a new module -- created by the user that respects a certain convention. Here's a simplified example of a very simple user-defined module: | |||
class Divide(Module): | class Divide(Module): | ||
Line 20: | Line 24: | ||
== Dealing with command line tools and side effects == | == Dealing with command line tools and side effects == | ||
In an ideal world | In an ideal world a module's outputs should be completely determined by its inputs. This is important for provenance purposes - if modules have implicit dependencies, it is not possible to be certain that the same results will be generated when the process is reexecuted. | ||
However, it is clear that certain modules are inherently side-effectful (reading/writing files, network, etc). For the common case of temporary files, VisTrails provides a convenience layer that removes part of the burden of managing temporary files. As an illustrative example, consider one of the packages we make available for image conversion, using the [http://www.imagemagick.org/ ImageMagick] suite: | However, it is clear that certain modules are inherently side-effectful (reading/writing files, network, etc). For the common case of temporary files, VisTrails provides a convenience layer that removes part of the burden of managing temporary files. As an illustrative example, consider one of the packages we make available for image conversion, using the [http://www.imagemagick.org/ ImageMagick] suite: | ||
Line 195: | Line 199: | ||
[[Image:PackageCustomColorShape3.png]] | [[Image:PackageCustomColorShape3.png]] | ||
== How to make your package reloadable == | |||
You need to move almost everything in the __init__.py file to a new file "init.py", but keep the identifier, name, version, configuration, and package_dependencies fields/methods in the __init__.py file. Specifically, make sure that imports (excluding things like core.configuration) and the initialize method are in the init.py file. For example, take a look at the __init__.py file of the pylab package included in VisTrails: | |||
identifier = 'edu.utah.sci.vistrails.matplotlib' | |||
name = 'matplotlib' | |||
version = '0.9.0' | |||
def package_dependencies(): | |||
import core.packagemanager | |||
manager = core.packagemanager.get_package_manager() | |||
if manager.has_package('edu.utah.sci.vistrails.spreadsheet'): | |||
return ['edu.utah.sci.vistrails.spreadsheet'] | |||
else: | |||
return [] | |||
def package_requirements(): | |||
import core.requirements | |||
if not core.requirements.python_module_exists('matplotlib'): | |||
raise core.requirements.MissingRequirement('matplotlib') | |||
if not core.requirements.python_module_exists('pylab'): | |||
raise core.requirements.MissingRequirement('pylab') | |||
And the init.py contains the other imports, class definitions and the initialize method. | |||
== Adding default values and/or labels for parameters == | |||
In versions 1.4 and greater, package developers can add labels and default values for parameters. To add this functionality, you need to use the <code>defaults</code> and <code>labels</code> keyword arguments and pass the values as '''strings'''. For example, | |||
class TestDefaults(Module): | |||
_input_ports = [("f1", "(edu.utah.sci.vistrails.basic:Float)", | |||
{"defaults": str([1.23]), "labels": str(["temp"])})] | |||
_modules = [TestDefaults] | |||
or in the older syntax, | |||
def initialize(): | |||
reg = core.modules.module_registry.get_module_registry() | |||
reg.add_module(TestDefaults2) | |||
reg.add_input_port(TestDefaults2, "f2", [Float, String], | |||
defaults=str([4.56, "abc"]), | |||
labels=str(["temp", "name"])) | |||
== Packages that generate modules dynamically == | |||
When wrapping existing libraries or trying to generate modules in a more procedural manner, it is useful to dynamically generate modules. In our work, we have created some shortcuts to make this easier. In addition, the list of modules can also be based based on the package configuration. Here is some example code: | |||
=== __init__.py === | |||
from core.configuration import ConfigurationObject | |||
identifier = "edu.utah.sci.dakoop.auto_example" | |||
version = "0.0.1" | |||
name = "AutoExample" | |||
configuration = ConfigurationObject(use_b=True) | |||
=== init.py === | |||
The <code>expand_ports</code> and <code>build_modules</code> methods are functions to help the construction of the modules easier. The key parts are the <code>new_module</code> call and setting the <code>_modules</code> variable. | |||
from core.modules.vistrails_module import new_module, Module | |||
identifier = "edu.utah.sci.dakoop.auto_example" | |||
def expand_ports(port_list): | |||
new_port_list = [] | |||
for port in port_list: | |||
port_spec = port[1] | |||
if type(port_spec) == str: # or unicode... | |||
if port_spec.startswith('('): | |||
port_spec = port_spec[1:] | |||
if port_spec.endswith(')'): | |||
port_spec = port_spec[:-1] | |||
new_spec_list = [] | |||
for spec in port_spec.split(','): | |||
spec = spec.strip() | |||
parts = spec.split(':', 1) | |||
print 'parts:', parts | |||
namespace = None | |||
if len(parts) > 1: | |||
mod_parts = parts[1].rsplit('|', 1) | |||
if len(mod_parts) > 1: | |||
namespace, module_name = mod_parts | |||
else: | |||
module_name = parts[1] | |||
if len(parts[0].split('.')) == 1: | |||
id_str = 'edu.utah.sci.vistrails.' + parts[0] | |||
else: | |||
id_str = parts[0] | |||
else: | |||
mod_parts = spec.rsplit('|', 1) | |||
if len(mod_parts) > 1: | |||
namespace, module_name = mod_parts | |||
else: | |||
module_name = spec | |||
id_str = identifier | |||
if namespace: | |||
new_spec_list.append(id_str + ':' + module_name + ':' + \ | |||
namespace) | |||
else: | |||
new_spec_list.append(id_str + ':' + module_name) | |||
port_spec = '(' + ','.join(new_spec_list) + ')' | |||
new_port_list.append((port[0], port_spec) + port[2:]) | |||
print new_port_list | |||
return new_port_list | |||
def build_modules(module_descs): | |||
new_classes = {} | |||
for m_name, m_dict in module_descs: | |||
m_doc = m_dict.get("_doc", None) | |||
m_inputs = m_dict.get("_inputs", []) | |||
m_outputs = m_dict.get("_outputs", []) | |||
if "_inputs" in m_dict: | |||
del m_dict["_inputs"] | |||
if "_outputs" in m_dict: | |||
del m_dict["_outputs"] | |||
if "_doc" in m_dict: | |||
del m_dict["_doc"] | |||
klass_dict = {} | |||
if "_compute" in m_dict: | |||
klass_dict["compute"] = m_dict["_compute"] | |||
del m_dict["_compute"] | |||
m_class = new_module(Module, m_name, klass_dict, m_doc) | |||
m_class._input_ports = expand_ports(m_inputs) | |||
m_class._output_ports = expand_ports(m_outputs) | |||
new_classes[m_name] = (m_class, m_dict) | |||
return new_classes.values() | |||
def initialize(): | |||
global _modules | |||
def a_compute(self): | |||
a = self.getInputFromPort("a") | |||
i = 0 | |||
if self.hasInputFromPort("i"): | |||
i = self.getInputFromPort("i") | |||
if a == "abc": | |||
i += 100 | |||
self.setResult("b", i) | |||
module_descs = [("ModuleA", {"_inputs": [("a", "basic:String")], | |||
"_outputs": [("b", "basic:Integer")], | |||
"_doc": "ModuleA documentation", | |||
"_compute": a_compute, | |||
"namespace": "Test"}), | |||
("ModuleB", {"_inputs": [("a", "Test|ModuleA")], | |||
"_outputs": [("b", "Test|ModuleA")], | |||
"_doc": "ModuleB documentation"}) | |||
] | |||
if configuration.use_b: | |||
_modules = build_modules(module_descs) | |||
else: | |||
_modules = build_modules(module_descs[:1]) | |||
_modules = [] | |||
== Help! This documentation wasn't good enough! == | == Help! This documentation wasn't good enough! == | ||
Sorry, it's our fault! If you need help | Sorry, it's our fault! If you need help, join the vistrails-users list and post your question there. |
Latest revision as of 17:55, 28 April 2014
This page might be outdated
Please go to http://www.vistrails.org/usersguide/v2.1/html/packages.html for the most recent version of this page.
Introduction
VisTrails provides a plugin infrastructure to integrate user-defined functions and libraries. Specifically, users can incorporate their own visualization and simulation codes into pipelines by defining custom modules (or wrappers). These modules are bundled in what we call packages. A VisTrails package is simply a collection of Python classes -- each of these classes will represent a new module -- created by the user that respects a certain convention. Here's a simplified example of a very simple user-defined module:
class Divide(Module): def compute(self): arg1 = self.getInputFromPort("arg1") arg2 = self.getInputFromPort("arg2") if arg2 == 0.0: raise ModuleError(self, "Division by zero") self.setResult("result", arg1 / arg2) registry.addModule(Divide) registry.addInputPort(Divide, "arg1", (basic.Float, 'dividend')) registry.addInputPort(Divide, "arg2", (basic.Float, 'divisor')) registry.addOutputPort(Divide, "result", (basic.Float, 'quotient'))
New VisTrails modules must subclass from Module, the base class that defines basic functionality. The only required override is the compute() method, which performs the actual module computation. Input and output is specified through ports, which currently have to be explicitly registered with VisTrails. However, this is straightforward, and done through method calls to the module registry. A complete documented example of a (slightly) more complicated module is available here.
Dealing with command line tools and side effects
In an ideal world a module's outputs should be completely determined by its inputs. This is important for provenance purposes - if modules have implicit dependencies, it is not possible to be certain that the same results will be generated when the process is reexecuted.
However, it is clear that certain modules are inherently side-effectful (reading/writing files, network, etc). For the common case of temporary files, VisTrails provides a convenience layer that removes part of the burden of managing temporary files. As an illustrative example, consider one of the packages we make available for image conversion, using the ImageMagick suite:
class Convert(ImageMagick): """Convert is the base Module for VisTrails Modules in the ImageMagick package that deal with operations on images. Convert is a bit of a misnomer since the 'convert' tool does more than simply file format conversion. Each subclass has a descriptive name of the operation it implements.""" def create_output_file(self): """Creates a File with the output format given by the outputFormat port.""" if self.hasInputFromPort('outputFormat'): s = '.' + self.getInputFromPort('outputFormat') return self.interpreter.filePool.create_file(suffix=s) def geometry_description(self): """returns a string with the description of the geometry as indicated by the appropriate ports (geometry or width and height)""" # if complete geometry is available, ignore rest if self.hasInputFromPort("geometry"): return self.getInputFromPort("geometry") elif self.hasInputFromPort("width"): w = self.getInputFromPort("width") h = self.getInputFromPort("height") return "'%sx%s'" % (w, h) else: raise ModuleError(self, "Needs geometry or width/height") def run(self, *args): """run(*args), runs ImageMagick's 'convert' on a shell, passing all arguments to the program.""" cmdline = ("convert" + (" %s" * len(args))) % args if not self.__quiet: print cmdline r = os.system(cmdline) if r != 0: raise ModuleError(self, "system call failed: '%s'" % cmdline) def compute(self): o = self.create_output_file() i = self.input_file_description() self.run(i, o.name) self.setResult("output", o) (...) reg.addModule(Convert) reg.addInputPort(Convert, "geometry", (basic.String, 'ImageMagick geometry')) reg.addInputPort(Convert, "width", (basic.String, 'width of the geometry for operation')) reg.addInputPort(Convert, "height", (basic.String, 'height of the geometry for operation')) reg.addOutputPort(Convert, "output", (basic.File, 'the output file'))
This example introduces several new VisTrails features. The last line of the snippet registers an output port that provides a file. Immediately, a file output presents several problems when a pipeline is to be shared among users in heterogenous environments. For example, where should a file be written to? For temporary files, VisTrails provides a file pool class, that manages temporary files and their lifetimes automatically, so that users don't have to worry about deleting them post-execution. To create a temporary file, a user calls, for example
fileObj = self.interpreter.filePool.create(suffix=".png")
fileObj will then contain a module that represents a file. The file pool simply creates a temporary file with write permissions, whose local name is available, in this case, as fileObj.name. The package developer is then free to use this file for any purpose.
Another feature of this example is the use of command line tools. Notice that Python provides a very convenient way to execute commands through a shell. In this case, we use os.system on a command-line that executes the appropriate program.
Interaction with Caching
VisTrails provides a caching mechanism, in which portions of pipelines that are common across different executions are automatically shared. However, some modules are intrinsically side-effectful (writing a report to stdout, or a file to disk, or creating a user interface widget), and should not be shared. Caching control is therefore up to the package developer. By default, caching is enabled. So a developer that doesn't want caching to apply must make small changes to the module. There's a convenient way to disable caching entirely, by using multiple inheritance, and deriving from a mixin class that's provided by VisTrails. For example, look at the StandardOutput module:
from core.modules.vistrails_module import Module, newModule, \ NotCacheable, ModuleError (...) class StandardOutput(NotCacheable, Module): """StandardOutput is a VisTrails Module that simply prints the value connected on its port to standard output. It is intended mostly as a debugging device.""" def compute(self): v = self.getInputFromPort("value") print v
By subclassing from NotCacheable as well as from Module (or one of its subclasses), VisTrails automatically will not cache this module, or anything downstream from it.
VisTrails also allows a more sophisticated decision on whether to use caching or not. To do that, a user simply overrides the method is_cacheable to return the correct value. This allows context-dependent decisions. For example, in the teem package, there's a module that generates a scalar field with random numbers. This is non-deterministic, so shouldn't be cached. However, this module only generates non-deterministic values in special occasions, depending on its input port values. To keep efficiency when caching is possible, while still maintaining correctness, that module implements the following override:
class Unu1op(Unu): (...) def is_cacheable(self): return not self.getInputFromPort('op') in ['rand', 'nrand'] (...)
Notice that the module explicitly uses inputs to decide whether it should be cached. This allows reasonably fine-grained control over the process.
Interaction with Other Packages
When developing more complicated packages, it becomes natural to split code among different VisTrails packages, and have one logically depend on the other. For example, in one package (say, named ' package_base '), a user might define
class PackageBaseModule(Module): ...
def initialize(): registry.addModule(PackageBaseModule) ...
And then, in another package (say, ' package_derived '),
class DerivedModule(PackageBaseModule): ...
def initialize(): registry.addModule(DerivedModule) ...
Because of the way packages are loaded, package_derived cannot be initialized before package_base. VisTrails provides a mechanism for specifying interpackage dependencies. Every VisTrails package can provide a list of necessary installed packages. This is done by providing a callable in the package under the name package_dependencies. For example, here's how the VTK VisTrails package declares dependencies:
def package_dependencies(): import core.packagemanager manager = core.packagemanager.get_package_manager() if manager.has_package('spreadsheet'): return ['spreadsheet'] else: return []
The callable must return a list of strings, representing the name of the packages it depends on. We also use this example to introduce the package manager API, that is useful here for inspecting packages present in the system. Notice that the dependencies are not static. vtk depends on spreadsheet if and only if spreadsheet is present in the system. Otherwise, it has no dependencies.
Note: Circular dependencies are not allowed. They will be detected by VisTrails and an error will be signalled.
Note: Currently, package names are reasonably brittle, in the sense that conflicts in package naming might become an issue. We are in the process of designing an API that will allow more robust naming schemes.
User-defined module shapes and colors
VisTrails allows users to define custom colors and shapes to modules. This must be done at module registration time, by passing special parameters to addModule. For example:
reg.addModule(Afront, moduleColor=(1.0,0.0,0.0), moduleFringe=[(0.0, 0.0), (0.2, 0.0), (0.2, 0.4), (0.0, 0.4), (0.0, 1.0)])
gives this result:
This piece of code
reg = core.modules.module_registry reg.addModule(Afront, moduleColor=(0.4,0.6,0.8), moduleFringe=[(0.0, 0.0), (0.2, 0.0), (0.0, 0.2), (0.2, 0.4), (0.0, 0.6), (0.2, 0.8), (0.0, 1.0)])
gives this result:
The moduleColor parameter must be a tuple of three floats between 0 and 1 that specify RGB colors for the module background, while moduleFringe is a list of pairs of floats that specify points as they go around a side of the module (the same one is used to go from the top-right corner to bottom-right corner, and from the bottom-left corner to the top-left one. If this is not enough, let the developers know!)
Alternatively, you can use different fringes for the left and right borders:
reg.addModule(Afront, moduleColor=(1.0,0.8,0.6), moduleLeftFringe=[(0.0, 0.0), (-0.2, 0.0), (0.0, 1.0)], moduleRightFringe=[(0.0, 0.0), (0.2, 1.0), (0.0, 1.0)])
which gives this:
How to make your package reloadable
You need to move almost everything in the __init__.py file to a new file "init.py", but keep the identifier, name, version, configuration, and package_dependencies fields/methods in the __init__.py file. Specifically, make sure that imports (excluding things like core.configuration) and the initialize method are in the init.py file. For example, take a look at the __init__.py file of the pylab package included in VisTrails:
identifier = 'edu.utah.sci.vistrails.matplotlib' name = 'matplotlib' version = '0.9.0'
def package_dependencies(): import core.packagemanager manager = core.packagemanager.get_package_manager() if manager.has_package('edu.utah.sci.vistrails.spreadsheet'): return ['edu.utah.sci.vistrails.spreadsheet'] else: return [] def package_requirements(): import core.requirements if not core.requirements.python_module_exists('matplotlib'): raise core.requirements.MissingRequirement('matplotlib') if not core.requirements.python_module_exists('pylab'): raise core.requirements.MissingRequirement('pylab')
And the init.py contains the other imports, class definitions and the initialize method.
Adding default values and/or labels for parameters
In versions 1.4 and greater, package developers can add labels and default values for parameters. To add this functionality, you need to use the defaults
and labels
keyword arguments and pass the values as strings. For example,
class TestDefaults(Module): _input_ports = [("f1", "(edu.utah.sci.vistrails.basic:Float)", {"defaults": str([1.23]), "labels": str(["temp"])})] _modules = [TestDefaults]
or in the older syntax,
def initialize(): reg = core.modules.module_registry.get_module_registry() reg.add_module(TestDefaults2) reg.add_input_port(TestDefaults2, "f2", [Float, String], defaults=str([4.56, "abc"]), labels=str(["temp", "name"]))
Packages that generate modules dynamically
When wrapping existing libraries or trying to generate modules in a more procedural manner, it is useful to dynamically generate modules. In our work, we have created some shortcuts to make this easier. In addition, the list of modules can also be based based on the package configuration. Here is some example code:
__init__.py
from core.configuration import ConfigurationObject identifier = "edu.utah.sci.dakoop.auto_example" version = "0.0.1" name = "AutoExample" configuration = ConfigurationObject(use_b=True)
init.py
The expand_ports
and build_modules
methods are functions to help the construction of the modules easier. The key parts are the new_module
call and setting the _modules
variable.
from core.modules.vistrails_module import new_module, Module identifier = "edu.utah.sci.dakoop.auto_example" def expand_ports(port_list): new_port_list = [] for port in port_list: port_spec = port[1] if type(port_spec) == str: # or unicode... if port_spec.startswith('('): port_spec = port_spec[1:] if port_spec.endswith(')'): port_spec = port_spec[:-1] new_spec_list = [] for spec in port_spec.split(','): spec = spec.strip() parts = spec.split(':', 1) print 'parts:', parts namespace = None if len(parts) > 1: mod_parts = parts[1].rsplit('|', 1) if len(mod_parts) > 1: namespace, module_name = mod_parts else: module_name = parts[1] if len(parts[0].split('.')) == 1: id_str = 'edu.utah.sci.vistrails.' + parts[0] else: id_str = parts[0] else: mod_parts = spec.rsplit('|', 1) if len(mod_parts) > 1: namespace, module_name = mod_parts else: module_name = spec id_str = identifier if namespace: new_spec_list.append(id_str + ':' + module_name + ':' + \ namespace) else: new_spec_list.append(id_str + ':' + module_name) port_spec = '(' + ','.join(new_spec_list) + ')' new_port_list.append((port[0], port_spec) + port[2:]) print new_port_list return new_port_list def build_modules(module_descs): new_classes = {} for m_name, m_dict in module_descs: m_doc = m_dict.get("_doc", None) m_inputs = m_dict.get("_inputs", []) m_outputs = m_dict.get("_outputs", []) if "_inputs" in m_dict: del m_dict["_inputs"] if "_outputs" in m_dict: del m_dict["_outputs"] if "_doc" in m_dict: del m_dict["_doc"] klass_dict = {} if "_compute" in m_dict: klass_dict["compute"] = m_dict["_compute"] del m_dict["_compute"] m_class = new_module(Module, m_name, klass_dict, m_doc) m_class._input_ports = expand_ports(m_inputs) m_class._output_ports = expand_ports(m_outputs) new_classes[m_name] = (m_class, m_dict) return new_classes.values() def initialize(): global _modules def a_compute(self): a = self.getInputFromPort("a") i = 0 if self.hasInputFromPort("i"): i = self.getInputFromPort("i") if a == "abc": i += 100 self.setResult("b", i) module_descs = [("ModuleA", {"_inputs": [("a", "basic:String")], "_outputs": [("b", "basic:Integer")], "_doc": "ModuleA documentation", "_compute": a_compute, "namespace": "Test"}), ("ModuleB", {"_inputs": [("a", "Test|ModuleA")], "_outputs": [("b", "Test|ModuleA")], "_doc": "ModuleB documentation"}) ] if configuration.use_b: _modules = build_modules(module_descs) else: _modules = build_modules(module_descs[:1]) _modules = []
Help! This documentation wasn't good enough!
Sorry, it's our fault! If you need help, join the vistrails-users list and post your question there.