spacer spacer spacer
spacer spacer spacer
spacer
NASA Jet Propulsion Laboratory, California Institute of Technology + View the NASA Portal

+ NASA en Español

+ Contact NASA
Search the API    

User Guide

This is the user guide for the OODT Catalog and Archive Service (CAS) Workflow Manager component, or Workflow Manager for short. This guide explains the Workflow Manager architecture including its extension points. The guide also discusses available services provided by the Workflow Manager, how to utilize them, and the different APIs that exist. The guide concludes with a description of Workflow Manager use cases.

Architecture

The Workflow Manager component is responsible for description, execution, and monitoring of Workflows , using a client, and a server system. Workflows are typically considered to be sequences of tasks, joined together by control flow, and data flow, that must execute in some ordered fashion. Workflows typically generate output data, perform routine management tasks (such as email, etc.), or describe a business's internal routine practices. The Workflow Manager is an extensible software component that provides an XML-RPC external interface, and a fully tailorable Java-based API for workflow management. The critical objects managed by the Workflow Manager include:

  • Events - are what trigger Workflows to be executed. Events are named, and contain dynamic Metadata information, passed in by the user.
  • Metadata - a dynamic set of properties, and values, provided to a WorkflowInstance via a user-triggered Event.
  • Workflow - a description of both the control f low, and data flow of a sequence of tasks (or stages that must be executed in some order.
  • Workflow Instance - an instance of a Workflow, typically containing additional runtime descriptive information, such as start time, end time, task wall clock time, etc. A WorkflowInstance also contains a shared Metadata context, passed in by the user who triggered the Workflow. This context can be read/written to by the underlying WorkflowTasks, present in a Workflow.
  • Workflow Tasks - descriptions of data flow, and an underlying process, or stage, that is part of a Workflow.
  • Workflow Task Instances - the actual executing code, or process, that performs the work in the Workflow Task.
  • Workflow Task Configuration - static configuration properties, that configure a WorkflowTask.
  • Workflow Conditions - any pre (or post) conditions on the execution of a WorkflowTask.
  • Workflow Condition Instances - the actual executing code, or process, that performs the work in the Workflow Condition.

Each Event kicks off 1 or more Workflow Instances, providing a Metadata context (submitted by an external user). Each Workflow Instance is a run-time execution model of a Workflow. Each Workflow contains 1 or more Workflow Tasks. Each Workflow Task contains a single Workflow Task Configuration, and one or more Workflow Conditions. Each Workflow Task has a corresponding Workflow Task Instance (that it models), as does each Workflow Condition have a corresponding Workflow Condition Instance. These relationships are shown in the below figure.

Workflow Manager Object Model

Extension Points

There are several extension points for the Workflow Manager. An extension point is an interface within the workflow manager that can have many implementations. This is particularly useful when it comes to software component configuration because it allows different implementations of an existing interface to be selected at deployment time. So, the Workflow Manager component may communicate with a Database-based Workflow Instance Repository, and an XML-based Workflow Repository (to store workflow descriptions), or it may use a Lucene-based Workflow Instance Repository, and a Database-based Workflow repository. The selection of the actual component implementations is handled entirely by the extension point mechanism. Using extension points, it is fairly simple to support many different types of what are typically referred to as “plug-in architectures” Each of the core extension points for the Workflow Manager is described below:

Workflow Instance Repository The Workflow Instance Repository extension point is responsible for storing all the instance data for Workflow Instances, including shared context metadata, runtime properties such as start date time, end date time, and task start/end date time.
Workflow Repository The Workflow Repository extension point is responsible for managing Workflow models, storing control flow, and Workflow Tasks, which model data flow. The Workflow Repository also stores Workflow Condition information, and Workflow Task Configuration. In essence, the Workflow Repository is a repository of abstract Workflow models, that get turned into Workflow Instances by the Engine extension point.
Workflow Engine The Workflow Engine's responsibility is to turn abstract Workflow models into executing Workflow Instances. The Workflow Engine tracks and monitors execution of Workflow Instances, and provides the ability to start, stop and pause executing Workflow Instances.
System The extension point that provides the external interface to the Workflow Manager services. This includes the Workflow Manager server interface, as well as the associated Workflow Manager client interface, that communicates with the server.

The relationships between the extension points for the Workflow Manager are shown in the below Figure.

Workflow Manager Extension Points

Key Capabilities

The Workflow Manager is responsible for providing the necessary key capabilities for managing processing pipelines, data flow, and control flow. Each high level capability provided by the Workflow Manager is detailed below:

  1. Support for representation of Workflow as a directed graph, allowing for true parallelism.
  2. Support for identified workflow patterns especially control-flow.
  3. Support for capturing data-flow.
  4. Support for persistance of Workflow Instances to several backend repositories, including relational databases, and Apache Lucene flat file indices.
  5. Representation of Workflow models as XML documents.
  6. Scalability – The Workflow Manager uses the popular client-server paradigm, allowing new Workflow Manager servers to be instantiated, as needed, without affecting the Workflow Manager clients, and vice-versa.
  7. Communication over lightweight, standard protocols – The Workflow Manager uses XML-RPC, as its main external interface, between Workflow Manager client and server. XML-RPC, the little brother of SOAP, is fast, extensible, and uses the underlying HTTP protocol for data transfer.
  8. Event-driven Workflow execution, including arbitrary Metadata parameters, provided as a shared context between stages of the executing Workflow.

This capability set is not exhaustive, and is meant to give the user a “feel” for what general features are provided by the Workflow Manager. Most likely the user will find that the Workflow Manager provides many other capabilities besides those described here.

Current Extension Point Implementations

There are at least two implementations of all of the aforementioned extension points for the Manager, with the exception of the ThreadPoolWorkflowEngine, which itself is meant to be an extension point. Each extension point implementation is detailed below:

  • Workflow Instance Repository

    1. Data Source based Workflow Instance Repository – an implementation of the Workflow Instance Repository extension point interface that uses a JDBC accessible database backend.
    2. Lucene based Workflow Instance Repository – an implementation of the Workflow Instance Repository extension point interface that uses the Lucene free text index system to store Workflow Instance information.
    3. Memory based Workflow Instance Repository - an implementation of the Workflow Instance Repository extension point interface that stores Workflow Instance information in runtime memory.
  • Workflow Repository

    1. Data Source based Workflow Repository – an implementation of the Workflow Repository extension point that stores Workflow model information in a JDBC accessible database.
    2. XML based Workflow Repository – an implementation of the Workflow Repository extension point that stores Workflow model information in XML files ending in *.workflow.xml , as well as files named tasks.xml , conditions.xml , and events.xml .
  • Workflow Engine

    1. ThreadPoolWorkflowEngine - an implementaiton of the Workflow Engine that itself is meant to be an extension point for WorkflowEngines that want to implement ThreadPooling. This WorkflowEngine provides everything needed to manage a ThreadPool using Doug Lea's wonderful java.util.concurrent package that made it into JDK5.
  • System (Workflow Manager client and Workflow Manager server)

    1. XML-RPC based Workflow Manager server – an implementation of the external server interface for the Workflow Manager that uses XML-RPC as the transportation medium.
    2. XML-RPC based Workflow Manager client – an implementation of the client interface for the XML-RPC Workflow Manager server that uses XML-RPC as the transportation medium.

Configuration and Installation

To install the Workflow Manager, you need to download a release of the workflow manager, available from its home web site. For bleeding-edge features, you can also check out the cas-workflow trunk project from the OODT subversion repository. You can browse the repository using ViewCVS, located at: http://oodt.jpl.nasa.gov/vc/svn/ The actual web url for the repository is located at: http://oodt.jpl.nasa.gov/repo/ To check out the Workflow Manager, use your favorite Subversion client. Several clients are listed a http://oodt.jpl.nasa.gov/wiki/display/oodt/Subversion .

Project Organization

The cas-workflow project follows the traditional Subversion-style trunk , tag and branches format. Trunk corresponds to the latest and greatest development on the cas-workflow. Tags are official release versions of the project. Branches correspond to deviations from the trunk large enough to warrant a separate development tree.

For the purposes of this the User Guide, we'll assume you already have downloaded a built release of the workflow manager, from its web site. If you were building cas-workflow from the trunk, a tagged release (or branch) the process would be quite similar. To build cas-workflow, you would need the Apache Maven software. Maven is an XML-based, project management system similar to Apache Ant, but with many extra bells and whistles. Maven makes cross-platform project development a snap. You can download Maven from: http://maven.apache.org The cas-workflow is constructed to be compatible with the 1.x.x series of Maven, despite Maven 2.x having already been released. Once you have Maven installed, follow the pr ocedures in the below Sections to build a fresh copy of the Workflow Manager:

Building the Workflow Manager

  1. cd to cas-workflow, and then type:
    # maven dist
    This will perform several tasks, including compiling the source code, downloading required jar files, running unit tests, and so on. When the command completes, cd to the target/distributions directory within cas-workflow. This will contain the build of the Workflow Manager component, of the following form:
                cas-workflow-${version}.tar.gz
               
    This is a distribution tar ball, that you would copy to a deployment directory, such as /usr/local/ , and then unpack using # tar xvzf . The resultant directory layout from the unpacked tarball is as follows:
                bin/ etc/ logs/ docs/ lib/ policy/ LICENSE.txt CHANGES.txt
               
    • bin - contains the "wmgr" server script, and the "wmgr-client" client script.
    • etc - contains the logging.properties file for the Workflow Manager, and the workflow.properties file used to configure the server options.
    • logs - the default directory for log files to be written to.
    • docs - contains Javadoc documentation, and user guides for using the particular CAS component.
    • lib - the required Java jar files to run the Workflow Manager.
    • policy – the default XML-based element and product type policy in case the user is using the Lucene Workflow Instance Repository and/or the XML Workflow Repository, along with the ThreadPoolWorkflowEngine.
    • CH ANGES.txt - contains the CHANGES present in this released version of the CAS component.
    • LICENSE.txt - the LICENSE for the Workflow Manager project.

Deploying the Workflow Manager

To deploy the workflow manager, you'll need to create an installation directory. Typically this would be somewhere in /usr/local (on *nix style systems), or C:\Program Files\ (on windows style systems). We'll assume that you're installing on a *nix style system though the Windows instructions are quite similar.

Follow the process below to deploy the Workflow Manager:

  1. Copy the binary distribution to the deployment directory
    # cp -R cas-workflow/trunk/target/distributions/cas-workflow-${version}.tar.gz /usr/local/
  2. Untar the distribution
    # cd /usr/local ; tar xvzf cas-workflow-${version}.tar.gz
  3. Set up a symlink
    # ln -s /usr/local/cas-workflow-${version} /usr/local/workflow
  4. edit /usr/local/workflow/bin/wmgr
    • Set the SERVER_PORT variable to the desired port you'd like to run the Workflow Manager server on.
    • Set the JAVA_HOME variable to point to the location of your installed JRE runtime.
    • Set the RUN_HOME variable to point to the location you'd like the Workflow Manager PID file written to. Typically this should default to /var/run , but not all system administrators allow users to write to /var/run .
  5. edit /usr/local/workflow/bin/wmgr-client
    • Set the JAVA_HOME variable to point to the location of your installed JRE runtime.
  6. (optional) edit /usr/local/workflow/etc/logging.properties
    • Set the logging levels for each subsystem to the desired level. The system defaults are fairly considerate and prevent much of the logging at levels below INFO to the console.
  7. edit /usr/local/workflow/etc/workflow.properties
    • This java properties file contains all of the default information properties to configure the Workflow Manager. By default, the Workflow Manager is built to use the XML-based Workflow Repository, the ThreadPoolWorkflowEngine, and Lucene Workflow Instance Repository extension points. These defaults can be changed quite easily by changing the factory classes that are pointed to for each extension point. For example, to use the DataSource based Workflow Instance Repository extension point, you would change the following property, workflow.engine.instRep.factory to gov.nasa.jpl.oodt.cas.workflow.instrepo.DataSourceWorkflowInstanceRepositoryFactory .
    • You need to configure the properties for each of the extension points that you are using. By default, you would at least need to configure:
      • The path to the Lucene catalog, created by the LuceneWorkflowInstanceRepository.
      • The paths to the directories where the XML policy files are stored for the XML Workflow Repository. A good default location is to place these files within /usr/local/workflow/policy.

Other configuration options are possible: check the API documentation , as well as the comments within the workflow.properties file to find out the rest of the configurable properties for the extension points you choose. A full listing of all the extension point factory class names are provided in the Appendix. After step 7, you are officially done configuring the Workflow Manager for deployment.

Running the Workflow Manager

To run the workflow manager, cd to /usr/local/workflow/bin and type:

# ./wmgr start

This will startup the workflow manager XML-RPC server interface. Your Workflow Manager is now ready to run! You can test out the workflow manager by running a command that will execute the hello world workflow.

Run the below command, assuming that you started the Workflow Manager on the default port of 9001 :

# ./wmgr-client --url http://localhost:9001 --operation \
                              --sendEvent \
                              --eventName test
      

You should see a response message at the end similar to:

      Mar 4, 2008 9:32:29 PM gov.nasa.jpl.oodt.cas.workflow.system.XmlRpcWorkflowManager handleEvent
      INFO: WorkflowManager: Received event: test
      Mar 4, 2008 9:32:29 PM gov.nasa.jpl.oodt.cas.workflow.system.XmlRpcWorkflowManager handleEvent
      INFO: WorkflowManager: Workflow testWorkflow retrieved for event test
      Mar 4, 2008 9:32:29 PM gov.nasa.jpl.oodt.cas.workflow.engine.IterativeWorkflowProcessorThread checkTaskRequiredMetadata
      INFO: Task: [Hello World] has no required metadata fields
      Mar 4, 2008 9:32:29 PM gov.nasa.jpl.oodt.cas.workflow.engine.IterativeWorkflowProcessorThread executeTaskLocally 
      INFO: Executing task: [Hello World] locally
      Hello World: Chris
      Mar 4, 2008 9:32:29 PM gov.nasa.jpl.oodt.cas.workflow.engine.IterativeWorkflowProcessorThread checkTaskRequiredMetadata
      INFO: Task: [Goodbye World] has no required metadata fields
      Mar 4, 2008 9:32:29 PM gov.nasa.jpl.oodt.cas.workflow.engine.IterativeWorkflowProcessorThread executeTaskLocally
      INFO: Executing task: [Goodbye World] locally
      Goodbye World: Chris
      

which means that everything installed okay!

Use Cases

The Workflow Manager was built to support several of the above capabilities. In particular there were several use cases that we wanted to support, some of which are described below.

Workflow Manager Event-based Execution Use Case

The black numbers in the above Figure correspond to a sequence of steps that occurs and a series of interactions between the different Workflow Manager extension points in order to perform the workflow execution activity. In Step 1, an event is provided to the Workflow Manager event listenter (the System extension point), along with required Metadata. The Workflow Manager, in step 2, looks up if ther are any associated Workflow Repository models associated with the provided Event. If so, in steps 3 and 4, the returned Workflow models are sent to the WorkflowEngine, to be turned into executable Workflow Instances.Each WorkflowInstance is handed off to a WorkflowProcessorThread, taken from the ThreadPoolWorkflowEngine, in steps 5 and 6. The WorkflowProcessorThread, in step 7, steps through each executable WorkflowTask, checking to make sure that all necessary Workflow Conditions (if any) are satisfied. If all Workflow Conditions are satisfied, then the Workflow T ask is executed, either locally, or if an Resource Manager is defined, then the task is sent (in step 7) to the Resource Manager (labeled Process Manager in the figure). In steps 8-13, the WorkflowTask is executed on remote resources using the Resource Manager, and eventually completed, with the final notification being sent back to the corresponding Workflow Processor Thread, which is stepping through the Workflow Instance, controlling its exectuion.

Appendix

Full list of Workflow Manager extension point classes and their associated property names from the workflow.properties file:

workflow.repo.factory gov.nasa.jpl.oodt.cas.workflow.repository.XMLWorkflowRepositoryFactory

gov.nasa.jpl.oodt.cas.workflow.repository.DataSourceWorkflowRepositoryFactory
workflow.engine.factory gov.nasa.jpl.oodt.cas.workflow.engine.ThreadPoolWorkflowEngineFactory
workflow.instRep.factory gov.nasa.jpl.oodt.cas.workflow.instrepo.LuceneWorkflowInstanceRepositoryFactory

gov.nasa.jpl.oodt.cas.workflow.instrepo.DataSourceWorkflowInstanceRepositoryFactory

gov.nasa.jpl.oodt.cas.workflow.instrepo.MemoryWorkflowInstanceRepositoryFactory
FirstGov - Your First Click to the US Governmnet

+ Freedom of Information Act

+ NASA Privacy Statement, Disclaimer,

and Accessibility Certification


+ Freedom to Manage
NASA

Editor: Sean Kelly

NASA Official: Dan Crichton

Last Published: 14 April 2008

+ Contact NASA
spacer
spacer spacer spacer
spacer spacer spacer