|
|
This is the user guide for the OODT Catalog and Archive Service (CAS) Workflow Manager
component, or Workflow Manager for short. This guide explains the Workflow Manager architecture
including its extension points. The guide also discusses available services provided
by the Workflow Manager, how to utilize them, and the different APIs that exist. The guide
concludes with a description of Workflow Manager use cases.
The Workflow Manager component is responsible for description, execution, and monitoring of
Workflows
, using a client, and a server system. Workflows are typically considered to be
sequences of tasks, joined together by control flow, and data flow, that must execute in some
ordered fashion. Workflows typically generate output data, perform routine management tasks (such
as email, etc.), or describe a business's internal routine practices. The Workflow Manager is an
extensible software component that provides an XML-RPC external interface, and a fully
tailorable Java-based API for workflow management. The critical objects managed by the Workflow
Manager include:
-
Events - are what trigger Workflows to be executed. Events are named, and contain dynamic
Metadata information, passed in by the user.
-
Metadata - a dynamic set of properties, and values, provided to a WorkflowInstance via
a user-triggered Event.
-
Workflow - a description of both the control f
low, and data flow of a sequence of tasks
(or
stages
that must be executed in some order.
-
Workflow Instance - an instance of a Workflow, typically containing additional runtime
descriptive information, such as start time, end time, task wall clock time, etc. A WorkflowInstance
also contains a shared Metadata context, passed in by the user who triggered the Workflow. This context
can be read/written to by the underlying WorkflowTasks, present in a Workflow.
-
Workflow Tasks - descriptions of data flow, and an underlying process, or stage, that is part of
a Workflow.
-
Workflow Task Instances - the actual executing code, or process, that performs the work in the
Workflow Task.
-
Workflow Task Configuration - static configuration properties, that
configure
a WorkflowTask.
-
Workflow Conditions - any pre (or post) conditions on the execution of a WorkflowTask.
-
Workflow Condition Instances - the actual executing code, or process, that performs the work
in the Workflow Condition.
Each Event kicks off 1 or more Workflow Instances, providing a Metadata context (submitted by an
external user). Each Workflow Instance is a run-time execution model of a Workflow. Each Workflow contains
1 or more Workflow Tasks. Each Workflow Task contains a single Workflow Task Configuration, and one or
more Workflow Conditions. Each Workflow Task has a corresponding Workflow Task Instance (that it models),
as does each Workflow Condition have a corresponding Workflow Condition Instance. These relationships
are shown in the below figure.
There are several extension points for the Workflow Manager. An extension point is an interface
within the workflow manager that can have many implementations. This is particularly useful when
it comes to software component configuration because it allows different implementations of an
existing interface to be selected at deployment time. So, the Workflow Manager component may
communicate with a Database-based Workflow Instance Repository, and an XML-based Workflow Repository
(to store workflow descriptions), or it may use a Lucene-based Workflow Instance Repository, and a
Database-based Workflow repository. The selection of the actual component implementations is handled
entirely by the extension point mechanism. Using extension points, it is fairly simple to support many
different types of what are typically referred to as “plug-in architectures” Each of the core extension
points for the Workflow Manager is described below:
|
Workflow Instance Repository
|
The Workflow Instance Repository extension point is responsible for storing all the instance data for
Workflow Instances, including shared context metadata, runtime properties such as start date time,
end date time, and task start/end date time.
|
|
Workflow Repository
|
The Workflow Repository extension point is responsible for managing Workflow models, storing
control flow, and Workflow Tasks, which model data flow. The Workflow Repository also stores Workflow
Condition information, and Workflow Task Configuration. In essence, the Workflow Repository is a
repository of abstract Workflow models, that get turned into Workflow Instances by the
Engine
extension point.
|
|
Workflow Engine
|
The
Workflow Engine's responsibility is to turn abstract Workflow models into executing Workflow Instances.
The Workflow Engine tracks and monitors execution of Workflow Instances, and provides the ability to start, stop
and pause executing Workflow Instances.
|
|
System
|
The extension point that provides the external interface to the Workflow Manager
services. This includes the Workflow Manager server interface, as well as the associated
Workflow Manager client interface, that communicates with the server.
|
The relationships between the extension points for the Workflow Manager are shown in the below
Figure.
The Workflow Manager is responsible for providing the necessary key capabilities for managing
processing pipelines, data flow, and control flow. Each high level capability provided by the
Workflow Manager is detailed below:
-
Support for representation of Workflow as a directed graph, allowing for true parallelism.
-
Support for identified workflow patterns especially control-flow.
-
Support for capturing data-flow.
-
Support for persistance of Workflow Instances to several backend repositories, including relational
databases, and Apache Lucene flat file indices.
-
Representation of Workflow models as XML documents.
-
Scalability – The Workflow Manager uses the popular client-server paradigm, allowing new Workflow Manager
servers to be instantiated, as needed, without affecting the Workflow Manager
clients, and vice-versa.
-
Communication over lightweight, standard protocols – The Workflow Manager uses XML-RPC, as its main
external interface, between Workflow Manager client and server. XML-RPC, the little brother of SOAP, is
fast, extensible, and uses the underlying HTTP protocol for data transfer.
-
Event-driven Workflow execution, including arbitrary Metadata parameters, provided as a shared context
between stages of the executing Workflow.
This capability set is not exhaustive, and is meant to give the user a “feel” for what
general features are provided by the Workflow Manager. Most likely the user will find that the
Workflow Manager provides many other capabilities besides those described here.
There are at least two implementations of all of the aforementioned extension points for
the Manager, with the exception of the ThreadPoolWorkflowEngine, which itself is meant to be an
extension point. Each extension point implementation is detailed below:
-
Workflow Instance Repository
-
Data Source based Workflow Instance Repository – an implementation of the Workflow Instance
Repository extension point interface that uses a JDBC accessible database backend.
-
Lucene based Workflow Instance Repository – an implementation of the Workflow Instance
Repository extension point interface that uses the Lucene free text index system to store Workflow
Instance information.
-
Memory based Workflow Instance Repository - an implementation of the Workflow Instance
Repository extension point interface that stores Workflow Instance information in runtime memory.
-
Workflow Repository
-
Data Source based Workflow Repository – an implementation of the Workflow Repository
extension point that stores Workflow model information in a JDBC accessible database.
-
XML based Workflow Repository – an implementation of the Workflow Repository extension
point that stores Workflow model information in XML files ending in
*.workflow.xml
,
as well as files named
tasks.xml
,
conditions.xml
, and
events.xml
.
-
Workflow Engine
-
ThreadPoolWorkflowEngine - an implementaiton of the Workflow Engine that itself is meant to
be an extension point for WorkflowEngines that want to implement ThreadPooling. This WorkflowEngine
provides everything needed to manage a ThreadPool using Doug Lea's wonderful java.util.concurrent
package that made it into JDK5.
-
System (Workflow Manager client and Workflow Manager server)
-
XML-RPC based Workflow Manager server – an implementation of the external server interface
for the Workflow Manager that uses XML-RPC as the transportation medium.
-
XML-RPC based Workflow Manager client – an implementation of the client interface for the
XML-RPC Workflow Manager server that uses XML-RPC as the transportation medium.
To install the Workflow Manager, you need to download a
release
of the workflow manager, available from its home web site. For bleeding-edge features, you can
also check out the cas-workflow trunk project from the OODT subversion repository. You can browse
the repository using ViewCVS, located at:
http://oodt.jpl.nasa.gov/vc/svn/
The actual web url for the repository is located at:
http://oodt.jpl.nasa.gov/repo/
To check out the Workflow Manager, use your favorite Subversion client. Several clients are
listed a
http://oodt.jpl.nasa.gov/wiki/display/oodt/Subversion
.
The cas-workflow project follows the traditional Subversion-style
trunk
,
tag
and
branches
format. Trunk corresponds to the latest and greatest development on the
cas-workflow. Tags are official release versions of the project. Branches correspond to deviations
from the trunk large enough to warrant a separate development tree.
For the purposes of this the User Guide, we'll assume you already have downloaded a built release
of the workflow manager, from its web site. If you were building cas-workflow from the trunk, a tagged release
(or branch) the process would be quite similar. To build cas-workflow, you would need the Apache Maven
software. Maven is an XML-based, project management system similar to Apache Ant, but with many extra
bells and whistles. Maven makes cross-platform project development a snap. You can download Maven from:
http://maven.apache.org
The cas-workflow is constructed to be compatible with the 1.x.x series of Maven, despite
Maven 2.x having already been released. Once you have Maven installed, follow the pr
ocedures in
the below Sections to build a fresh copy of the Workflow Manager:
-
cd to cas-workflow, and then type:
This will perform several tasks, including compiling the source code, downloading
required jar files, running unit tests, and so on. When the command completes, cd
to the
target/distributions
directory within cas-workflow. This will
contain the build of the Workflow Manager component, of the following form:
cas-workflow-${version}.tar.gz
This is a distribution tar ball, that you would copy to a deployment directory, such as
/usr/local/
, and then unpack using
# tar xvzf
. The resultant directory
layout from the unpacked tarball is as follows:
bin/ etc/ logs/ docs/ lib/ policy/ LICENSE.txt CHANGES.txt
-
bin - contains the "wmgr" server script, and the "wmgr-client" client script.
-
etc - contains the logging.properties file for the Workflow Manager, and the workflow.properties
file used to configure the server options.
-
logs - the default directory for log files to be written to.
-
docs - contains Javadoc documentation, and user guides for using the particular CAS component.
-
lib - the required Java jar files to run the Workflow Manager.
-
policy – the default XML-based element and product type policy in
case the user is using the Lucene Workflow Instance Repository and/or the XML
Workflow Repository, along with the ThreadPoolWorkflowEngine.
-
CH
ANGES.txt - contains the CHANGES present in this released version of the CAS component.
-
LICENSE.txt - the LICENSE for the Workflow Manager project.
To deploy the workflow manager, you'll need to create an installation directory. Typically this
would be somewhere in /usr/local (on *nix style systems), or C:\Program Files\ (on windows
style systems). We'll assume that you're installing on a *nix style system though the Windows
instructions are quite similar.
Follow the process below to deploy the Workflow Manager:
-
Copy the binary distribution to the deployment directory
# cp -R cas-workflow/trunk/target/distributions/cas-workflow-${version}.tar.gz /usr/local/
-
Untar the distribution
# cd /usr/local ; tar xvzf cas-workflow-${version}.tar.gz
-
Set up a symlink
# ln -s /usr/local/cas-workflow-${version} /usr/local/workflow
-
edit /usr/local/workflow/bin/wmgr
-
Set the
SERVER_PORT
variable to the desired port you'd like to run the
Workflow Manager server on.
-
Set the
JAVA_HOME
variable to point to the location of your installed
JRE runtime.
-
Set the
RUN_HOME
variable to point to the location you'd like the Workflow
Manager PID file written to. Typically this should default to
/var/run
, but not all
system administrators allow users to write to
/var/run
.
-
edit
/usr/local/workflow/bin/wmgr-client
-
Set the
JAVA_HOME
variable to point to the location of your installed JRE runtime.
-
(optional) edit
/usr/local/workflow/etc/logging.properties
-
Set the logging levels for each subsystem to the desired level. The system
defaults are fairly considerate and prevent much of the logging at levels below
INFO
to the console.
-
edit
/usr/local/workflow/etc/workflow.properties
-
This java properties file contains all of the default information properties to
configure the Workflow Manager. By default, the Workflow Manager is built to use the XML-based
Workflow Repository, the ThreadPoolWorkflowEngine, and Lucene Workflow Instance Repository
extension points. These defaults can be changed quite easily by changing the factory classes
that are pointed to for each extension point. For example, to use the DataSource based
Workflow Instance Repository extension point, you would change the following property,
workflow.engine.instRep.factory
to
gov.nasa.jpl.oodt.cas.workflow.instrepo.DataSourceWorkflowInstanceRepositoryFactory
.
-
You need to configure the properties for each of the extension points that you are
using. By default, you would at least need to configure:
-
The path to the Lucene catalog, created by the LuceneWorkflowInstanceRepository.
-
The paths to the directories where the XML policy files are stored for the
XML Workflow Repository. A good default location is to place these files within
/usr/local/workflow/policy.
Other configuration options are possible: check the
API documentation
,
as well as the comments within the workflow.properties file to find out the rest of the configurable
properties for the extension points you choose. A full listing of all the extension point factory
class names are provided in the Appendix. After step 7, you are officially done configuring the Workflow
Manager for deployment.
To run the workflow manager, cd to
/usr/local/workflow/bin
and type:
This will startup the workflow manager XML-RPC server interface. Your Workflow Manager
is now ready to run! You can test out the workflow manager by running a command that will execute
the hello world workflow.
Run the below command, assuming that you started the Workflow Manager on the default port of
9001
:
# ./wmgr-client --url http://localhost:9001 --operation \
--sendEvent \
--eventName test
You should see a response message at the end similar to:
Mar 4, 2008 9:32:29 PM gov.nasa.jpl.oodt.cas.workflow.system.XmlRpcWorkflowManager handleEvent
INFO: WorkflowManager: Received event: test
Mar 4, 2008 9:32:29 PM gov.nasa.jpl.oodt.cas.workflow.system.XmlRpcWorkflowManager handleEvent
INFO: WorkflowManager: Workflow testWorkflow retrieved for event test
Mar 4, 2008 9:32:29 PM gov.nasa.jpl.oodt.cas.workflow.engine.IterativeWorkflowProcessorThread checkTaskRequiredMetadata
INFO: Task: [Hello World] has no required metadata fields
Mar 4, 2008 9:32:29 PM gov.nasa.jpl.oodt.cas.workflow.engine.IterativeWorkflowProcessorThread executeTaskLocally
INFO: Executing task: [Hello World] locally
Hello World: Chris
Mar 4, 2008 9:32:29 PM gov.nasa.jpl.oodt.cas.workflow.engine.IterativeWorkflowProcessorThread checkTaskRequiredMetadata
INFO: Task: [Goodbye World] has no required metadata fields
Mar 4, 2008 9:32:29 PM gov.nasa.jpl.oodt.cas.workflow.engine.IterativeWorkflowProcessorThread executeTaskLocally
INFO: Executing task: [Goodbye World] locally
Goodbye World: Chris
which means that everything installed okay!
The Workflow Manager was built to support several of the above capabilities. In particular there
were several use cases that we wanted to support, some
of which are described below.
The black numbers in the above Figure correspond to a sequence of steps that occurs and a
series of interactions between the different Workflow Manager extension points in order to
perform the workflow execution activity. In Step 1, an event is provided to the Workflow
Manager event listenter (the System extension point), along with required Metadata. The
Workflow Manager, in step 2, looks up if ther are any associated Workflow Repository models
associated with the provided Event. If so, in steps 3 and 4, the returned Workflow models
are sent to the WorkflowEngine, to be turned into executable Workflow Instances.Each
WorkflowInstance is handed off to a WorkflowProcessorThread, taken from the ThreadPoolWorkflowEngine,
in steps 5 and 6. The WorkflowProcessorThread, in step 7, steps through each executable WorkflowTask,
checking to make sure that all necessary Workflow Conditions (if any) are satisfied. If all Workflow
Conditions are satisfied, then the Workflow T
ask is executed, either locally, or if an Resource Manager
is defined, then the task is sent (in step 7) to the Resource Manager (labeled
Process Manager
in the figure). In steps 8-13, the WorkflowTask is executed on remote resources using the Resource Manager,
and eventually completed, with the final notification being sent back to the corresponding Workflow Processor
Thread, which is stepping through the Workflow Instance, controlling its exectuion.
Full list of Workflow Manager extension point classes and their associated property names from the
workflow.properties file:
|
workflow.repo.factory
|
gov.nasa.jpl.oodt.cas.workflow.repository.XMLWorkflowRepositoryFactory
gov.nasa.jpl.oodt.cas.workflow.repository.DataSourceWorkflowRepositoryFactory
|
|
workflow.engine.factory
|
gov.nasa.jpl.oodt.cas.workflow.engine.ThreadPoolWorkflowEngineFactory
|
|
workflow.instRep.factory
|
gov.nasa.jpl.oodt.cas.workflow.instrepo.LuceneWorkflowInstanceRepositoryFactory
gov.nasa.jpl.oodt.cas.workflow.instrepo.DataSourceWorkflowInstanceRepositoryFactory
gov.nasa.jpl.oodt.cas.workflow.instrepo.MemoryWorkflowInstanceRepositoryFactory
|
|