Title: Using WSRF to build workflow scripts:
1 - Using WSRF to build workflow scripts
- Temp files and filters in a Grid environment
- Michael Grobe
- Indiana University
- 1 Introduction
- Programmers frequently build scripts that store
data in local temporary files, and sometimes pass
the file handles of those files to subroutines or
other programs. They also build filter
pipelines that link simple applications in
sequences that perform more complex processes.
This poster demonstrates the use of the same
techniques to orchestrate workflow within a grid
environment built using the Web Services Resource
Framework (WSRF). It includes the source code
for a simple WSRF Resource that may be used for
temporary storage, a WSRF service that uses the
temporary storage Resource, and outlines a script
to use those Resources. - This approach to workflow control will be
demonstrated by using the WSRFLite Perl module
developed at the University of Manchester 8
along with the WSRF container included in the
Perl distribution as Container.pl. Other WSRF
containers, such as the one provided within the
Globus Toolkit, could be used as well. - The target Grid-based application for the major
example in this paper is the Centralized Life
Sciences Data (CLSD) service at Indiana
University 3. CLSD presents a collection of
life science data converted to relational form
and/or federated into a single relational
database managed by an IBM DB2 database
management system. -
- 2 The Filespace Resource
- Filespace.pm is a modified version of the version
of Mark McKeown's Counter resource that inherits
from the FileBasedResourceLifetimes class, which
stores resource properties in a file. The
createFilespaceResource operation returns an
address or EndPoint Reference (EPR) of a newly
created Resource. Invocations of
createFilespaceResource, in particular, and WSRF
Resources, in general, take the following form - Define the location and URI of the service.
- WS_uri "http//host.domain/Filespace"
- WS_target
- "WS_hostWS_port/Session/Filespace/Filespace"
- Now create a Filespace WS-Resource.
- my ans WSRFLite
- -gt uri( WS_uri )
- -gt wsaddress( WSRFWS_Address-gtnew()-gt
- Address( WS_target ) ) Specify
address. - -gt createFilespaceResource() Invoke
function. - This usage will return an EPR like
- http//host.domain8422/
- Session/Filespace/Filespace/5310185210716301993
7
bash perl use-CLSD.pl "select tabschema,
tabname from syscat.tables" 1 4 Sending request
to create temporary WS-Resource. Successfully
created Filespace Resource http//host.subdomain.
indiana.edu8422/ Session/Filespace/Filespace/53
101852107163019937 Sending SQL command to the
CLSDtoResource Resource select tabschema,
tabname from syscat.tables Getting the value of
the array property... Here is the result of the
query "TABSCHEMA (VARCHAR)","TABNAME
(VARCHAR)" "BIND ","BIND_INTERACTION" "BIND
","BIND_PATHWAY" "DB2INST2","ADVISE_INDEX" "DB2INS
T2","ADVISE_WORKLOAD" Destroying
WS-Resource. Temporary Filespace Resource
destroyed. bash Of course a simple flow like
this does not really require a temporary storage
resource, but one can easily imagine more
complicated scenarios. For example, the data
retrieved from CLSD might be passed on to other
resources for statistical processing and/or
constructing graphs or tables. In other cases, a
resource invoked via this technique might involve
batch processing, so that the Filespace resource
would have to be polled until process completion.
The second element in the array property could be
used for this purpose, or some completion flag
could be added to the current version of the
Resource. Note that WSRFLite provides some
support for WSRF security 6,9, so that messages
may be transmitted securely and authentication
may be required when invoking remote
services. Note also that the current
implementation of file-based Resources is not
efficient for large (over 100MB) file storage
and/or manipulation, but that Resources could be
customized for better efficiency. 4
Discussion This approach to controlling workflows
can be used by scripts running on desktops, as
CGI scripts, Web Services, etc. It employs the
Grid as a network-based computing utility.
However, error-checking and failure recovery will
add significant complexity to these workflow
scripts, so that workflow engines, such as
Taverna7, may prove to be more practicable
platforms.
Fig. 1. The CLSDtoResource package. package
CLSDtoResource use strict use vars
qw(_at_ISA) use WSRFLite trace gt debug gt sub
use WSRFLite _at_ISA qw(WSRFFileBasedReso
urceLifetimes) This sub queries CLSD and
stores results in the Filespace Resource whose
address is submitted as an input parameter. sub
CLSDtoResource Process input parameters
sent from the WSRF Container. my envelope
pop _at__ my (class, _at_params) _at__ my
Filespace_epr params0 my property_name
params1 my my_query params2 my
starting_row params3 my number_of_rows
params4 Set up and make the call to
CLSD using SOAPLite. my host
"host.subdomain.indiana.edu" my
CLSD_return_value SOAPLite -gt
service("http//host8421/axis/CLSDservice.jws?WS
DL"), -gt proxy("http//host8421/axis/CLSDser
vice.jws?wsdl", timeoutgt1200 )
-gt queryCLSD(my_query, starting_row,
number_of_rows, "DB2account",
"account_password", "csv") Embed the
returned information within appropriate XML.
my insertTerm "ltwsrpUpdategtltproperty_namegt
. CLSD_return_value . "lt/property_namegtlt/ws
rpUpdategt" Now store the results in the
Filespace WS-Resource. my ans WSRFLite
-gt wsaddress( WSRFWS_Address-gtnew()
-gtAddress( Filespace_epr ) ) -gt uri(
WSRFConstantsWSRP ) -gt
SetResourceProperties(
SOAPData-gtvalue( insertTerm )-gttype( 'xml' )
) Invoke built-in SetResourceProperti
es function. if( ans-gtfault ) die
"ERROR " . ans-gtfaultcode." \n" .
ans-gtfaultstring."\n" return
envelope. return WSRFHeaderheader(
envelope ), "ok" end sub
CLSDtoResource Fig. 2. Outline of command-line
client to access CLSD using WSRF. Define the
location and URI of the Filespace
service. WS_Filespace_host "http//host.subdoma
in.indiana.edu" WS_Filespace_port
"8422" WS_Filespace_uri
"http//host.subdomain.indiana.edu/Filespace"
1. Create a Filespace Resource. 2. Send
the EPR and SQL query to CLSDtoResource.
CLSDtoResource will relay the SQL command to CLSD
via JDBC, and place the result in the
"array" property of the temporary
Filespace Resource. 3. Get the contents of
the Filespace Resource. Get the data from
the "array" property of the Resource
created above, and print it. 4. Destroy the
Filespace Resource. (See handout for
details.)
Acknowledgments Thanks to Mark McKeown of the
University of Manchester for several fine
tutorial slidesets and example applications.
Thanks to Stephan Zasada for carefully explained
presentations on security within WSRFLite.
Thanks to Andy Arenson and Scott McCaulay for
providing the opportunity to prepare this
paper. References 1 Foster, Ian, et al. The
Open Grid Systems Architecture, Version
1.5, 2006, http//www.ggf.org/documents/GFD.80.pd
f 2 Globus Alliance, The Globus Grid Toolkit
Homepage, http//www.globus.org/toolkit/ 3
Indiana University, The Centralized Life
Sciences Data (CLSD) Service,
http//rac.uits.iu.edu/clsd/ 4 McKeown, Mark,
Web Services for the GridWSRF and
WSRFLite, 2005, http//www.sve.man.ac.uk/
Research/AtoZ/ILCT/cern.ppt, 5 McKeown, Mark,
Web Services for Grid Computing, 2006,
http//www.sve.man.ac.uk/Research/AtoZ/ILCT/ogsa-w
orkshop 6 McKeown, Mark and Stephan Zasada,
Build Secure WS-Resource with WSLite and
WS-Security, 2006, http//www-128.ibm.com/de
veloperworks/edu/gr-dw-gr-buildsecure.html 7
Open Middleware Infrastructure Institute, The
Taverna project, http//www.omii.ac.uk/projec
ts/display_project.jsp?projectid76 8 Open
Middleware Infrastructure Institute, WSRFLite
An Implementation of the Web Services
Resource Framework, http//www.sve.man.ac.
uk/Research/AtoZ/ILCT 9 Zasada, Stephan,
Investigating Security in Perl-based Grid
Middlewares, 2004, http//www.sve.man.ac.uk/Re
search/AtoZ/ILCT/stefans_msc.pdf,