Title: Programming Scientific and Distributed Workflow with Triana Services
1Programming Scientific and Distributed Workflow
with Triana Services
- Matthew Shields, GGF10 Workflow Workshop, 9th
March
2Presentation Outline
- Triana
- Overview
- Triana services and their distribution
- Distribution policies
- The GAP interface and its relation to the Gridlab
GAT - Scientific Workflow
- Binary Inspiral Algorithm Example
- Dynamic Distributed Workflow
- Service Composition on the Grid
- Service Usage, dynamically distributing a Triana
workflow - Conclusion
3What is Triana?
4Triana Distributed Work-flow
Triana Service Engine
Triana Service Engine
Action Commands
Workflow, e.g. BPEL4WS
Network
- Distributed Triana Work-flow
- flexible distribution based around Triana
Groups - HPC and Pipelined distribution
Triana Controlling Service (TCS)
Triana Engine
Other Engine
Triana Gateway
5GAP Overview
- based around a series of Java interface classes
- Concrete implementations that form the GAP
bindings - The core interface is the
- Service Creation and Discovery
- Pipe Creation and Discovery
- Message Communication
- Information
- Job Submmission
- Data Management - transfers - logical lookup
- Will be become an adapter for the GridLab Java
GAT, providing - Advertisement, Discovery, deployment and
communication of services - GRMS job submission adapter
- Data Management Services
6Java GAT Prototype
GAP (Java Prototype)
- Advertising
- Discovery
- Communication
Web Services
Jxta
P2PS
OGSA (planned)
GSI Enabled
Jxtaserve
NS-2
And more..
Job Submission (GRMS)
- Generic Job Submission
- Virtual filename data access
- Set of generic Java interfaces
- high level abstractions to Grid services
- Factory design dynamic pluggable services
Data Management
GridLab GAT (www.gridlab.org)
7Triana Prototype
- Distributed Triana Prototype
- Based around Triana Groups i.e. aggregate tools
- Each group can be distributed
- Distribution policies
- HTC - high throughput/task farming
- Pipeline - allow node to node communication
- Each service can be a gateway to finer
granularities of distribution
8Triana Workflow
- Triana is inherently flow based
- Data flow - data arriving at component triggers
execution - Control flow - control commands trigger execution
- Decentralised execution
- Data or Control messages sent along communication
pipes from sender to receiver causes receiver
to execute - Synchronous or Asynchronous messaging
(Implementation dependant) - Multiple inputs can block or trigger immediately
(Component designer defined)
9Components and Definitions
- Component is unit of execution
- Components are defined in XML files
- Naming information
- Input and output ports
- Parameter information
- Why Components?
- To simplify the application design process and to
speed up application development - The component model provides an infrastructure
for the interaction of components
10Taskgraph
- Internal object based workflow graph
representation - Taskgraph - DAG
- Tasks
- Connections
- External XML representation
- Simple XML syntax
- List of participating Task definitions
- Parent/Child connection
- Hierarchical (Compound components)
- Alternative Languages Syntax
- e.g. BPEL4WS
- Available through pluggable readers writers.
11Workflow
- No explicit language support for control
constructs - Loops and execution branching handled by
components - Loop component - controls loop over sub-workflow
- Logical component - control workflow branching
- Unlike BPEL4WS or similar
- Flexibility of control - constraint based loops
etc
12Distributing Triana Workflow
- Deploying Remote Services on Resources
- Service application installation
- Service execution
- Service discovery
- Mapping tasks or groups of tasks to Services
- Workflow rewiring, XML definition for connections
modified for remote location - sub-workflows
duplicated - Data distribution, annotated sub-sections of
taskgraph passed to resources
13GEO 600 Inspiral Search
- Background
- Compact binary stars orbiting each other in a
close orbit - among the most powerful sources of gravitational
waves - As the orbital radius decreases a characteristic
chirp waveform is produced - amplitude and
frequency increase with time until eventually the
two bodies merge together - Computing
- Need 10 Gigaflops to keep up with real time data
(modest search..) - Data 8kHz in 24-bit resolution (stored in 4
bytes) -gt Signal contained within 1 kHz 2000
samples/second - divided into chunks of 15 minutes in duration
(i.e. 900 seconds) 8MB - Algorithm
- Data is transmitted to a node
- Node initialises i.e. generates its templates
(around 10000) - fast correlates its templates with data
14Coalescing Binary Search
GEO 600 Coalescing Binary Search Algorithm
implemented as a Triana workflow
15Coalescing Binary Scenario
Controller
Email, SMS notification
Logical File Name
GW Data Distributed Storage
GAT (GRMS, Adaptive)
GW Data
- Submit Job
- Optimised Mapping
GAT (Data Management)
CB Search
Gridlab Test-bed
16Triana Service Job Submission
GAP
GRMS Web Service rage1.man.poznan.pl
Gridlab Testbed
17Triana GRMS Component
- Front end to GridLab GRMS Web Service
- Job Submission Service - interfaces with GRAM
- GAP Web Service binding GSI Authentication
- Java CoG Kit
- X509 Certificate handling
- Axis authentication communication
- GRMS executes applications on GridLab Testbed
- Heterogeneous hardware platforms
- Default software - Globus 2.4, GSISSH, cc, cvs,
c, F90, make, perl, mpicc
18Service Composition Workflow
- Multiple GRMS Components
- Install Applications (ftp, tar, ant)
- Start installed Triana Services
19Dynamic Distributed Workflow
- Distribution units are standard Triana tools,
enabling users to create their own custom
distributions
20Conclusion
Controller
Email, SMS notification
Logical File Name
GW Data Distributed Storage
GAT (GRMS, Adaptive)
GW Data
- Submit Job
- Optimised Mapping
GAT (Data Management)
CB Search
Gridlab Test-bed
21Conclusion
- Shown three distinct workflows
- Service composition workflow to submit grid jobs
that deploys multiple Triana Services on
remote resources - Local scientific workflow representing the
algorithm - Dynamic distributed workflow - rewire local
workflow for data parallelism across multiple
Triana Services - GAP API
- Web Service binding GSI - Grid Job Submission
- P2PS binding - service discovery service
communication - Combined to perform parallel scientific
computation
22Thanks !
- The Astronomers Prof. B Sathyaprakash, David
Churches, Roger Philp and Craig Robinson - The Triana team Ian Wang, Andrew Harrison, Omer
Rana, Diem Lam and Shalil Majithia - All the partners in the GridLab project
23Thanks !
Information Software
http//www.trianacode.org/
http//www.gridlab.org/