Title: A solution to Grid application workflow composition problem
1A solution toGrid application workflowcompositio
n problem
Tomasz Gubala and Kamil Górka X TAT Institute
of Computer Science ACC CYFRONET AGH, Kraków,
Poland www.eu-crossgrid.org
2Presentation outline
- Introduction to an environment and technologies
- Problem overview
- Solution analysis and system design
- Optimization techniques used
- Application example
- Performance test description
- Summary and future work
3The Grid
- Grid is a system that (Ian Foster)
- coordinates resources that are not subject to
centralized control - uses standard, open, general-purpose protocols
and interfaces - delivers nontrivial qualities of service
- Grid application
- The Grid application is a combination of
cooperating Grid resources (application elements)
joined together to achieve particular
functionality
4Types of Information Flows
- Cooperation introduces communication
- Information transfer between elements through
- data links (dataflow model)
- method invocation (workflow model)
5Application Elements
- System element is a piece of software
- Various possible solutions
- Grid Services
- (Open Grid Service Architecture)
- Components
- (Common Component Architecture)
- Dataflow Modules
- (SCIRun 1)
6OGSA and CCA Flows
7CCA Architecture Concepts
- Component is a specific class
- Ports
- provides
- uses
- Framework as a glue instantiates and connects
components together
8Advantages of CCA Model
- High level conceptual architecture
- can be implemented in various ways
- useful connection abstraction (from SIMD to
Internet) - Direct connection data transfer
- Language independency - SIDL notation
- Very simple, the whole specification is a single
html document about 500 lines long
9CCA Frameworks
- XCAT (formerly CCAT), Indiana University
- fully distributed (appropriate for Grid)
- possible merge with Grid Services
- CCAFFEINE, Sandia National Laboratories
- SCMD concept of parallelism
- extremely lightweight
- SCIRun 2 (Uintah), University of Utah
- multithreaded
- migrated from dataflow model
10Flow Composition Problem
- Application workflow description is needed to
deploy it in a framework - It should contain
- list of application elements
- list of element connections
- optionally some dispatch data
- Problems
- lack of important knowledge
- frequent changes of environment
11Solution Overview
- Application Flow Composer (AFC)
- component based (CCA)
- automatically constructs flows
- What does it need
- incomplete application description
- (from user or portal)
- information about components available in the
environment - (from component registry)
12AFC System Decomposition
- Flow Composer Component (FCC)
- main functionality
- designed as CCA component
- Flow Composer Registry (FCR)
- stores component information
- publishes it when asked to
- designed as CCA component with
- use of OGSA VORegistry Service
13Application Factory Concept
- Application Factory is a service capable of
deploying whole Grid application using its
detailed description. - AFC can be easily applied to this solution
14Components in Business
- Differences
- providers share information, not software
- framework needed for both development and
dispatching runtime assemblation
15Components on the Grid
16Main Requirements
- The main requirement to deliver solution to
workflow composition problem. - Other requirements
- Grid-deployable
- Registry persistency
- Lookup speed maximization
- Use of externally defined document formats
17AFC System Use Cases
- Workflow Composition
- Component Registration
- Component Unregistration
- Component Lookup
18Implementation Technology
- The XCAT framework
- Registry service implementation possibilities
- OGSA Registry Service
- Component Browser idea
- The Java programming language
19System Decomposition
- Flow Composer Registry Component
- Component Browser Provides Port
- Component Lookup Provides Port
- Uses OGSA for Component Registering
- Flow Composer Component
- Component Lookup Uses Port
- Flow Composition Provides Port
- External Optimizer
20Workflow Composition
- User provides Initial Workflow Document
- Flow Composer Component
- parses IWD
- builds subsequent queries
- performs the lookup in FCR
- parses CSIDs returned by FCR
- builds the Final Workflow Docs
- FCC returns FWDs to the User
21Registration of a Component
- User provides CSID to register
- FCR parses the CSID
- FCR gets the Name Doc from OGSA registry, puts
new entry there and registers it back - The same with the Port Type Doc
- FCR registers the CSID itself with unique
identifier
22Component Lookup
- FCC provides Component Query Doc
- FCR parses CQD
- If it is a query by Port Type then FCR gets the
Port Type Doc from registry and finds proper IDs
there - Otherwise FCR gets IDs from Name Doc
- FCR gets the CSIDs from OGSA registry using the
obtained identifiers and returns them to the FCC
23Unregistration of a Component
- User inputs ComponentID to unregister
- FCR gets the Name Doc from OGSA registry, deletes
the correct entry from there and registers it
back - The same with Port Type Doc
- FCR unregisters the CSID itself
24External and Internal Documents
- Workflow Definition Doc
- Initial WD and Final WD
- Component Static Information Doc
- Component Query Doc
- Name Doc and Port Type Doc
25Need for Optimization
- An example FWD output document contains
- - a list of n flows, each with
- - a list of m components, each can be
- - run on k different hosts.
- Overall degree of complexity
26Entities in Optimization Process
- Flow Optimizer reduces output size by selecting
better flows, uses flow evaluation values when
making decisions - Flow Evaluator computes quality value of
particular flow, uses external environmental
values - Grid Monitor measures and stores Grid env.
monitoring data, sharing it with others
27Optimization Techniques
28Optimization Algorithm
- Optimization proceeds in three steps
- for initial components FO finds the host of best
performance - for every component communicating with these FO
finds best host speed to link throughput ratio - the last step is recurrently repeated until every
component has preferred host assigned to - Quality - simple sum of computed values
29Example Water Application
30Example Description
- Simulation of the river flow through a 3-D
modeled landscape - Problem with too general port descriptions
- Unusful flows and a way to avoid them
- Document examples
31Example Unuseful Flow Possibility
32Example Initial Workflow Description
- lt?xml version"1.0"?gt
- lt!--River-on-Map Visualising Application
- Authors Tomasz Gubala, Kamil Gorka
- Date 26 February 2003--gt
- ltworkflowDefinition appName"WaterApp"
maxComponents"10" xmlnsxsi"http//www.w3.org/20
01/XMLSchema-instance xsinoNamespaceSchemaLocati
on" ../schemas/wd.xsd"gt - ltinitComponent id"init1"
name"LandscapeVisualiser"/gt - ltfinalComponent id"final1"
name"TopoMapScanner"/gt - lt/workflowDefinitiongt
33Example Component Static Information
- ltcomponentStaticInformationgt
- ltcomponentInformationgt
- ltuniqueIDgtwa.ee.1lt/uniqueIDgt
- ltnamegtElevationExtractorlt/namegt
- ltauthorgtTomasz Gubalalt/authorgt
- ltcomponentAuthorgtKamil Gorkalt/componentAut
horgt - ltportInfo isUsed"true"gt
- ltportNamegtReceivesMaplt/portNamegt
ltportTypegtPicturePortTypelt/portTypegt - lt/portInfogt
- ltportInfo isUsed"true"gt
- ltportNamegtGetElevMeshlt/portNamegt
ltportTypegt2DMeshPortTypelt/portTypegt - lt/portInfogt
- ltportInfogt
- ltportNamegtExtractElevationlt/portNamegt
- ltportTypegtElevPointSurfacePortTypelt/po
rtTypegt - lt/portInfogt
- lt/componentInformationgt
- ltexecutionEnvgt
- lthostNamegtH1.clt/hostNamegt
34Example Final Workflow Description
- ltworkflowDefinition appName"WaterApp"
maxComponents"10" completed"true"gt - ltinitComponent id"FCCID2" name"LandscapeVisu
aliser" ready"true" connected"true"gt - ltexecutionEnvgt
- lthostNamegtmarlowe.inv.pllt/hostNamegt
- ltcreationProtogtexeclt/creationProtogt
- lt/executionEnvgt
- lt/initComponentgt
- ltinterComponent id"FCCID3"
name"ElevationExtractor" ready"true"
connected"true"gt - ltexecutionEnvgt ... lt/executionEnvgt
- lt/interComponentgt
- ltinterComponent id"FCCID4"
name"WaterCorrector" ready"true"
connected"true"gt...lt/interComponentgt - ltinterComponent id"FCCID5"
name"MeshGenerator" ready"true"
connected"true"gt...lt/interComponentgt - more inter components not shown
- ltfinalComponent id"FCCID1"
name"TopoMapScanner" ready"true"
connected"true"gt...lt/finalComponentgt - ltconnection from"FCCID2" usesPort"ReceiveEle
vation" to"FCCID3" providesPort"ExtractElevation
"/gt - ltconnection from"FCCID2" usesPort"ReceiveWat
erSurface" to"FCCID4" providesPort"WaterSurface"
/gt - ltconnection from"FCCID3" usesPort"ReceivesMa
p" to"FCCID1" providesPort"ScanMap"/gt - ltconnection from"FCCID3" usesPort"GetElevMes
h" to"FCCID5" providesPort"2DMeshGenerator"/gt - ltconnection from"FCCID4" usesPort"ReceiveArt
efacts" to"FCCID6" providesPort"ExtractArtefacts
"/gt
35Example - Discussion
- Lack of knowledge about semantics of the
components - Loops wrong or not?
- Port identification only by its type (data
structure) - Unusful flow building possibility
- No way of reducing the problem space
36System Performance Analysis Test 1
- Growth of component number
- Adding a layer of components in every step
- Every Uses Port corresponds to one provides port
- one possible flow - Measuring the time growth rate
37System Performance Analysis Test 2
- Growth of port number
- Adding a layer of connections in every step
- Every Uses Port corresponds to one provides port
one possible flow
38Performance Analysis Results (1)
- Linear growth of computation time
- Registry searching overhead
39Performance Analysis Results (2)
- Results of the second test similar to the first
one - Test 3 growth of port replication
- Worst case exponential growth of flow number
40Performance Analysis - Discussion
- Expected results obtained
- Worst Case Exponential Complexity
- Component replication
- Port Replication
- Communication overhead
- FCC -gt FCR
- FCR -gt OGSA
- Reasonable flow building times for real
applications
41Summary
- The functional requirements fullfillment
- workflow composition
- registering and unregistering components
- component lookup
- The AFC system lacks
- limited registry persistance
- Registry Image Snapshot
- Development process benefits
- technologies evaluation
- design and implementation of the AFC system using
CCA and OGSA - performance evaluation
- system improvement study
42Future Work
- Flow optimization improvements
- named quantities
- metrics definition
- frequency of computing
- optimization aim specification
- Evolution of the FCR
- improving performance
- authorization facility
- Upgrading underlaying Framework
43Links
- Main source document
- Interactive Applications and Component
Architecture Approach - Kamil Górka, Tomasz Gubala
- M.Sc. Thesis, ICS AGH
- Kraków, May 2003
- http//ernie.icslab.agh.edu.pl/kgorka/thesis.pdf
44Acknowledgements
- Special thanks to
- Katarzyna Zajac
- Maciej Malawski
- Michal Kapalka
- Very special thanks to
- dr inz. Marian Bubak