Title: Lightweight Service Oriented Parallelism
1Lightweight Service Oriented Parallelism
- Paul Roe
- Queensland University of Technology (QUT)
- p.roe_at_qut.edu.au
2QUT
- Queensland University of Technology (QUT)
- One of largest universities in Australia 40,000
students (undergraduate, postgraduate, 10
international) - Applied emphasis, strong links with industry
- Motto A university for the real world
- Faculty of IT, 4000 students, 20 international
3My Background
- Academic at QUT for 10 years
- I am a computer scientist background in
- Programming languages
- Distributed computing
- Practical / applied emphasis
- I lead a small research group interested in grid
computing and eScience
4Two Parts
- Introduction to web services and service
orientation - Lightweight Service Oriented Parallelism
5Web services
6Web services (WS)
- Computer to computer messaging using XML
- Typically SOAP for messaging protocol with WSDL
(Web Service Definition Language) - Standard and platform neutral
- Designed for eCommerce and enterprise application
integration - Similarities with MPI
- message passing
- Support for different message exchange patterns
- Web service principles and technologies are
evolving - Originally SOAP was for lightweight RPC between
objects - SOAP and WSDL support RPC and messaging encoding
and styles - Now strong move to XML centric messaging
7Why Not CORBA, DCOM, Java RMI etc.?
- Distributed object models try to scale local OO
model - Ok for a LAN
- Breaks for Internet
- Too complex
- Assume an object model, virtual machine etc.
- Large investment for little return
- Poor interoperability
- WS designed for interoperability primary goal
- Designed for local area networks rather than
Internet - Not standards based (except CORBA)
- Problems bootstrapping, all or nothing approach
- Other attempts e.g. EDI
- Problem fixed, not extensible
8XML Basics
- XML is the basis for web services
- XML is platform neutral data language
- XML is three things
- Family of specifications e.g. XSLT, XPath,
- Serialisation format (XML 1.0 with tags etc.)
- Infoset Model for data
- XML can be described by XML schema
9Infoset
- Infoset is a model of XML
- Essence of XML
- XML is no longer just a syntax
- This is important opens the way to other
representations of XML
XML is very inefficient its verbose, theres
lots of angle brackets, everythings a Unicode
string, theres no binary format youve always
got to parse it first, and thats why web
services are slow
Wrong!
10SOAP
- Provides two key features for XML based messaging
- Separation of message header vs payload data
(envelope with header and body) - Standard way to report faults
- No further evolution of SOAP necessary!
- Extensible header mechanism supports modular and
composable advanced services e.g. security,
transactions and reliability - Vital feature
11SOAP
ltenvelopegt
ltbodygt Message payload, data
ltfaultgt Soap error (optional)
12SOAP Extensible Headers
Extensible header info can be optional or
mandatory
ltsoapEnvelope xmlnsxsi"http//www.w3.org/2001
/XMLSchema-instance" xmlnsxsd"http//www.w3.org
/2001/XMLSchema" xmlnssoap"http//schemas.xmlso
ap.org/soap/envelope/"gt ltsoapHeadergt
lttTransaction xmlnst"some-URI"
soapmustUnderstand"1"gt 5
lt/tTransactiongt lt/soapHeadergt ltsoapBodygt
ltAdd xmlns"http//www.qut.edu.au/"gt
ltagt1lt/agt ltbgt2lt/bgt lt/Addgt
lt/soapBodygt lt/soapEnvelopegt
SOAP body, message payload
13WSDL (1.1)
ltdefinitionsgt root element
(Typically XML Schema)
Abstract, c.f. interface
ltbindinggt How will messages be transmitted
SOAP specifics, encoding etc.
concrete
WSDL is an XML document. Elements can be split
across multiple files.
14Web service invocationThe big picture
Generate using developer tools e.g. Visual Studio
or Eclipse
sender
receiver
WSDL doc (contains/refs XML schema)
describes
Web service Proxy
XML document
Client Program
Server Program
Web service stub
Deserialise message
Serialise message
Send XML message on the wire, SOAP format
15Web Services Landscape
Description
Discovery UDDI, WSDiscovery, MetaDataExchange
Security
Reliable Messaging
Transactions
WS-Policy
Composable service assurances
WS-Addressing, MTOM
Messaging
XML, SOAP
WSDL, XML Schema
HTTP, HTTPS, SMTP, TCP,
Transport
16Service Orientation
17Service Orientation (SO)
- Architectural view of software and systems
inspired by web services - Much hype!
- Service-oriented development focuses on systems
that are built from a set of autonomous
services. Don Box - No flat space containing a sea of objects
- There are four tenets
- Boundaries are explicit
- Services are autonomous
- Services share schema and contract, not class
- Service compatibility is determined based on
policy - Key idea services are loosely coupled and
autonomous - Web services are one possible implementation
18SO vs Distributed Objects
- CORBA, DCOM, Java RMI etc. try to present a
uniform view of the world - Common object model
- Set of objects all living in the same space
- Ok for a LAN single admin domain, reliable,
simple security, homogeneous - Doesnt work on the internet
- Cant do business by dictation you must use
Corba / RMI / DCOM etc. - Increasingly doesnt work in LAN
- Move to more structure, local firewalls and
tiered admin within organisations - Déjà vu?
- C.f. TCP sockets (no shared implementation)
- Policy gt metadata
19Parallelism
20Motivation and Ideas
- Use SOAP instead of MPI
- Interoperability
- Leverage higher level WS specs e.g. security
- Service orientation decouples clients and
servers, producers and consumers - Simple producer consumer models of parallelism
can benefit from SO - E.g. when producers are legacy applications and
consumers are modern e.g. WS enabled apps or
modern scripts
21Two Simple Models of Parallelism
- (Both producer consumer)
- Futures (Task-result)
- Lisp futures or Cilk etc.
- Linda
- Tuple space, JavaSpaces etc.
22Futures
- Idea, spawn function calls asynchronous
- handle Future (Add(1,2))
- Create a task to perform Add(1,2)
- Can interrogate the handle to enquire on result
- Web services can naturally express this form of
communication
Client
Cluster
handle Add(int,int)
int getAdd(handle)
23Add Request
lt?xml version"1.0" encoding"utf-8"?gt ltsoapEnve
lope xmlnsxsi"http//www.w3.org/2001/XML
Schema-instance" xmlnsxsd"http//www.w3.
org/2001/XMLSchema" xmlnssoap"http//sch
emas.xmlsoap.org/soap/envelope/"gt ltsoapBodygt
ltAdd xmlns"http//www.qut.edu.au/"gt
ltagt1lt/agt ltbgt2lt/bgt lt/Addgt
lt/soapBodygt lt/soapEnvelopegt
24Add Response
lt?xml version"1.0" encoding"utf-8"?gt ltsoapEnve
lope xmlnsxsi"http//www.w3.org/2001/XML
Schema-instance" xmlnsxsd"http//www.w3.
org/2001/XMLSchema" xmlnssoap"http//sch
emas.xmlsoap.org/soap/envelope/"gt ltsoapBodygt
ltAddResult xmlns"http//www.qut.edu.au/"gt
437643786432 lt/AddResultgt
lt/soapBodygt lt/soapEnvelopegt
25getResultAdd Request
lt?xml version"1.0" encoding"utf-8"?gt ltsoapEnve
lope xmlnsxsi"http//www.w3.org/2001/XML
Schema-instance" xmlnsxsd"http//www.w3.
org/2001/XMLSchema" xmlnssoap"http//sch
emas.xmlsoap.org/soap/envelope/"gt ltsoapBodygt
ltgetAdd xmlns"http//www.qut.edu.au/"gt
lthandlegt437643786432lt/handlegt lt/getAdd gt
lt/soapBodygt lt/soapEnvelopegt
26getResultAdd Response
lt?xml version"1.0" encoding"utf-8"?gt ltsoapEnve
lope xmlnsxsi"http//www.w3.org/2001/XML
Schema-instance" xmlnsxsd"http//www.w3.
org/2001/XMLSchema" xmlnssoap"http//sch
emas.xmlsoap.org/soap/envelope/"gt ltsoapBodygt
ltgetAddResult xmlns"http//www.qut.edu.au/"gt
3 lt/getAddResultgt lt/soapBodygt lt/soapE
nvelopegt
If result not ready return null (empty)
27Caching
- Assume computation is functional
- Cache results on server
- Sessionless
- Poll server until get result
- Need to match args to see if already got result
- Can support both kinds of function in web service
interface
Client
Cluster
int Add(int,int)
28Data Parallelism
- Problem, asynchronous programming model rather
tricky - Often want to invoke many functions en mass
- Can build data parallel abstractions in language
to support data parallelism - E.g. matrix add
- Also build into web service framework,
automatically lift point wise operations
Client
Cluster
int Add(int,int)
29System Overview
Decoupled And autonomous
Grid/ Cluster
Client
Server
Web Services
Web Services
Web Server
Job Repository (function cache)
30System Properties
- Job requestors poll for results and for creating
tasks - Job executors poll for jobs
- Decouple result requestors/consumers from result
producers - Result producers can be legacy code
- Result consumers can be different code
- Completely decoupled
- Can share results
- Also naturally fault tolerant if cache results in
a stable store - (Service orientation
- 1. Boundaries are explicit
- 2. Services are autonomous)
31Result cache
- Need a stable store
- Need to efficiently store results and compare
arguments XML - Use an XML database e.g.
- Xindice, SQL Server 2005 etc.
- One table per job type e.g. table for Add
- Use stored procedures to perform operations
- Need facility to create tables
- Also a web service
32Jobs, Schema and Web Services
Web Services
Web Services
Server
Create table
Job creators / consumers
Job executors
Create job
Get result
Job table
Get result
Put result
Data parallel
Schema
WSDL
33Database
34WSDL, Schema etc
- Typed jobs when a job type is created the schema
must be provided for the inputs and outputs to
the function. - The WSDL, table, and web services are created
automatically - (Service orientation
- 3. Services share schema and contract, not class
- 4.Service compatibility is determined based on
policy)
35Details
- Using SQL 2005
- Supports XML indexing, but not testing XML for
equality - Therefore need an efficient mechanism to compare
web service call inputs with what already in
database - Use canonicalisation provided by XML security and
generate a hash from this
36User Interface
37Utilising Idle Machines
- (old project G2, g2.fit.qut.edu.au)
- System is amenable to cycle scavenging
- Extend the system to also support code caching
and distribution for simple code - Can be heterogenous and support Java applets,
.NET etc. - Volunteer machines download jobs and code
- Extra table in database
38Results
- Blast application running on ten node test
cluster - Speedup of 9.96 times for 40 jobs of approx 1m57s
duration - The bioinformatics SVM application in 50 PC lab
(cycle scavenging) - Speedup of 46 times with 200 jobs of approx 1m44s
duration (input and output were negligible) - Works well for coarse grained parallelism
- To generate tasks simply send an XML doc to the
server via a tool or DIY
39REST
- Many end user applications support binding to XML
- E.g. in Excel can simply import XML data
- REST different style of web services based on
HTTP verbs - Expose results as XML through a URL e.g.
- eresearch.fit.qut.edu.au/g2x/Add/1/2
- Results in an XML doc
40Linda
- (Work in progress)
- Alternative simple model of parallelism
- Linda has a tuple space and 4 operations
- in, out, rd, eval
- Add and copy/remove tuples from tuplespace
- Remove and copy by associative matching on data
- Naturally asynchronous model
41XML Databases and Linda
- Use XML instead of tuples
- XML databases store XML data and support querying
data - Build a Linda like system
- SQL server supports XQuery (Xindice supports
XPath) - Use XQuery to query for data
- XQuery is a SQL like functional language for
querying XML data - Have a few simple web services to add and remove
XML data - (related work on XSpaces etc.)
42Operations
- Like functional case support creation of typed
XML tables, but hold just a single XML value - Operations (web services)
URL CreateLindaTable(XML Schema) void
Put(XMLDoc) XMLDoc Take (XQuery-string) XMLDoc
Copy (XQuery-string)
43Linda
Cluster
Web services
Producers Put(ltfoogt lt/foogt)
Table
XML documents
ltfoogt lt/foogt
ltfoogt lt/foogt
ltfoogt lt/foogt
ltfoogt lt/foogt
Consumers Take(for v in / where v/_at_val lt
2000 return v)
ltfoogt lt/foogt
44Preliminary Results
- Preliminary results encouraging
- Sending around XQueries some security issues
e.g. DoS attacks etc. - Model well suited to certain algorithms e.g.
genetic algorithms where got a set of improving
values - Producers and consumers tend to be the same
program - But just need to generate and send XML docs to
server - Can have multiple tables
- Locking?
45Future Work
- Search on functional parallelism cache
- Notification interface
- WS Resource Framework
- Untyped jobs
- Security
- Connect to a proper job scheduler
- Server is a bottleneck can we use database
replication etc. to alleviate this
46Conclusions
- Web services and databases can support simple
lightweight service oriented parallelism - Service orientation very useful, particularly the
decoupling - Databases useful highly tuned
- Need to support different paradigms