Web Services in Scientific Applications - PowerPoint PPT Presentation

1 / 10
About This Presentation
Title:

Web Services in Scientific Applications

Description:

... a data-streaming framework based on the Styx protocol for distributed systems ... Have developed 'Styx Grid Services' that can be composed in such a way that data ... – PowerPoint PPT presentation

Number of Views:56
Avg rating:3.0/5.0
Slides: 11
Provided by: archive6
Category:

less

Transcript and Presenter's Notes

Title: Web Services in Scientific Applications


1
Web Services in Scientific Applications
  • Jon Blower
  • Reading e-Science Centre

2
Intro
  • Web Services provide significant benefits of
    platform neutrality and wide adoption
  • Highly suitable for simple request-response
    interactions
  • e.g. get the current temperature at a given
    location
  • However, plain Web Services are, on their own,
    not always appropriate for
  • cases where the request-response time has to be
    very short
  • long-running services (e.g. performing a
    calculation that takes minutes or more to run)
  • services that consume or produce large amounts of
    data
  • Unfortunately this describes many scientific
    applications!
  • So what can we do about this?

3
Problem 1 I need a fast request-response time
  • Web Services work by exchanging XML documents.
  • XML parsing is relatively slow.
  • Therefore there is not a lot we can do to
    increase the responsiveness (i.e. decrease the
    latency) of a Web Service!
  • WS only really suitable when the XML parsing time
    can be tolerated.
  • You would not normally use Web Services to
    exchange messages in a tightly-coupled compute
    cluster!
  • Use MPI (or similar) instead
  • (But you might provide a Web Service job
    submission interface to the whole cluster)

4
Problem 2 My service will take a long time to run
  • If the service is long-running, can run into
    problems with connection timeouts etc
  • Also probably not good practice to block the
    calling program until the WS completes
  • One solution is for the WS to return immediately
    with a ticket number (or job ID).
  • The client can call the WS again with this ticket
    to get progress
  • Or the user could supply an email address and the
    remote system sends an email when the service
    finishes.
  • The future (probably) lies in WS-Notification, an
    emerging standard for notifying WS clients of
    progress and other state
  • Part of WS-RF (WS Resource Framework)
  • Potential problem with firewalls!

5
Problem 3 My service uses large amounts of data
  • Web Services communicate via XML (SOAP) messages
  • Input data and parameters are turned into
    plain-text strings and included in the message
  • Arrays become very large in the XML message
  • e.g. array of 3 integers in binary 12 bytes
  • in XML array is represented as a long string
    (123
    42 bytes)
  • XML takes time and RAM to create and parse
  • It is sometimes suggested that XML documents
    should not be 4MB for these reasons
  • Conclusion We shouldnt put large datasets in
    the SOAP message.

6
Large datasets in workflows
  • Furthermore, in a workflow environment, we want
    large datasets to travel in the most direct way
    possible

7
Large datasets solutions
  • SOAP with attachments
  • (rather like attachments to emails)
  • Data dont bloat by being translated to XML
    (but do increase by 33 in translation to MIME
    attachment)
  • But data are still transported with the SOAP
    message
  • Often used for passing around image files
  • Pass pointers to datasets
  • e.g. GADS (data extraction service) prepares the
    extracted data, puts it on a separate HTTP server
    and then returns a URL to the data.
  • Have to run another server
  • Also, have to manage the cache of extracted data

8
Large datasets solutions (2)
  • Perhaps a better solution is to stream data
    directly between services
  • Similar to Unix filter commands
  • extract process render
  • Would not require data to be cached on, say, an
    HTTP server
  • Services could run concurrently for example, the
    first chunk of data extracted could be processed
    while the second chunk is being extracted.
  • No standards-based solution to this AFAIK
  • ReSC have developed a data-streaming framework
    based on the Styx protocol for distributed
    systems
  • Allows workflow engine to pass around pointers to
    streams
  • Built into current release of Taverna workflow
    system
  • Very easy to wrap existing binary code (direct
    access to streams)

9
Data streaming between remote services
  • Have developed Styx Grid Services that can be
    composed in such a way that data streams directly
    between them
  • Can wrap SGSs in Web Services wrapper
  • Can send streams to multiple locations
  • Also gives method for progress monitoring in a
    way that wont be defeated by firewalls
  • Health warning - work in progress!

This is the Triana workflow system this
framework will work with any WS-based WF system
but currently integrates best with Taverna (but
still work-in-progress)
10
Summary
  • Web Services provide a platform-neutral layer for
    accessing remote resources
  • But plain Web Services dont address many of
    the problems faced in the scientific world
  • WS-RF is probably the horse to back for a
    widely-accepted, standards-based method of
    addressing some of these issues
  • No actual, widely-available implementation yet
    (except WSRFLite)
  • Globus Toolkit 4 will be based on WS-RF (in alpha
    release)
  • But remember what happened to GT3!
  • ReSC has a useful solution (still immature) that
    solves many such issues
  • plans to build into CDAT and provide better
    tooling
  • Python scripting interfac
Write a Comment
User Comments (0)
About PowerShow.com