CrashOnly Web Services: Failure Semantics in an SOA Environment - PowerPoint PPT Presentation

About This Presentation

Title:

CrashOnly Web Services: Failure Semantics in an SOA Environment

Description:

Easier to restart quickly in a known state than to clean up ... Stall Proxy. Web Service. Consumer. Web Services. Endpoint. Recovery. Agent. Crash-Only. Backend ... – PowerPoint PPT presentation

Number of Views:34

Avg rating:3.0/5.0

Slides: 36

Provided by: paulk77

Learn more at: https://www.oasis-open.org

Category:

more less

Transcript and Presenter's Notes

Title: CrashOnly Web Services: Failure Semantics in an SOA Environment

1
Crash-Only Web ServicesFailure Semantics in an
SOA Environment
www.oasis-open.org
Chris Hobbs and Abbie BarbirPresented byPaul
KnightNortel OASIS Symposium 2007, San Diego
2
The crash-only model

Software design approach
Easier to restart quickly in a known state than
to clean up and rebuild to recover from an error

George Candea and Armando Fox are key proponents
of crash-only software
3
Two themes of this talk

Discuss issues of the behaviors of individual and
composed services and their part in Web Services
Service Level Agreements (WSLA)
Based on the behaviors of the individual services
Need a taxonomy or ontology of service behaviors
Need an approach to calculating behaviors of
composed services
The crash-only model of operation as a simple
failure behavior for a Web Service
Failure is one of many identified behaviors

4
Background Orchestration as a New Programming
Paradigm

SOA promotes the concept of combining services
through orchestration - invoking services in a
defined sequence to implement a business process
Orchestration compounds the difficulties of
testing and managing the quality of the deployed
services
Testing composite services in SOA environment is
a discipline which is still at an early stage of
study
Describing and usefully modeling the individual
and combined behaviors - needed to offer Service
Level Agreements (SLA) - is at an even earlier
stage
We hope to stimulate additional research on these
topics

5
Testing Composed Services

Its fairly straightforward to test the operation
of a device or system if we control all the
parts.
When we start offering orchestrated services as a
product, the services we are using may be outside
our control.
For example consider well-known components
Google mapping service
Amazon S3 storage service
Mobile operators location service

6
Testing Composed Services (2)

With orchestrated services, there is never a
complete box we can test
With orchestration as the new programming
paradigm, testing becomes a much bigger problem
Failures of orchestrated services are often
Heisenbugs - impervious to conventional
debugging, generally non-reproducible
Offering a WSLA based on testing alone, without
reliable knowledge of component service
behaviors, may be risky

7
Web Services SLA (WSLA)
Packets
Provider X Service X
Service Provider Z
Client
Network
Web Service
WSLA
Provider Y Service Y
Message flows

Concerned with behaviors of the message flows and
services spanning the end-to-end business
transaction
Clients can develop testing strategies that
stress the service to ensure that the service
provider has met the contracted WSLA commitment
Composed services make offering a WSLA more risky

8
How can WSLAs be derived from behaviors of
component services?

Need to develop a model of the behavioral
attributes of the individual component Web
Services which contribute to the overall behavior
of an orchestrated or composed Web Service.
Need to model the combination of individual
service behavioral models

9
Web Services behaviors

Behaviors may be described and quantified for
each Web Service
May be combined by a calculus of behaviors when
multiple services are composed
Behavior parameters may become a part of the
service description, perhaps in WSDL.

Availability and Reliability
Performance
Management
Failure
Security
Privacy, confidentiality and integrity
Scalability
Execution
Internationalization
Synchronization
Etc.,

10
Web Services behaviors (2)

To develop a Service Level Agreement (SLA) for a
composed service (Z), we need to have relevant
behavior descriptions for the individual services
(X and Y)
We also need a deep understanding of how to
combine the descriptions of X and Y to calculate
results for Z

Z
X
Y
11
Web Services behaviors (3)

For each behavior, the challenges include the
following
1. How may service Xs and service Ys behavior
be characterized?
2. How may those characterizations be formalized
and advertised by X and Y?
3. How may Z incorporate Xs and Ys
characterizations and then advertise the result?
Z itself might become a component of an even
larger service and therefore needs to advertise
its own characteristics. It also needs this
characterization to offer an SLA to consumers.

12
Web Services behaviors (4)

Each behavior may have its own ontology,
measures, and calculus of combining those
measures when services are composed.

Local Ontology
Z Specific Ontology
Abstracted Ontology
?
X
Local Ontology
Z
Abstracted Ontology
Y
Need this analysis for each behavior of services
X, Y and Z
Local Ontology
13
Web Services behaviors (5)

Ten behavior examples
Availability and Reliability
Performance
Management
Failure (Crash-only is one mode)
Security
Privacy, confidentiality and integrity
Scalability
Execution
Internationalization
Synchronization
Lets focus on a few of these behaviors

Source Advertising Service Properties,
unpublished paper by C. Hobbs, J. Bell, P. Sanchez
14
Availability and Reliability

Availability is the percentage of client
requests to which the server responds within the
time it advertised.
Reliability is the percentage of such server
responses which return the correct answer.
In some applications availability is more
important than reliability
Many protocols used within the Internet, for
example, are self-correcting and an occasional
wrong answer is unimportant. The failure to give
any answer, however, can cause a major network
upheaval.

15
Availability and Reliability (2)

In other applications reliability is more
important than availability
If the service which calculates a persons annual
tax return does not respond occasionally its not
a major problem - the user can try again
If that service does respond but with the wrong
answer which is submitted to the tax authorities,
then it could be disastrous

16
Availability and Reliability (3)

Services are built with either availability or
reliability in mind, with clients accepting that
no service can ever be 100 available or 100
reliable.
In combining services X and Y into a composite
service Z, it is necessary to combine the
underlying availability and reliability models
and predict Zs model.
To do so without manual intervention, Xs and Ys
models must be exposed.

17
Availability and Reliability (4)

Availability and reliability models are often
expressed as Markov Models or Petri Nets, which
are easy to combine in a hierarchical way.
Major issues
Agreeing upon the semantics of the states in the
Markov model or places in the Petri nets
Finding a way for X and Y to publish the models
in a standard form.

18
Availability and Reliability (5)

Currently, apart from raw percentage figures,
there is no method for describing these models
Percentage time when the server is unavailable?
Percentage of requests to which it does not
reply?
Different clients may experience these
differently
A server which is unavailable from 0000 to 0400
every day can be 100 available to a client that
only tries to access it in the afternoons.

19
Availability and Reliability (6)

If X and Y are distributed, then it is possible,
following network failures, that for some
customers, Z can access X but not Y and for
others Y but not X.
The assessment of Zs availability may be hard to
quantify, so it may be difficult for Z to offer a
meaningful WSLA.

20
Failure

The failure models of X and Y may be very
different
X fails cleanly and may, because of its
idempotency, immediately be called again
Y has more complex failure modes
Z will add its own failure modes to those of X
and Y
Predicting the outcome could be very difficult
The complexity is increased because many
developers do not understand failure modeling
and, even were models to be published, their
combination would be difficult due to their
stochastic nature.

21
Failure (2)

One approach to describing a services failure
model
Service publishes the exceptions that it can
raise and associates the required consumer
behavior with each
Exception D may be thrown when the database is
locked by another process. Required action is to
try again after a random backoff period of not
less than 34ms.
Crash-only failure model is a simple starting
point for building a taxonomy of failure
behavior. This work is just beginning.

22
Scalability

A behavioral description and WSLA for the
composite service Z must include its scalability
How many simultaneous service instances can it
support?
What service request rate does it handle? etc.
These parameters will almost certainly differ
between the component services X and Y, and will
need to be published by those services.
X and Y are presumably not dedicated solely to Z,
so the actual load being applied to X and Y at
any given time is unknown to the provider of Z,
making the scalability of Z even harder to
determine.

23
Web Services behaviors (again)

Ten behavior examples
Availability and Reliability
Performance
Management
Failure (Crash-only is one mode)
Security
Privacy, confidentiality and integrity
Scalability
Execution
Internationalization
Synchronization
We described a few of these behaviors
Can we use them to build WSLAs?

24
Web Service Level Agreement (WSLA)

Based on behaviors and descriptors for these
behaviors.
Example Failure model
Is transaction half-performed?
Is it re-wound?
These behaviors and descriptors are not available
in the WS description, in WSDL
No performance info
Not even price!

25
Web Service Level Agreements (2)

Business acceptance of composed services for
business-critical operations depends on a service
providers ability to offer WSLA
Uptime, response time, etc.
Offering a WSLA depends on ability to compose the
WSLA-related behaviors of the individual services
This information needs to be available via WSDL
or similar source
Should include test vectors to test the SLA
claims
The ability to determine and offer a WSLA
commitment is a limiting factor for widespread
acceptance of services based on orchestration

26
Web Service Level Agreements conclusions

Need a more precise way to express the parameters
of behaviors
Availability What is 99.97 uptime?
Several milliseconds outage each minute?
Several minutes planned downtime each month?
Failure model Crash-only as the simplest,
lowest layer or level of failure in a future full
failure model.
Eight other SLA-related behaviors listed here
each has a complex semantic for description and
composition
More questions than answers now - many PhDs still
to be earned in this area!

27
Back to the crash-only software model

Can it simplify service composition, testing,
development of WSLA, and end world hunger?

28
Crash-only software (1)

Historically, developers have spent a lot of
effort making software resilient
Put borders around it so it will not affect other
things if it fails
Try to close it down cleanly
Save state
Reload the software component
Restart and replay
Trying to keep the client from becoming aware
that a failure occurred

29
Crash-only software (2)

Years of work over last ten years on resilient
software - which stays up all the time, and
recovers from problems
For example, tutorials by Bev Littlewood
Crash-only software is the exact opposite
Client accepts that the server may crash
Power failure, network down, hardware, etc.
Client must be able to recover or restart the
process by itself

30
Crash-only software (3)

Crash-only principles
Forget recovery - more trouble than its worth
When the server senses a problem, it will crash
as cleanly as possible and may perform a
micro-reboot to return to original state
Sometimes recover to a well-defined checkpoint
Client may initiate the crash
The server is back working sooner than if it
tried to recover via logs and journals, etc.
Principles fit the Web Services paradigm nicely!
Loose coupling of services
Little state shared among services

31
Crash-Only Software (4)

Crash-only semantic has several advantages
Simpler macroscopic behavior with fewer
externally visible states
Reduces outage time by removing all shutting-down
time
Simplifies failure model by reducing recovery
state table size
Crashing can be invoked from outside the software
of the provider
Recovery from a failed state is notoriously
difficult and the crash-only paradigm coerces the
system into a known state without attempting
recovery
Reduce the complexity of the provider code
Simplifies testing by reducing the failure
combinations that have to be verified. Consumer
is assumed to be able to initiate the crash.

32
Crash-Only Web Services

Candeas list of properties required for a
crash-only system can be abstracted to match
properties of Web Services
Components have externally enforced boundaries.
This is supported by the virtual machine concept
used on many Web Service systems
All interactions between components have a
timeout. This is implicit in any loosely-coupled
Web Services interaction.
All resources are leased to the service rather
than being permanently allocated. This is
particularly useful in Web Services.
Requests are entirely self-describing. For
crash-only services this requires that the
request carries information about time-to-live
and idempotency will it return the same result
if invoked again?.
All important non-volatile state is managed by
dedicated state stores.

33
Crash-Only Reliable Web Service
WS-ReliableMessaging
Crash-Only Application Server
Web Services Endpoint
Web Service Consumer
Internet
Crash-Only Backend
Crash-only WSM
Stall Proxy
Crash-Only Backend
Recovery Agent
Reliable SOAP Protocol

For systems with hardware redundancy, by using
crash only techniques, SOAP WS-RM can be
extended in order to produce an always available
Web Service from the providers and consumers
point of view
WSLA response time may be at risk if a service is
forced to crash

34
Conclusions

Testing Web Services in an SOA environment is a
discipline that is still in its infancy
There are no standard models to describe or
combine Web Services behavior information across
various services and providers
Web Services SLAs (WSLAs) for composed services
are problematic
Testing is only a partial solution
Behavioral composition needs work, but is
promising
Crash-only Web Services can address some of these
difficulties
There are many related areas for further work

35
Q A

Write a Comment

User Comments (0)