Title: Making Services Fault Tolerant
1Making Services Fault Tolerant
- Pat Chan
- Department of Computer Science and Engineering
- The Chinese University of Hong Kong
- 27 June 2006
2Outline
- Introduction
- Problem Statement
- Methodologies for Web Service Reliability
- New Reliable Web Service Paradigm
- Road Map for Experiment
- Experimental Results and Discussion
- Conclusion
- Future Directions
3Introduction
- Service-oriented computing is becoming a reality.
- Service-oriented Architectures (SOA) are based on
a simple model of roles. - The problems of service dependability, security
and timeliness are becoming critical. - We propose experimental settings and offer a
roadmap to dependable Web services.
4Problem Statement
- Fault-tolerant techniques
- Replication
- Diversity
- Replication is one of the efficient ways for
providing reliable systems by time or space
redundancy. - Increasing the availability of distributed
systems - Key components are re-executed or replicated
- Protect against hardware malfunctions or
transient system faults. - Another efficient technique is design diversity.
- By independently designing software systems or
services with different programming teams, - Resort in defending against permanent software
design faults. - We focus on the analysis of the replication
techniques when applied to Web services. - A generic Web service system with spatial as well
as temporal replication is proposed and
investigated.
5Methodologies for reliable Web services --
Redundancy
- Spatial redundancy
- Static redundancy, all replicas are active at the
same time and voting takes place to obtain a
correct result. - Dynamic redundancy engages one active replica at
one time while others are kept in an active or in
standby state. - Temporal redundancy
- Redundant in time
6Methodologies for reliable Web services --
Diversity
- Protect redundant systems against common-mode
failures - With different designs and implementations,
common failure modes will probably cause
different error effects. - N-version programming, recovery blocks
7Failure Response Stages of Web Services
- Fault confinement
- Fault detection
- Diagnosis
- Fail-over
- Reconfiguration
- Recovery
- Restart
- Repair
- Reintegration
8(No Transcript)
9Proposed Paradigm
10Work Flow of the Replication Manager
11Road Map for Experiment Research
- Redundancy in time
- Redundancy in space
- Sequentially
- Parallel
- Majority voting using N modular redundancy
- Diversified version of different services
12Experiments
- A series of experiments are designed and
performed for evaluating the reliability of the
Web service, - single service without replication,
- single service with retry or reboot and,
- service with spatial replication.
- We will also perform retry or failover when the
Web service is down.
13Summary of the experiments
14Parameters of the Experiments
15Experimental Results
Retry 11.97 to 4.93
Reboot 11.97 to 6.44
Failover 11.97 to 3.56
Retry and Failover 11.97 to 2.59
16Number of failure when the server is is normal
situation
17Number of failure when the server is busy
18Number of failure when the server reboots
periodically
19Reliability of the system over time
20Reliability Model
21Reliability Model Parameters
22Outcome (SHARPE)
Reliability of the proposed system
Failure Rate 0.228 0.114 0.057
23Conclusion
- Surveyed replication and design diversity
techniques for reliable services. - Proposed a hybrid approach to improving the
availability of Web services. - Carried out a series of experiments to evaluate
the availability and reliability of the proposed
Web service system. - N-Version Programming may finally become
commercially viable in service environment.
24Future Directions
- Make experiments comprehensive range of service
arrival rates or distributions, failure rates,
polling frequency, etc. - Improve the current fault-tolerant techniques
- Current approach can deal with hardware and
software failures. - How about software fault detectors?
- N-version programming
- Different providers provide different solutions.
- There is a problem in failover or switch between
the Web Services.
25Future Directions
- Modeling the Web Service behavior
- The behavior of the We Services can be studied.
- Modeling the failure
- Study the failure model of Web Service.