Title: Fault Tolerance for Compound Web Services
1Fault Tolerance for Compound Web Services
- Geoffrey Gamble
- CSE 294
- 2/15/08
2References
- Thema Byzantine-fault-tolerant middleware for
Web-service applicationsMerideth, M.G. Arun
Iyengar Mikalsen, T. Tai, S. Rouvellou, I.
Narasimhan, P., Reliable Distributed Systems,
2005. SRDS 2005. 24th IEEE Symposium on , vol.,
no., pp. 131-140, 26-28 Oct. 2005BASE Using
abstraction to improve fault tolerance - Castro, M., Rodrigues, R., and Liskov, B. 2003.
ACM Trans. Comput. Syst. 21, 3 (Aug. 2003),
236-269. DOI http//doi.acm.org/10.1145/859716.85
9718Practical Byzantine fault toleranceMiguel
Castro , Barbara Liskov, Proceedings of the third
symposium on Operating systems design and
implementation, p.173-186, February 1999, New
Orleans, Louisiana, United States
3Questions to be Addressed
- What is fault tolerance?
- What kind of faults are we concerned about?
- Why is fault tolerance important in distributed
systems? - How do we increase the fault tolerance of a
distributed system? - Can we adapt traditional fault tolerance
solutions to web services based systems?
4What Is Fault Tolerance?
- Fault Tolerance
- Property of a system which allows it to continue
operating properly when some of its components
fail
5Properties of a Fault Tolerant System
- No Single Point of Failure
- Uninterrupted Repair
- Fault Containment
6What Kinds of Faults are We Talking About?
- Crash
- - Servers O.S. crashes
- - Servers hardware fails
- - Program halts due to unexpected conditions
- Incorrect Execution
- - Program continues to execute but the results
are undesirable - - Malicious agent has hijacked a node
7Importance of Fault Tolerance
THE MONOLITHIC APPLICATION
8Importance of Fault Tolerance
9Importance of Fault Tolerance
10Importance of Fault Tolerance
11Increasing the Fault Tolerance of a Distributed
System
- Byzantine Fault Tolerance (BFT)
12Standard Byzantine Fault Tolerance Example
13Standard Byzantine Fault Tolerance Example
14Problems With Standard BFT Implementations
15Applying BFT to Distributed Web Service Based
Applications
- A framework called THEMA brings BFT to web
service/SOA based systems - - Provides BFT to distributed applications
composed of compound web services - - Web services easily communicate across
organizational bounds while maintaining BFT - - Supports a 'mixed-fault' environment
16THEMA Enables BFT For Web Services
17How Does THEMA Work?
18Performance Ramifications
- Results for TPC-W benchmark paint Thema in a
favorable light, but are questionable upon
examination
19Questions That Have Been Addressed
- What is fault tolerance?
- What kind of faults are we concerned about?
- Why is fault tolerance important in distributed
systems? - How do we increase the fault tolerance of a
distributed system? - Can we adapt traditional fault tolerance
solutions to web services based systems?
20Conclusions
- Trade-offs, Thema BFT at what price?
- Costs/Drawbacks
- Computational overhead of extra network layer
- Need for external clients and services to
install Thema - Massive increase in necessary computational
resources - System management becomes much more difficult
21Conclusions
- Trade-offs Thema BFT at what price?
- Benefits
- Theoretically, this can lead to a robust web
service based system - A good fit for large, distributed systems where
availability/reliability are a primary concern - Makes BFT more accessible by harmonizing with
SOAP/WSDL - Makes BFT transparent to developers who
understand web service protocols - Opens closed BFT systems to web services outside
of organizational bounds