Title: Dependence Isolation for Threadbased Multitier Internet Services
1Dependence Isolation for Thread-based Multi-tier
Internet Services
- Lingkun Chu, Kai Shen, Hong Tang, Tao Yang,
Jingyu Zhou - Ask Jeeves Inc.
- University of Rochester
- University of California, Santa Barbara
2Motivation
- Large scale cluster-based Internet services
- Google, Yahoo, Ask Jeeves Inc.
- Improve availability of thread-based network
services - Challenges
- Component dependencies in multi-tier services
- Slow responsive or unresponsive failures
- Propose a technique to isolation component
dependencies and provide per-dependency
management.
3Component Dependencies in Multi-tier Internet
Services
Tier 1
Index servers (partition 1)
Tier 2
Query caches
Bypass-able dependency
Aggregation dependency
Firewall/ Traffic switch
Web server/ Query handler
Local-area network
Index servers (partition 2)
Replication dependency
Doc server (partition 2)
Index servers (partition 3)
Doc server (partition 1)
4Problem Scenario
- Bounded pools
- Avoid context switch overhead
- Avoid poor caching performance
- Problem
- All threads can be blocked due to a slow
responsive service provider. - Solution
- Capture dependency states and provide
per-dependency management.
Requests
Queue
Service B
Replica 2
Thread Pool
(Healthy)
Service A
(From healthy to unresponsive)
5Problem Statement Objectives
- How to recognize the service dependency and
design a mechanism to isolate dependency and
provide per-dependency management to tolerate
component failure or unresponsiveness under
bounded resource?
6Proposed Technique Dependency isolation
- A mechanism monitors and manages the blocking
states of a thread at a fine-grain level based on
service dependency. - A request a number of states in accessing
network services or local I/O devices. - Use dependency capsules to model those states.
- Caller-side admission control
- Feedback-based failure management
7Dependency capsules
- A capsule is a schedulable entity that includes
request handlers, kernel threads, management
policy, and statistics.
Capsule topology for dependency isolation at each
cluster node
8State transition diagram
- Roundtrip migration cost two kernel-level
context switches plus two user-level context
switches - 40us on a P-3 450MHz PC
- 16.5us on a P-4 2.4GHz PC
State transition and capsule migration of a
user-level thread
9Capsule Specification
- Each capsule is uniquely identified by its name
and category. Additionally, we can specify - The number of kernel threads that are bounded to
the capsule. - The maximum number of user-level threads that can
reside in the capsule - The scheduling policy FIFO, priority-based or
user provided policy. - The timeout value
- The above parameters can be specified via
configuration files and API functions.
10Per-dependency Capsule Statistics
- Performance Data
- The number of outstanding requests
- Their elapsed waiting time
- Recent average response time
- Usage
- Caller-side Admission Control
- Apply admission control on the caller side.
- Feedback-based Failure Management
- Provide feedback information to upper layer
middleware or applications so that service
callers can bypass the problematic component.
11Software Layers of Dependency Isolation
12Evaluation Objectives and Setup
- Study the overhead of introducing dependency
isolation. - Demonstrate the improved availability by
comparing with the traditional multithreading. - Demonstrate the effectiveness of
- Caller-side admission control
- Feedback-based failure management
- Traces
- One query trace from Ask Jeeves for RET
- One trace from online discussion forum
www.melissavirus.com dated 4/3/1999 for BBS - A synthesized trace for RUBiS
13Application benchmarks
- Retriever service (RET) with Aggregation
dependency - Bulletin board service (BBS) with Bypass-able
dependency - Auction services (RUBiS) with Replication
dependency. - Neptune platform RET, BBS
- J2EE platform RUBiS
14Overhead of Dependency Isolation (applications)
- Application performance with and without
dependency isolation.
15Improving Availability
- Throughput of RET with multithreading or
dependency capsules before/during/after a failure.
16Caller-side Admission Control
- We require that all partition data are available
for this experiment.
17Feedback Mechanism
- Throughput of BBS using the traditional
multithreading, dependency capsules without
feedback or dependency capsules with feedback
before/during/after a failure.
18Summaries
- The main contribution of this work
- Proposed a dependency isolation scheme for
improving availability of multi-tier Internet
service clusters and our evaluation has proved
its effectiveness. - Objectives
- dependency-aware concurrency management.
- dependency-specific management for better
availability and performance.