Adaptive Overload Control for Busy Internet Servers - PowerPoint PPT Presentation

About This Presentation

Title:

Adaptive Overload Control for Busy Internet Servers

Description:

Adaptive Overload Control for Busy Internet Servers. Matt Welsh and ... Boundless queue at reject handler. Parameters: How to tune them? How difficult to tune? ... – PowerPoint PPT presentation

Number of Views:253

Avg rating:3.0/5.0

Slides: 27

Provided by: Spoo1

Category:

more less

Transcript and Presenter's Notes

Title: Adaptive Overload Control for Busy Internet Servers

1
Adaptive Overload Control for Busy Internet
Servers

Matt Welsh and David Culler
USENIX Symposium on Internet Technologies and
Systems (USITS)
2003
Alex Cheung
Nov 13, 2006
ECE1747

2
Outline

Motivation
Goal
Methodology
Detection
Overload control
Experiments
Comments

3
Motivation

Internet services becoming important to our daily
lives
Email
News
Trading
Services becoming more complex
Large dynamic content
Requires high computation and I/O
Hard to predict load requirements of requests
Withstand peak load that is 1000x the norm
without over-provisioning
Solve CNNs problem on 911

4
Goal

Adaptive overload control scheme at node level by
maintaining
Response time
Throughput
QoS Availability

5
Methodology - Detection

Look at the 90th percentile response time
Compare with threshold and decide what to do
Weaker alternatives
100th percentile does not capture shape of
response time curve
Throughput does not capture user perceived
performance of the system
I ask
What makes 90th percentile so great?
Why not 95th? 80th? 70th?
No supporting micro-experiment

Requests served
1
2
3
4
5
6
7
8
9
10
Examine 90th highest response time
6
Methodology Overload Control

If response time is higher than threshold
Limit service rate by rejecting selected requests
Extension Differentiate requests with
classes/priorities levels and reject lower
class/priority requests first
Quality/service degradation
Back pressure
Queue explosion at 1st stage (they say)
Solved by rejecting requests at 1st stage
Breaks the loose-coupling modular design of SEDA
with out-of-band notification scheme (I say)

7
Methodology Overload Control

Forward rejected request to another more
available server.
more available server with the most of a
particular resource
CPU, network, I/O, hard disk
Make decision using centralized or distributed
algorithm
Reliable state migration, possibly transactional
My take
More complex, interesting, and actually solves
CNNs problem with a cluster of servers!

8
Rate Limit
SMOOTHED
Multiplicative decrease Additive increase Just
like TCP!

10 fine-tuned parameters per stage.

9
Rate Limit With Class/Priority

Class/priority assignment based on
IP address, header information, HTTP cookies
I ask
Where is the priority assignment module
implemented?
Should priority assignment be a stage of its own?
Is it not shown because complicates the diagram
and makes the stage design not clean?
How to classify which requests are potentially
bottleneck requests? Application dependent?

10
Quality/Service Degradation

Notify application via signal to DO service
degradation.
Application does service degradation, not SEDA
Questions
How is the signaling implemented?
Out of band?
Is it possible to signal previous stages in the
pipeline? Will this SEDAs loose-coupling design?

signal
11

Experiments

12
Experiments - Setup

Arashi email server (realistic experiment)
Real access workload
Real email content
Admission control
Web server benchmark
Service degradation 1-class admission control

13
Experiments Admission Rate
14
Experiments Response Time
15
Experiments Massive Load Spike
Not fair! SEDAs parameters were fine-tuned.
Apache can be tuned to stay flat too.
16
Experiments Service Degradation
17
Experiments Service Differentiation

Average reject rates without service
differentiation
Low-priority 55.5
High-priority 57.6
With service differentiation
Low-priority 87.9 32.4
High-priority 48.8 -8.8
Question
Why is the drop rate for high priority request
reduced so little with service differentiation?
Workload dependent?

Comments

19
Comments

No idea on what is the controllers overhead
Overload control at node level is not good
Node level is inefficient
Late rejection
Node level is not user friendly
All session state is gone if you get a reject out
of the blues ? comes without warning
Need global level overload control scheme
Idea/concept is explained in 2.5 pages

20
Comments

Rejected requests
Instead of TCP timeout, send static page.
(Paper says) this is better
(I say) This is worst because it leads to a
out-of-memory crash down the road
Saturated output bandwidth
Boundless queue at reject handler
Parameters
How to tune them? How difficult to tune?
May be tedious tuning each stage manually.
Given a 1M stage application, need to configure
all 1M stage thresholds manually?
Automated tuning with control theory?
Methodology of adding extensions is not shown in
any figures.

21
Comments

Experiment is not entirely realistic
Inter-request think time is 20ms
realistic?
Rejected users have to re-login after 5 min
All state information is gone
Frustrated users
Two drawbacks of using response time for load
detection

22
Comments

No idea which resource is the bottleneck CPU?
I/O? Network? Memory?
SEDA can only either
Do admission control
Reduces throughput
Tell application to degrade overall service

23
Comments
OVERLOADED!!!
Attach image
Send response
Default admission control
threshold
Resource utilization
CPU I/O Network Memory
Reject requests
24
Comments
Service degradation WITH bottleneck intelligence
threshold
Resource utilization
CPU I/O Network Memory
Network is the bottleneck, so expend some CPU and
memory to reduce fidelity and size of images to
reduce bandwidth consumption WITHOUT reducing
admission rate.
25
Comments

The response time index is lagging by at least
the magnitude of the response time
50 requests come in all at once
nreq 100
timeout 10s
target 20s
Processing time per request 1s
Detects overload after 30s
Solution
Compare enqueue VS dequeue rate
Overload occurs when enqueue rate gt dequeue rate
Detects overload after 10s

26
Questions?

Write a Comment

User Comments (0)