Adaptive Overload Control for Busy Internet Servers - PowerPoint PPT Presentation

About This Presentation
Title:

Adaptive Overload Control for Busy Internet Servers

Description:

Adaptive Overload Control for Busy Internet Servers. Matt Welsh and ... Boundless queue at reject handler. Parameters: How to tune them? How difficult to tune? ... – PowerPoint PPT presentation

Number of Views:253
Avg rating:3.0/5.0
Slides: 27
Provided by: Spoo1
Category:

less

Transcript and Presenter's Notes

Title: Adaptive Overload Control for Busy Internet Servers


1
Adaptive Overload Control for Busy Internet
Servers
  • Matt Welsh and David Culler
  • USENIX Symposium on Internet Technologies and
    Systems (USITS)
  • 2003
  • Alex Cheung
  • Nov 13, 2006
  • ECE1747

2
Outline
  • Motivation
  • Goal
  • Methodology
  • Detection
  • Overload control
  • Experiments
  • Comments

3
Motivation
  • Internet services becoming important to our daily
    lives
  • Email
  • News
  • Trading
  • Services becoming more complex
  • Large dynamic content
  • Requires high computation and I/O
  • Hard to predict load requirements of requests
  • Withstand peak load that is 1000x the norm
    without over-provisioning
  • Solve CNNs problem on 911

4
Goal
  • Adaptive overload control scheme at node level by
    maintaining
  • Response time
  • Throughput
  • QoS Availability

5
Methodology - Detection
  • Look at the 90th percentile response time
  • Compare with threshold and decide what to do
  • Weaker alternatives
  • 100th percentile does not capture shape of
    response time curve
  • Throughput does not capture user perceived
    performance of the system
  • I ask
  • What makes 90th percentile so great?
  • Why not 95th? 80th? 70th?
  • No supporting micro-experiment

Requests served
1
2
3
4
5
6
7
8
9
10
Examine 90th highest response time
6
Methodology Overload Control
  • If response time is higher than threshold
  • Limit service rate by rejecting selected requests
  • Extension Differentiate requests with
    classes/priorities levels and reject lower
    class/priority requests first
  • Quality/service degradation
  • Back pressure
  • Queue explosion at 1st stage (they say)
  • Solved by rejecting requests at 1st stage
  • Breaks the loose-coupling modular design of SEDA
    with out-of-band notification scheme (I say)

7
Methodology Overload Control
  • Forward rejected request to another more
    available server.
  • more available server with the most of a
    particular resource
  • CPU, network, I/O, hard disk
  • Make decision using centralized or distributed
    algorithm
  • Reliable state migration, possibly transactional
  • My take
  • More complex, interesting, and actually solves
    CNNs problem with a cluster of servers!

8
Rate Limit
SMOOTHED
Multiplicative decrease Additive increase Just
like TCP!
  • 10 fine-tuned parameters per stage.

9
Rate Limit With Class/Priority
  • Class/priority assignment based on
  • IP address, header information, HTTP cookies
  • I ask
  • Where is the priority assignment module
    implemented?
  • Should priority assignment be a stage of its own?
  • Is it not shown because complicates the diagram
    and makes the stage design not clean?
  • How to classify which requests are potentially
    bottleneck requests? Application dependent?

10
Quality/Service Degradation
  • Notify application via signal to DO service
    degradation.
  • Application does service degradation, not SEDA
  • Questions
  • How is the signaling implemented?
  • Out of band?
  • Is it possible to signal previous stages in the
    pipeline? Will this SEDAs loose-coupling design?

signal
11
  • Experiments

12
Experiments - Setup
  • Arashi email server (realistic experiment)
  • Real access workload
  • Real email content
  • Admission control
  • Web server benchmark
  • Service degradation 1-class admission control

13
Experiments Admission Rate
14
Experiments Response Time
15
Experiments Massive Load Spike
Not fair! SEDAs parameters were fine-tuned.
Apache can be tuned to stay flat too.
16
Experiments Service Degradation
17
Experiments Service Differentiation
  • Average reject rates without service
    differentiation
  • Low-priority 55.5
  • High-priority 57.6
  • With service differentiation
  • Low-priority 87.9 32.4
  • High-priority 48.8 -8.8
  • Question
  • Why is the drop rate for high priority request
    reduced so little with service differentiation?
    Workload dependent?

18
  • Comments

19
Comments
  • No idea on what is the controllers overhead
  • Overload control at node level is not good
  • Node level is inefficient
  • Late rejection
  • Node level is not user friendly
  • All session state is gone if you get a reject out
    of the blues ? comes without warning
  • Need global level overload control scheme
  • Idea/concept is explained in 2.5 pages

20
Comments
  • Rejected requests
  • Instead of TCP timeout, send static page.
  • (Paper says) this is better
  • (I say) This is worst because it leads to a
    out-of-memory crash down the road
  • Saturated output bandwidth
  • Boundless queue at reject handler
  • Parameters
  • How to tune them? How difficult to tune?
  • May be tedious tuning each stage manually.
  • Given a 1M stage application, need to configure
    all 1M stage thresholds manually?
  • Automated tuning with control theory?
  • Methodology of adding extensions is not shown in
    any figures.

21
Comments
  • Experiment is not entirely realistic
  • Inter-request think time is 20ms
  • realistic?
  • Rejected users have to re-login after 5 min
  • All state information is gone
  • Frustrated users
  • Two drawbacks of using response time for load
    detection

22
Comments
  • No idea which resource is the bottleneck CPU?
    I/O? Network? Memory?
  • SEDA can only either
  • Do admission control
  • Reduces throughput
  • Tell application to degrade overall service

23
Comments
OVERLOADED!!!
Attach image
Send response
Default admission control
threshold
Resource utilization
CPU I/O Network Memory
Reject requests
24
Comments
Service degradation WITH bottleneck intelligence
threshold
Resource utilization
CPU I/O Network Memory
Network is the bottleneck, so expend some CPU and
memory to reduce fidelity and size of images to
reduce bandwidth consumption WITHOUT reducing
admission rate.
25
Comments
  • The response time index is lagging by at least
    the magnitude of the response time
  • 50 requests come in all at once
  • nreq 100
  • timeout 10s
  • target 20s
  • Processing time per request 1s
  • Detects overload after 30s
  • Solution
  • Compare enqueue VS dequeue rate
  • Overload occurs when enqueue rate gt dequeue rate
  • Detects overload after 10s

26
Questions?
Write a Comment
User Comments (0)
About PowerShow.com