Title: Balaji Prabhakar
1Coping with (exploiting) heavy tails
Balaji Prabhakar Departments of EE and
CSStanford University
balaji_at_stanford.edu
Balaji Prabhakar
2Overview
- SIFT Asimple algorithm for identifying large
flows - reducing average flow delays
- with smaller router buffers
- with Konstantinos Psounis and Arpita Ghosh
- Bandwidth at wireless proxy servers
- TDM vs FDM
- or, how may servers suffice
- with Pablo Molinero-Fernandez and Konstantinos
Psounis
3SIFT Motivation
- Egress buffers on router line cards at present
serve packets in a FIFO manner - The bandwidth sharing that results from this and
the actions of transport protocols like TCP
translates to some service order for flows that
isnt well understood that is, at the flow level
do we have - FIFO? PS? SRPT?
- (none of the above)
Egress Buffer
4SIFT Motivation
- But, serving packets according to the SRPT
(Shortest Remaining Processing Time) policy at
the flow level - would minimize average delay
- given the heavy-tailed nature of Internet flow
size distribution, the reduction in delay can be
huge
5SRPT at the flow level
Egress Buffer
- Next packet to depart under FIFO
- green
- Next packet to depart under SRPT
- orange
6But
- SRPT is unimplementable
- router needs to know residual flow sizes for all
enqueued flows virtually impossible to implement - Other pre-emptive schemes like SFF (shortest flow
first) or LAS (least attained service) are
like-wise too complicated to implement - This has led researchers to consider tagging
flows at the edge, where the number of distinct
flows is much smaller - but, this requires a different design of edge and
core routers - more importantly, needs extra space on IP packet
headers to signal flow size - Is something simpler possible?
7SIFT A randomized algorithm
- Flip a coin with bias p ( 0.01, say) for heads
on each arriving packet, independently from
packet to packet - A flow is sampled if one its packets has a head
on it -
H
T
T
T
T
T
H
8SIFT A randomized algorithm
- A flow of size X has roughly 0.01X chance of
being sampled - flows with fewer than 15 packets are sampled with
prob 0.15 - flows with more than 100 packets are sampled with
prob 1 - the precise probability is 1 (1-0.01)X
-
- Most short flows will not be sampled, most long
flows will be
9The accuracy of classification
- Ideally, we would like to sample like the blue
curve - Sampling with prob p gives the red curve
- there are false positives and false negatives
- Can we get the green curve?
Prob (sampled)
Flow size
10SIFT
- Sample with a coin of bias q 0.1
- say that a flow is sampled if it gets two
heads! - this reduces the chance of making errors
- but, you have to have a count the number heads
- So, how can we use SIFT at a router?
11SIFT at a router
- Sample incoming packets
- Place any packet with a head (or the second such
packet) in the low priority buffer - Place all further packets from this flow in the
low priority buffer (to avoid mis-sequencing)
12Simulation results
- Simulation results with ns-2
- Topology
13Overall Average Delays
14Average delay for short flows
15Delay for long flows
16Implementation requirements
- SIFT needs
- two logical queues in one physical buffer
- to sample arriving packets
- a table for maintaining id of sampled flows
- to check whether incoming packet belongs to
sampled flow or not - all quite simple to implement
17A big bonus
- The buffer of the short flows has very low
occupancy - so, can we simply reduce it drastically without
sacrificing performance? - More precisely, suppose
- we reduce the buffer size for the small flows,
increase it for the large flows, keep the total
the same as FIFO
18SIFT incurs fewer drops
Buffer_Size(Short flows) 10 Buffer_Size(Long
flows) 290 Buffer_Size(Single FIFO Queue)
300
SIFT ------ FIFO ------
19Reducing total buffer size
- Suppose we reduce the buffer size of the long
flows as well - Questions
- will packet drops still be fewer?
- will the delays still be as good?
20Drops under SIFT with less total buffer
Buffer_Size(PRQ0) 10 Buffer_Size(PRQ1)
190 Buffer_Size(One Queue) 300
SIFT ------ FIFO ------
One Queue
21Delay histogram for short flows
SIFT ------ FIFO ------
22Delay histogram for long flows
SIFT ------ FIFO ------
23Conclusions for SIFT
- A randomized scheme, preliminary results show
that - it has a low implementation complexity
- it reduces delays drastically
- (users are happy)
- with 30-35 smaller buffers at egress line cards
- (router manufacturers are happy)
- Lot more work needed
- at the moment we have a good understanding of how
to sample, - and extensive (and encouraging) simulation
tests - need to understand the effect of reduced buffers
on end-to-end congestion control algorithms
24How many servers do we need?
- Motivation Wireless and satellite
- Problem Single transmitter, multiple receivers
- bandwidth available for transmission W bits/sec
- should files be transferred to one receiver at a
time? (TDM) - or, should we divide the bandwidth into K
channels of W/K bits/sec and transmit to K
receivers at a time? (FDM) - For heavy-tailed jobs, K gt 1 minimizes mean delay
- Questions
- What is the right choice of K?
- How does it depend on flow-size distributions?
25A simulation HT file sizes
26The model
- Use an M/Heavy-Tailed/K queueing system
- service times X bimodal to begin with,
generalizes - P(XA) a 1-P(XB), where A lt E(X) B
and a ¼ 1 - the arrival rate is l
- Let SK, WK and DK be the service time, waiting
time and total delay in K server system - E(SK) K E(X) E(DK) K E(X) E(WK)
- Main idea in determining K
- have enough servers to take care of long jobs
- so that short jobs arent waiting for long
amounts of time - but no more
- because, otherwise, the service times become
big
27Approximately determining WK
- Consider two states servers blocked or not
- Blocked all K servers are busy serving long jobs
- E(WK) EWKblocked PB EWKunblocked
(1-PB) - PB ¼ P(there are at least K large arrivals in KB
time slots) - this is actually a lower bound, but accurate for
large B - with Poisson arrivals, PB is easy to find out
- E(WKunblocked) ¼ 0
- E(WKblocked) ¼ E(W1), which can be determined
from the P-K formula
28Putting it all together
29M/Bimodal/K
30M/Pareto/K
31M/Pareto/K higher moments
32The optimal number of servers
rload gPareto K simulation K formula
0.1 1.1 3 3
0.5 1.1 8 9
0.8 1.1 50 46
0.1 1.3 2 2
0.5 1.3 6 6
0.8 1.3 20 22