Title: Traffic management Concepts, Issues and Challenges
1Traffic managementConcepts, Issues and Challenges
- S. Keshav
- Cornell University
- ACM SIGCOMM 97, Cannes
- September 15th 1997
2An example
- Executive participating in a worldwide
videoconference - Proceedings are videotaped and stored in an
archive - Edited and placed on a Web site
- Accessed later by others
- During conference
- Sends email to an assistant
- Breaks off to answer a voice call
3What this requires
- For video
- sustained bandwidth of at least 64 kbps
- low loss rate
- For voice
- sustained bandwidth of at least 8 kbps
- low loss rate
- For interactive communication
- low delay (lt 100 ms one-way)
- For playback
- low delay jitter
- For email and archiving
- reliable bulk transport
4What if
- A million executives were simultaneously
accessing the network? - What capacity should each trunk have?
- How should packets be routed? (Can we spread load
over alternate paths?) - How can different traffic types get different
services from the network? - How should each endpoint regulate its load?
- How should we price the network?
- These types of questions lie at the heart of
network design and operation, and form the basis
for traffic management.
5Traffic management
- Set of policies and mechanisms that allow a
network to efficiently satisfy a diverse range of
service requests - Tension is between diversity and efficiency
- Traffic management is necessary for providing
Quality of Service (QoS) - Subsumes congestion control (congestion loss
of efficiency)
6Why is it important?
- One of the most challenging open problems in
networking - Commercially important
- AOL burnout
- Perceived reliability (necessary for
infrastructure) - Capacity sizing directly affects the bottom line
- At the heart of the next generation of data
networks - Traffic management Connectivity Quality of
Service
7Outline
- Economic principles
- Traffic classes
- Time scales
- Mechanisms
- Some open problems
8Basics utility function
- Users are assumed to have a utility function that
maps from a given quality of service to a level
of satisfaction, or utility - Utility functions are private information
- Cannot compare utility functions between users
- Rational users take actions that maximize their
utility - Can determine utility function by observing
preferences
9Example
- Let u S - a t
- u utility from file transfer
- S satisfaction when transfer infinitely fast
- t transfer time
- a rate at which satisfaction decreases with
time - As transfer time increases, utility decreases
- If t gt S/a, user is worse off! (reflects time
wasted) - Assumes linear decrease in utility
- S and a can be experimentally determined
10Social welfare
- Suppose network manager knew the utility function
of every user - Social Welfare is maximized when some combination
of the utility functions (such as sum) is
maximized - An economy (network) is efficient when increasing
the utility of one user must necessarily decrease
the utility of another - An economy (network) is envy-free if no user
would trade places with another (better
performance also costs more) - Goal maximize social welfare
- subject to efficiency, envy-freeness, and making
a profit
11Example
- Assume
- Single switch, each user imposes load 0.4
- As utility 4 - d
- Bs utility 8 - 2d
- Same delay to both users
- Conservation law
- 0.4d 0.4d C gt d 1.25 C gt sum of utilities
12-3.75 C - If Bs delay reduced to 0.5C, then As delay 2C
- Sum of utilities 12 - 3C
- Increase in social welfare need not benefit
everyone - A loses utility, but may pay less for service
12Some economic principles
- A single network that provides heterogeneous QoS
is better than separate networks for each QoS - unused capacity is available to others
- Lowering delay of delay-sensitive traffic
increased welfare - can increase welfare by matching service menu to
user requirements - BUT need to know what users want (signaling)
- For typical utility functions, welfare increases
more than linearly with increase in capacity - individual users see smaller overall fluctuations
- can increase welfare by increasing capacity
13Principles applied
- A single wire that carries both voice and data is
more efficient than separate wires for voice and
data - ADSL
- IP Phone
- Moving from a 20 loaded10 Mbps Ethernet to a 20
loaded 100 Mbps Ethernet will still improve
social welfare - increase capacity whenever possible
- Better to give 5 of the traffic lower delay than
all traffic low delay - should somehow mark and isolate low-delay traffic
14The two camps
- Can increase welfare either by
- matching services to user requirements or
- increasing capacity blindly
- Which is cheaper?
- no one is really sure!
- small and smart vs. big and dumb
- It seems that smarter ought to be better
- otherwise, to get low delays for some traffic, we
need to give all traffic low delay, even if it
doesnt need it - But, perhaps, we can use the money spent on
traffic management to increase capacity - We will study traffic management, assuming that
it matters!
15Outline
- Economic principles
- Traffic classes
- Time scales
- Mechanisms
- Some open problems
16Traffic classes
- Networks should match offered service to source
requirements (corresponds to utility functions) - Example telnet requires low bandwidth and low
delay - utility increases with decrease in delay
- network should provide a low-delay service
- or, telnet belongs to the low-delay traffic class
- Traffic classes encompass both user requirements
and network service offerings
17Traffic classes - details
- A basic division guaranteed service and best
effort - like flying with reservation or standby
- Guaranteed-service
- utility is zero unless app gets a minimum level
of service quality - bandwidth, delay, loss
- open-loop flow control with admission control
- e.g. telephony, remote sensing, interactive
multiplayer games - Best-effort
- send and pray
- closed-loop flow control
- e.g. email, net news
18GS vs. BE (cont.)
- Degree of synchrony
- time scale at which peer endpoints interact
- GS are typically synchronous or interactive
- interact on the timescale of a round trip time
- e.g. telephone conversation or telnet
- BE are typically asynchronous or non-interactive
- interact on longer time scales
- e.g. Email
- Sensitivity to time and delay
- GS apps are real-time
- performance depends on wall clock
- BE apps are typically indifferent to real time
- automatically scale back during overload
19Traffic subclasses (roadmap)
- ATM Forum
- based on sensitivity to bandwidth
- GS
- CBR, VBR
- BE
- ABR, UBR
- IETF
- based on sensitivity to delay
- GS
- intolerant
- tolerant
- BE
- interactive burst
- interactive bulk
- asynchronous bulk
20ATM Forum GS subclasses
- Constant Bit Rate (CBR)
- constant, cell-smooth traffic
- mean and peak rate are the same
- e.g. telephone call evenly sampled and
uncompressed - constant bandwidth, variable quality
- Variable Bit Rate (VBR)
- long term average with occasional bursts
- try to minimize delay
- can tolerate loss and higher delays than CBR
- e.g. compressed video or audio with constant
quality, variable bandwidth
21ATM Forum BE subclasses
- Available Bit Rate (ABR)
- users get whatever is available
- zero loss if network signals (in RM cells) are
obeyed - no guarantee on delay or bandwidth
- Unspecified Bit Rate (UBR)
- like ABR, but no feedback
- no guarantee on loss
- presumably cheaper
22IETF GS subclasses
- Tolerant GS
- nominal mean delay, but can tolerate occasional
variation - not specified what this means exactly
- uses controlled-load service
- book uses older terminology (predictive)
- even at high loads, admission control assures a
source that its service does not suffer - it really is this imprecise!
- Intolerant GS
- need a worst case delay bound
- equivalent to CBRVBR in ATM Forum model
23IETF BE subclasses
- Interactive burst
- bounded asynchronous service, where bound is
qualitative, but pretty tight - e.g. paging, messaging, email
- Interactive bulk
- bulk, but a human is waiting for the result
- e.g. FTP
- Asynchronous bulk
- junk traffic
- e.g netnews
24Some points to ponder
- The only thing out there is CBR and asynchronous
bulk! - These are application requirements. There are
also organizational requirements (link sharing) - Users needs QoS for other things too!
- billing
- privacy
- reliability and availability
25Outline
- Economic principles
- Traffic classes
- Time scales
- Mechanisms
- Some open problems
26Time scales
- Some actions are taken once per call
- tell network about traffic characterization and
request resources - in ATM networks, finding a path from source to
destination - Other actions are taken during the call, every
few round trip times - feedback flow control
- Still others are taken very rapidly,during the
data transfer - scheduling
- policing and regulation
- Traffic management mechanisms must deal with a
range of traffic classes at a range of time scales
27Summary of mechanisms at each time scale
- Less than one round-trip-time (cell-level)
- Scheduling and buffer management
- Regulation and policing
- Policy routing (datagram networks)
- One or more round-trip-times (burst-level)
- Feedback flow control
- Retransmission
- Renegotiation
28Summary (cont.)
- Session (call-level)
- Signaling
- Admission control
- Service pricing
- Routing (connection-oriented networks)
- Day
- Peak load pricing
- Weeks or months
- Capacity planning
29Outline
- Economic principles
- Traffic classes
- Mechanisms at each time scale
- Faster than one RTT
- scheduling and buffer management
- regulation and policing
- policy routing
- One RTT
- Session
- Day
- Weeks to months
- Some open problems
30Scheduling and buffer management
31Outline
- What is scheduling?
- Why we need it
- Requirements of a scheduling discipline
- Fundamental choices
- Scheduling best effort connections
- Scheduling guaranteed-service connections
- Packet drop strategies
32Scheduling
- Sharing always results in contention
- A scheduling discipline resolves contention
- whos next?
- Key to fairly sharing resources and providing
performance guarantees
33Components
- A scheduling discipline does two things
- decides service order
- manages queue of service requests
- Example
- consider queries awaiting web server
- scheduling discipline decides service order
- and also if some query should be ignored
34Where?
- Anywhere where contention may occur
- At every layer of protocol stack
- Usually studied at network layer, at output
queues of switches
35Outline
- What is scheduling
- Why we need it
- Requirements of a scheduling discipline
- Fundamental choices
- Scheduling best effort connections
- Scheduling guaranteed-service connections
- Packet drop strategies
36Why do we need one?
- Because future applications need it
- Recall that we expect two types of future
applications - best-effort (adaptive, non-real time)
- e.g. email, some types of file transfer
- guaranteed service (non-adaptive, real time)
- e.g. packet voice, interactive video, stock quotes
37What can scheduling disciplines do?
- Give different users different qualities of
service - Example of passengers waiting to board a plane
- early boarders spend less time waiting
- bumped off passengers are lost!
- Scheduling disciplines can allocate
- bandwidth
- delay
- loss
- They also determine how fair the network is
38Outline
- What is scheduling
- Why we need it
- Requirements of a scheduling discipline
- Fundamental choices
- Scheduling best effort connections
- Scheduling guaranteed-service connections
- Packet drop strategies
39Requirements
- An ideal scheduling discipline
- is easy to implement
- is fair
- provides performance bounds
- allows easy admission control decisions
- to decide whether a new flow can be allowed
40Requirements 1. Ease of implementation
- Scheduling discipline has to make a decision once
every few microseconds! - Should be implementable in a few instructions or
hardware - for hardware critical constraint is VLSI space
- Work per packet should scale less than linearly
with number of active connections
41Requirements 2. Fairness
- Scheduling discipline allocates a resource
- An allocation is fair if it satisfies min-max
fairness - Intuitively
- each connection gets no more than what it wants
- the excess, if any, is equally shared
Transfer half of excess
Unsatisfied demand
A
B
C
A
B
C
42Fairness (cont.)
- Fairness is intuitively a good idea
- But it also provides protection
- traffic hogs cannot overrun others
- automatically builds firewalls around heavy users
- Fairness is a global objective, but scheduling is
local - Each endpoint must restrict its flow to the
smallest fair allocation - Dynamics delay gt global fairness may never be
achieved
43Requirements 3. Performance bounds
- What is it?
- A way to obtain a desired level of service
- Can be deterministic or statistical
- Common parameters are
- bandwidth
- delay
- delay-jitter
- loss
44Bandwidth
- Specified as minimum bandwidth measured over a
prespecified interval - E.g. gt 5Mbps over intervals of gt 1 sec
- Meaningless without an interval!
- Can be a bound on average (sustained) rate or
peak rate - Peak is measured over a small interval
- Average is asymptote as intervals increase
without bound
45Delay and delay-jitter
- Bound on some parameter of the delay distribution
curve
46Reqments 4. Ease of admission control
- Admission control needed to provide QoS
- Overloaded resource cannot guarantee performance
- Choice of scheduling discipline affects ease of
admission control algorithm
47Outline
- What is scheduling
- Why we need it
- Requirements of a scheduling discipline
- Fundamental choices
- Scheduling best effort connections
- Scheduling guaranteed-service connections
- Packet drop strategies
48Fundamental choices
- 1. Number of priority levels
- 2. Work-conserving vs. non-work-conserving
- 3. Degree of aggregation
- 4. Service order within a level
49Choices 1. Priority
- Packet is served from a given priority level only
if no packets exist at higher levels (multilevel
priority with exhaustive service) - Highest level gets lowest delay
- Watch out for starvation!
- Usually map priority levels to delay classes
- Low bandwidth urgent messages
- Realtime
- Non-realtime
Priority
50Choices 2. Work conserving vs.
non-work-conserving
- Work conserving discipline is never idle when
packets await service - Why bother with non-work conserving?
51Non-work-conserving disciplines
- Key conceptual idea delay packet till eligible
- Reduces delay-jitter gt fewer buffers in network
- How to choose eligibility time?
- rate-jitter regulator
- bounds maximum outgoing rate
- delay-jitter regulator
- compensates for variable delay at previous hop
52Do we need non-work-conservation?
- Can remove delay-jitter at an endpoint instead
- but also reduces size of switch buffers
- Increases mean delay
- not a problem for playback applications
- Wastes bandwidth
- can serve best-effort packets instead
- Always punishes a misbehaving source
- cant have it both ways
- Bottom line not too bad, implementation cost may
be the biggest problem
53Choices 3. Degree of aggregation
- More aggregation
- less state
- cheaper
- smaller VLSI
- less to advertise
- BUT less individualization
- Solution
- aggregate to a class, members of class have same
performance requirement - no protection within class
54Choices 4. Service within a priority level
- In order of arrival (FCFS) or in order of a
service tag - Service tags gt can arbitrarily reorder queue
- Need to sort queue, which can be expensive
- FCFS
- bandwidth hogs win (no protection)
- no guarantee on delays
- Service tags
- with appropriate choice, both protection and
delay bounds possible
55Outline
- What is scheduling
- Why we need it
- Requirements of a scheduling discipline
- Fundamental choices
- Scheduling best effort connections
- Scheduling guaranteed-service connections
- Packet drop strategies
56Scheduling best-effort connections
- Main requirement is fairness
- Achievable using Generalized processor sharing
(GPS) - Visit each non-empty queue in turn
- Serve infinitesimal from each
- Why is this fair?
- How can we give weights to connections?
57More on GPS
- GPS is unimplementable!
- we cannot serve infinitesimals, only packets
- No packet discipline can be as fair as GPS
- while a packet is being served, we are unfair to
others - Degree of unfairness can be bounded
- Define work(I,a,b) bits transmitted for
connection I in time a,b - Absolute fairness bound for discipline S
- Max (work_GPS(I,a,b) - work_S(I, a,b))
- Relative fairness bound for discipline S
- Max (work_S(I,a,b) - work_S(J,a,b))
58What next?
- We cant implement GPS
- So, lets see how to emulate it
- We want to be as fair as possible
- But also have an efficient implementation
59Weighted round robin
- Serve a packet from each non-empty queue in turn
- Unfair if packets are of different length or
weights are not equal - Different weights, fixed packet size
- serve more than one packet per visit, after
normalizing to obtain integer weights - Different weights, variable size packets
- normalize weights by mean packet size
- e.g. weights 0.5, 0.75, 1.0, mean packet sizes
50, 500, 1500 - normalize weights 0.5/50, 0.75/500, 1.0/1500
0.01, 0.0015, 0.000666, normalize again 60,
9, 4
60Problems with Weighted Round Robin
- With variable size packets and different weights,
need to know mean packet size in advance - Can be unfair for long periods of time
- E.g.
- T3 trunk with 500 connections, each connection
has mean packet length 500 bytes, 250 with weight
1, 250 with weight 10 - Each packet takes 500 8/45 Mbps 88.8
microseconds - Round time 2750 88.8 244.2 ms
61Weighted Fair Queueing (WFQ)
- Deals better with variable size packets and
weights - GPS is fairest discipline
- Find the finish time of a packet, had we been
doing GPS - Then serve packets in order of their finish times
62WFQ first cut
- Suppose, in each round, the server served one bit
from each active connection - Round number is the number of rounds already
completed - can be fractional
- If a packet of length p arrives to an empty queue
when the round number is R, it will complete
service when the round number is R p gt finish
number is R p - independent of the number of other connections!
- If a packet arrives to a non-empty queue, and the
previous packet has a finish number of f, then
the packets finish number is fp - Serve packets in order of finish numbers
63A catch
- A queue may need to be considered non-empty even
if it has no packets in it - e.g. packets of length 1 from connections A and
B, on a link of speed 1 bit/sec - at time 1, packet from A served, round number
0.5 - A has no packets in its queue, yet should be
considered non-empty, because a packet arriving
to it at time 1 should have finish number 1 p - A connection is active if the last packet served
from it, or in its queue, has a finish number
greater than the current round number
64WFQ continued
- To sum up, assuming we know the current round
number R - Finish number of packet of length p
- if arriving to active connection previous
finish number p - if arriving to an inactive connection R p
- (How should we deal with weights?)
- To implement, we need to know two things
- is connection active?
- if not, what is the current round number?
- Answer to both questions depends on computing the
current round number (why?)
65WFQ computing the round number
- Naively round number number of rounds of
service completed so far - what if a server has not served all connections
in a round? - what if new conversations join in halfway through
a round? - Redefine round number as a real-valued variable
that increases at a rate inversely proportional
to the number of currently active connections - this takes care of both problems (why?)
- With this change, WFQ emulates GPS instead of
bit-by-bit RR
66Problem iterated deletion
- A sever recomputes round number on each packet
arrival - At any recomputation, the number of conversations
can go up at most by one, but can go down to zero - gt overestimation
- Trick
- use previous count to compute round number
- if this makes some conversation inactive,
recompute - repeat until no conversations become inactive
active conversations
Round number
67WFQ implementation
- On packet arrival
- use source destination address (or VCI) to
classify it and look up finish number of last
packet served (or waiting to be served) - recompute round number
- compute finish number
- insert in priority queue sorted by finish numbers
- if no space, drop the packet with largest finish
number - On service completion
- select the packet with the lowest finish number
68Analysis
- Unweighted case
- if GPS has served x bits from connection A by
time t - WFQ would have served at least x - P bits, where
P is the largest possible packet in the network - WFQ could send more than GPS would gt absolute
fairness bound gt P - To reduce bound, choose smallest finish number
only among packets that have started service in
the corresponding GPS system (WF2Q) - requires a regulator to determine eligible packets
69Evaluation
- Pros
- like GPS, it provides protection
- can obtain worst-case end-to-end delay bound
- gives users incentive to use intelligent flow
control (and also provides rate information
implicitly) - Cons
- needs per-connection state
- iterated deletion is complicated
- requires a priority queue
70Outline
- What is scheduling
- Why we need it
- Requirements of a scheduling discipline
- Fundamental choices
- Scheduling best effort connections
- Scheduling guaranteed-service connections
- Packet drop strategies
71Scheduling guaranteed-service connections
- With best-effort connections, goal is fairness
- With guaranteed-service connections
- what performance guarantees are achievable?
- how easy is admission control?
- We now study some scheduling disciplines that
provide performance guarantees
72WFQ
- Turns out that WFQ also provides performance
guarantees - Bandwidth bound
- ratio of weights link capacity
- e.g. connections with weights 1, 2, 7 link
capacity 10 - connections get at least 1, 2, 7 units of b/w
each - End-to-end delay bound
- assumes that the connection doesnt send too
much (otherwise its packets will be stuck in
queues) - more precisely, connection should be leaky-bucket
regulated - bits sent in time t1, t2 lt ? (t2 - t1) ?
73Parekh-Gallager theorem
- Let a connection be allocated weights at each WFQ
scheduler along its path, so that the least
bandwidth it is allocated is g - Let it be leaky-bucket regulated such that bits
sent in time t1, t2 lt ? (t2 - t1) ? - Let the connection pass through K schedulers,
where the kth scheduler has a rate r(k) - Let the largest packet allowed in the network be
P
74Significance
- Theorem shows that WFQ can provide end-to-end
delay bounds - So WFQ provides both fairness and performance
guarantees - Bound holds regardless of cross traffic behavior
- Can be generalized for networks where schedulers
are variants of WFQ, and the link service rate
changes over time
75Problems
- To get a delay bound, need to pick g
- the lower the delay bounds, the larger g needs to
be - large g gt exclusion of more competitors from
link - g can be very large, in some cases 80 times the
peak rate! - Sources must be leaky-bucket regulated
- but choosing leaky-bucket parameters is
problematic - WFQ couples delay and bandwidth allocations
- low delay requires allocating more bandwidth
- wastes bandwidth for low-bandwidth low-delay
sources
76Delay-Earliest Due Date
- Earliest-due-date packet with earliest deadline
selected - Delay-EDD prescribes how to assign deadlines to
packets - A source is required to send slower than its peak
rate - Bandwidth at scheduler reserved at peak rate
- Deadline expected arrival time delay bound
- If a source sends faster than contract, delay
bound will not apply - Each packet gets a hard delay bound
- Delay bound is independent of bandwidth
requirement - but reservation is at a connections peak rate
- Implementation requires per-connection state and
a priority queue
77Rate-controlled scheduling
- A class of disciplines
- two components regulator and scheduler
- incoming packets are placed in regulator where
they wait to become eligible - then they are put in the scheduler
- Regulator shapes the traffic, scheduler provides
performance guarantees
78Examples
- Recall
- rate-jitter regulator
- bounds maximum outgoing rate
- delay-jitter regulator
- compensates for variable delay at previous hop
- Rate-jitter regulator FIFO
- similar to Delay-EDD (what is the difference?)
- Rate-jitter regulator multi-priority FIFO
- gives both bandwidth and delay guarantees (RCSP)
- Delay-jitter regulator EDD
- gives bandwidth, delay,and delay-jitter bounds
(Jitter-EDD)
79Analysis
- First regulator on path monitors and regulates
traffic gt bandwidth bound - End-to-end delay bound
- delay-jitter regulator
- reconstructs traffic gt end-to-end delay is fixed
( worst-case delay at each hop) - rate-jitter regulator
- partially reconstructs traffic
- can show that end-to-end delay bound is smaller
than (sum of delay bound at each hop delay at
first hop)
80Decoupling
- Can give a low-bandwidth connection a low delay
without overbooking - E.g consider connection A with rate 64 Kbps sent
to a router with rate-jitter regulation and
multipriority FCFS scheduling - After sending a packet of length l, next packet
is eligible at time (now l/64 Kbps) - If placed at highest-priority queue, all packets
from A get low delay - Can decouple delay and bandwidth bounds, unlike
WFQ
81Evaluation
- Pros
- flexibility ability to emulate other disciplines
- can decouple bandwidth and delay assignments
- end-to-end delay bounds are easily computed
- do not require complicated schedulers to
guarantee protection - can provide delay-jitter bounds
- Cons
- require an additional regulator at each output
port - delay-jitter bounds at the expense of increasing
mean delay - delay-jitter regulation is expensive (clock
synch, timestamps)
82Summary
- Two sorts of applications best effort and
guaranteed service - Best effort connections require fair service
- provided by GPS, which is unimplementable
- emulated by WFQ and its variants
- Guaranteed service connections require
performance guarantees - provided by WFQ, but this is expensive
- may be better to use rate-controlled schedulers
83Outline
- What is scheduling
- Why we need it
- Requirements of a scheduling discipline
- Fundamental choices
- Scheduling best effort connections
- Scheduling guaranteed-service connections
- Packet drop strategies
84Packet dropping
- Packets that cannot be served immediately are
buffered - Full buffers gt packet drop strategy
- Packet losses happen almost always from
best-effort connections (why?) - Shouldnt drop packets unless imperative
- packet drop wastes resources (why?)
85Classification of drop strategies
- 1. Degree of aggregation
- 2. Drop priorities
- 3. Early or late
- 4. Drop position
861. Degree of aggregation
- Degree of discrimination in selecting a packet to
drop - E.g. in vanilla FIFO, all packets are in the same
class - Instead, can classify packets and drop packets
selectively - The finer the classification the better the
protection - Max-min fair allocation of buffers to classes
- drop packet from class with the longest queue
(why?)
872. Drop priorities
- Drop lower-priority packets first
- How to choose?
- endpoint marks packets
- regulator marks packets
- congestion loss priority (CLP) bit in packet
header
88CLP bit pros and cons
- Pros
- if network has spare capacity, all traffic is
carried - during congestion, load is automatically shed
- Cons
- separating priorities within a single connection
is hard - what prevents all packets being marked as high
priority?
892. Drop priority (cont.)
- Special case of AAL5
- want to drop an entire frame, not individual
cells - cells belonging to the selected frame are
preferentially dropped - Drop packets from nearby hosts first
- because they have used the least network
resources - cant do it on Internet because hop count (TTL)
decreases
903. Early vs. late drop
- Early drop gt drop even if space is available
- signals endpoints to reduce rate
- cooperative sources get lower overall delays,
uncooperative sources get severe packet loss - Early random drop
- drop arriving packet with fixed drop probability
if queue length exceeds threshold - intuition misbehaving sources more likely to
send packets and see packet losses - doesnt work!
913. Early vs. late drop RED
- Random early detection (RED) makes three
improvements - Metric is moving average of queue lengths
- small bursts pass through unharmed
- only affects sustained overloads
- Packet drop probability is a function of mean
queue length - prevents severe reaction to mild overload
- Can mark packets instead of dropping them
- allows sources to detect network state without
losses - RED improves performance of a network of
cooperating TCP sources - No bias against bursty sources
- Controls queue length regardless of endpoint
cooperation
924. Drop position
- Can drop a packet from head, tail, or random
position in the queue - Tail
- easy
- default approach
- Head
- harder
- lets source detect loss earlier
934. Drop position (cont..)
- Random
- hardest
- if no aggregation, hurts hogs most
- unlikely to make it to real routers
- Drop entire longest queue
- easy
- almost as effective as drop tail from longest
queue
94Outline
- Economic principles
- Traffic classes
- Mechanisms at each time scale
- Faster than one RTT
- scheduling
- regulation and policing
- policy routing
- One RTT
- Session
- Day
- Weeks to months
- Some open problems
95Regulation and policing
96Open loop flow control
- Two phases to flow
- Call setup
- Data transmission
- Call setup
- Network prescribes parameters
- User chooses parameter values
- Network admits or denies call
- Data transmission
- User sends within parameter range
- Network polices users
- Scheduling policies give user QoS
97Hard problems
- Choosing a descriptor at a source
- Choosing a scheduling discipline at intermediate
network elements - Admitting calls so that their performance
objectives are met (call admission control).
98Traffic descriptors
- Usually an envelope
- Constrains worst case behavior
- Three uses
- Basis for traffic contract
- Input to regulator
- Input to policer
99Descriptor requirements
- Representativity
- adequately describes flow, so that network does
not reserve too little or too much resource - Verifiability
- verify that descriptor holds
- Preservability
- Doesnt change inside the network
- Usability
- Easy to describe and use for admission control
100Examples
- Representative, verifiable, but not useable
- Time series of interarrival times
- Verifiable, preservable, and useable, but not
representative - peak rate
101Some common descriptors
- Peak rate
- Average rate
- Linear bounded arrival process
102Peak rate
- Highest rate at which a source can send data
- Two ways to compute it
- For networks with fixed-size packets
- min inter-packet spacing
- For networks with variable-size packets
- highest rate over all intervals of a particular
duration - Regulator for fixed-size packets
- timer set on packet transmission
- if timer expires, send packet, if any
- Problem
- sensitive to extremes
103Average rate
- Rate over some time period (window)
- Less susceptible to outliers
- Parameters t and a
- Two types jumping window and moving window
- Jumping window
- over consecutive intervals of length t, only a
bits sent - regulator reinitializes every interval
- Moving window
- over all intervals of length t, only a bits sent
- regulator forgets packet sent more than t seconds
ago
104Linear Bounded Arrival Process
- Source bounds bits sent in any time interval by
a linear function of time - the number of bits transmitted in any active
interval of length t is less than rt s - r is the long term rate
- s is the burst limit
- insensitive to outliers
105Leaky bucket
- A regulator for an LBAP
- Token bucket fills up at rate r
- Largest tokens lt s
106Variants
- Token and data buckets
- Sum is what matters
- Peak rate regulator
107Choosing LBAP parameters
- Tradeoff between r and s
- Minimal descriptor
- doesnt simultaneously have smaller r and s
- presumably costs less
- How to choose minimal descriptor?
- Three way tradeoff
- choice of s (data bucket size)
- loss rate
- choice of r
108Choosing minimal parameters
- Keeping loss rate the same
- if s is more, r is less (smoothing)
- for each r we have least s
- Choose knee of curve
109LBAP
- Popular in practice and in academia
- sort of representative
- verifiable
- sort of preservable
- sort of usable
- Problems with multiple time scale traffic
- large burst messes up things
110Outline
- Economic principles
- Traffic classes
- Mechanisms at each time scale
- Faster than one RTT
- scheduling
- regulation and policing
- policy routing
- One RTT
- Session
- Day
- Weeks to months
- Some open problems
111Policy routing
112Routing vs. policy routing
- In standard routing, a packet is forwarded on the
best path to destination - choice depends on load and link status
- With policy routing, routes are chosen depending
on policy directives regarding things like - source and destination address
- transit domains
- quality of service
- time of day
- charging and accounting
- The general problem is still open
- fine balance between correctness and information
hiding
113Multiple metrics
- Simplest approach to policy routing
- Advertise multiple costs per link
- Routers construct multiple shortest path trees
114Problems with multiple metrics
- All routers must use the same rule in computing
paths - Remote routers may misinterpret policy
- source routing may solve this
- but introduces other problems (what?)
115Provider selection
- Another simple approach
- Assume that a single service provider provides
almost all the path from source to destination - e.g. ATT or MCI
- Then, choose policy simply by choosing provider
- this could be dynamic (agents!)
- In Internet, can use a loose source route through
service providers access point - Or, multiple addresses/names per host
116Crankback
- Consider computing routes with QoS guarantees
- Router returns packet if no next hop with
sufficient QoS can be found - In ATM networks (PNNI) used for the call-setup
packet - In Internet, may need to be done for _every_
packet! - Will it work?
117Outline
- Economic principles
- Traffic classes
- Mechanisms at each time scale
- Faster than one RTT
- One RTT
- Feedback flow control
- Retransmission
- Renegotiation
- Session
- Day
- Weeks to months
- Some open problems
118Feedback flow control
119Open loop vs. closed loop
- Open loop
- describe traffic
- network admits/reserves resources
- regulation/policing
- Closed loop
- cant describe traffic or network doesnt support
reservation - monitor available bandwidth
- perhaps allocated using GPS-emulation
- adapt to it
- if not done properly either
- too much loss
- unnecessary delay
120Taxonomy
- First generation
- ignores network state
- only match receiver
- Second generation
- responsive to state
- three choices
- State measurement
- explicit or implicit
- Control
- flow control window size or rate
- Point of control
- endpoint or within network
121Explicit vs. Implicit
- Explicit
- Network tells source its current rate
- Better control
- More overhead
- Implicit
- Endpoint figures out rate by looking at network
- Less overhead
- Ideally, want overhead of implicit with
effectiveness of explicit
122Flow control window
- Recall error control window
- Largest number of packet outstanding (sent but
not acked) - If endpoint has sent all packets in window, it
must wait gt slows down its rate - Thus, window provides both error control and flow
control - This is called transmission window
- Coupling can be a problem
- Few buffers are receiver gt slow rate!
123Window vs. rate
- In adaptive rate, we directly control rate
- Needs a timer per connection
- Plusses for window
- no need for fine-grained timer
- self-limiting
- Plusses for rate
- better control (finer grain)
- no coupling of flow control and error control
- Rate control must be careful to avoid overhead
and sending too much
124Hop-by-hop vs. end-to-end
- Hop-by-hop
- first generation flow control at each link
- next server sink
- easy to implement
- End-to-end
- sender matches all the servers on its path
- Plusses for hop-by-hop
- simpler
- distributes overflow
- better control
- Plusses for end-to-end
- cheaper
1251. On-off
- Receiver gives ON and OFF signals
- If ON, send at full speed
- If OFF, stop
- OK when RTT is small
- What if OFF is lost?
- Bursty
- Used in serial lines or LANs
1262. Stop and Wait
- Send a packet
- Wait for ack before sending next packet
1273. Static window
- Stop and wait can send at most one pkt per RTT
- Here, we allow multiple packets per RTT (
transmission window)
128What should window size be?
- Let bottleneck service rate along path b
pkts/sec - Let round trip time R sec
- Let flow control window w packet
- Sending rate is w packets in R seconds w/R
- To use bottleneck w/R gt b gt w gt bR
- This is the bandwidth delay product or optimal
window size
129Static window
- Works well if b and R are fixed
- But, bottleneck rate changes with time!
- Static choice of w can lead to problems
- too small
- too large
- So, need to adapt window
- Always try to get to the current optimal value
1304. DECbit flow control
- Intuition
- every packet has a bit in header
- intermediate routers set bit if queue has built
up gt source window is too large - sink copies bit to ack
- if bits set, source reduces window size
- in steady state, oscillate around optimal size
131DECbit
- When do bits get set?
- How does a source interpret them?
132DECbit details router actions
- Measure demand and mean queue length of each
source - Computed over queue regeneration cycles
- Balance between sensitivity and stability
133Router actions
- If mean queue length gt 1.0
- set bits on sources whose demand exceeds fair
share - If it exceeds 2.0
- set bits on everyone
- panic!
134Source actions
- Keep track of bits
- Cant take control actions too fast!
- Wait for past change to take effect
- Measure bits over past present window size
- If more than 50 set, then decrease window, else
increase - Additive increase, multiplicative decrease
135Evaluation
- Works with FIFO
- but requires per-connection state (demand)
- Software
- But
- assumes cooperation!
- conservative window increase policy
136Sample trace
1375. TCP Flow Control
- Implicit
- Dynamic window
- End-to-end
- Very similar to DECbit, but
- no support from routers
- increase if no loss (usually detected using
timeout) - window decrease on a timeout
- additive increase multiplicative decrease
138TCP details
- Window starts at 1
- Increases exponentially for a while, then
linearly - Exponentially gt doubles every RTT
- Linearly gt increases by 1 every RTT
- During exponential phase, every ack results in
window increase by 1 - During linear phase, window increases by 1 when
acks window size - Exponential phase is called slow start
- Linear phase is called congestion avoidance
139More TCP details
- On a loss, current window size is stored in a
variable called slow start threshold or ssthresh - Switch from exponential to linear (slow start to
congestion avoidance) when window size reaches
threshold - Loss detected either with timeout or fast
retransmit (duplicate cumulative acks) - Two versions of TCP
- Tahoe in both cases, drop window to 1
- Reno on timeout, drop window to 1, and on fast
retransmit drop window to half previous size
(also, increase window on subsequent acks)
140TCP vs. DECbit
- Both use dynamic window flow control and
additive-increase multiplicative decrease - TCP uses implicit measurement of congestion
- probe a black box
- Operates at the cliff
- Source does not filter information
141Evaluation
- Effective over a wide range of bandwidths
- A lot of operational experience
- Weaknesses
- loss gt overload? (wireless)
- overload gt self-blame, problem with FCFS
- overload detected only on a loss
- in steady state, source induces loss
- needs at least bR/3 buffers per connection
142Sample trace
1436. TCP Vegas
- Expected throughput transmission_window_size/pro
pagation_delay - Numerator known
- Denominator measure smallest RTT
- Also know actual throughput
- Difference how much to reduce/increase rate
- Algorithm
- send a special packet
- on ack, compute expected and actual throughput
- (expected - actual) RTT packets in bottleneck
buffer - adjust sending rate if this is too large
- Works better than TCP Reno
1447. NETBLT
- First rate-based flow control scheme
- Separates error control (window) and flow control
(no coupling) - So, losses and retransmissions do not affect the
flow rate - Application data sent as a series of buffers,
each at a particular rate - Rate (burst size burst rate) so granularity
of control burst - Initially, no adjustment of rates
- Later, if received rate lt sending rate,
multiplicatively decrease rate - Change rate only once per buffer gt slow
1458. Packet pair
- Improves basic ideas in NETBLT
- better measurement of bottleneck
- control based on prediction
- finer granularity
- Assume all bottlenecks serve packets in round
robin order - Then, spacing between packets at receiver ( ack
spacing) 1/(rate of slowest server) - If all data sent as paired packets, no
distinction between data and probes - Implicitly determine service rates if servers are
round-robin-like
146Packet pair
147Packet-pair details
- Acks give time series of service rates in the
past - We can use this to predict the next rate
- Exponential averager, with fuzzy rules to change
the averaging factor - Predicted rate feeds into flow control equation
148Packet-pair flow control
- Let X packets in bottleneck buffer
- S outstanding packets
- R RTT
- b bottleneck rate
- Then, X S - Rb (assuming no losses)
- Let l source rate
- l(k1) b(k1) (setpoint -X)/R
149Sample trace
1509. ATM Forum EERC
- Similar to DECbit, but send a whole cells worth
of info instead of one bit - Sources periodically send a Resource Management
(RM) cell with a rate request - typically once every 32 cells
- Each server fills in RM cell with current share,
if less - Source sends at this rate
151ATM Forum EERC details
- Source sends Explicit Rate (ER) in RM cell
- Switches compute source share in an unspecified
manner (allows competition) - Current rate allowed cell rate ACR
- If ER gt ACR then ACR ACR RIF PCR else ACR
ER - If switch does not change ER, then use DECbit
idea - If CI bit set, ACR ACR (1 - RDF)
- If ER lt AR, AR ER
- Allows interoperability of a sort
- If idle 500 ms, reset rate to Initial cell rate
- If no RM cells return for a while, ACR (1-RDF)
152Comparison with DECbit
- Sources know exact rate
- Non-zero Initial cell-rate gt conservative
increase can be avoided - Interoperation between ER/CI switches
153Problems
- RM cells in data path a mess
- Updating sending rate based on RM cell can be
hard - Interoperability comes at the cost of reduced
efficiency (as bad as DECbit) - Computing ER is hard
154Comparison among closed-loop schemes
- On-off, stop-and-wait, static window, DECbit,
TCP, NETBLT, Packet-pair, ATM Forum EERC - Which is best? No simple answer
- Some rules of thumb
- flow control easier with RR scheduling
- otherwise, assume cooperation, or police rates
- explicit schemes are more robust
- hop-by-hop schemes are more responsive, but more
complex - try to separate error control and flow control
- rate based schemes are inherently unstable unless
well-engineered
155Outline
- Economic principles
- Traffic classes
- Mechanisms at each time scale
- Faster than one RTT
- One RTT
- Feedback flow control
- Retransmission
- Renegotiation
- Session
- Day
- Weeks to months
- Some open problems
156Retransmission
157Retransmission and traffic management
- Loss detection time introduces pauses in traffic
- annoying to users
- can cause loss of soft state
- Retransmission strategy decides how many packets
are retransmitted, and when - if uncontrolled, can lead to congestive losses
and congestion collapse - Good loss detection and retransmission strategies
are needed for providing good service to
reliable, best-effort traffic (e.g. TCP)
158Loss detection
- At receiver, from a gap in sequence space
- send a nack to the sender
- At sender, by looking at cumulative acks, and
timing out if no ack for a while - need to choose timeout interval
159Nacks
- Sounds good, but does not work well
- extra load during loss, even though in reverse
direction - If nack is lost, receiver must retransmit it
- moves timeout problem to receiver
- So we need timeouts anyway
160Timeouts
- Set timer on sending a packet
- If timer goes off, and no ack, resend
- How to choose timeout value?
- Intuition is that we expect a reply in about one
round trip time (RTT)
161Timeout schemes
- Static scheme
- know RTT a priori
- timer set to this value
- works w