EE384x: Packet Switch Architectures - PowerPoint PPT Presentation

1 / 47

About This Presentation

Title:

EE384x: Packet Switch Architectures

Description:

Link 4, ingress. Link 4, egress. Link rate, R. R. R. R ... Link 3, ingress. Link 3, egress. Link N, ingress. Link N, egress. A single, physical memory device ... – PowerPoint PPT presentation

Number of Views:36

Avg rating:3.0/5.0

Slides: 48

Provided by: nickmc

Category:

more less

Transcript and Presenter's Notes

Title: EE384x: Packet Switch Architectures

1
EE384x Packet Switch Architectures

Handout 2 Queues and Arrival processes,
Output Queued Switches, and
Output Link Scheduling.

Nick McKeown Professor of Electrical Engineering
and Computer Science, Stanford
University nickm_at_stanford.edu http//www.stanford.
edu/nickm
2
Outline

Output Queued Switches
Terminology Queues and arrival processes.
Output Link Scheduling

3
Generic Router Architecture
1
1
Queue Packet
Buffer Memory
2
2
Queue Packet
Buffer Memory
N times line rate
N
N
Queue Packet
Buffer Memory
4
Simple model of output queued switch
Link rate, R
Link rate, R
Link 2
R1
Link 1
Link 3
R
R
Link 4
R
R
R
R
5
Characteristics of an output queued (OQ) switch

Arriving packets are immediately written into the
output queue, without intermediate buffering.
The flow of packets to one output does not affect
the flow to another output.
An OQ switch is work conserving an output line
is always busy when there is a packet in the
switch for it.
OQ switch have the highest throughput, and lowest
average delay.
We will also see that the rate of individual
flows, and the delay of packets can be controlled.

6
The shared memory switch
A single, physical memory device
Link 1, ingress
Link 1, egress
Link 2, ingress
Link 2, egress
R
R
Link 3, ingress
Link 3, egress
R
R
Link N, ingress
Link N, egress
R
R
7
Characteristics of a shared memory switch
8
Memory bandwidth

Basic OQ switch
Consider an OQ switch with N different physical
memories, and all links operating at rate R
bits/s.
In the worst case, packets may arrive
continuously from all inputs, destined to just
one output.
Maximum memory bandwidth requirement for each
memory is (N1)R bits/s.
Shared Memory Switch
Maximum memory bandwidth requirement for the
memory is 2NR bits/s.

9
How fast can we make a centralized shared memory
switch?
5ns SRAM
Shared Memory

5ns per memory operation
Two memory operations per packet
Therefore, up to 160Gb/s
In practice, closer to 80Gb/s

1
2
N
200 byte bus
10
Outline

Output Queued Switches
Terminology Queues and arrival processes.
Output Link Scheduling

11
Queue Terminology
A(t), l
D(t)
S,m
Q(t)

Arrival process, A(t)
In continuous time, usually the cumulative number
of arrivals in 0,t,
In discrete time, usually an indicator function
as to whether or not an arrival occurred at time
tnT.
l is the arrival rate the expected number of
arriving packets (or bits) per second.
Queue occupancy, Q(t)
Number of packets (or bits) in queue at time t.
Service discipline, S
Indicates the sequence of departure e.g.
FIFO/FCFS, LIFO,
Service distribution
Indicates the time taken to process each packet
e.g. deterministic, exponentially distributed
service time.
m is the service rate the expected number of
served packets (or bits) per second.
Departure process, D(t)
In continuous time, usually the cumulative number
of departures in 0,t,
In discrete time, usually an indicator function
as to whether or not a departure occurred at time
tnT.

12
More terminology

Customer queueing theory usually refers to
queued entities as customers. In class,
customers will usually be packets or bits.
Work each customer is assumed to bring some work
which affects its service time. For example,
packets may have different lengths, and their
service time might be a function of their length.
Waiting time time that a customer waits in the
queue before beginning service.
Delay time from when a customer arrives until it
has departed.

13
Arrival Processes

Examples of deterministic arrival processes
E.g. 1 arrival every second or a burst of 4
packets every other second.
A deterministic sequence may be designed to be
adversarial to expose some weakness of the
system.
Examples of random arrival processes
(Discrete time) Bernoulli i.i.d. arrival process
Let A(t) 1 if an arrival occurs at time t,
where t nT, n0,1,
A(t) 1 w.p. p and 0 w.p. 1-p.
Series of independent coin tosses with p-coin.
(Continuous time) Poisson arrival process
Exponentially distributed interarrival times.

14
Adversarial Arrival ProcessExample for
Knockout Switch

If our design goal was to not drop packets, then
a simple discrete time adversarial arrival
process is one in which
A1(t) A2(t) Ak1(t) 1, and
All packets are destined to output t mod N.

15
Bernoulli arrival process
Memory write bandwidth N.R
1
A1(t)
R
R
2
R
R
A2(t)
3
A3(t)
R
R
N
AN(t)
R
R
Assume Ai(t) 1 w.p. p, else 0. Assume each
arrival picks an output independently, uniformly
and at random. Some simple results follow 1.
Probability that at time t a packet arrives to
input i destined to output j is p/N. 2.
Probability that two consecutive packets arrive
to input i is the same as the probability that
packets arrive to inputs i and j simultaneously,
equals p2. Questions 1. What is the probability
that two arrivals occur at input i in any three
time slots? 2. What is the probability that two
arrivals occur for output j in any three time
slots? 3. What is the probability that queue i
holds k packets?
16
Simple deterministic model
Cumulative number of bits that arrived up until
time t.
A(t)
A(t)
Cumulative number of bits
D(t)
Q(t)
R
Service process
time
D(t)

Properties of A(t), D(t)
A(t), D(t) are non-decreasing
A(t) gt D(t)

Cumulative number of departed bits up until time
t.
17
Simple Deterministic Model
Cumulative number of bits
d(t)
A(t)
Q(t)
D(t)
time
Queue occupancy Q(t) A(t) - D(t). Queueing
delay, d(t), is the time spent in the queue by a
bit that arrived at time t, (assuming that the
queue is served FCFS/FIFO).
18
Outline

Output Queued Switches
Terminology Queues and arrival processes.
Output Link Scheduling

19
The problems caused by FIFO queues in routers

In order to maximize its chances of success, a
source has an incentive to maximize the rate at
which it transmits.
(Related to 1) When many flows pass through it,
a FIFO queue is unfair it favors the most
greedy flow.
It is hard to control the delay of packets
through a network of FIFO queues.

Fairness
Delay Guarantees
20
Fairness
10 Mb/s
0.55 Mb/s
A
1.1 Mb/s
100 Mb/s
C
R1
e.g. an http flow with a given (IP SA, IP DA, TCP
SP, TCP DP)
0.55 Mb/s
B
What is the fair allocation (0.55Mb/s,
0.55Mb/s) or (0.1Mb/s, 1Mb/s)?
21
Fairness
10 Mb/s
A
1.1 Mb/s
R1
100 Mb/s
D
B
0.2 Mb/s
What is the fair allocation?
C
22
Max-Min FairnessA common way to allocate flows

N flows share a link of rate C. Flow f wishes to
send at rate W(f), and is allocated rate R(f).
Pick the flow, f, with the smallest requested
rate.
If W(f) lt C/N, then set R(f) W(f).
If W(f) gt C/N, then set R(f) C/N.
Set N N 1. C C R(f).
If Ngt0 goto 1.

23
Max-Min FairnessAn example
1
W(f1) 0.1
W(f2) 0.5
C
R1
W(f3) 10
W(f4) 5

Round 1 Set R(f1) 0.1
Round 2 Set R(f2) 0.9/3 0.3
Round 3 Set R(f4) 0.6/2 0.3
Round 4 Set R(f3) 0.3/1 0.3

24
Max-Min Fairness

How can an Internet router allocate different
rates to different flows?
First, lets see how a router can allocate the
same rate to different flows

25
Fair Queueing

Packets belonging to a flow are placed in a FIFO.
This is called per-flow queueing.
FIFOs are scheduled one bit at a time, in a
round-robin fashion.
This is called Bit-by-Bit Fair Queueing.

Flow 1
Bit-by-bit round robin
Classification
Scheduling
Flow N
26
Weighted Bit-by-Bit Fair Queueing

Likewise, flows can be allocated different rates
by servicing a different number of bits for each
flow during each round.

1
R(f1) 0.1
R(f2) 0.3
C
R1
R(f3) 0.3
R(f4) 0.3
Order of service for the four queues f1, f2,
f2, f2, f3, f3, f3, f4, f4, f4, f1,
Also called Generalized Processor Sharing (GPS)
27
Packetized Weighted Fair Queueing (WFQ)

Problem We need to serve a whole packet at a
time.
Solution
Determine what time a packet, p, would complete
if we served flows bit-by-bit. Call this the
packets finishing time, F.
Serve packets in the order of increasing
finishing time.
Theorem Packet p will depart before F
TRANSPmax

Also called Packetized Generalized Processor
Sharing (PGPS)
28
Intuition behind Packetized WFQ

Consider packet p that arrives and immediately
enters service under WFQ.
Potentially, there are packets Q q, r, that
arrive after p that would have completed service
before p under bit-by-bit WFQ. These packets are
delayed by the duration of ps service.
Because the amount of data in Q that could have
departed before p must be less than or equal to
the length of p, their ordering is simply changed
Packets in Q are delayed by a maximum length of
p.
(Detailed proof in Parekh and Gallager)

29
Calculating F

Assume that at time t there are N(t) active
(non-empty) queues.
Let R(t) be the number of rounds in a round-robin
service discipline of the active queues, in
0,t.
A P bit long packet entering service at t0 will
complete service in round R(t) R(t0) P.

30
An example of calculating F
Flow 1
R(t)
Flow i
Pick packet with smallest Fi Send
Calculate Si and Fi Enqueue
Flow N

In both cases, Fi Si Pi
R(t) is monotonically increasing with t,
therefore
same departure order in R(t) as in t.

31
Understanding bit by bit WFQ 4 queues,
sharing 4 bits/sec of bandwidth, Equal Weights
32
Understanding bit by bit WFQ 4 queues,
sharing 4 bits/sec of bandwidth, Equal Weights
Round 3
33
Understanding bit by bit WFQ 4 queues,
sharing 4 bits/sec of bandwidth, Weights 3221
34
Understanding bit by bit WFQ 4 queues,
sharing 4 bits/sec of bandwidth, Weights 3221
Round 1
Round 2
Weights 1111
35
WFQ is complex

There may be hundreds to millions of flows the
linecard needs to manage a FIFO per flow.
The finishing time must be calculated for each
arriving packet,
Packets must be sorted by their departure time.
Naively, with m packets, the sorting time is
O(logm).
In practice, this can be made to be O(logN), for
N active flows

1
Egress linecard
2
Calculate Fp
Find Smallest Fp
Departing packet
Packets arriving to egress linecard
3
N
36
Deficit Round Robin (DRR) Shreedhar Varghese,
95An O(1) approximation to WFQ
Step 1
Active packet queues
200
100
100
600
0
400
400
0
600
150
50
0
340
400
60
Quantum Size 200

It appears that DRR emulates bit-by-bit FQ, with
a larger bit.
So, if the quantum size is 1 bit, does it equal
FQ? (No).
It is easy to implement Weighted DRR using a
different quantumsize for each queue.

37
The problems caused by FIFO queues in routers

In order to maximize its chances of success, a
source has an incentive to maximize the rate at
which it transmits.
(Related to 1) When many flows pass through it,
a FIFO queue is unfair it favors the most
greedy flow.
It is hard to control the delay of packets
through a network of FIFO queues.

Fairness
Delay Guarantees
38
Deterministic analysis of a router queue
FIFO delay, d(t)
Cumulative bytes
Model of router queue
A(t)
D(t)
A(t)
D(t)
m
Q(t)
Q(t)
m
time
39
So how can we control the delay of packets?

Assume continuous time, bit-by-bit flows for a
moment
Lets say we know the arrival process, Af(t), of
flow f to a router.
Lets say we know the rate, R(f) that is
allocated to flow f.
Then, in the usual way, we can determine the
delay of packets in f, and the buffer occupancy.

40
Flow 1
R(f1), D1(t)
A1(t)
Classification
WFQ Scheduler
Flow N
AN(t)
R(fN), DN(t)
Cumulative bytes
Key idea In general, we dont know the arrival
process. So lets constrain it.
A1(t)
D1(t)
R(f1)
time
41
Lets say we can bound the arrival process
r
Cumulative bytes
Number of bytes that can arrive in any period of
length t is bounded by This is called (s,r)
regulation
A1(t)
s
time
42
(s,r) Constrained Arrivals and Minimum Service
Rate
Cumulative bytes
A1(t)
D1(t)
r
s
R(f1)
time
Theorem Parekh,Gallager 93 If flows are
leaky-bucket constrained,and routers use WFQ,
then end-to-end delay guarantees are possible.
43
The leaky bucket (s,r) regulator
Tokens at rate, r
Token bucket size, s
Packets
Packets
One byte (or packet) per token
Packet buffer
44
How the user/flow can conform to the (s,r)
regulationLeaky bucket as a shaper
Tokens at rate, r
Token bucket size s
To network
Variable bit-rate compression
C
r
bytes
bytes
bytes
time
time
time
45
Checking up on the user/flowLeaky bucket as a
policer
Router
Tokens at rate, r
Token bucket size s
From network
C
r
bytes
bytes
time
time
46
QoS Router
Per-flow Queue
Scheduler
Per-flow Queue
Per-flow Queue
Scheduler
Per-flow Queue

Remember These results assume that it is an OQ
switch!
Why?
What happens if it is not?

47
References

Abhay K. Parekh and R. GallagerA Generalized
Processor Sharing Approach to Flow Control in
Integrated Services Networks The Single Node
Case IEEE Transactions on Networking, June 1993.
M. Shreedhar and G. VargheseEfficient Fair
Queueing using Deficit Round Robin, ACM Sigcomm,
1995.

Write a Comment

User Comments (0)