EE384x: Packet Switch Architectures - PowerPoint PPT Presentation

1 / 47
About This Presentation
Title:

EE384x: Packet Switch Architectures

Description:

Link 4, ingress. Link 4, egress. Link rate, R. R. R. R ... Link 3, ingress. Link 3, egress. Link N, ingress. Link N, egress. A single, physical memory device ... – PowerPoint PPT presentation

Number of Views:35
Avg rating:3.0/5.0
Slides: 48
Provided by: nickmc
Category:

less

Transcript and Presenter's Notes

Title: EE384x: Packet Switch Architectures


1
EE384x Packet Switch Architectures
  • Handout 2 Queues and Arrival processes,
  • Output Queued Switches, and
  • Output Link Scheduling.

Nick McKeown Professor of Electrical Engineering
and Computer Science, Stanford
University nickm_at_stanford.edu http//www.stanford.
edu/nickm
2
Outline
  • Output Queued Switches
  • Terminology Queues and arrival processes.
  • Output Link Scheduling

3
Generic Router Architecture
1
1
Queue Packet
Buffer Memory
2
2
Queue Packet
Buffer Memory
N times line rate
N
N
Queue Packet
Buffer Memory
4
Simple model of output queued switch
Link rate, R
Link rate, R
Link 2
R1
Link 1
Link 3
R
R
Link 4
R
R
R
R
5
Characteristics of an output queued (OQ) switch
  • Arriving packets are immediately written into the
    output queue, without intermediate buffering.
  • The flow of packets to one output does not affect
    the flow to another output.
  • An OQ switch is work conserving an output line
    is always busy when there is a packet in the
    switch for it.
  • OQ switch have the highest throughput, and lowest
    average delay.
  • We will also see that the rate of individual
    flows, and the delay of packets can be controlled.

6
The shared memory switch
A single, physical memory device
Link 1, ingress
Link 1, egress
Link 2, ingress
Link 2, egress
R
R
Link 3, ingress
Link 3, egress
R
R
Link N, ingress
Link N, egress
R
R
7
Characteristics of a shared memory switch
8
Memory bandwidth
  • Basic OQ switch
  • Consider an OQ switch with N different physical
    memories, and all links operating at rate R
    bits/s.
  • In the worst case, packets may arrive
    continuously from all inputs, destined to just
    one output.
  • Maximum memory bandwidth requirement for each
    memory is (N1)R bits/s.
  • Shared Memory Switch
  • Maximum memory bandwidth requirement for the
    memory is 2NR bits/s.

9
How fast can we make a centralized shared memory
switch?
5ns SRAM
Shared Memory
  • 5ns per memory operation
  • Two memory operations per packet
  • Therefore, up to 160Gb/s
  • In practice, closer to 80Gb/s

1
2
N
200 byte bus
10
Outline
  • Output Queued Switches
  • Terminology Queues and arrival processes.
  • Output Link Scheduling

11
Queue Terminology
A(t), l
D(t)
S,m
Q(t)
  • Arrival process, A(t)
  • In continuous time, usually the cumulative number
    of arrivals in 0,t,
  • In discrete time, usually an indicator function
    as to whether or not an arrival occurred at time
    tnT.
  • l is the arrival rate the expected number of
    arriving packets (or bits) per second.
  • Queue occupancy, Q(t)
  • Number of packets (or bits) in queue at time t.
  • Service discipline, S
  • Indicates the sequence of departure e.g.
    FIFO/FCFS, LIFO,
  • Service distribution
  • Indicates the time taken to process each packet
    e.g. deterministic, exponentially distributed
    service time.
  • m is the service rate the expected number of
    served packets (or bits) per second.
  • Departure process, D(t)
  • In continuous time, usually the cumulative number
    of departures in 0,t,
  • In discrete time, usually an indicator function
    as to whether or not a departure occurred at time
    tnT.

12
More terminology
  • Customer queueing theory usually refers to
    queued entities as customers. In class,
    customers will usually be packets or bits.
  • Work each customer is assumed to bring some work
    which affects its service time. For example,
    packets may have different lengths, and their
    service time might be a function of their length.
  • Waiting time time that a customer waits in the
    queue before beginning service.
  • Delay time from when a customer arrives until it
    has departed.

13
Arrival Processes
  • Examples of deterministic arrival processes
  • E.g. 1 arrival every second or a burst of 4
    packets every other second.
  • A deterministic sequence may be designed to be
    adversarial to expose some weakness of the
    system.
  • Examples of random arrival processes
  • (Discrete time) Bernoulli i.i.d. arrival process
  • Let A(t) 1 if an arrival occurs at time t,
    where t nT, n0,1,
  • A(t) 1 w.p. p and 0 w.p. 1-p.
  • Series of independent coin tosses with p-coin.
  • (Continuous time) Poisson arrival process
  • Exponentially distributed interarrival times.

14
Adversarial Arrival ProcessExample for
Knockout Switch
  • If our design goal was to not drop packets, then
    a simple discrete time adversarial arrival
    process is one in which
  • A1(t) A2(t) Ak1(t) 1, and
  • All packets are destined to output t mod N.

15
Bernoulli arrival process
Memory write bandwidth N.R
1
A1(t)
R
R
2
R
R
A2(t)
3
A3(t)
R
R
N
AN(t)
R
R
Assume Ai(t) 1 w.p. p, else 0. Assume each
arrival picks an output independently, uniformly
and at random. Some simple results follow 1.
Probability that at time t a packet arrives to
input i destined to output j is p/N. 2.
Probability that two consecutive packets arrive
to input i is the same as the probability that
packets arrive to inputs i and j simultaneously,
equals p2. Questions 1. What is the probability
that two arrivals occur at input i in any three
time slots? 2. What is the probability that two
arrivals occur for output j in any three time
slots? 3. What is the probability that queue i
holds k packets?
16
Simple deterministic model
Cumulative number of bits that arrived up until
time t.
A(t)
A(t)
Cumulative number of bits
D(t)
Q(t)
R
Service process
time
D(t)
  • Properties of A(t), D(t)
  • A(t), D(t) are non-decreasing
  • A(t) gt D(t)

Cumulative number of departed bits up until time
t.
17
Simple Deterministic Model
Cumulative number of bits
d(t)
A(t)
Q(t)
D(t)
time
Queue occupancy Q(t) A(t) - D(t). Queueing
delay, d(t), is the time spent in the queue by a
bit that arrived at time t, (assuming that the
queue is served FCFS/FIFO).
18
Outline
  • Output Queued Switches
  • Terminology Queues and arrival processes.
  • Output Link Scheduling

19
The problems caused by FIFO queues in routers
  • In order to maximize its chances of success, a
    source has an incentive to maximize the rate at
    which it transmits.
  • (Related to 1) When many flows pass through it,
    a FIFO queue is unfair it favors the most
    greedy flow.
  • It is hard to control the delay of packets
    through a network of FIFO queues.

Fairness
Delay Guarantees
20
Fairness
10 Mb/s
0.55 Mb/s
A
1.1 Mb/s
100 Mb/s
C
R1
e.g. an http flow with a given (IP SA, IP DA, TCP
SP, TCP DP)
0.55 Mb/s
B
What is the fair allocation (0.55Mb/s,
0.55Mb/s) or (0.1Mb/s, 1Mb/s)?
21
Fairness
10 Mb/s
A
1.1 Mb/s
R1
100 Mb/s
D
B
0.2 Mb/s
What is the fair allocation?
C
22
Max-Min FairnessA common way to allocate flows
  • N flows share a link of rate C. Flow f wishes to
    send at rate W(f), and is allocated rate R(f).
  • Pick the flow, f, with the smallest requested
    rate.
  • If W(f) lt C/N, then set R(f) W(f).
  • If W(f) gt C/N, then set R(f) C/N.
  • Set N N 1. C C R(f).
  • If Ngt0 goto 1.

23
Max-Min FairnessAn example
1
W(f1) 0.1
W(f2) 0.5
C
R1
W(f3) 10
W(f4) 5
  • Round 1 Set R(f1) 0.1
  • Round 2 Set R(f2) 0.9/3 0.3
  • Round 3 Set R(f4) 0.6/2 0.3
  • Round 4 Set R(f3) 0.3/1 0.3

24
Max-Min Fairness
  • How can an Internet router allocate different
    rates to different flows?
  • First, lets see how a router can allocate the
    same rate to different flows

25
Fair Queueing
  • Packets belonging to a flow are placed in a FIFO.
    This is called per-flow queueing.
  • FIFOs are scheduled one bit at a time, in a
    round-robin fashion.
  • This is called Bit-by-Bit Fair Queueing.

Flow 1
Bit-by-bit round robin
Classification
Scheduling
Flow N
26
Weighted Bit-by-Bit Fair Queueing
  • Likewise, flows can be allocated different rates
    by servicing a different number of bits for each
    flow during each round.

1
R(f1) 0.1
R(f2) 0.3
C
R1
R(f3) 0.3
R(f4) 0.3
Order of service for the four queues f1, f2,
f2, f2, f3, f3, f3, f4, f4, f4, f1,
Also called Generalized Processor Sharing (GPS)
27
Packetized Weighted Fair Queueing (WFQ)
  • Problem We need to serve a whole packet at a
    time.
  • Solution
  • Determine what time a packet, p, would complete
    if we served flows bit-by-bit. Call this the
    packets finishing time, F.
  • Serve packets in the order of increasing
    finishing time.
  • Theorem Packet p will depart before F
    TRANSPmax

Also called Packetized Generalized Processor
Sharing (PGPS)
28
Intuition behind Packetized WFQ
  • Consider packet p that arrives and immediately
    enters service under WFQ.
  • Potentially, there are packets Q q, r, that
    arrive after p that would have completed service
    before p under bit-by-bit WFQ. These packets are
    delayed by the duration of ps service.
  • Because the amount of data in Q that could have
    departed before p must be less than or equal to
    the length of p, their ordering is simply changed
  • Packets in Q are delayed by a maximum length of
    p.
  • (Detailed proof in Parekh and Gallager)

29
Calculating F
  • Assume that at time t there are N(t) active
    (non-empty) queues.
  • Let R(t) be the number of rounds in a round-robin
    service discipline of the active queues, in
    0,t.
  • A P bit long packet entering service at t0 will
    complete service in round R(t) R(t0) P.

30
An example of calculating F
Flow 1
R(t)
Flow i
Pick packet with smallest Fi Send
Calculate Si and Fi Enqueue
Flow N
  • In both cases, Fi Si Pi
  • R(t) is monotonically increasing with t,
    therefore
  • same departure order in R(t) as in t.

31
Understanding bit by bit WFQ 4 queues,
sharing 4 bits/sec of bandwidth, Equal Weights
32
Understanding bit by bit WFQ 4 queues,
sharing 4 bits/sec of bandwidth, Equal Weights
Round 3
33
Understanding bit by bit WFQ 4 queues,
sharing 4 bits/sec of bandwidth, Weights 3221
34
Understanding bit by bit WFQ 4 queues,
sharing 4 bits/sec of bandwidth, Weights 3221
Round 1
Round 2
Weights 1111
35
WFQ is complex
  • There may be hundreds to millions of flows the
    linecard needs to manage a FIFO per flow.
  • The finishing time must be calculated for each
    arriving packet,
  • Packets must be sorted by their departure time.
    Naively, with m packets, the sorting time is
    O(logm).
  • In practice, this can be made to be O(logN), for
    N active flows

1
Egress linecard
2
Calculate Fp
Find Smallest Fp
Departing packet
Packets arriving to egress linecard
3
N
36
Deficit Round Robin (DRR) Shreedhar Varghese,
95An O(1) approximation to WFQ
Step 1
Active packet queues
200
100
100
600
0
400
400
0
600
150
50
0
340
400
60
Quantum Size 200
  • It appears that DRR emulates bit-by-bit FQ, with
    a larger bit.
  • So, if the quantum size is 1 bit, does it equal
    FQ? (No).
  • It is easy to implement Weighted DRR using a
    different quantumsize for each queue.

37
The problems caused by FIFO queues in routers
  • In order to maximize its chances of success, a
    source has an incentive to maximize the rate at
    which it transmits.
  • (Related to 1) When many flows pass through it,
    a FIFO queue is unfair it favors the most
    greedy flow.
  • It is hard to control the delay of packets
    through a network of FIFO queues.

Fairness
Delay Guarantees
38
Deterministic analysis of a router queue
FIFO delay, d(t)
Cumulative bytes
Model of router queue
A(t)
D(t)
A(t)
D(t)
m
Q(t)
Q(t)
m
time
39
So how can we control the delay of packets?
  • Assume continuous time, bit-by-bit flows for a
    moment
  • Lets say we know the arrival process, Af(t), of
    flow f to a router.
  • Lets say we know the rate, R(f) that is
    allocated to flow f.
  • Then, in the usual way, we can determine the
    delay of packets in f, and the buffer occupancy.

40
Flow 1
R(f1), D1(t)
A1(t)
Classification
WFQ Scheduler
Flow N
AN(t)
R(fN), DN(t)
Cumulative bytes
Key idea In general, we dont know the arrival
process. So lets constrain it.
A1(t)
D1(t)
R(f1)
time
41
Lets say we can bound the arrival process
r
Cumulative bytes
Number of bytes that can arrive in any period of
length t is bounded by This is called (s,r)
regulation
A1(t)
s
time
42
(s,r) Constrained Arrivals and Minimum Service
Rate
Cumulative bytes
A1(t)
D1(t)
r
s
R(f1)
time
Theorem Parekh,Gallager 93 If flows are
leaky-bucket constrained,and routers use WFQ,
then end-to-end delay guarantees are possible.
43
The leaky bucket (s,r) regulator
Tokens at rate, r
Token bucket size, s
Packets
Packets
One byte (or packet) per token
Packet buffer
44
How the user/flow can conform to the (s,r)
regulationLeaky bucket as a shaper
Tokens at rate, r
Token bucket size s
To network
Variable bit-rate compression
C
r
bytes
bytes
bytes
time
time
time
45
Checking up on the user/flowLeaky bucket as a
policer
Router
Tokens at rate, r
Token bucket size s
From network
C
r
bytes
bytes
time
time
46
QoS Router
Per-flow Queue
Scheduler
Per-flow Queue
Per-flow Queue
Scheduler
Per-flow Queue
  • Remember These results assume that it is an OQ
    switch!
  • Why?
  • What happens if it is not?

47
References
  • Abhay K. Parekh and R. GallagerA Generalized
    Processor Sharing Approach to Flow Control in
    Integrated Services Networks The Single Node
    Case IEEE Transactions on Networking, June 1993.
  • M. Shreedhar and G. VargheseEfficient Fair
    Queueing using Deficit Round Robin, ACM Sigcomm,
    1995.
Write a Comment
User Comments (0)
About PowerShow.com