Title: Techniques
1Techniques for Fast Packet Buffers
Sundar Iyer, Ramana Rao, Nick McKeown (sundaes,ra
mana, nickm)_at_stanford.edu Departments of
Electrical Engineering Computer Science,
Stanford University
2Problem Statement
- Motivation
-
- To design an extremely high speed packet buffer.
3Problem Statement .. 1
Buffer Memory
OC768 40Gb/sec 64 byte cells
1
How to design a buffer with an access time of
6.4ns ?
4Problem Statement .. 2
R 40Gb/sec RTT 0.25s RTT R 10Gbit
Buffer Memory
1
2
3
4
5
6
7
8
9
How to create a buffer of size 10Gbit?
5Buffer Architecture - Demand and Supply
- Buffer Architecture requires
- Fast access time, large size
- SRAM
- Fast access time, small size (low density)
- DRAM
- Slow access time, large size (high density)
6Problem Statement Redefined
- Motivation To design an extremely high speed
packet buffer architecture with fast access time
and large size. - This talk
- Is about the analysis of one such well known
approach.
7Some Thoughts
- We believe that this architecture and many such
equivalent designs already exist in many router
line cards - These results may already be known and might
exist in proprietary form - One would like to be able to give deterministic
guarantees in the architecture
8Characteristics of Packet Buffer Architectures
- The total throughput needed is at least 2(Ingress
Rate) - Size of Buffer is at least R RTT
- The buffers have one or more FIFOs
- The sequence in which the FIFOs are accessed is
determined by an arbiter and is unknown apriori
9Memory Hierarchy of Packet Buffer
Large DRAM memory with access time T
1
Q
b
cells
Write Access
Read Access
Time T 2T
Time T 2T
Memory Management Algorithm
b
cells
b
cells
1
1
Arriving
Departing
Packets
Packets
Arbiter
R
R
Q
Q
grants
Ingress SRAM
Egress SRAM
cache of FIFO heads
cache of FIFO tails
10System Design Parameters
- Main Parameters
- SRAM Size
- Latency faced by a cell
- System Parameters
- I/O Bandwidth
- Number of addresses
- Use single address on every DRAM
- Use different addresses on every DRAM
- Use/Non Use of DRAM Burst Mode
- (non) Existence of Bank conflicts
11Packet Buffer Design
DRAMs
.........
.
b
cells
Memory Management Algorithm
Egress SRAM Buffer
Ingress SRAM Buffer
SRAM Buffer Area A
R
R
Number of Queues Q
C
12 Todays Talk
- Optimize Main Parameters
- Minimize latency at cost of SRAM size
- .. (later) Minimize SRAM size at cost of
Latency - Assumptions on system parameters
- No speedup on I/O
- I/O 2R
- Simple address architecture
- Use single address from every DRAM
13More Assumptions ..
- We shall assume that we have only cells of size
C which arrive in the system - No use of DRAM Burst Mode
- No bank conflicts
14Symmetry Argument
- The analysis and working of the ingress and
egress buffer architectures are similar - We shall analyze only the egress buffer
architecture
15System Parameters in Packet Buffer Design
- Access Time of a DRAM T 50ns
- DRAM Access Time as seenby the E(In)gress T
2T - Cell Time of System Ts
6.4ns - Cell Time of E(In)gress Tc 12.8ns
- Min. width of DRAMs T/Tc b
16Packet Buffer DesignQuestions
- Can we give deterministic guarantees?
- Why not keep all cells in the DRAM?
- Does not an SRAM of size little more than qb
suffice?
17A Bad Case for the Queues 1
t 0 t 1
t 2 t 3
t 4 t 5
t 6 t 7
18A Bad Case for the Queues 2
t 8 t 9
t 10 t 11
t 12 t 13
t 14 t 17
19Observation
Q
Ti
b -1
w
- There exists some value of w for which the
buffer does not overflow - w qb is one such sufficient value
- Threshold value Ti governs w.
20Definitions
- Occupancy
- This is the number of cells in the SRAM for a
particular queue - Active Queue
- An active queue is one which has cells in the
DRAM present for it
21One More Definition ?
- Deficit
- This is defined as the difference between the
threshold T and the occupancy of an active
queue. - For a queue which is not active the deficit is
zero
Ti
b -1
deficit
occupancy
22Can we Bound the Maximum Value of the Deficit?
- Define f(i,q)
- The maximum deficit that a set of i queues can
have in a system of q queues - We are interested in f(1,q)
- f(q,q) lt qb . trivially
23Largest Deficit Queue First
- Recurrence Equations
- f(2,q) gt f(1,q) b f(1,q) b
- f(3,q) gt f(2,q) b f(2,q) b/2
- f(4,q) gt f(3,q) b f(3,q) b/3
-
- f(q,q) gt f(q-1,q) b f(q-1,q) b/(q-1)
24Dirty Math..
- qb gt f(q,q) trivially
- gt f(q-1,q) b f(q-1,q) b/(q-1)
- gt f(q-1,q)(q/q-1) b(q/q-1)
- gt f(q-2,q)(q-1/q-2) b(q-1/q-2)(q/q-1)
bq/q-1 - gt f(q-2,q)q/q-2 bq/q-2 bq/q-1
- gt f(q-3,q)q/q-3 bq/q-3 bq/q-2 - bq/q-1
- ..
- gt f(1,q) q/1 bq sigma 1/i
- This gives,
- f(1,q) lt b1 ln q
25Results
- If the MMA services the queue,
- with the largest deficit
- has a simple address architecture
- and no I/O speedup
- then
- A latency of zero can be guaranteed when the
- width of the SRAM is b1 lnq b b 2 ln
q - And the size of SRAM is 2 lnqQb
- Necessity vs. Sufficiency?
26A Dose of Reality
- Typical values
- b is typically lt 10
- Q Np, where
- N of ports (for VOQ)
- p number of classes per port
- Implementations
- VOQ
- N 32, p 1, Q 25, b 23, SRAM 700 kb
- Diffserv
- N 32, p 16, Q 29, b 23, SRAM 17 Mb
- Intserv
- Lets not think about it!
27Future Work
- Discussion on trading off latency for SRAM size
- Analysis of other parameters
- Relaxing I/O, address constraints
- Implementation Pain
- . Still a long way to go