Title: Growth in Router Capacity
1Growth in Router Capacity IPAM, Lake
Arrowhead October 2003
Nick McKeown Professor of Electrical Engineering
and Computer Science, Stanford
University nickm_at_stanford.edu www.stanford.edu/ni
ckm
2Generic Router Architecture
Header Processing
Lookup IP Address
Update Header
Queue Packet
3Generic Router Architecture
Buffer Manager
Buffer Memory
Buffer Manager
Buffer Memory
Buffer Manager
Buffer Memory
4What a High Performance Router Looks Like
19
19
Capacity 160Gb/sPower 4.2kW
Capacity 80Gb/sPower 2.6kW
6ft
3ft
2ft
2.5ft
Juniper M160
Cisco GSR 12416
5Backbone router capacity
1Tb/s
100Gb/s
10Gb/s
Router capacity per rack 2x every 18 months
1Gb/s
6Backbone router capacity
1Tb/s
100Gb/s
Traffic 2x every year
10Gb/s
Router capacity per rack 2x every 18 months
1Gb/s
7Extrapolating
100Tb/s
2015 16x disparity
Traffic 2x every year
Router capacity 2x every 18 months
1Tb/s
8Consequence
- Unless something changes, operators will need
- 16 times as many routers, consuming
- 16 times as much space,
- 256 times the power,
- Costing 100 times as much.
- Actually need more than that
9What limits router capacity?
Approximate power consumption per rack
Power density is the limiting factor today
10Trend Multi-rack routersReduces power density
11Juniper TX8/T640
Alcatel 7670 RSP
TX8
Avici TSR
Chiaro
12Trend Single POP routers
- Very high capacity (10Tb/s)
- Line-rates T1 to OC768
- Reasons
- Big multi-rack router more efficient than many
single-rack routers, - Easier to manage fewer routers.
13Router linecard
OC192c linecard
Lookup Tables
Optics
Packet Processing
Buffer Mgmt Scheduling
Physical Layer
Framing Maintenance
Buffer Mgmt Scheduling
- 30M gates
- 2.5Gbits of memory
- 2-300W
- 1m2
- 25k cost, 100k price.
Scheduler
40-55 of power in chip-to-chip serial links
14Whats hard, whats not
- Linerate fowarding
- Linerate LPM was an issue for while.
- Commercial TCAMs and algorithms available up to
100Gb/s. - 1M prefixes fit in corner of 90nm ASIC.
- 232 addresses will fit in a 10 DRAM in 8 years
- Packet buffering
- Not a problem up to about 10Gb/s big problem
above 10Gb/s. - More on this later
- Header processing
- For basic IPv4 operations not a problem.
- If we keep adding functions, it will be a
problem. - More on this later
15Whats hard, whats not (2)
- Switching
- If throughput doesnt matter
- Easy Lots of multistage, distributed or
load-balanced switch fabrics. - If throughput matters
- Use crossbar, VOQs and centralized scheduler
- Or multistage fabric and lots of speedup.
- If throughput guarantee is required
- Maximal matching, VOQs and speedup of two Dai
Prabhakar 00 or - Load-balanced 2-stage switch Chang 01 Sigcomm
03.
16Whats hard
- Packet buffers above 10Gb/s
- Extra processing on the datapath
- Switching with throughput guarantees
17Packet Buffering Problem Packet buffers for a
160Gb/s router linecard
40Gbits
Buffer Memory
Buffer Manager
- Problem is solved if a memory can be (random)
accessed every 3.2ns and store 40Gb of data
18Memory Technology
- Use SRAM?
- Fast enough random access time, but
- Too low density to store 40Gbits of data.
- Use DRAM?
- High density means we can store data, but
- Cant meet random access time.
19Cant we just use lots of DRAMs in parallel?
Read/write packets in larger blocks
Buffer Memory
Buffer Memory
Buffer Memory
Buffer Memory
Buffer Memory
1280B
1280B
Write Rate, R
Buffer Manager
One 128B packet every 6.4ns
Scheduler Requests
20Works fine if there is only one FIFO
Buffer Memory
1280B
1280B
Write Rate, R
Read Rate, R
128B
Buffer Manager (on chip SRAM)
128B
1280B
1280B
One 128B packet every 6.4ns
One 128B packet every 6.4ns
Scheduler Requests
Aggregate 1280B for the queue in fast SRAM and
read and write to all DRAMs in parallel
21In practice, buffer holds many FIFOs
1
1280B
1280B
1280B
1280B
- e.g.
- In an IP Router, Q might be 200.
- In an ATM switch, Q might be 106.
How can we writemultiple packets into different
queues?
2
1280B
1280B
1280B
1280B
Q
1280B
1280B
1280B
1280B
Scheduler Requests
22Parallel Packet Buffer Hybrid Memory Hierarchy
Large DRAM memory holds the body of FIFOs
5
7
6
8
10
9
50
52
51
53
54
12
14
13
15
86
88
87
89
91
90
92
94
93
95
6
8
7
9
11
10
7
9
8
10
11
82
84
83
85
86
DRAM
b degree of parallelism
Writing b bytes
Reading b bytes
Buffer Manager
Arriving
Departing
Packets
Packets
R
R
(ASIC with on chip SRAM)
Scheduler Requests
23Problem
- Problem
- What is the minimum size of the SRAM needed so
that every packet is available immediately within
a fixed latency? - Solutions
- Qb(2 ln Q) bytes, for zero latency
- Q(b 1) bytes, for Q(b 1) 1 deep pipeline.
24Discussion Q1000, b 10
Queue Length for Zero Latency
SRAM Size
Queue Length for Maximum Latency
Pipeline Latency, x
25Why its interesting
- This is a problem faced by every linecard,
network switch and network processor starting at
10Gb/s. - All commercial routers use an ad-hoc memory
management algorithm with no guarantees. - We have the only (and optimal) solution that
guarantees to work for all traffic patterns.
26Whats hard
- Packet buffers above 10Gb/s
- Extra processing on the datapath
- Switching with throughput guarantees
27Recent trends
Line Capacity 2x / 7 months
User Traffic 2x / 12months
Moores Law 2x / 18 months
DRAM Random Access Time 1.1x / 18months
28Packet processing gets harder
Instructions per arriving byte
time
29Packet processing gets harder
Clock cycles per minimum length packet since 1996
30Whats hard
- Packet buffers above 10Gb/s
- Extra processing on the datapath
- Switching with throughput guarantees
31Potted history
- Karol et al. 1987 Throughput limited to
by head-of-line blocking for
Bernoulli IID uniform traffic. - Tamir 1989 Observed that with Virtual Output
Queues (VOQs) Head-of-Line blocking is reduced
and throughput goes up.
32Potted history
- Anderson et al. 1993 Observed analogy to
maximum size matching in a bipartite graph. - M et al. 1995 (a) Maximum size match can not
guarantee 100 throughput.(b) But maximum weight
match can O(N3). - Mekkittikul and M 1998 A carefully picked
maximum size match can give 100 throughput.
Matching O(N2.5)
33Potted history Speedup
- 5. Chuang, Goel et al. 1997 Precise
emulation of a central shared memory switch is
possible with a speedup of two and a stable
marriage scheduling algorithm. - Prabhakar and Dai 2000 100 throughput possible
for maximal matching with a speedup of two.
34Potted historyNewer approaches
- Tassiulas 1998 100 throughput possible for
simple randomized algorithm with memory. - Giaccone et al. 2001 Apsara algorithms.
- Iyer and M 2000 Parallel switches can achieve
100 throughput and emulate an output queued
switch. - Chang et al. 2000, Keslassy et al. Sigcomm 2003
A 2-stage switch with no scheduler can give 100
throughput. - Iyer, Zhang and M 2002 Distributed shared
memory switches can emulate an output queued
switch.
35Basic Switch Model
S(n)
L11(n)
A11(n)
1
1
D1(n)
A1(n)
A1N(n)
AN1(n)
DN(n)
AN(n)
N
N
ANN(n)
LNN(n)
36Some definitions of throughput
37Scheduling algorithms to achieve 100 throughput
- When traffic is uniform (Many algorithms)
- When traffic is non-uniform, but traffic matrix
is known - Technique Birkhoff-von Neumann decomposition.
- When matrix is not known.
- Technique Lyapunov function.
- When algorithm is pipelined, or information is
incomplete. - Technique Lyapunov function.
- When algorithm does not complete.
- Technique Randomized algorithm.
- When there is speedup.
- Technique Fluid model.
- When there is no algorithm.
- Technique 2-stage load-balancing switch.
- Technique Parallel Packet Switch.
38Outline
39 Throughput results
Theory
Input Queueing (IQ)
58 Karol, 1987
Practice
Input Queueing (IQ)
Various heuristics, distributed algorithms, and
amounts of speedup
40Trends in Switching
- Fastest centralized scheduler with throughput
guarantee 1Tb/s - Complexity scales O(n2)
- Capacity grows ltlt2x every 18 months
- Hence load-balanced switches
41Stanford 100Tb/s Internet Router
- Goal Study scalability
- Challenging, but not impossible
- Two orders of magnitude faster than deployed
routers - We will build components to show feasibility
42Question
- Can we use an optical fabric at 100Tb/s with 100
throughput? - Conventional answer No.
- Need to reconfigure switch too often
- 100 throughput requires complex electronic
scheduler.
43Two-stage load-balancing switch
R
R
R
R/N
R/N
Out
In
R/N
R/N
R/N
R/N
R/N
R/N
R
R
R
In
R/N
R/N
R/N
R/N
R/N
R/N
R
R
R
R/N
R/N
In
R/N
R/N
Load-balancing stage
Switching stage
44R
R
In
R/N
R/N
3
3
3
1
R/N
R/N
R/N
R/N
R/N
R/N
R
R
In
2
R/N
R/N
R/N
R/N
R/N
R/N
R/N
R
R
R/N
In
3
R/N
R/N
45R
R
In
R/N
R/N
1
R/N
R/N
3
R/N
R/N
R/N
R/N
R
R
In
2
R/N
R/N
3
R/N
R/N
R/N
R/N
R/N
R
R
R/N
In
3
R/N
R/N
3
46Changs load-balanced switchGood properties
- 100 throughput for broad class of traffic
- 2. No scheduler needed a Scalable
47Changs load-balanced switchBad properties
- Packet mis-sequencing
- Pathological traffic patterns a Throughput
1/N-th of capacity - Uses two switch fabrics a Hard to package
- Doesnt work with some linecards missinga
Impractical
48100Tb/s Load-Balanced Router
L 16 160Gb/s linecards
49Summary of trends
- Multi-rack routers
- Single router POPs
- No commercial router provides 100 throughput
guarantee. -
- Address lookups
- Not a problem to 160Gb/s per linecard.
- Packet buffering
- Imperfect loss of throughput above 10Gb/s.
- Switching
- Centralized schedulers up to about 1Tb/s
- Load-balanced 2-stage switches with 100
throughput.