Growth in Router Capacity - PowerPoint PPT Presentation

About This Presentation

Title:

Growth in Router Capacity

Description:

1. Growth in Router Capacity. IPAM, Lake Arrowhead. October 2003. Nick McKeown ... power in chip-to-chip serial links. 14. What's hard, what's not. Linerate ... – PowerPoint PPT presentation

Number of Views:54

Avg rating:3.0/5.0

Slides: 50

Provided by: yubaSt

Learn more at: http://yuba.stanford.edu

Category:

more less

Transcript and Presenter's Notes

Title: Growth in Router Capacity

1
Growth in Router Capacity IPAM, Lake
Arrowhead October 2003
Nick McKeown Professor of Electrical Engineering
and Computer Science, Stanford
University nickm_at_stanford.edu www.stanford.edu/ni
ckm
2
Generic Router Architecture
Header Processing
Lookup IP Address
Update Header
Queue Packet
3
Generic Router Architecture
Buffer Manager
Buffer Memory
Buffer Manager
Buffer Memory
Buffer Manager
Buffer Memory
4
What a High Performance Router Looks Like
19
19
Capacity 160Gb/sPower 4.2kW
Capacity 80Gb/sPower 2.6kW
6ft
3ft
2ft
2.5ft
Juniper M160
Cisco GSR 12416
5
Backbone router capacity
1Tb/s
100Gb/s
10Gb/s
Router capacity per rack 2x every 18 months
1Gb/s
6
Backbone router capacity
1Tb/s
100Gb/s
Traffic 2x every year
10Gb/s
Router capacity per rack 2x every 18 months
1Gb/s
7
Extrapolating
100Tb/s
2015 16x disparity
Traffic 2x every year
Router capacity 2x every 18 months
1Tb/s
8
Consequence

Unless something changes, operators will need
16 times as many routers, consuming
16 times as much space,
256 times the power,
Costing 100 times as much.
Actually need more than that

9
What limits router capacity?
Approximate power consumption per rack
Power density is the limiting factor today
10
Trend Multi-rack routersReduces power density
11
Juniper TX8/T640
Alcatel 7670 RSP
TX8
Avici TSR
Chiaro
12
Trend Single POP routers

Very high capacity (10Tb/s)
Line-rates T1 to OC768
Reasons
Big multi-rack router more efficient than many
single-rack routers,
Easier to manage fewer routers.

13
Router linecard
OC192c linecard

Buffer
State
Memory

Lookup Tables
Optics
Packet Processing
Buffer Mgmt Scheduling
Physical Layer
Framing Maintenance
Buffer Mgmt Scheduling

30M gates
2.5Gbits of memory
2-300W
1m2
25k cost, 100k price.

Buffer
State
Memory

Scheduler
40-55 of power in chip-to-chip serial links
14
Whats hard, whats not

Linerate fowarding
Linerate LPM was an issue for while.
Commercial TCAMs and algorithms available up to
100Gb/s.
1M prefixes fit in corner of 90nm ASIC.
232 addresses will fit in a 10 DRAM in 8 years
Packet buffering
Not a problem up to about 10Gb/s big problem
above 10Gb/s.
More on this later
Header processing
For basic IPv4 operations not a problem.
If we keep adding functions, it will be a
problem.
More on this later

15
Whats hard, whats not (2)

Switching
If throughput doesnt matter
Easy Lots of multistage, distributed or
load-balanced switch fabrics.
If throughput matters
Use crossbar, VOQs and centralized scheduler
Or multistage fabric and lots of speedup.
If throughput guarantee is required
Maximal matching, VOQs and speedup of two Dai
Prabhakar 00 or
Load-balanced 2-stage switch Chang 01 Sigcomm
03.

16
Whats hard

Packet buffers above 10Gb/s
Extra processing on the datapath
Switching with throughput guarantees

17
Packet Buffering Problem Packet buffers for a
160Gb/s router linecard
40Gbits
Buffer Memory
Buffer Manager

Problem is solved if a memory can be (random)
accessed every 3.2ns and store 40Gb of data

18
Memory Technology

Use SRAM?
Fast enough random access time, but
Too low density to store 40Gbits of data.
Use DRAM?
High density means we can store data, but
Cant meet random access time.

19
Cant we just use lots of DRAMs in parallel?
Read/write packets in larger blocks
Buffer Memory
Buffer Memory
Buffer Memory
Buffer Memory
Buffer Memory
1280B
1280B
Write Rate, R
Buffer Manager
One 128B packet every 6.4ns
Scheduler Requests
20
Works fine if there is only one FIFO
Buffer Memory
1280B
1280B
Write Rate, R
Read Rate, R
128B
Buffer Manager (on chip SRAM)
128B
1280B
1280B
One 128B packet every 6.4ns
One 128B packet every 6.4ns
Scheduler Requests
Aggregate 1280B for the queue in fast SRAM and
read and write to all DRAMs in parallel
21
In practice, buffer holds many FIFOs
1
1280B
1280B
1280B
1280B

e.g.
In an IP Router, Q might be 200.
In an ATM switch, Q might be 106.

How can we writemultiple packets into different
queues?
2
1280B
1280B
1280B
1280B
Q
1280B
1280B
1280B
1280B
Scheduler Requests
22
Parallel Packet Buffer Hybrid Memory Hierarchy
Large DRAM memory holds the body of FIFOs
5
7
6
8
10
9
50
52
51
53
54
12
14
13
15
86
88
87
89
91
90
92
94
93
95
6
8
7
9
11
10
7
9
8
10
11
82
84
83
85
86
DRAM
b degree of parallelism
Writing b bytes
Reading b bytes
Buffer Manager
Arriving
Departing
Packets
Packets
R
R
(ASIC with on chip SRAM)
Scheduler Requests
23
Problem

Problem
What is the minimum size of the SRAM needed so
that every packet is available immediately within
a fixed latency?
Solutions
Qb(2 ln Q) bytes, for zero latency
Q(b 1) bytes, for Q(b 1) 1 deep pipeline.

24
Discussion Q1000, b 10
Queue Length for Zero Latency
SRAM Size
Queue Length for Maximum Latency
Pipeline Latency, x
25
Why its interesting

This is a problem faced by every linecard,
network switch and network processor starting at
10Gb/s.
All commercial routers use an ad-hoc memory
management algorithm with no guarantees.
We have the only (and optimal) solution that
guarantees to work for all traffic patterns.

26
Whats hard

Packet buffers above 10Gb/s
Extra processing on the datapath
Switching with throughput guarantees

27
Recent trends
Line Capacity 2x / 7 months
User Traffic 2x / 12months
Moores Law 2x / 18 months
DRAM Random Access Time 1.1x / 18months
28
Packet processing gets harder
Instructions per arriving byte
time
29
Packet processing gets harder
Clock cycles per minimum length packet since 1996
30
Whats hard

Packet buffers above 10Gb/s
Extra processing on the datapath
Switching with throughput guarantees

31
Potted history

Karol et al. 1987 Throughput limited to
by head-of-line blocking for
Bernoulli IID uniform traffic.
Tamir 1989 Observed that with Virtual Output
Queues (VOQs) Head-of-Line blocking is reduced
and throughput goes up.

32
Potted history

Anderson et al. 1993 Observed analogy to
maximum size matching in a bipartite graph.
M et al. 1995 (a) Maximum size match can not
guarantee 100 throughput.(b) But maximum weight
match can O(N3).
Mekkittikul and M 1998 A carefully picked
maximum size match can give 100 throughput.

Matching O(N2.5)
33
Potted history Speedup

5. Chuang, Goel et al. 1997 Precise
emulation of a central shared memory switch is
possible with a speedup of two and a stable
marriage scheduling algorithm.
Prabhakar and Dai 2000 100 throughput possible
for maximal matching with a speedup of two.

34
Potted historyNewer approaches

Tassiulas 1998 100 throughput possible for
simple randomized algorithm with memory.
Giaccone et al. 2001 Apsara algorithms.
Iyer and M 2000 Parallel switches can achieve
100 throughput and emulate an output queued
switch.
Chang et al. 2000, Keslassy et al. Sigcomm 2003
A 2-stage switch with no scheduler can give 100
throughput.
Iyer, Zhang and M 2002 Distributed shared
memory switches can emulate an output queued
switch.

35
Basic Switch Model
S(n)
L11(n)
A11(n)
1
1
D1(n)
A1(n)
A1N(n)
AN1(n)
DN(n)
AN(n)
N
N
ANN(n)
LNN(n)
36
Some definitions of throughput
37
Scheduling algorithms to achieve 100 throughput

When traffic is uniform (Many algorithms)
When traffic is non-uniform, but traffic matrix
is known
Technique Birkhoff-von Neumann decomposition.
When matrix is not known.
Technique Lyapunov function.
When algorithm is pipelined, or information is
incomplete.
Technique Lyapunov function.
When algorithm does not complete.
Technique Randomized algorithm.
When there is speedup.
Technique Fluid model.
When there is no algorithm.
Technique 2-stage load-balancing switch.
Technique Parallel Packet Switch.

38
Outline
39
Throughput results
Theory
Input Queueing (IQ)
58 Karol, 1987
Practice
Input Queueing (IQ)
Various heuristics, distributed algorithms, and
amounts of speedup
40
Trends in Switching

Fastest centralized scheduler with throughput
guarantee 1Tb/s
Complexity scales O(n2)
Capacity grows ltlt2x every 18 months
Hence load-balanced switches

41
Stanford 100Tb/s Internet Router

Goal Study scalability
Challenging, but not impossible
Two orders of magnitude faster than deployed
routers
We will build components to show feasibility

42
Question

Can we use an optical fabric at 100Tb/s with 100
throughput?
Conventional answer No.
Need to reconfigure switch too often
100 throughput requires complex electronic
scheduler.

43
Two-stage load-balancing switch
R
R
R
R/N
R/N
Out
In
R/N
R/N
R/N
R/N
R/N
R/N
R
R
R
In
R/N
R/N
R/N
R/N
R/N
R/N
R
R
R
R/N
R/N
In
R/N
R/N
Load-balancing stage
Switching stage
44
R
R
In
R/N
R/N
3
3
3
1
R/N
R/N
R/N
R/N
R/N
R/N
R
R
In
2
R/N
R/N
R/N
R/N
R/N
R/N
R/N
R
R
R/N
In
3
R/N
R/N
45
R
R
In
R/N
R/N
1
R/N
R/N
3
R/N
R/N
R/N
R/N
R
R
In
2
R/N
R/N
3
R/N
R/N
R/N
R/N
R/N
R
R
R/N
In
3
R/N
R/N
3
46
Changs load-balanced switchGood properties