Growth in Router Capacity - PowerPoint PPT Presentation

About This Presentation
Title:

Growth in Router Capacity

Description:

1. Growth in Router Capacity. IPAM, Lake Arrowhead. October 2003. Nick McKeown ... power in chip-to-chip serial links. 14. What's hard, what's not. Linerate ... – PowerPoint PPT presentation

Number of Views:54
Avg rating:3.0/5.0
Slides: 50
Provided by: yubaSt
Learn more at: http://yuba.stanford.edu
Category:
Tags: capacity | growth | lake | link | router

less

Transcript and Presenter's Notes

Title: Growth in Router Capacity


1
Growth in Router Capacity IPAM, Lake
Arrowhead October 2003
Nick McKeown Professor of Electrical Engineering
and Computer Science, Stanford
University nickm_at_stanford.edu www.stanford.edu/ni
ckm
2
Generic Router Architecture
Header Processing
Lookup IP Address
Update Header
Queue Packet
3
Generic Router Architecture
Buffer Manager
Buffer Memory
Buffer Manager
Buffer Memory
Buffer Manager
Buffer Memory
4
What a High Performance Router Looks Like
19
19
Capacity 160Gb/sPower 4.2kW
Capacity 80Gb/sPower 2.6kW
6ft
3ft
2ft
2.5ft
Juniper M160
Cisco GSR 12416
5
Backbone router capacity
1Tb/s
100Gb/s
10Gb/s
Router capacity per rack 2x every 18 months
1Gb/s
6
Backbone router capacity
1Tb/s
100Gb/s
Traffic 2x every year
10Gb/s
Router capacity per rack 2x every 18 months
1Gb/s
7
Extrapolating
100Tb/s
2015 16x disparity
Traffic 2x every year
Router capacity 2x every 18 months
1Tb/s
8
Consequence
  • Unless something changes, operators will need
  • 16 times as many routers, consuming
  • 16 times as much space,
  • 256 times the power,
  • Costing 100 times as much.
  • Actually need more than that

9
What limits router capacity?
Approximate power consumption per rack
Power density is the limiting factor today
10
Trend Multi-rack routersReduces power density
11
Juniper TX8/T640
Alcatel 7670 RSP
TX8
Avici TSR
Chiaro
12
Trend Single POP routers
  • Very high capacity (10Tb/s)
  • Line-rates T1 to OC768
  • Reasons
  • Big multi-rack router more efficient than many
    single-rack routers,
  • Easier to manage fewer routers.

13
Router linecard
OC192c linecard
  • Buffer
  • State
  • Memory

Lookup Tables
Optics
Packet Processing
Buffer Mgmt Scheduling
Physical Layer
Framing Maintenance
Buffer Mgmt Scheduling
  • 30M gates
  • 2.5Gbits of memory
  • 2-300W
  • 1m2
  • 25k cost, 100k price.
  • Buffer
  • State
  • Memory

Scheduler
40-55 of power in chip-to-chip serial links
14
Whats hard, whats not
  • Linerate fowarding
  • Linerate LPM was an issue for while.
  • Commercial TCAMs and algorithms available up to
    100Gb/s.
  • 1M prefixes fit in corner of 90nm ASIC.
  • 232 addresses will fit in a 10 DRAM in 8 years
  • Packet buffering
  • Not a problem up to about 10Gb/s big problem
    above 10Gb/s.
  • More on this later
  • Header processing
  • For basic IPv4 operations not a problem.
  • If we keep adding functions, it will be a
    problem.
  • More on this later

15
Whats hard, whats not (2)
  • Switching
  • If throughput doesnt matter
  • Easy Lots of multistage, distributed or
    load-balanced switch fabrics.
  • If throughput matters
  • Use crossbar, VOQs and centralized scheduler
  • Or multistage fabric and lots of speedup.
  • If throughput guarantee is required
  • Maximal matching, VOQs and speedup of two Dai
    Prabhakar 00 or
  • Load-balanced 2-stage switch Chang 01 Sigcomm
    03.

16
Whats hard
  • Packet buffers above 10Gb/s
  • Extra processing on the datapath
  • Switching with throughput guarantees

17
Packet Buffering Problem Packet buffers for a
160Gb/s router linecard
40Gbits
Buffer Memory
Buffer Manager
  • Problem is solved if a memory can be (random)
    accessed every 3.2ns and store 40Gb of data

18
Memory Technology
  • Use SRAM?
  • Fast enough random access time, but
  • Too low density to store 40Gbits of data.
  • Use DRAM?
  • High density means we can store data, but
  • Cant meet random access time.

19
Cant we just use lots of DRAMs in parallel?
Read/write packets in larger blocks
Buffer Memory
Buffer Memory
Buffer Memory
Buffer Memory
Buffer Memory
1280B
1280B
Write Rate, R
Buffer Manager
One 128B packet every 6.4ns
Scheduler Requests
20
Works fine if there is only one FIFO
Buffer Memory
1280B
1280B
Write Rate, R
Read Rate, R
128B
Buffer Manager (on chip SRAM)
128B
1280B
1280B
One 128B packet every 6.4ns
One 128B packet every 6.4ns
Scheduler Requests
Aggregate 1280B for the queue in fast SRAM and
read and write to all DRAMs in parallel
21
In practice, buffer holds many FIFOs
1
1280B
1280B
1280B
1280B
  • e.g.
  • In an IP Router, Q might be 200.
  • In an ATM switch, Q might be 106.

How can we writemultiple packets into different
queues?
2
1280B
1280B
1280B
1280B
Q
1280B
1280B
1280B
1280B
Scheduler Requests
22
Parallel Packet Buffer Hybrid Memory Hierarchy
Large DRAM memory holds the body of FIFOs
5
7
6
8
10
9
50
52
51
53
54
12
14
13
15
86
88
87
89
91
90
92
94
93
95
6
8
7
9
11
10
7
9
8
10
11
82
84
83
85
86
DRAM
b degree of parallelism
Writing b bytes
Reading b bytes
Buffer Manager
Arriving
Departing
Packets
Packets
R
R
(ASIC with on chip SRAM)
Scheduler Requests
23
Problem
  • Problem
  • What is the minimum size of the SRAM needed so
    that every packet is available immediately within
    a fixed latency?
  • Solutions
  • Qb(2 ln Q) bytes, for zero latency
  • Q(b 1) bytes, for Q(b 1) 1 deep pipeline.

24
Discussion Q1000, b 10
Queue Length for Zero Latency
SRAM Size
Queue Length for Maximum Latency
Pipeline Latency, x
25
Why its interesting
  • This is a problem faced by every linecard,
    network switch and network processor starting at
    10Gb/s.
  • All commercial routers use an ad-hoc memory
    management algorithm with no guarantees.
  • We have the only (and optimal) solution that
    guarantees to work for all traffic patterns.

26
Whats hard
  • Packet buffers above 10Gb/s
  • Extra processing on the datapath
  • Switching with throughput guarantees

27
Recent trends
Line Capacity 2x / 7 months
User Traffic 2x / 12months
Moores Law 2x / 18 months
DRAM Random Access Time 1.1x / 18months
28
Packet processing gets harder
Instructions per arriving byte
time
29
Packet processing gets harder
Clock cycles per minimum length packet since 1996
30
Whats hard
  • Packet buffers above 10Gb/s
  • Extra processing on the datapath
  • Switching with throughput guarantees

31
Potted history
  1. Karol et al. 1987 Throughput limited to
    by head-of-line blocking for
    Bernoulli IID uniform traffic.
  2. Tamir 1989 Observed that with Virtual Output
    Queues (VOQs) Head-of-Line blocking is reduced
    and throughput goes up.

32
Potted history
  • Anderson et al. 1993 Observed analogy to
    maximum size matching in a bipartite graph.
  • M et al. 1995 (a) Maximum size match can not
    guarantee 100 throughput.(b) But maximum weight
    match can O(N3).
  • Mekkittikul and M 1998 A carefully picked
    maximum size match can give 100 throughput.

Matching O(N2.5)
33
Potted history Speedup
  • 5. Chuang, Goel et al. 1997 Precise
    emulation of a central shared memory switch is
    possible with a speedup of two and a stable
    marriage scheduling algorithm.
  • Prabhakar and Dai 2000 100 throughput possible
    for maximal matching with a speedup of two.

34
Potted historyNewer approaches
  • Tassiulas 1998 100 throughput possible for
    simple randomized algorithm with memory.
  • Giaccone et al. 2001 Apsara algorithms.
  • Iyer and M 2000 Parallel switches can achieve
    100 throughput and emulate an output queued
    switch.
  • Chang et al. 2000, Keslassy et al. Sigcomm 2003
    A 2-stage switch with no scheduler can give 100
    throughput.
  • Iyer, Zhang and M 2002 Distributed shared
    memory switches can emulate an output queued
    switch.

35
Basic Switch Model
S(n)
L11(n)
A11(n)
1
1
D1(n)
A1(n)
A1N(n)
AN1(n)
DN(n)
AN(n)
N
N
ANN(n)
LNN(n)
36
Some definitions of throughput
37
Scheduling algorithms to achieve 100 throughput
  • When traffic is uniform (Many algorithms)
  • When traffic is non-uniform, but traffic matrix
    is known
  • Technique Birkhoff-von Neumann decomposition.
  • When matrix is not known.
  • Technique Lyapunov function.
  • When algorithm is pipelined, or information is
    incomplete.
  • Technique Lyapunov function.
  • When algorithm does not complete.
  • Technique Randomized algorithm.
  • When there is speedup.
  • Technique Fluid model.
  • When there is no algorithm.
  • Technique 2-stage load-balancing switch.
  • Technique Parallel Packet Switch.

38
Outline
39
Throughput results
Theory
Input Queueing (IQ)
58 Karol, 1987
Practice
Input Queueing (IQ)
Various heuristics, distributed algorithms, and
amounts of speedup
40
Trends in Switching
  • Fastest centralized scheduler with throughput
    guarantee 1Tb/s
  • Complexity scales O(n2)
  • Capacity grows ltlt2x every 18 months
  • Hence load-balanced switches

41
Stanford 100Tb/s Internet Router
  • Goal Study scalability
  • Challenging, but not impossible
  • Two orders of magnitude faster than deployed
    routers
  • We will build components to show feasibility

42
Question
  • Can we use an optical fabric at 100Tb/s with 100
    throughput?
  • Conventional answer No.
  • Need to reconfigure switch too often
  • 100 throughput requires complex electronic
    scheduler.

43
Two-stage load-balancing switch
R
R
R
R/N
R/N
Out
In
R/N
R/N
R/N
R/N
R/N
R/N
R
R
R
In
R/N
R/N
R/N
R/N
R/N
R/N
R
R
R
R/N
R/N
In
R/N
R/N
Load-balancing stage
Switching stage
44
R
R
In
R/N
R/N
3
3
3
1
R/N
R/N
R/N
R/N
R/N
R/N
R
R
In
2
R/N
R/N
R/N
R/N
R/N
R/N
R/N
R
R
R/N
In
3
R/N
R/N
45
R
R
In
R/N
R/N
1
R/N
R/N
3
R/N
R/N
R/N
R/N
R
R
In
2
R/N
R/N
3
R/N
R/N
R/N
R/N
R/N
R
R
R/N
In
3
R/N
R/N
3
46
Changs load-balanced switchGood properties
  • 100 throughput for broad class of traffic
  • 2. No scheduler needed a Scalable

47
Changs load-balanced switchBad properties
  • Packet mis-sequencing
  • Pathological traffic patterns a Throughput
    1/N-th of capacity
  • Uses two switch fabrics a Hard to package
  • Doesnt work with some linecards missinga
    Impractical

48
100Tb/s Load-Balanced Router
L 16 160Gb/s linecards
49
Summary of trends
  • Multi-rack routers
  • Single router POPs
  • No commercial router provides 100 throughput
    guarantee.
  • Address lookups
  • Not a problem to 160Gb/s per linecard.
  • Packet buffering
  • Imperfect loss of throughput above 10Gb/s.
  • Switching
  • Centralized schedulers up to about 1Tb/s
  • Load-balanced 2-stage switches with 100
    throughput.
Write a Comment
User Comments (0)
About PowerShow.com