Title: CS244a: An Introduction to Computer Networks
1Scaling routers Where do we go from
here? HPSR, Kobe, Japan May 28th, 2002
Nick McKeown Professor of Electrical Engineering
and Computer Science, Stanford
University nickm_at_stanford.edu www.stanford.edu/ni
ckm
2Router capacity x2.2/18 months
Moores law x2/18 m
3Router capacity x2.2/18 months
Moores law x2/18 m
DRAM access rate x1.1/18 m
4Router vital statistics
Cisco GSR 12416
Juniper M160
19
19
Capacity 160Gb/sPower 4.2kW
Capacity 80Gb/sPower 2.6kW
6ft
3ft
2ft
2.5ft
5Internet traffic x2/yr
5x
Router capacity x2.2/18 months
6Fast (large) routers
- Big POPs need big routers
POP with smaller routers
POP with large routers
- Interfaces Price gt200k, Power gt 400W
- About 50-60 of interfaces are used for
interconnection within the POP. - Industry trend is towards large, single router
per POP.
7Job of router architect
- For a given set of features
8Mind the gap
- Operators are unlikely to deploy 5 times as many
POPs, or make them 5 times bigger, with 5 times
the power consumption. - Our options
- Make routers simple
- Use more parallelism
- Use more optics
9Mind the gap
- Operators are unlikely to deploy 5 times as many
POPs, or make them 5 times bigger, with 5 times
the power consumption. - Our options
- Make routers simple
- Use more parallelism
- Use more optics
10Make routers simple
- We tell our students that Internet routers are
simple. All routers do is make a forwarding
decision, update a header, then forward packets
to the correct outgoing interface. - But I dont understand them anymore.
- List of required features is huge and still
growing, - Software is complex and unreliable,
- Hardware is complex and power-hungry.
11Router linecard
OC192c linecard
Lookup Tables
Optics
Packet Processing
Buffer Mgmt Scheduling
Physical Layer
Framing Maintenance
Buffer Mgmt Scheduling
- 30M gates
- 2.5Gbits of memory
- 1m2
- 25k cost, 200k price.
Scheduler
12Things that slow routers down
- 250ms of buffering
- Requires off-chip memory, more board space, pins
and power. - Multicast
- Affects everything!
- Complicates design, slows deployment.
- Latency bounds
- Limits pipelining.
- Packet sequence
- Limits parallelism.
- Small internal cell size
- Complicates arbitration.
- DiffServ, IntServ, priorities, WFQ etc.
- Others IPv6, Drop policies, VPNs, ACLs, DOS
traceback, measurement, statistics,
13An example Packet processing
CPU Instructions per minimum length packet since
1996
14Reducing complexityConclusion
- Need aggressive reduction in complexity of
routers. - Get rid of irrelevant requirements and irrational
tests. - It is not clear who has the right incentive to
make this happen. - Else, be prepared for core routers to be replaced
by optical circuit switches.
15Mind the gap
- Operators are unlikely to deploy 5 times as many
POPs, or make them 5 times bigger, with 5 times
the power consumption. - Our options
- Make routers simpler
- Use more parallelism
- Use more optics
16Use more parallelism
- Parallel packet buffers
- Parallel lookups
- Parallel packet switches
- Things that make parallelism hard
- Maintaining packet order,
- Making throughput guarantees,
- Making delay guarantees,
- Latency requirements,
- Multicast.
17Parallel Packet Switches
Router
1
rate, R
rate, R
1
1
2
rate, R
rate, R
N
N
k
Bufferless
18Characteristics
- Advantages
- kh a memory bandwidth i
- kh a lookup/classification rate i
- kh a routing/classification table size I
- With appropriate algorithms
- Packets remain in order,
- 100 throughput,
- Delay guarantees (at least in theory).
19Mind the gap
- Operators are unlikely to deploy 5 times as many
POPs, or make them 5 times bigger, with 5 times
the power consumption. - Our options
- Make routers simpler
- Use more parallelism
- Use more optics
20All-optical routers dont make sense
- A router is a packet-switch, and so requires
- A switch fabric,
- Per-packet address lookup,
- Large buffers for times of congestion.
- Packet processing/buffering infeasible with
optics - A typical 10 Gb/s router linecard has 30 Mgates
and 2.5 Gbits of memory. - Research Problem
- How to optimize the architecture of a router that
uses an optical switch fabric?
21100Tb/s optical routerStanford University
Research Project
- Collaboration
- 4 Professors at Stanford (Mark Horowitz, Nick
McKeown, David Miller and Olav Solgaard), and our
groups. - Objective
- To determine the best way to incorporate optics
into routers. - Push technology hard to expose new issues.
- Photonics, Electronics, System design
- Motivating example The design of a 100 Tb/s
Internet router - Challenging but not impossible (100x current
commercial systems) - It identifies some interesting research problems
22100Tb/s optical router
Optical Switch
Electronic Linecard 1
Electronic Linecard 625
160- 320Gb/s
160- 320Gb/s
40Gb/s
- Line termination
- IP packet processing
- Packet buffering
- Line termination
- IP packet processing
- Packet buffering
40Gb/s
160Gb/s
40Gb/s
Arbitration
Request
40Gb/s
Grant
(100Tb/s 625 160Gb/s)
23Research Problems
- Linecard
- Memory bottleneck Address lookup and packet
buffering. - Architecture
- Arbitration Computation complexity.
- Switch Fabric
- Optics Fabric scalability and speed,
- Electronics Switch control and link electronics,
- Packaging Three surface problem.
24160Gb/s Linecard Packet Buffering
DRAM
DRAM
DRAM
160 Gb/s
160 Gb/s
Queue Manager
SRAM
- Problem
- Packet buffer needs density of DRAM (40 Gbits)
and speed of SRAM (2ns per packet) - Solution
- Hybrid solution uses on-chip SRAM and off-chip
DRAM. - Identified optimal algorithms that minimize size
of SRAM (12 Mbits). - Precisely emulates behavior of 40 Gbit, 2ns SRAM.
klamath.stanford.edu/nickm/papers/ieeehpsr2001.pd
f
25The Arbitration Problem
- A packet switch fabric is reconfigured for every
packet transfer. - At 160Gb/s, a new IP packet can arrive every 2ns.
- The configuration is picked to maximize
throughput and not waste capacity. - Known algorithms are too slow.
26Approach
- We know that a crossbar with VOQs, and uniform
Bernoulli i.i.d. arrivals, gives 100 throughput
for the following scheduling algorithms - Pick a permutation uar from all permutations.
- Pick a permutation uar from the set of size N in
which each input-output pair (i,j) are connected
exactly once in the set. - From the same set as above, repeatedly cycle
through a fixed sequence of N different
permutations. - Can we make non-uniform, bursty traffic uniform
enough for the above to hold?
272-Stage Switch
External Outputs
Internal Inputs
External Inputs
Spanning Set of Permutations
Spanning Set of Permutations
- Recently shown to have 100 throughput
- Mild conditions weakly mixing arrival processes
C.S.Chang et al. http//www.ee.nthu.edu.tw/cscha
ng/PartI.pdf
282-Stage Switch
External Outputs
Internal Inputs
External Inputs
Spanning Set of Permutations
Spanning Set of Permutations
1
N
29Problem Unbounded Mis-sequencing
External Outputs
Internal Inputs
External Inputs
Spanning Set of Permutations
Spanning Set of Permutations
- Side-note Mis-sequencing is maximized when
arrivals are uniform.
30Preventing Mis-sequencing
Large Congestion Buffers
Small Coordination Buffers FFF Algorithm
Spanning Set of Permutations
Spanning Set of Permutations
- The Full Frames First algorithm
- Keep packets ordered and
- Guarantees a delay bound within the optimum
Infocom02 klamath.stanford.edu/nickm/papers/inf
ocom02_two_stage.pdf
31ExampleOptical 2-stage Switch
Linecards
Lookup
Phase 1
Buffer
1
Lookup
Buffer
2
Phase 2
Lookup
Buffer
Idea Use a single-stage twice
3
32ExamplePassive Optical 2-Stage Switch
R/N
R/N
Ingress Linecard 1
Midstage Linecard 1
Egress Linecard 1
R/N
R/N
Ingress Linecard 2
Midstage Linecard 2
Egress Linecard 2
Ingress Linecard n
Midstage Linecard n
Egress Linecard n
R/N
R/N
It is helpful to think of it as spreading rather
than switching.
332-Stage spreading
Buffer stage
1
1
1
N
34Passive Optical Switching
Integrated AWGR or diffraction grating based
wavelength router
Midstage Linecard 1
Egress Linecard 1
Ingress Linecard 1
1
1
1
1
Midstage Linecard 2
Egress Linecard 2
2
Ingress Linecard 2
2
2
2
Midstage Linecard n
Egress Linecard n
n
Ingress Linecard n
n
n
n
35100Tb/s Router
Optical links
Optical Switch Fabric
Racks of 160Gb/s Linecards
36Racks with 160Gb/s linecards
37Additional Technologies
- Demonstrated or in development
- Chip to chip optical interconnects with total
power dissipations of several mW. - Demonstration of wavelength division multiplexed
chip interconnect. - Integrated laser modulators.
- 8Gsample/s serial links.
- Low-power variable power supply serial links.
- Integrated array waveguide routers.
38Mind the gap
- Operators are unlikely to deploy 5 times as many
POPs, or make them 5 times bigger, with 5 times
the power consumption. - Our options
- Make routers simpler
- Use more parallelism
- Use more optics
39Some predictions about core Internet routers
- The need for more capacity for a given power and
volume budget will mean - Fewer functions in routers
- Little or no optimization for multicast,
- Continued overprovisioning will lead to little or
no support for QoS, DiffServ, , - Fewer unnecessary requirements
- Mis-sequencing will be tolerated,
- Latency requirements will be relaxed.
- Less programmability in routers, and hence no
network processors. - Greater use of optics to reduce power in switch.
40What I believe is most likely
- The need for capacity and reliability will mean
- Widespread replacement of core routers with
transport switching based on circuits - Circuit switches have proved simpler, more
reliable, lower power, higher capacity and lower
cost per Gb/s. Eventually, this is going to
matter. - Internet will evolve to become edge routers
interconnected by rich mesh of WDM circuit
switches.