Title: LoadBalanced Routers
1A 100Tb/s Regional Node
Isaac Keslassy, Shang-Tse (Da) Chuang, Nick
McKeown Stanford University
2Why 100Tb/s?
- 100 million homes at 100Mb/s ? total of
1016b/s10Pb/s - 100 regional nodes ? each roughly 1014100Tb/s
- More because of route length
- Less because of multiplexing
- Challenging numbers
- 100Tb/s router
- 160Gb/s line rate
- 640 linecards
- Reliable performance guarantees
3Typical Router Architecture
Switch Fabric
R
R
R
R
R
R
Scheduler
4Definitions Traffic Matrix
- Traffic matrix
- Uniform traffic matrix ?ij ?
5Definitions 100 Throughput
- 100 throughput for any traffic matrix of row
and column sum less than R, - ?ij lt µij
6100 Tb/s Router Wish List
- Scale to High Linecard Speeds
- No Centralized Scheduler
- Optical Switch Fabric
- Low Packet-Processing Complexity
- Scale to High Number of Linecards
- High Number of Linecards
- Arbitrary Arrangement of Linecards
- Provide Performance Guarantees
- 100 Throughput Guarantee
- Delay Guarantee
- No Packet Reordering
7100 Throughput in a Mesh Fabric
R
In
R
In
In
8If Traffic Is Uniform
R
In
R
In
R
In
9Real Traffic is Not Uniform
10Load-Balanced Switch
R
R
R
R/N
R/N
Out
In
R/N
R/N
R/N
R/N
R/N
R/N
R
R
R
In
R/N
R/N
R/N
R/N
R/N
R/N
R
R
R
R/N
R/N
In
R/N
R/N
Load-balancing mesh
Forwarding mesh
100 throughput for weakly mixing traffic
(Valiant, C.-S. Chang)
11Load-Balanced Switch
R
R
In
R/N
R/N
1
2
3
R/N
R/N
R/N
R/N
R/N
R/N
R
R
In
R/N
R/N
R/N
R/N
R/N
R/N
R/N
R
R
R/N
In
R/N
R/N
12Load-Balanced Switch
R
R
In
R/N
R/N
R/N
R/N
1
R/N
R/N
R/N
R/N
R
R
In
R/N
R/N
2
R/N
R/N
R/N
R/N
R/N
R
R
R/N
In
R/N
R/N
3
13Combining the Two Meshes
R
In
R/N
R/N
R/N
R/N
R
In
R/N
R/N
R/N
R
R/N
In
R/N
14A Single Combined Mesh
15AWGR A Mesh of WDM Channels
Fixed Laser/Modulator
Detector
l
l
1
1
1
N
1
1
l
l
l
l
,
,
l
l
1
2
1
2
2
2
2
1
l
l
N
N
l
l
N
N
l
l
1
1
2
1
AWGR (Arrayed Waveguide Grating Router)
2
2
l
l
,
l
l
,
l
l
1
2
1
2
2
2
2
l
N
l
l
N
N
l
l
1
1
N
N-1
N
N
l
l
l
l
,
,
l
l
1
2
1
2
2
2
N
1
l
l
N
N
l
l
N
N
16100 Tb/s Router Wish List
- Scale to High Linecard Speeds
- No Centralized Scheduler
- Optical Switch Fabric
- Low Packet-Processing Complexity
- Scale to High Number of Linecards
- High Number of Linecards
- Arbitrary Arrangement of Linecards
- Provide Performance Guarantees
- 100 Throughput Guarantee
- Delay Guarantee
- No Packet Reordering
?
?
?
17Packet Reordering
R
R
In
R/N
R/N
R/N
R/N
R/N
R/N
R/N
R/N
R
R
In
R/N
R/N
R/N
R/N
R/N
R/N
R/N
R
R
R/N
In
R/N
R/N
18Full Ordered Frames First
- Distributed algorithm with O(1) complexity
- Property 1 No packet reordering.
- Property 2 Average packet delay within constant
from ideal output-queued router. - Corollary 100 throughput for any adversarial
traffic.
19100 Tb/s Router Wish List
- Scale to High Linecard Speeds
- No Centralized Scheduler
- Optical Switch Fabric
- Low Packet-Processing Complexity
- Scale to High Number of Linecards
- High Number of Linecards
- Arbitrary Arrangement of Linecards
- Provide Performance Guarantees
- 100 Throughput Guarantee
- Delay Guarantee
- No Packet Reordering
?
?
20When N is Too Large Example with N8
Uniform spreading
21When N is Too LargeDecompose into groups (or
racks)
Uniform multiplexing
Uniform demultiplexing
Uniform spreading
22When N is Too LargeDecompose into groups (or
racks)
Group/Rack 1
Group/Rack 1
2R
2R
2RL/G
2R
2R
2RL
2RL
2R
2R
2RL/G
Group/Rack G
Group/Rack G
2RL/G
2R
2R
2R
2R
2RL
2RL
2R
2R
2RL/G
23Router Wish List
- Scale to High Linecard Speeds
- No Centralized Scheduler
- Optical Switch Fabric
- Low Packet-Processing Complexity
- Scale to High Number of Linecards
- High Number of Linecards
- Arbitrary Arrangement of Linecards
- Provide Performance Guarantees
- 100 Throughput Guarantee
- Delay Guarantee
- No Packet Reordering
?
?
?
24When Linecards are MissingFailures, Incremental
Additions, and Removals
Group/Rack 1
Group/Rack 1
2R
2R
2RL
2RL/G
2R
2R
2RL
2RL
2R
2R
Group/Rack G
Group/Rack G
2R
2R
2R
2R
2RL
2RL
2R
2R
25When Linecards are MissingMEMS-Based Architecture
26When Linecards are Missing
Group/Rack 1
Group/Rack 1
MEMS Switch
MEMS Switch
Group/Rack G
Group/Rack G
27Implementation of a Regional Node100Tb/s
Load-Balanced Router
Switch Rack lt 100W
Linecard Rack G 40
40 x 40 static MEMS
L 16 160Gb/s linecards
1
2
55
56
28Ongoing Research
- Upon linecard failure, algorithm for new TDM
packet schedule (Infocom04) Verilog
implementation lt 50 ms - Study challenging components
- High-speed buffering
- Clock acquisition
- On-chip optical modulator array
- Fast tunable lasers and filters
29References
- Initial Work
- C.-S. Chang, D.-S. Lee and Y.-S. Jou, "Load
Balanced Birkhoff-von Neumann Switches, part I
One-Stage Buffering," Computer Communications,
Vol. 25, pp. 611-622, 2002. - Sigcomm03
- I. Keslassy, S.-T. Chuang, K. Yu, D. Miller, M.
Horowitz, O. Solgaard and N. McKeown, "Scaling
Internet Routers Using Optics," ACM SIGCOMM '03,
Karlsruhe, Germany, August 2003.
30Thank you.