Title: The LoadBalanced Router Stanford Workshop on Load Balancing
1The Load-Balanced Router
Isaac Keslassy, Shang-Tse (Da) Chuang, Nick
McKeown Stanford University
2Typical Router Architecture
Switch Fabric
R
R
1
2
R
R
1
R
R
Scheduler
3Definitions Traffic Matrix
- Traffic matrix
- Uniform traffic matrix ?ij ?
4Definitions 100 Throughput
- 100 throughput for any traffic matrix of row
and column sum less than R, - ?ij lt µij
5Router Wish List
- Scale to High Linecard Speeds
- No Centralized Scheduler
- Optical Switch Fabric
- Low Packet-Processing Complexity
- Scale to High Number of Linecards
- High Number of Linecards
- Arbitrary Arrangement of Linecards
- Provide Performance Guarantees
- 100 Throughput Guarantee
- Delay Guarantee
- No Packet Reordering
6Stanford 100Tb/s Router
- Optics in Routers project
- http//yuba.stanford.edu/or/
- Some challenging numbers
- 100Tb/s
- 160Gb/s linecards
- 640 linecards
7100 Throughput in a Mesh Fabric
R
In
R
In
In
8If Traffic Is Uniform
R
In
R
In
R
In
9Real Traffic is Not Uniform
10Load-Balanced Switch
R
R
R
R/N
R/N
Out
In
R/N
R/N
R/N
R/N
R/N
R/N
R
R
R
In
R/N
R/N
R/N
R/N
R/N
R/N
R
R
R
R/N
R/N
In
R/N
R/N
Load-balancing stage
Forwarding stage
100 throughput for weakly mixing traffic
(Valiant, C.-S. Chang et al.)
11Load-Balanced Switch
R
R
In
R/N
R/N
1
2
3
R/N
R/N
R/N
R/N
R/N
R/N
R
R
In
R/N
R/N
R/N
R/N
R/N
R/N
R/N
R
R
R/N
In
R/N
R/N
12Load-Balanced Switch
R
R
In
R/N
R/N
R/N
R/N
1
R/N
R/N
R/N
R/N
R
R
In
R/N
R/N
2
R/N
R/N
R/N
R/N
R/N
R
R
R/N
In
R/N
R/N
3
13Intuition Proof of 100 Throughput
R
R
In
R/N
R/N
R/N
R/N
R/N
R/N
R
R
R/N
R/N
In
R/N
R/N
R/N
R/N
R/N
R
R/N
R
R/N
R/N
In
R/N
R/N
- Arrivals to second mesh
- Capacity of second mesh
- Second mesh arrival rate lt service rate
14Alternative Crossbar Switch Fabric
External Outputs
Intermediate ports
External Inputs
- Proposed by C.-S.Chang et al.
- Essential result same rate gt same guarantees
15Router Wish List
- Scale to High Linecard Speeds
- No Centralized Scheduler
- Optical Switch Fabric
- Low Packet-Processing Complexity
- Scale to High Number of Linecards
- High Number of Linecards
- Arbitrary Arrangement of Linecards
- Provide Performance Guarantees
- 100 Throughput Guarantee
- Delay Guarantee
- No Packet Reordering
?
?
?
16Packet Reordering
R
R
In
R/N
R/N
R/N
R/N
R/N
R/N
R/N
R/N
R
R
In
R/N
R/N
R/N
R/N
R/N
R/N
R/N
R
R
R/N
In
R/N
R/N
17Bounding Delay Difference Between Middle Ports
R
R
In
R/N
R/N
R/N
R/N
R/N
R/N
R/N
R/N
R
R
In
R/N
R/N
R/N
R/N
R/N
R/N
R/N
R
R
R/N
In
R/N
R/N
18UFS (Uniform Frame Spreading)
R
R
In
R/N
R/N
R/N
R/N
R/N
R/N
R/N
R/N
R
R
In
R/N
R/N
R/N
R/N
R/N
R/N
R/N
R
R
R/N
In
R/N
R/N
19FOFF (Full Ordered Frames First)
R
R
In
R/N
R/N
R/N
R/N
R/N
R/N
R/N
R/N
R
R
In
R/N
R/N
R/N
R/N
R/N
R/N
R/N
R
R
R/N
In
R/N
R/N
20FOFF (Full Ordered Frames First)
1
2
3
4
1
2
- Input Algorithm
- N FIFO queues corresponding to the N output flows
- Spread each flow uniformly if last packet was
sent to middle port k, send next to k1. - Every N time-slots, pick a flow - If full frame
exists, pick it and spread like UFS - Else if
all frames are partial, pick one in round-robin
order and send it
21Bounding Reordering
R
R
In
R/N
R/N
R/N
R/N
R/N
R/N
R/N
R/N
R
R
In
R/N
R/N
R/N
R/N
R/N
R/N
R/N
R
R
R/N
In
R/N
R/N
22FOFF
Output
1
1
1
4
2
2
3
3
3
- Output properties
- N FIFO queues corresponding to the N middle ports
- If there are N2 packets, one of the head-of-line
packets is in order and can depart - ? Buffer size at most N2 packets
23FOFF Properties
- Property 1 FOFF maintains packet order.
- Property 2 FOFF has O(1) complexity.
- Property 3 Congestion buffers operate
independently. - Property 4 FOFF maintains an average packet
delay within constant from ideal output-queued
router. - Corollary FOFF has 100 throughput for any
adversarial traffic.
24Output-Queued Router
R
In
R
In
In
25Router Wish List
- Scale to High Linecard Speeds
- No Centralized Scheduler
- Optical Switch Fabric
- Low Packet-Processing Complexity
- Scale to High Number of Linecards
- High Number of Linecards
- Arbitrary Arrangement of Linecards
- Provide Performance Guarantees
- 100 Throughput Guarantee
- Delay Guarantee
- No Packet Reordering
?
26From Two Meshes to One Mesh
R
R
In
R/N
R/N
R/N
R/N
R/N
R/N
R/N
R/N
R
R
In
R/N
R/N
R/N
R/N
R/N
R/N
R/N
R
R
R/N
In
R/N
R/N
27From Two Meshes to One Mesh
R
First mesh
Second mesh
28From Two Meshes to One Mesh
R
Combined mesh
29Many Fabric Options
N channels each at rate 2R/N
Any spreading device
Options Space Full uniform mesh Time
Round-robin crossbar Wavelength Static WDM
30AWGR (Arrayed Waveguide Grating Router) A
Passive Optical Component
1
l
Linecard 1
Linecard 1
1
Linecard 2
1
l
Linecard 2
2
NxN AWGR
1
l
Linecard N
Linecard N
N
- Wavelength i on input port j goes to output port
(ij-1) mod N - Can shuffle information from different inputs
31Static WDM Switching Packaging
AWGR Passive andAlmost ZeroPower
A
B
C
D
32Router Wish List
- Scale to High Linecard Speeds
- No Centralized Scheduler
- Optical Switch Fabric
- Low Packet-Processing Complexity
- Scale to High Number of Linecards
- High Number of Linecards
- Arbitrary Arrangement of Linecards
- Provide Performance Guarantees
- 100 Throughput Guarantee
- Delay Guarantee
- No Packet Reordering
?
?
33Scaling Problem
- For N lt 64, an AWGR is a good solution.
- We want N 640.
- Need to decompose.
34A Different Representation of the Mesh
Mesh
35A Different Representation of the Mesh
2R/N
36Example N8
2R/8
37When N is Too LargeDecompose into groups (or
racks)
2R
2R
4R
4R/4
4R
2R
2R
38When N is Too LargeDecompose into groups (or
racks)
Group/Rack 1
Group/Rack 1
2R
2R
2RL/G
2R
2R
2RL
2RL
2R
2R
2RL/G
Group/Rack G
Group/Rack G
2RL/G
2R
2R
2R
2R
2RL
2RL
2R
2R
2RL/G
39Router Wish List
- Scale to High Linecard Speeds
- No Centralized Scheduler
- Optical Switch Fabric
- Low Packet-Processing Complexity
- Scale to High Number of Linecards
- High Number of Linecards
- Arbitrary Arrangement of Linecards
- Provide Performance Guarantees
- 100 Throughput Guarantee
- Delay Guarantee
- No Packet Reordering
?
?
?
40When Linecards are Missing
Group/Rack 1
Group/Rack 1
2R
2R
2RL
2RL/G
2R
2R
2RL
2RL
2R
2R
- Solution replace mesh with sum of permutations
Group/Rack G
Group/Rack G
2R
2R
2R
2R
2RL
2RL
2R
2R
41MEMS-Based Architecture
42When Linecards are Missing
Group/Rack 1
Group/Rack 1
MEMS Switch
MEMS Switch
Group/Rack G
Group/Rack G
43Implementation of a 100Tb/s Load-Balanced Router
Switch Rack lt 100W
Linecard Rack G 40
40 x 40 static MEMS
L 16 160Gb/s linecards
1
2
55
56
44Summary
- The load-balanced switch
- Does not need any centralized scheduling
- Can use a mesh
- Using FOFF
- It keeps packets in order
- It guarantees 100 throughput
- Using the MEMS-based architecture
- It scales to high port numbers
- It tolerates linecard failure
45References
- Initial Work
- C.-S. Chang, D.-S. Lee and Y.-S. Jou, "Load
Balanced Birkhoff-von Neumann Switches, part I
One-Stage Buffering," Computer Communications,
Vol. 25, pp. 611-622, 2002. - Extensions
- I. Keslassy, S.-T. Chuang, K. Yu, D. Miller, M.
Horowitz, O. Solgaard and N. McKeown, "Scaling
Internet Routers Using Optics," ACM SIGCOMM'03,
Karlsruhe, Germany, August 2003. - I. Keslassy, S.-T. Chuang and N. McKeown, A
Load-Balanced Switch with an Arbitrary Number of
Linecards, IEEE Infocom04, Hong Kong, March
2004.
46Thank you.