Title: High Speed Router Design
1High Speed Router Design
- Shivkumar Kalyanaraman
- Rensselaer Polytechnic Institute
- shivkuma_at_ecse.rpi.edu
- http//www.ecse.rpi.edu/Homepages/shivkuma
- Based in part on slides of Nick McKeown
(Stanford), S. Keshav (Cornell)
2Overview
- Introduction
- Evolution of High-Speed Routers
- High Speed Router Components
- Lookup Algorithm
- Classification
- Switching
3What do switches/routers look like?
Access routers e.g. ISDN, ADSL
Core router e.g. OC48c POS
Core ATM switch
4Basic Architectural Components
Congestion Control
Control
Admission Control
Reservation
Routing
Datapath per-packet processing
Output Scheduling
Switching
Policing
5Basic Architectural ComponentsForwarding
Decision
3.
1.
Output Scheduling
2.
Forwarding Table
Interconnect
Forwarding Decision
Forwarding Table
Forwarding Decision
Forwarding Table
Forwarding Decision
6Per-packet processing in an IP Router
- 1. Accept packet arriving on an incoming link.
- 2. Lookup packet destination address in the
forwarding table, to identify outgoing port(s). - 3. Manipulate packet header e.g., decrement TTL,
update header checksum. - 4. Send (switch) packet to the outgoing port(s).
- 5. Classify and buffer packet in the queue.
- 6. Transmit packet onto outgoing link.
7Lookup and Forwarding Engine
Packet
header
payload
Router
Routing Lookup Data Structure
Destination Address
Outgoing Port
Forwarding Table
Dest-network
Port
65.0.0.0/8
3
128.9.0.0/16
1
149.12.0.0/19
7
8Example Forwarding Table
Prefix length
IP prefix 0-32 bits
142.12.0.0/19
128.9.0.0/16
65.0.0.0/8
0
232-1
224
65.0.0.0
65.255.255.255
9Prefixes can Overlap
Longest matching prefix
128.9.176.0/24
128.9.16.0/21
128.9.172.0/21
142.12.0.0/19
65.0.0.0/8
128.9.0.0/16
0
232-1
Routing lookup Find the longest matching prefix
(aka the most specific route) among all prefixes
that match the destination address.
10Difficulty of Longest Prefix Match
- 2-dimensional search
- Prefix Length
- Prefix Value
32
24
Prefix Length
128.9.176.0/24
128.9.172.0/21
128.9.16.0/21
142.12.0.0/19
128.9.0.0/16
65.0.0.0/8
8
Prefix Values
11Lookup Rates Required
40B packets (Mpps)
Line-rate (Gbps)
Line
Year
1.94
0.622
OC12c
1998-99
7.81
2.5
OC48c
1999-00
31.25
10.0
OC192c
2000-01
125
40.0
OC768c
2002-03
12Update Rates Required
- Recent BGP studies show that updates can be
- Bursty several 100s of routes updated/withdrawn
gt insert/delete operations - Frequent Average 100 updates per second
- Need data structure to be efficient in terms of
lookup as well as update (insert/delete)
operations.
13Size of the Forwarding Table
Renewed Exponential Growth
Number of Prefixes
10,000/year
95
96
97
98
99
00
Year
Renewed growth due to multi-homing of enterprise
networks!
- Source http//www.telstra.net/ops/bgptable.html
14Potential Hyper-Exponential Growth!
Global routing table vs Moore's law since 1999
160000
Global prefixes
Moore's law
150000
Double growth
140000
130000
120000
110000
Prefixes
100000
90000
80000
70000
60000
50000
01/99
04/99
07/99
10/99
01/00
04/00
07/00
10/00
01/01
04/01
15Trees and Tries
Binary Search Tree
Binary Search Trie
lt
gt
0
1
lt
gt
lt
gt
0
1
0
1
111
010
16Trees and TriesMultiway tries
16-ary Search Trie
0000, ptr
1111, ptr
1111, ptr
0000, 0
1111, ptr
0000, 0
000011110000
111111111111
17Lookup Multiway TriesTradeoffs
Table produced from 215 randomly generated 48-bit
addresses
18Routing Lookups in Hardware
Number
Prefix length
Most prefixes are 24-bits or shorter
19Routing Lookups in Hardware
224 16M entries
Prefixes up to 24-bits
142.19.6
142.19.6.14
14
20Routing Lookups in Hardware
Prefixes up to 24-bits
1
Next Hop
128.3.72
128.3.72.44
44
21Basic Architectural Components Interconnect
3.
1.
Output Scheduling
2.
Forwarding Table
Interconnect
Forwarding Decision
Forwarding Table
Forwarding Decision
Forwarding Table
Forwarding Decision
22First-Generation IP Routers
Shared Backplane
Buffer Memory
CPU
- Most Ethernet switches and cheap packet routers
- Bottleneck can be CPU, host-adaptor or I/O bus
- What is costly? Bus ? Memory? Interface? CPU?
23Second-Generation IP Routers
- Port mapping intelligence in line cards
- Higher hit rate in local lookup cache
- What is costly? Bus ? Memory? Interface? CPU?
24Third-Generation Switches/Routers
Switched Backplane
Line Card
CPU Card
Line Card
Local Buffer Memory
Local Buffer Memory
MAC
MAC
- Third generation switch provides parallel paths
(fabric) - Whats costly? Bus? Memory, CPU?
25Fourth-Generation Switches/RoutersClustering and
Multistage
13
14
15
16
17
18
25
26
27
28
29
30
1
2
3
4
5
6
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
19
20
21
22
23
24
31
32
21
7
8
9
10
11
12
26Circuit switch
- A switch that can handle N calls has N logical
inputs and N logical outputs - N up to 200,000
- Moves 8-bit samples from an input to an output
port - Recall that samples have no headers
- Destination of sample depends on time at which it
arrives at the switch - In practice, input trunks are multiplexed
- Multiplexed trunks carry frames set of samples
- Goal extract samples from frame, and depending
on position in frame, switch to output - each incoming sample has to get to the right
output line and the right slot in the output frame
27Call blocking
- Cant find a path from input to output
- Internal blocking
- slot in output frame exists, but no path
- Output blocking
- no slot in output frame is available
- Output blocking is reduced in transit switches
- need to put a sample in one of several slots
going to the desired next hop
28Multiplexors and demultiplexors
- Most trunks time division multiplex voice samples
- At a central office, trunk is demultiplexed and
distributed to active circuits - Synchronous multiplexor
- N input lines
- Output runs N times as fast as input
1
1
2
2
3
3
De- MUX
MUX
1
2
3
N
N
N
29Switching what does a switch do?
- Transfers data from an input to an output
- many ports (density), high speeds
- Eg Crossbar
30Time division switching
- Key idea when de-multiplexing, position in frame
determines output trunk - Time division switching interchanges sample
position within a frame time slot interchange
(TSI)
31Space division switching
- Each sample takes a different path through the
switch, depending on its destination
32Crossbar
- Simplest possible space-division switch
- Crosspoints can be turned on or off, long enough
to transfer a packet from an input to an output - Internally nonblocking
- but need N2 crosspoints
- time to set each crosspoint grows quadratically
33Multistage crossbar
- In a crossbar during each switching time only one
cross-point per row or column is active - Can save crosspoints if a cross-point can attach
to more than one input line (why?) - This is done in a multistage crossbar
- Need to rearrange connections every switching time
34Multistage crossbar
- Can suffer internal blocking
- unless sufficient number of second-level stages
- Number of crosspoints lt N2
- Finding a path from input to output requires a
depth-first-search - Scales better than crossbar, but still not too
well - 120,000 call switch needs 250 million crosspoints
35Packet switches
- In a circuit switch, path of a sample is
determined at time of connection establishment - No need for a sample header--position in frame
used - In a packet switch, packets carry a destination
field or label - Need to look up destination port on-the-fly
- Datagram switches
- lookup based on entire destination address
(longest-prefix match) - Cell or Label-switches
- lookup based on VCI or Labels
36Blocking in packet switches
- Can have both internal and output blocking
- Internal
- no path to output
- Output
- trunk unavailable
- Unlike a circuit switch, cannot predict if
packets will block (why?) - If packet is blocked gt must either buffer or
drop
37Dealing with blocking in packet switches
- Over-provisioning
- internal links much faster than inputs
- Buffers
- at input or output
- Backpressure
- if switch fabric doesnt have buffers, prevent
packet from entering until path is available - Parallel switch fabrics
- increases effective switching capacity
38Switch Fabrics Buffered crossbar
- What happens if packets at two inputs both want
to go to same output? - Can defer one at an input buffer
- Or, buffer cross-points complex arbiter
39Switch fabric element
- Goal towards building self-routing fabrics
- Can build complicated fabrics from a simple
element - Routing rule if 0, send packet to upper output,
else to lower output - If both packets to same output, buffer or drop
40Banyan
- Simplest self-routing recursive fabric
- What if two packets both want to go to the same
output? - output blocking
41Blocking in Banyan S/ws Sorting
- Can avoid blocking by choosing order in which
packets appear at input ports - If we can
- present packets at inputs sorted by output
- remove duplicates
- remove gaps
- precede banyan with a perfect shuffle stage
- then no internal blocking
- For example X, 010, 010, X, 011, X, X, X
- Sort gt 010, 011, 011, X, X, X, X, X
- Remove dups gt 010, 011, X, X, X, X, X, X
- Shuffle gt 010, X, 011, X, X, X, X,
X - Need sort, shuffle, and trap networks
42Sorting using Merging
- Build sorters from merge networks
- Assume we can merge two sorted lists
- Sort pairwise, merge, recurse
43Non-Blocking Batcher-Banyan
Batcher Sorter
Self-Routing Network
3
7
7
7
7
7
7
000
7
2
5
0
4
6
6
001
5
3
2
5
5
4
5
010
2
5
3
1
6
5
4
011
6
6
1
3
0
3
3
100
0
1
0
4
3
2
2
101
1
0
6
2
1
0
1
110
4
4
4
6
2
2
0
111
- Fabric can be used as scheduler.
- Batcher-Banyan network is blocking for multicast.
44Basic Architectural Components Queuing,
Classification
3.
1.
Output Scheduling
2.
Forwarding Table
Interconnect
Forwarding Decision
Forwarding Table
Forwarding Decision
Forwarding Table
Forwarding Decision
45QueuingTwo basic techniques
Input Queueing
Output Queueing
Usually a non-blocking switch fabric (e.g.
crossbar)
Usually a fast bus
46QueuingOutput Queueing
Individual Output Queues
Centralized Shared Memory
1
2
N
1
2
N
47Input QueueingHead of Line Blocking
Delay
Load
100
48Solution Input Queueing w/Virtual output queues
(VOQ)
49Input QueuesVirtual Output Queues
Delay
Load
100
50Packet Classification
HEADER
Action
Incoming Packet
51Multi-field Packet Classification
Given a classifier with N rules, find the action
associated with the highest priority rule
matching an incoming packet.
52Prefix matching 1-d range problem
128.9/16
0
232-1
128.9.16.14
53Classification 2D Geometry problem
R7
R6
R2
R1
R4
R5
R3
e.g. (144.24/16, 64/24)
e.g. (128.16.46.23, )
Field 2
Field 1
54Summary
- High speed routers lookup, switching,
classification, buffer management - Lookup Range-matching, tries, multi-way tries
- Switching circuit s/w, crossbar, batcher-banyan,
- Queuing input/output queuing issues
- Classification Multi-dimensional geometry problem