Title: End of
1End of Design Principles!
- Goals
- framework for covering advanced topics
- material that is timely, timeless hot now but
also long shelf life - synthesis deeper understanding see the forest
for the trees
2Routers in a Network
3Sample Routers and Switches
Cisco 1816 Routerup to 1.28 Tb/s throughput up
to 40 Gb/s ports
Juniper Networks T640 Router up to 11.28 Tb/s
throughput up to 40 Gb/s ports
3Com 387048 port gigabit Ethernet switch
4High Capacity Router
- Cisco CRS-1
- up to 92 Tb/s thruput
- two rack types
- line card rack
- 640 Gb/s thruput
- up to 16 line cards
- up to 40 Gb/s each
- up to 72 racks
- switch rack
- central switch stage
- up to 8 racks
- continuous service operation
5Components of a Basic Router
- Input/Output Interfaces (II, OI)
- convert between optical signals and electronic
signals - extract timing from received signals
- encode (decode) data for transmission
- Input Port Processor (IPP)
- synchronize signals
- determine required OI or OIs from routing table
- Output Port Processor (OPP)
- queue outgoing cells
- shared bus interconnects IPPs and OPPs
- Control Processor (CP)
- configures routing tables
- coordinates end-to-end channel setup together
with neighboring routers
6Router functionality
Header Processing
Lookup IP Address
Update Header
Queue Packet
Classify Packet
Data
Hdr
Buffer Memory
Address Table
7Lookups Must be Fast
40B packets (Mpkt/s)
Line
Year
1.94
622Mb/s
1997
7.81
2.5Gb/s
1999
31.25
10Gb/s
2001
125
40Gb/s
2003
8Memory Technology (2004-05)
Technology Single chip density /chip (/MByte) Access speed Watts/chip
Networking DRAM 512Mb 6-10 (0.08-0.4) 20-40ns 1-2W
SRAM 36 Mb 80 1.7 4-8ns 0.5-1W
TCAM 1 Mb 200-250 (200-250) 4-8ns 15-30W
Note price, speed, power manufacturer and market
dependent
9Lookup Mechanism is Protocol Dependent
Protocol Mechanism Techniques
MPLS, ATM, Ethernet Exact match search Direct lookup Associative lookup Hashing Binary/Multi-way Search Trie/Tree
IPv4, IPv6 Longest-prefix match search Radix trie and variants Compressed trie Binary search on prefix intervals
10Exact Matches in Ethernet Switches
- layer-2 addresses usually 48-bits long
- address global, not just local to link
- range/size of address not negotiable
- 248 gt 1012, therefore cannot hold all addresses
in table and use direct lookup
11Exact Matches in Ethernet Switches (Associative
Lookup)
- associative memory (aka Content Addressable
Memory, CAM) compares all entries in parallel
against incoming data
Associative Memory (CAM)
Network address
Location
Address
Data
48bits
Match
12Exact Matches in Ethernet SwitchesHashing
Memory
Memory
Network Address
Hashing Function
Pointer
16, say
List/Bucket
Address
Data
Data
Address
48
List of network addresses in this bucket
- use pseudo-random hash function (relatively
insensitive to actual function) - bucket linearly searched (or could be binary
search, etc.) - unpredictable number of memory references
13Exact Matches in Ethernet SwitchesPerfect Hashing
Network Address
Hashing Function
Port
16, say
Data
Address
Memory
48
- There always exists perfect hash function
- Goal With perfect hash function, memory lookup
always takes O(1) memory references - Problem
- finding perfect hash function very complex
- - updates?
14Exact Matches in Ethernet Switches Hashing
- advantages
- simple
- expected lookup time is small
- disadvantages
- inefficient use of memory
- non-deterministic lookup time
- ? attractive for software-based switches, but
decreasing use in hardware platforms
15Longest Prefix Match Harder than Exact Match
- destination address of arriving packet does not
carry information to determine length of longest
matching prefix - need to search space of all prefix lengths as
well as space of prefixes of given length
16LPM in IPv4 exact match
- Use 32 exact match algorithms
Exact match against prefixes of length 1
Exact match against prefixes of length 2
Port
Priority Encode and pick
Exact match against prefixes of length 32
17IP Address Lookup
- routing tables contain (prefix, next hop) pairs
- address in packet compared to stored prefixes,
starting at left - prefix that matches largest number of address
bits is desired match - packet forwarded to specified next hop
routing table
nexthop
prefix
10
7
01
5
110
3
1011
5
0001
0
0101 1
7
0001 0
1
0011 00
2
1011 001
3
1011 010
5
0100 110
6
0100 1100
4
1011 0011
8
1001 1000
10
0101 1001
9
Problem - large router may have100,000 prefixes
in its list
address 1011 0010 1000
18Address Lookup Using Tries
P1 111 H1
P2 10 H2
P3 1010 H3
P4 10101 H4
A
1
B
- prefixes spelled out by following path from
root - to find best prefix, spell out address in tree
- last green node marks longest matching prefix
- Lookup 10111
- adding prefix easy
1
D
C
0
P2
1
1
F
E
P1
0
G
P3
1
H
P4
19Binary Tries
- W-bit prefixes O(W) lookup, O(NW) storage, O(W)
update complexity
- Advantages
- simplicity
- extensible to wider fields
- Disadvantages
- worst case lookup slow
- wastage of storage space in chains
20Leaf-pushed Binary Trie
Trie node
A
left-ptr or next-hop
right-ptr or next-hop
1
B
1
C
D
0
P1 111 H1
P2 10 H2
P3 1010 H3
P4 10101 H4
P1
P2
1
E
P2
0
G
P4
P3
21PATRICIA
- PATRICIA (practical algorithm to retrieve coded
information in alphanumeric) - leaves store complete key values
Lookup 10111
A
Bitpos 12345
2
0
1
B
C
P1
3
1
0
E
5
P2
P1 111 H1
P2 10 H2
P3 1010 H3
P4 10101 H4
1
0
F
G
P4
P3
22PATRICIA
- W-bit prefixes O(W2) lookup, O(N) storage and
O(W) update complexity
- Advantages
- decreased storage
- extensible to wider fields
- Disadvantages
- worst case lookup slow
- backtracking makes implementation complex
23Path-compressed Tree
A
P1 111 H1
P2 10 H2
P3 1010 H3
P4 10101 H4
1, ?, 2
0
1
C
B
111,P1
10,P2,3
1
D
1010,P3,5
1
E
10101,P4
Lookup 10111
24Path-compressed Tree
- W-bit prefixes O(W) lookup, O(N) storage and
O(W) update complexity
- Advantages
- decreased storage
- Disadvantages
- worst case lookup slow
25Multi-bit Tries
Binary trie
W
Depth W Degree 2 Stride 1 bit
26Prefix Expansion with Multi-bit Tries
If stride k bits, prefix lengths that are not a
multiple of k need to be expanded
Prefix Expanded prefixes
0 00, 01
11 11
E.g., k 2
Maximum number of expanded prefixes corresponding
to one non-expanded prefix 2k-1
274-ary Trie (k2)
A four-ary trie node
next-hop-ptr (if prefix)
A
ptr00
ptr01
ptr10
ptr11
11
10
B
C
Lookup 10111
P2
11
10
F
D
E
10
P3
P12
P11
11
10
H
G
P42
P41
P1 111 H1
P2 10 H2
P3 1010 H3
P4 10101 H4
28Prefix Expansion Increases Storage Consumption
- replication of next-hop ptr
- greater number of unused (null) pointers in a node
Time W/k Storage NW/k 2k-1
29Generalization Different Strides at Each Trie
Level
- 16-8-8 split
- 4-10-10-8 split
- 24-8 split
- 21-3-8 split
30Choice of Strides Controlled Prefix Expansion
- Given forwarding table and desired number of
memory accesses in worst case (i.e., maximum tree
depth, D)
A dynamic programming algorithm to compute
optimal sequence of strides that minimizes
storage requirements runs in O(W2D) time
31Router functionality
Header Processing
Lookup IP Address
Update Header
Queue Packet
Classify Packet
Data
Hdr
Buffer Memory
Address Table
32Packet Classification
- general router mechanism
- firewalls
- network address translation
- web server load balancing
- special processing for selected flows
- common form based on 5 IP header fields
- source/dest. addr. either/both specified by
prefixes - protocol field - may be wild-card
- source/dest. port s (TCP/UDP) - may be port
ranges - no ideal design
- exhaustive search - slow links, few filters
- ternary content-addressable memory exhaustive
search - efficient special cases - exact match, one or two
address prefixes
33Packet Classification
L3-DA
L3-SA
L4-PROT
Field 1 Field 2 Field k Action
Rule 1 5.3.40.0/21 2.13.8.11/32 UDP A1
Rule 2 5.168.3.0/24 152.133.0.0/16 TCP A2
Rule N 5.168.0.0/16 152.0.0.0/8 ANY AN
Packet Classification find action associated
with highest priority rule matching incoming
packet header
34Formal Problem Definition
- Given classifier C with N rules, Rj, 1 ? j ? N,
where Rj consists of three entities - a regular expression Rji, 1 ? i ? d, on each of
the d header fields, - a number, pri(Rj), indicating the priority of the
rule in the classifier, and - an action, referred to as action(Rj).
For incoming packet P with header considered as
d-tuple of points (P1, P2, , Pd), the
d-dimensional packet classification problem is to
find rule Rm with highest priority among all
rules Rj matching d-tuple i.e., pri(Rm) gt
pri(Rj), ? j ? m, 1 ? j ? N, such that Pi
matches Rji, 1 ? i ? d. Rule Rm is best
matching rule for packet P.
35Routing Lookup Instance of 1D Classification
- one-dimension (destination address)
- forwarding table ? classifier
- routing table entry ? rule
- outgoing interface ? action
- prefix-length ? priority
36Example 4D Classifier
Rule L3-DA L3-SA L4-DP L3-PROT Action
R1 152.163.190.69/255.255.255.255 152.163.80.11/255.255.255.255 Deny
R2 152.168.3/255.255.255 152.163.200.157/255.255.255.255 eq www udp Deny
R3 152.168.3/255.255.255 152.163.200.157/255.255.255.255 range 20-21 udp Permit
R4 152.168.3/255.255.255 152.163.200.157/255.255.255.255 eq www tcp Deny
R5 Deny
37Example Classification Results
Pkt Hdr L3-DA L3-SA L4-DP L3-PROT Rule, Action
P1 152.163.190.69 152.163.80.11 www tcp R1, Deny
P2 152.168.3.21 152.163.200.157 www udp R2, Deny
38Geometric Interpretation
Packet classification problem Find the highest
priority rectangle containing an incoming point
R7
R6
R2
R1
R4
R5
R3
e.g. (128.16.46.23, )
Dimension 2
e.g. (144.24/24, 64/16)
Dimension 1
39Linear Search
- keep rules in a linked list
- O(N) storage, O(N) lookup time, O(1) update
complexity
40Ternary Match Operation
- each TCAM entry stores a value, V, and mask, M
- hence, two bits (Vi and Mi) for each bit
position i (i1..W) - for an incoming packet header, H Hi, the
TCAM entry outputs - a match if Hi matches Vi in each bit position for
which Mi equals 1.
Vi Mi Match in bit position i ?
X 0 Yes
0 1 Iff (Hi0)
1 1 Iff (Hi1)
41Lookups/Classification with Ternary CAM
TCAM
RAM
Memory array
Action Memory
0
1.23.11.3, tcp
0
1
1
2
0
3
0
Priority
Packet
Action
encoder
Header
M
1
1.23.x.x, x
42Lookups/Classification with Ternary CAM
TCAM
RAM
Memory array
Action Memory
0
1.23.11.3
0
1
1
2
0
3
0
Priority
Packet
Action
encoder
Header
M
1
1.23.x.x
43Range-to-prefix Blowup
- prefixes easier to handle than ranges
- can transform ranges to prefixes
- Range-to-prefix blowup problem
44Range-to-prefix Blowup
Maximum memory blowup factor of (2W-2)d
Maximal Prefixes
Rule Range
R1 3,11
R2 2,7
R3 4,11
R4 4,7
R5 1,14
45Range-to-prefix Blowup
Maximum memory blowup factor of (2W-2)d
Maximal Prefixes
0011, 01, 10
001, 01
01, 10
01
0001, 001, 01, 10, 110, 1110
Rule Range
R1 3,11
R2 2,7
R3 4,11
R4 4,7
R5 1,14
Luckily, real-life does not see too many
arbitrary ranges.
46TCAMs
- Advantages
- extensible to multiple fields
- fast 10-16 ns today (66-100 M searches per
second) going to 250 Msps - simple to understand and use
- Disadvantages
- inflexible range-to-prefix blowup
- high power, cost
- low density, largest available in 2003-4 is 2MB,
i.e., 128K x 128 (can be cascaded)
47Example Classifier
Rule Destination Address Source Address
R1 0 10
R2 0 01
R3 0 1
R4 00 1
R5 00 11
R6 10 1
R7 00
48Hierarchical Tries
Search (000,010)
Rule DA SA
R1 0 10
R2 0 01
R3 0 1
R4 00 1
R5 00 11
R6 10 1
R7 00
Dimension DA
1
0
0
0
O(NW) memory O(W2) lookup
49Set-pruning Tries
Search (000,010)
Rule DA SA
R1 0 10
R2 0 01
R3 0 1
R4 00 1
R5 00 11
R6 10 1
R7 00
Dimension DA
1
0
0
0
O(N2) memory O(2W) lookup
R3
R6
R4
Dimension SA
R7
R2
R1
R5
R7
R2
R1
R7
50Grid-of-Tries
Search (000,010)
Rule DA SA
R1 0 10
R2 0 01
R3 0 1
R4 00 1
R5 00 11
R6 10 1
R7 00
Dimension DA
1
0
0
0
O(NW) memory O(2W) lookup
R3
R4
R6
Dimension SA
R5
R2
R1
R7
51Grid-of-Tries
20K 2D rules 2MB, 9 memory accesses (with
prefix-expansion)
52Classification Algorithms Speed vs. Storage
Tradeoff
Lower bounds for Point Location in N regions with
d dimensions from Computational Geometry
O(log N) time with O(Nd) storage, or O(logd-1N)
time with O(N) storage
N 100, d 4, Nd 100 MBytes and logd-1N 350
memory accesses
53Algorithms so far Summary
- good for two fields, doesnt scale to more than
two fields, OR - good for very small classifiers (lt 50 rules)
only, OR - have non-deterministic classification time, OR
54Lookup Whats Used Out There?
- overwhelming majority of routers
- modifications of multi-bit tries (h/w optimized
trie algorithms) - DRAM (sometimes SRAM) based, large number of
routes (gt0.25M) - parallelism required for speed/storage becomes an
issue - others mostly TCAM based
- for smaller number of routes (256K)
- used more frequently in L2/L3 switches
- power and cost main bottlenecks
55Classification Whats Used Out There?
- majority of hardware platforms TCAMs
- high performance, cost, power, determinstic
worst-case - some others modifications of trie based
- low speed, low cost DRAM-based, heuristic
- works well in software platforms
- some others nothing/linear search/simulated-paral
lel-search etc.