Title: Advanced topics in Computer Networks
1Advanced topics inComputer Networks
Lecture 9 Tree-based lookup
- University of Tehran
- Dept. of EE and Computer Engineering
- By
- Dr. Nasser Yazdani
2Outline
- Issues
- Multiway and Multicolumn search
- DMP-Tree
- Some implementation issues
3Issues
- How to sort prefixes
- Prefixes as ranges
- Comparing prefixes
- Based on length
- Add extra bits at the end.
- New definition (DMP-tree)
- How to apply tree structures like binary tree or
m_way tree to prefixes
4Multiway tree lookup.
- Proposed by G. Varghese and his students.
- Consider prefixes as range.
- First try Pad 0s to prefixes in order to apply
binary search tree. - consider 1, 101 and 10101 prefixes
- 100000
- 101000
- 101010
Should match here
Binary search fail for all of them!.
101011 101110 111110
Binary search ends here.
5Multiway tree lookup(cont)
- Two problem in the previous example
- Being Far away from matching prefix
- Multiple addresses matching different prefixes
end up in the same region. - Solution Prefixes as ranges, Put the end of
range in the table. - 100000
- 101000
- 101010
- 101011
- 101111
- 111111
We have the explicit ranges. Search maps to one
range only.
6Multiway tree lookup(cont)
- 100000
- 101000
- 101010
- 101011
- 101111
- 111111
For 101011, we try to find first L which is not
followed by H. For the rest, we can have a stack
operation to find the first L. Problem Linear
search to find L
7Multiway tree lookup(cont)
- Solution Precompute prefixes corresponding to
ranges. -
- 100000
- 101000
- 101010
- 101011
- 101111
- 111111
gt P1)100000 P1 P1 P2)101000 P2
P2 P3)101010 P3 P3 101011 P2
P3 101111 P1 P2 111111 - P1
1 matching prefix.
8DMP-Tree
- Comparing prefixes.
- Sorting prefixes
- Binary prefix Tree.
- M_way prefix tree.
9Trie structure
10Sorting prefixes
- Question? Why well-known tree structures cannot
be applied to the longest prefix matching
problem? - Answer- No a well-known method for sorting.
- Definition Assume A?a1a2an and Bb1b2bm to be
prefixes of ? and there a character ? - 1. If nm, the numerical values of A and B are
compared. - 2. If n ? m (assume nltm), the two substrings
a1a2an and b1b2bn are compared. If a1a2an and
b1b2bn are equal, then, the (n1)th character of
string B is checked. It is considered BgtA if bn1
is before ? and B ? A otherwise.
11Sorting prefixes (cont)
- Example- Assume M is ? Then, BOAT is smaller
than GOAT and SAD is bigger than BALLOON. CAT is
considered bigger than CATEGORY since the fourth
character in CATEGORY, E, is smaller than M. - Sorting is a function to determine the position
of each prefix. - Prefixes of table is sorted as
- 00010,0001,001100,01001100,0100110,01011,001
,01011,01,10,10110001,1011001,10110011,10110
10,1011,110
12Binary prefix tree
- Unfortunately, it fails for 101100001000 Why?
- Prefixes are ranges and not just a data point in
the search space.
13Binary prefix tree (cont)
- Definition prefixes A and B are disjoint if
none of them is a prefix of other. - Definition prefix A is called enclosure if
there exists at least one element set such that A
is a prefix of that element. - We modify the sort structure
- Each enclosure has a bag to put its data element
on it. - Sort remaining elements.
- Distribute the bag elements to the right and
left according the sort definition. - Apply algorithm recursively.
14Binary prefix tree (cont)
- Example- Prefixes in table 1. First step.
The second step,
Note- enclosures are in the higher level than the
contained elements. (important!)
15Binary prefix tree (cont)
16Sorting prefixes (cont)
- Sorting algorithms
- Based on bubble sort
- Based on Radix sort.
- Tmp MinLength(list)
- for all i in list except tmp do
- compare i with tmp
- if i matches tmp then
- put i in tmps bag
- if ilttmp then
- put i in leftList
- if igttmp then
- put i in rightist
- endfor
- list Sort(leftList) ? Sort(rightList)
17M_way prefix tree
- Problems with the binary prefix tree.
- Two way branching.
- The structure is not dynamic and insertion may
cause problems!. - Divide by m after sorting the strings
- Static m_way tree.
- Build a dynamic data structure like B-tree.
- How to guarantee enclosure to be in the higher
level than its contained elements. - Define node splitting and insertion.
18M_way prefix tree (Cont)
- Node splitting Finding the split point.
- Take the median if the data elements are
disjoint. - If there is an enclosure containing other
elements, take it as split point. - Otherwise, take an element which gives the best
splitting result. - Note, this does not guarantee the final tree will
be balanced.
19M_way prefix tree (Cont)
- Insertion
- If the new element is not an enclosure of others,
find its place and insert in the corresponding
leaf, like B-tree. - Otherwise, replace the closet element with
element and reinsert the replace elements. - Resort the resulted subtree, (space division) if
necessary. - Building tree is similar to building B-tree.
20M-way prefix tree (cont)
21M-way prefix tree (cont)
- We insert prefixes randomly.
- The tree uses 5 branching factor (at most 4
prefixes in each node) - Insert 01011, 1011010, 10110001 and 0100110.
Then, adding 110 cause overflow. Split node - ? 10110001 ?
- (0100110,01011) (1011010, 110)
- (all element are disjoint)
22M-way prefix tree (cont)
- Insert 10110011, 1101110010, 00010. Adding
1011001 causes overflow. - ? 10110001 ? 1011010 ?
- (00010,0100110,01011) (1011001,10110011)
(110,1101110010) - (case 3 of splitting)
- Latter adding 1011 cause problem. It is the case
of adding an enclosure. We will have space
division.
23M-way prefix tree (cont)
- The tree supersede B-tree or B-tree is a special
case of this tree. Then, when data element are
relatively disjoint, the height of tree is logMN.
24DMP-Tree
Max. height
No. of Data
- BF is Branching factor in the internal nodes.
- No. of Data is in1000s.
25DMP-Tree
No. of Data
- Number of prefixes in the right.
26DMP-Tree
- Height of tree for 100K data prefixes.
Height
Branching
27DMP-Tree
- Analyzing of results.
- With increasing BF, Branching Factor, the height
decreases. - The result are for the worst case, Max height,
and the ave. case is much less. - After BF9, increasing Branching Factor does not
decrease the max. height. - The results are for the set of prefixes of
50,000-100,000 with lengths from 8 t0 31. The
size of actual prefixes in use is around 50,000
and the length is 8-31.
28DMP-Tree
- Memory utilization,
- Mem. Utilization is 0.64-0.67 without
considering the tree branching overhead. - Mem. Utilization is 0.53-0.62 with tree
branching overhead (pointers). - Without considering branching pointers, the mem.
Utilization decreases with increasing the
branching factor. - Total mem. Utilization increases with increasing
the branching factor.
29DMP-Tree
- Therefore,
- The longest matching prefix of a network can be
determined in 5 steps with 9 or more branching
factor. - In the worst case, we need at most 2 times of
total prefix data size of memory to implement the
scheme. For instance, for 50,000 prefixes of
32bit, we need at most 3.2 Mbit of memory.
30Overall Design
- All operations need search first in the Tree
structure. - Two search procedures, one for the longest
matching prefix and another for update. - The prefix tree data structure is on the chip.
- The Policy table is on the off chip memory.
- There is a port to data link layer mapping
module.
31Tree Nodes
Internal nodes
Branching factor
- Internal nodes.
- Each prefix has a left and right pointer which
are pointing to left and right subtrees
respectively. - We can have N prefixes in each internal node.
Then, N1 is the branching factor. - The bigger N, the faster search time, but the
more logic is needed. - Port is the address of the port in the switch to
which the packet will be sent.
Leaf nodes
Addr 1 ? Prefix 1 33 Port ? Addr 2 ? Prefix 2 33 port ? Addr ?
32Tree Nodes
- Leaf nodes.
- There is no left and right subtree pointers.
- The number of prefixes in the leaf node is M.
- The leaf nodes are stored in a off chip memory to
make the scheme scalable to the large number of
prefixes.
Prefix 1 33 port ? Prefix 2 33 port ?
33Branching Factor
- What is the best number for N? (Branching factor)
- The bigger N, the faster search process. (Fact 1)
- The bigger N, the more memory pins are and
usually the more mem. Bandwidth is needed (Fact
2). - The bigger N, the more logic we need to process
the node (Fact 3). - Simulation result shows
- The bigger N, the better memory utilization in
the memory. - For N ? 8, the max. height of the tree does not
decrease considerably.
34Simulation result
- Total memory assuming one memory block and
OC-192.
of Prefixes required Mem. Branching Factor Mem. Pins Mem BW (G/s) max Max mem Access Mem. Size (on chip)mm2 Max heights
64K 5.4 Mbits 15 897 89.7 4 81 5
64K 5.5 Mbits 11 655 65.5 4 82 5
64K 5.4Mbit 9 527 52.7 4 81 5
64K 6 Mbit 6 335 46.9 6 96 7
64K 6.6 Mbit 5 275 44 7 110 8
64K 6.5 Mbit 4 207 62.1 14 112 15
100K 8.3 Mbit 15 897 89.7 4 122 5
100K 8.5 Mbit 11 655 65.5 4 125 5
100K 8.3 9 527 63.24 5 122 6
100K 9 Mbit 6 335 53.6 7 135 8
100K 9.1Mbit 5 275 49.5 8 140 9
100K 9.5 4 207 62.1 14 150 15
30K 2.6 9 527 52.7 4 expected 40 5 expected
35Branching Factor
- It seems any number between 8-16 is reasonable.
But, N9 gives a better search time, memory
size. - Assuming 9 branching factors in the internal
node, 50 node utilization and 128K prefixes, we
need max. 128K/4.5 28.5K address. Then, 15 bit
address for left and right pointers are more than
enough. But, we need more for off chip
addressing - The number of switch port are usually limited,
around 64, We can assume 256, then 8 bit is
enough to address them.
36Branching Factor
- In order to make the internal node branching and
leaf node branching even, M10. - If we want to read a node at once, we will need
41x10410 pins which is difficult to support in
one chip. - We can divide a node in two and read/write in two
clock cycles. This reduce the memory pins to 205
which is affordable.
37Memory requirement
- Prefix tree Assuming 128K prefixes.
- N 9 (BF) and M10 (BF in leaves), the majority
of prefixes, 80 will be in leaves, assume 65
node utilization, - of ave prefixes in a leaf node node 100.6 5
6.5 - of leaf nodes ? 128Kx80x2/6.5 31.5K and 10
overhead ? 35 K - Total off chip memory 35K x 205(Mem BW) 7.2
Mbits - Then, we need 16 bits for addressing. 1 bit for
internal/external. - of internal nodes 128Kx20/5.84.41K and 10
overhead ?4.9 K - Total on chip memory4.9Kx529K ? 2.6Mbits
- Port to link address mapping table.
- For each port corresponding link address
- Max. 256 ports, on chip, some mem for indexing
Link Addr
48 bit addr.
38Memory requirement
of Prefixes on chip memory Mbits Branching Factor On chip Mem. Pins Off chip memory Mbits mem Access(search) Mem. Size (on chip) mm2 Off chip mem pins
128K 2.6 10 529 7.2 5 40 250
- Note
- Branching factor is the of branching in
internal nodes. - The size of the memory scales with the size of
data or - of prefixes.
- Power dissip. depends on the r/w freq, current
core voltage - Considering Faraday Mem. Modules
- A 10Kx32 bits single port mem size is 36x1.45
mm2.
39Overall Design
Memory
Mem. Ctrl
Search
Update Search
root addr
To/From NP
root content
Insertion
update
delete
CPU Inter face
To/From CPU
Output mem Ctrl
To/From out Mem
40Search Path
Mem Ctrl
To/From Off mem
Root
Node
RdAdd19
Node
Data32
Input
Addr32
Node
Piping
GetLen
Compare
Addr32
Next
CResult1
SOA1
LenNx6
InClk1
Match1
First1
SOA1
Addr32
Prty1
Addr32
MemAddr14
PackAdd29
Found1
OutMemAddr
IpAddr32
Dispatch
LinkAddr48
Cashing
Port8
Addr32
Addr32
DataOut32
To Scheduler
There are data assertion signals between blocks
which has not been shown every where because of
space limitation.
41Search Path
- Input Module
- Get the packet destination addresses from the
parser. - Do parity checking.
- It has the following input signals
- Input data which 32 bits.
- Start of Address, 1 bit, (SOA)
- Parity, 1 bit, (prty)
- Input clock, (InClk)
- It gets Data in two clock cycles, first the IP
address and then, the packet address in the
memory or packet id (cid)
42Search Path
- Input Module
- 29 bits is used for the packet address and the
last 3 bit for the policy, Then, 512 Mbytes can
be supported to store the packet before sending
them out. - The 2nd clock cycle data format
- The timing
31
2
0
Packet address Policy
InClk
SOA
PackAddr Or cid
Data
IpAddr
43Search Path
- Piping Module
- Pipelines the search process.
- For new elements from input block does.
- For each new IP address do
- If found in the hash table
- send the packet memory address to dispatcher
- Else
- Enter IP address and the policy into the pipe
FIFO - End do,
-
44Search Path
- Piping Module
- For elements in the FIFO
- For the first IP address in FIFO do
- If IP address is new then,
- assert first signal and send IP address and
policy out. - Else if next addr is on chip send the next node
address to Mem. Ctrl. - Else send the next node address to OffMemCtrl.
- send to the pipe the IP address and policy.
- For the recirculated address
- If the node was leaf then,
- Send the longest matching address to OutMemCtrl.
- Send Policy to Extract port and the packet
address to dispatch. - Else
- Put the IP address into the FIFO
- Replace the longest matching prefix address if a
new one found. -
45Search Path
- Piping Module
- FIFO . Keep the current information of IPs.
- LMPA Longest Matching Prefix Address
- New 1 new , 0 old
- If the packet is new the next address will be
zero and we can read root cash content instead of
reading from memory. - The address is off chip if the first, most
significant bit is 1, otherwise it on chip.
IP Addr 32 Port 8 Next Node 19 LMPA New 1
46Search Path
- GetLen Module This module get the length of
prefixes. We add 1 to the end of a prefix and
then padded with 0s to make it 33 bits. - Ex. 11011010 ? 1101101010000 (33 bits).
- Then, we should start from right and the first
1 we meet, the rest is the prefix length. - GetLen can be implemented as a multiplexer with
case statement (32 case statement) and it can be
done in one clock cycle.
47Search Path
- Compare Module compare two prefix A and B with
lengths L1 and L2. - Assume L1gtL2 and A1L2 is the first L2 bit from
A, Then, - If A1L2 B ? A and B match. If AL21 0 ?
A? B. Otherwise, Agt B. - If A1L2 gt B ? A gtB, otherwise AltB.
- One of the prefixes here is IP address with
length 32. - We assume there are no two identical elements in
the tree.
48Search Path
- Next Module Get the next node address to read
and also the matching prefix and its
corresponding port number. - It gets two signals for each prefix, Match and
ComResult (compare), - Match 1 ? the prefix match,
- ComResult 1 ? Prefix is bigger.
- It gets the left address of the first prefix,
from the left, such that its ComResult signal is
1. - It compares the matching prefix lengths and the
get the one with the largest length.
49Search Path
- Dispatch Module forms the Routing Group Address,
RGA, from the port number and send it with packet
stored memory address (PSMA) or CID. - RGA is a 64 bit size bit map. The bit correspond
to port number is set to 1. - PSMA is dispatched first and Port and DLL address
follows. - Cashing Module keep a cash of IP address and
corresponding port.
IP address 32 Port8
50Search Path
- Cashing Module
- The cash is kept as a FIFO and its depth depends
on the technology. - Check IP address in FIFO.
- If the address found, then,
- assert found signal.
- write IP address on top of FIFO if it is not
there already. - Else
- write IP address on top of FIFO
- Cashing system always removes the last reference
IP address from the cash.
51Search Process
- operations This is for large prefixes (50K up)
cashing
Piping
Piping
Piping
Piping
Piping
Off Mem
Dispatch
root
On chip memory nodes
Leaf node
11
2
16
0
14
19
5
8
Time
- This operation is for an IP address lookup
- Piping is the bottleneck in the system and in
ave. take 5 cycles. - Assuming 100 MHZ operation
- of packets 109/50 20 Million
- Line speed 512x20 10.24 G for 64 byte
packets. - 256x8x20 41 G for 256 byte packets.
- It is possible to support higher speeds with
duplicating pipe.
52Output pins
Pin Name Type Number Comment
DataIn (IP Addr) Input 32 IP address, from parser
DataOut(Portcid) Output 32 Port and cid, to schedular
DataBus(cpu) In/out 32 CPU data bus
CtrlBus(cpu) In/out 12 CPU control bus
MemData2(Tree) In/out 205 If off chip is used
MemAddr2(Tree) In/out 18 If off chip is used
MemCtrl1(Tree) In/out 8? If off chip is used
Total 340 This value can change around 10 percent