Title: TITOLO TESI
1VIII Workshop PisaTel - December, 6th 2005 -
SSSUP
DESIGN AND IMPLEMENTATIONOF A MULTI-DIMENSIONAL
PACKET CLASSIFIER FOR NETWORK PROCESSORS
TITOLOTESI
Ing. Fabio Vitucci
Gruppo RETI di TELECOMUNICAZIONI Dipartimento di
Ingegneria dellInformazione - Università di Pisa
2Outline
- Resume of previous activities
- Implementation of classification module
- Programming problems
- Measurements
- Future works
- Conclusions
3Resume of previous activities/1
- Detailed analysis of the Intel IXP2400 Network
Processor and the available board (Radysis
ENP-2611) - Choice of a proper application to be implemented
on NPs a packet classification - Comparative analysis among many research
algorithms
Source Address Layer 4 Destination Layer 4 Protocol ... Rule
11.14.2.21 www TCP ... R1
13.11.23. gt 1023 TCP ... R2
112... www UDP ... R3
4Resume of previous activities/2
- Comparative analysis among many research
algorithms
Algorithm Worst case Time Worst Case Storage
Linear Search O(N) O(N)
Hierarchical tries O(WD) O(NDW)
Set-pruning tries O(WD) O(ND)
Grid-of-tries O(WD-1) O(NDW)
Cross-producting O(DW) O(ND)
Area-Based Quadtree O(NW) O(W)
FIS-tree O((L1)W) O(LN11/L)
RFC O(D) O(ND)
Bitmap-intersection O(DWN/W) O(DN2)
HiCuts O(D) O(ND)
Ternary CAMs O(1) O(N)
N number of entries
W maximum number of bit for
level D number of fields to be processed
L number of level of data
structure
5Resume of previous activities/3
- Multidimensional Multibit Trie
- Fields
- IP Source Address and IP Destination Address
- Layer 4 Source Port and Destination Port
- Layer 4 Protocol Type
- Hierarchical trie a tree per dimension
- Many levels for dimension
- A fixed number of bits for level
- Performance parameters
- Research speed 5O(W/K)
- Memory accesses 12
- Storage complexity 5O(2(k-1)NW/K)
SA Trie
DA Trie
SP Trie
DP Trie
PR Trie
6Resume of previous activities/4
- Main bound
- Memory consumption
- Rules with unspecified fields (e.g. 131.114..)
need explosion of all possible rules - Modifications
- A level transition in case of wild-cards
- Less number of nodes
- Sometimes more memory accesses
- More complexity
- Validation tests with a C simulator
- Large saving in memory consumption (table in
SRAM) - Small increase in instruction store size
7Implementation of module/1
IPv4 Forwarder Intel
8Implementation of module/1
IPv4 Forwarder Intel
9Implementation of module/2
- Functions of XScale (implemented in C language)
- Receiving classification rules
- Building multidimensional trie according to
received rules to calculate the number of nodes
per level and SRAM addresses - Rebuilding multidimensional trie to put data in
SRAM to precalculated addresses - Functions of Microengines
- Receiving packets
- Retrieving proper fields to packet headers
- Finding matching rules using data structure in
SRAM - Modifying TOS fields
10Implementation of module/2
- Functions of XScale (implemented in C language)
- Receiving classification rules
- Building multidimensional trie according to
received rules to calculate the number of nodes
per level and SRAM addresses - Rebuilding multidimensional trie to put data in
SRAM to precalculated addresses - Functions of Microengines
- Receiving packets
- Retrieving proper fields to packet headers
- Finding matching rules using data structure in
SRAM - Modifying TOS fields
11Implementation of module/2
- Functions of XScale (implemented in C language)
- Receiving classification rules
- Building multidimensional trie according to
received rules to calculate the number of nodes
per level and SRAM addresses - Rebuilding multidimensional trie to put data in
SRAM to precalculated addresses - Functions of Microengines
- Receiving packets
- Retrieving proper fields to packet headers
- Finding matching rules using data structure in
SRAM - Modifying TOS fields
12Implementation of module/3
SRAM Data Table
long word
index of node index of node of 2nd level index of node of 2nd level index of node of 2nd level
index of node of 2nd level index of node of 2nd level index of node of 2nd level index of node of 2nd level
index of node value of field index of next node
value of field index of next node value of field index of next node
index of node value of field index of next node
value of field index of next node value of field index of next node
index of node index of next node
minimum value maximum value
index of node index of next node
minimum value maximum value
index of node value of field number of rule
value of field number of rule value of field number of rule
le liste relative alle porte hanno struttura
diversa, infatti le regole di classificazione,
quasi sempre, contengono intervalli di porte.
Inoltre i possibili valori delle porte sono
65536, perciò occorrono 16 bit per esprimerli.
Nella seconda LW ci sono i soliti 8 bit che
indicizzano il nodo successivo e i rimanenti 24
sono di padding
13Implementation of module/4
- Functions of µ-engines (implemented in µ-code
assembler) - Receiving packets
- Retrieving proper fields to packet headers
- Finding matching rules using data structure in
SRAM - Modifying TOS fields
- Number of added cycles 1600
- 50 memory registers initialization
- 180 reading first node
- 150 2 reading nodes of ports
- 145 7 reading other nodes
- 15 final matching
- 40 writing TOS field
14Programming problems/1
- Main problems
- Number of SRAM accesses
- Rate of SRAM accesses
15Programming problems/2
- Multithreaded Programming
running thread
context swap
idle thread
idle µe
µe control
memory access latency
thread 0
thread 1
thread 2
thread 3
thread 4
thread 5
thread 6
thread 7
time
We want to reduce the idle time
16Programming problems/3
running thread
context swap
idle thread
idle µe
µe control
memory access latency
thread 0
thread 1
thread 2
thread 3
time
thread 0
thread 1
thread 2
thread 3
time
- Decrease the number of active threads for µ-engine
17Programming problems/4
- Consolidate adjacent memory accesses
running thread
context swap
idle thread
idle µe
memory access latency
µe control
thread 0
thread 1
thread 2
thread 3
time
thread 0
thread 1
thread 2
thread 3
time
18Measurements/1
Developers Workbench (Microengines Programming)
- Cross-Compiler
- (XScale programming)
Serial Cable
AdTech AX4000
19Measurements/2
ADTech AX4000
20Measurements/3
- Max packet rate 2033000 pkt/s (0 lost packets)
- Number of supported rules 10000
- Performance indipendent from number of rules
- A fundamental feature robustness
21Measurements/4
1130 µsec
100 µsec
35 µsec
22Future Works Resources/Link Scheduler
23Conclusions
- Analyse the Intel IXP2400 hardware architecture
- Select a proper algorithm of packet
classification for the IXP2400 - Modify the algorithm to capitalize properties of
our hardware - Build a C Simulator to test the new version
- Implement XScale functions in C language
(building rule table) - Implement µ-engines functions in µ-code (finding
matching rule) - Analyse multithreaded programming
- Study stalling, filling, and other phenomenons
- Test working and performance of the classifier
- Characteristics 1600 added cycles, 2 Mpkt/s,
10000 rules supported, scalability, robustness in
case of congestion
24Workshop PisaTel - December 6th 2005 - SSSUP
DESIGN AND IMPLEMENTATION OF A MULTI-DIMENSIONAL
PACKET CLASSIFIER FOR NETWORK PROCESSORS
TITOLOTESI
Ing. Fabio Vitucci
Gruppo RETI di TELECOMUNICAZIONI Dipartimento di
Ingegneria dellInformazione - Università di Pisa