Title: Network Processors
1Network Processors
2Evolution Cellular phone generation
1G
2G
3G
2.5G
Data Rate
1000
170
900MHz Voice
900MHz 1800MHz Voice
900-1800-1900MHz Smart Phone Full web service
900-1800MHz Voice Tiny Internet
12 kb/s
3Evolution 3G cellular phones
base station controller (BSC)
12Kb/second
10 BS
100Mb/second
Network
100 MS
mobile station (MS)
5Mb/second
base station (BS)
4Evolution 3G cellular phones
base station controller (BSC)
1Mb/second
100 BS
50Gbit/second
NP
NP
NP
mobile station (MS)
Network
500 MS
base station (BS)
500Mb/second
5Evolution Networks
OC768
100,000
40Gb
Bandwidth (Mb/s)
x4
OC192
10Gb
10,000
x16
OC12
NP
1000
622Mb
x12
DS3
100
44Mb
x28
10
DS1
1.5M
1
x24
0.1
DS0
64K
Year
1980
1990
1995
1985
2000
2005
DS Digital signal
OC Optical carrier
6Networking Trends
- Increasing Networking Traffic.
- New sophisticated protocols are being introduced
at rapid pace. - Need for supporting new applications to provide
new services. - Convergence of voice and data networks
introducing a lot of changes in the communication
industry. - Increasing TTM Pressures
- Decreasing product life cycles.
7General Purpose Processor based Software Router
- Benefits
- Flexible for upgrading the system
- Easy for supporting additional interfaces
- Quick to develop new products with short TTM.
- The core processor performs all the routing
functionalities - Drawbacks
- Not able to scale up for higher bandwidths,
maximum up to OC-12 speeds only - Can support complex network operations viz.,
traffic engineering, QoS, etc - with a major reduction in performance
8ASIC based Routers
- Benefits
- Provide wire-speed performances
- provided high speed
- Drawbacks
- Lacks flexibility difficult to meet changing
market needs/demands - Long design cycles increases TTM reduces PLC.
- Change in design or failure in design involves
more risks - Need to replace the ASIC to provide new
functionality - Complex network operation are still executed in
software
9Network Processor based boxes
- Promises to provide performance and flexibility
- Comprises of many packet processing elements
supporting multiple threads - Achieves higher performance by pipelining and
parallel processing both in terms of threads and
packet processing elements - Brings-in flexibility by due software programming
- Easy to add features
10Network Processor
11Basic Architecture of Network Processors
12Basic architecture (contd.)
Look-A-Side Co-processors
Risc
CP2
CP1
CP3
CP4
Com Engine
Multiple Streams
Dispatcher
Merger
13Intro Systems and Protocols Relation with
Standards
Systems
Protocols
IETF / Forces WG Data / Forwarding Plane Control
Plane
- IETF/Protocols
- IPv4
- MPLS
- PPP/L2TP
- IPv6
- MIBs
NPF Service Layer System Wide No awareness where
things are Functional Layer Awareness where
things are Operational Layer Interface Management
ITU-T/ANSI/ATM Forum ATM IEEE Ethernet
14OSI Network Architecture
15Typical Applications
- WAN/LAN Switching and Routing, Multi-service
Switches, Multi-layer switches, Aggregators - Web caching, Load balancing, Web switching,
Content based load balancers - QoS solutions
- VoIP Gateways
- 2.5G and 3G wireless infrastructure equipments
- Security - Firewall, VPN, Encryption, Access
control - Storage solutions
- Residential Gateways
16Software Framework
17Scene setting - why specs are not enough
- 2 NPU vendors want to promote their solution with
some numbers - Both chip architectures comprise
- RISC engines
- Hardware support engines
- Various types of interfaces
- Support for internal and external memory
- They report the following data
- Aggregate MIPS
- Max number of lookups per second
- ...
Commonalties in building blocks
Commonalties in specifications
18Specifications
19Test scenario
- What is measured? Performance in packets per
second versus a forwarding information base (FIB)
that is increased in size. - Start application is IPv4.
- Next, counters are added for per flow billing
purposes. - Next, load balancing is introduced as an
additional feature. - Finally, encryption becomes an additional
requirement for 2 of the data that is being
forwarded
20Performance curves
Performance (Mpps)
IPv4
30
20
NPU B
10
NPU A
FIB (K entries)
50
100
150
21Performance curves
Performance (Mpps)
IPv4 counters
30
NPU A
20
Requires more memory references
NPU B
10
FIB (K entries)
50
100
150
22Performance curves
Performance (Mpps)
IPv4 counters Load balancing
30
NPU A
20
Requires even more memory references
10
NPU B
FIB (K entries)
50
100
150
23Performance curves
Performance (Mpps)
IPv4 counters Load balancing encryption
30
20
10
NPU B
No extra references and resources available
NPU A
FIB (K entries)
50
100
150
A does not have sufficient resources
24Architecture A
IPv4 counters LB crypto
LU
Int. mem
Int. mem
OC-192 POS
OC-192 POS
3 MIPS cores
Int. mem
Hash
Key extract
Count
Sched
External Buffer Mem
25Architecture B
IPv4 counters LB crypto
10 MIPS cores
10GE
10GE
IMEM
LB
Memory interface
External Buffer Mem
26Specifications - revisited
27So
- No clear value statement could be made in favor
of either NPU solutions - NPU A achieves higher throughput but with limited
flexibility - NPU B achieves lower throughput but is more
flexible - Were the provided specs accurate?
- Yes.
- The devices performed up to spec.
- Although NPU B looks better on paper at first
sight, more resources have to be consumed for
less per formant results. - There is a cost associated with flexibility
- Were the provided specs relevant?
- No. They represent granular maximum performances.
- For real world applications,
- some resources could not be maximally consumed
- some resources were over consumed
28Benchmarking considerations
- Processor core metrics are not always relevant
for networking applications - It might be relevant for NPU B, since
functionality relies almost totally on those
cores. - It is definitely not the case for NPU A, since
there is extensive additional hardware support
for specific functions.
GRANULARITY Highly granular specifications, data
or benchmarking information can offer a wrongful
picture of the actual performance capabilities of
the DUT. Since Network Processing Devices are
designed with specific applications in mind,
benchmarks must exist for those specific
applications
29Benchmarking considerations
- External factors affect NPD performance (where
you dont always suspect it) - A forwarding application relies on FIB lookups to
determine the destination of a packet - The size of the FIB table can influence
performance in many ways - Usage of multiple memory banks
- increasing number of hash collisions
EXTERNAL FACTORS Benchmarks should include
parameters that take into account
external factors that are relevant to the
particular applications that are being
benchmarked.
30Benchmarking considerations
- Interfaces present performance boundary
conditions - Ethernet applications require inter frame gaps
that result in more relaxed pps numbers
INTERFACES Benchmarks should also specify the
types of interfaces that are being used since
those interfaces have an impact all by themselves
on maximum performance figures
31Benchmarking considerations
- Combinations of applications or minor extensions
have a completely different impact on both
network processing devices - NPU A has a lot of well engineered hardware
support that can offer additional services BUT
fails almost completely when additional computing
resources are required - NPU B is very soft performance degrades slowly
when additional services are requested and shows
no abrupt peaks in the performance curves.
HEADROOM Benchmarks should combine applications
as they occur in the real world to give a sense
of headroom that is available to support real
world scenarios. It is however very hard to
define a metric for headroom
32CommBench A Telecommunication Benchmark For NPs
CommBench
HPAs
PPAs
33Benchmark Characteristics Code Computational
Kernel Sizes
34Benchmark Characteristics Computational
Complexity
Na,l Num Of Instructions/byte required for app
a operationg on a packet of length l
35Benchmark Characteristics Instruction Set
Characteristics
36Benchmark Characteristics Memory Hierarchy
37Example System Cisco Toaster 10000
- Almost all data plane operations execute on the
programmable XMC - Pipeline stages are assigned tasks e.g.
classification, routing, firewall, MPLS - Classic SW load balancing problem
- External SDRAM shared by common pipe stages
38Example System IXP 2400
- XScale core replaces StrongARM
- Microengines
- Faster
- More 2 clusters of 4 microengines each
- Local memory
- Next neighbor routes added between microengines
- Hardware to accelerate CRC operations and Random
number generation - 16 entry CAM
39References
- Network Processor Design Patrick Crowley etal.
- CommBench - A Telecommunications Benchmark for
Network Processors, Tilman Wolf and Mark
Franklin. Proceedings of IEEE International
Symposium on Performance Analysis of Systems and
Software (ISPASS), - http//www.ecs.umass.edu/ece/wolf/papers/commbenc
h.pdf - Network Processing Forum - Benchmarking
- www.wipro.com/pdf_files/networkprocessors_wipro_so
lPPT.pdf - http//intrage.insatlse.fr/etienne/netpro.ppt