Fundamental Architectural Considerations for Network Processors - PowerPoint PPT Presentation

1 / 13

About This Presentation

Title:

Fundamental Architectural Considerations for Network Processors

Description:

Data-Plane Processors are primarily responsible for forwarding packets from a ... Parallel Mode, the Task Scheduler is responsible for assigning packets to PEs as ... – PowerPoint PPT presentation

Number of Views:64

Avg rating:3.0/5.0

Slides: 14

Provided by: Win6228

Category:

more less

Transcript and Presenter's Notes

Title: Fundamental Architectural Considerations for Network Processors

1
Fundamental Architectural Considerations for
Network Processors

Review of a paper by
M Peyravian and J Calvignac
IBM Corporation, 2003

2
Introduction to Network Processors (NP)

Network Processors are modified or optimized in
ways to increase networking functionalities such
as packet manipulation and deep packet analysis.
Tremendous advancements in networking technology
and the increased speed and bandwidth of
networked devices have posed an ever increasing
need for machines which can move packets around
faster and more efficiently.
From a functionality point of view, networked
processors can be divided into two simple
categories, namely
control-plane
data-plane

3
Control-Plane Vs Data-Plane

Control plane processors have modest performance
requirements as they are helper processors used
mostly to control the flow of traffic and enforce
quality requirements without actually delving
into the packet or data flow.
Good examples of this procedure would be the RSVP
or Resource Reservation Protocol or the OSPF
(Open Shortest Path First) Protocol which have
been implemented through the use of the PowerPC
Processor.
Data-Plane Processors are primarily responsible
for forwarding packets from a source to a
destination.
Data-plane algorithms are best implemented by
parallel processors as network data exhibits a
high level of parallelism and parallel processors
have a short code path.
Data-plane processors need to be performance
optimized as they need to decode and move around
large amounts of data to satisfy network Quality
of Service requirements.
Most optimization research in network processors
are aimed towards data-plane processors.

4
Architecture Considerations

Most networked processors use parallel processing
through multiple Processing Engines (PE) and
their architecture can be divided into the
following types
Parallel Architectures
Pipeline Stage Architectures

5
Parallel Model

Parallel model NPs are typically just modified
RISC based processors which are scaled down and
contain bit manipulation circuits to increase
packet processing power.
They have small instruction and data caches and
many of these PEs are fitted onto one NP chip to
increase physical space efficiency.
In Parallel Mode, the Task Scheduler is
responsible for assigning packets to PEs as they
arrive at the network interfaces. It keeps track
of which PEs are available and then assigns
accordingly.

6
Pipeline Model

Pipeline mode NPs have multiple stages which are
controlled by PEs and each of these PEs are
responsible for a certain kind of task.
This is referred to as a task oriented processing
engine.
Both these types of NP architectures
provide just as much processing time per packet
but
they differ in the processing budget and the
throughput requirement.
Parallel based NPs are much less stringent in
their processing budget and they can produce
higher bandwidth networking appliances because of
their design.

7
Memory Organization

The network processor memory holds
instruction code
control data and
the packets that need to be processed.
The instruction code represents the application
programs which run on the processors and are
stored in high speed SRAMs with low cycle access
windows.
The amount of instruction memory depends on the
kind of application that is slated to run on the
NP. Lower layer protocol processes require just a
few kilobytes of memory while higher protocol
deep packet processing techniques require several
hundred kilobytes of instruction memory.
Control information such as a routing table is
stored in the control memory which is, in turn,
stored in high speed SRAMs
Packet information or packet data is stored in
packet memory which is in turn stored in large
low-cost DRAMs. This turns out to be the
bottleneck in NP based network appliances. The
greater cost of SRAMs versus DRAMs is the
limiting factor.

8
3 Memory Models

NP memory can be structured in three different
models, each with its own drawbacks and
advantages. They are
Shared
Distributed
Hybrid
Shared memory is limited by scalability and is
relatively slower in performance but offers a
much simpler programming model.
The distributed model is exactly the opposite and
offers much better scalability and performance,
but is much more difficult to implement.
The Hybrid model is best suited for most
applications and is much simpler to program.
In the hybrid model the PEs are partitioned into
multiple clusters and within each cluster, the
PEs have shared memory.
The instruction code is replicated throughout the
model to avoid instruction code contamination.
PEs within a cluster also share control memory
since session information is cluster specific.
Other control information such as route tables
and other global session control information.

9
Multithreading

Two main sources for latency in Network
Processors are
Memory Accesses (which can include reads and
writes)
Co-processor Accesses (which include response
time for requests from PEs)
Multithreading is a method of reducing latencies
in NPs by allowing PEs to continue processing
other packets while the processor is stalled in
handling the current packet for any reason. This
reason could be memory access or any other time
intensive task. There are a couple of different
approaches used to implement thread switching for
NPs.
The set of registers which are active at a thread
switch point are saved in memory and later
restored when execution returns to the thread
again. This can be counterproductive in some
instances because the memory save and restore
functions can take many processor cycles to
complete.
A PE can contain one set of registers per thread
and a single-stage PE can switch threads in one
cycle since thread switching only requires
pointing to one set of registers. In multi-stage
pipeline based multithreading scenarios the
pipeline might need to be cleared before thread
switching can occur.
Multithreading increases the utilization of PEs
but runs the risk of increasing processor and in
turn, packet latency into systems. In real-time
systems such as voice or video routing
environments, latency can be a major problem and
might be a more important factor to consider.

10
Support for Traffic Management and Interface
Requirements

To help the PE interface with other parts of the
network system, various external interfaces have
been developed and structural and architectural
considerations have to be kept in mind. Common
interfaces to NPs include serial interfaces which
can be of a plug-in form. These include Fast
Ethernet, Gigabit Ethernet and the most popular
ATM serial mode interfaces.

11
Support for Traffic Management and Interface
Requirements

Various different traffic management techniques
used in network appliances require different
processor enhancement approaches. Network
appliances can implement these techniques in
their software, may use external devices or may
depend on specialized hardware to perform these
tasks. QOS can support a variety of flexible
queuing schemes such as
Priority Queuing
Round Robin Queuing
Weighted Fair Queuing.

12
Network Processor Programming

The Parallel Architecture Programming Model
Important point of note in NP programming is that
the parallel architecture with high data
parallelisms can be exploited very effectively
using the Run To Completion programming model.
According to this model, the programmer writes
software to be run one only one thread and this
can simply be propagated to the other threads as
well as with parallel thread propagation
algorithms.
This programmers model is based on the Symmetric
Multi Processing architecture in which multiple
PEs share the same memory. The PEs in this model
are used as a pool of shared resources which are
used as and when needed if found to be in idle
mode.
The pipelined architecture Programming Model
Distributed programming models have to be used to
optimize categories of tasks and instructions in
the pipeline models. This is a weak programming
model and leads to lesser efficient systems in
most cases. The distributed architecture is also
very fragile and the code is difficult to
maintain and upgrade.

13
Conclusions

Network Processors are fast becoming a very
important part of network appliances
They need to keep up with the increasing demands
on networks and network management appliances.
This paper looks at the various different NP
architectures and gives a broad overview of
existing technology, using examples from existing
designs and future considerations.
This paper is meant more as a review of
technology than a direction to be explored. It
has simple explanations of a very complex subject
matter and is very easy to read.