INF5061: Multimedia data communication using network processors

About This Presentation

Title:

INF5061: Multimedia data communication using network processors

Description:

INF5061 multimedia data communication using network processors. Overview ... offloads host resources. 2005 Carsten Griwodz & P l Halvorsen ... – PowerPoint PPT presentation

Number of Views:77

Avg rating:3.0/5.0

Slides: 57

Provided by: paa5138

Category:

more less

Transcript and Presenter's Notes

Title: INF5061: Multimedia data communication using network processors

1
Introduction
INF5061Multimedia data communication using
network processors

2/9 - 2005

2
Overview

Course topic and scope
Background
software-based network systems
challenges and new requirements
evolution of network processors
(Very) short overview of some example network
processors

3
INF5061The Course
4
Lecturers

Carsten Griwodzemail griff _at_ ifi
Pål Halvorsenemail paalh _at_ ifi

5
About INF5061 Topic Scope

Content The course gives
an overview of network processor cards
(architectures and use)
an introduction of how to program Intel IXP
network processors
some ideas of how to use network processors

6
About INF5061 Topic Scope

Lab-assignmentsAn important part of the course
are lab-assignments where the students should
make a program for the Intel IXP2400 network
processor
wwpingbump download and run
protocol statistics extend the wwpingbump to
give processor, interface and protocol statistics
packet bridge with ARP support forward packet
to correct interface (of 3 available)
transparent load balancer balance load and
forward packets to the right machine in a cluster
of two with same IP address
HTTP protocol translator add support in the
transparent load balancer for HTTP streaming
having an RTSP/RTP server

7
About INF5061 Exam (10sp)

Prerequisite mandatory assignments
lab assignment 2 protocol statistics
presentation of a relevant paper
Graded assignments
lab assignment 4 transparent load balancer
deliver code
short demo/explanation of code (to lecturers
only)
lab assignment 5 HTTP protocol translator
deliver code and a short report
present and demonstrate to the class at the end
of the course
Final exam oral exam (???/12-2005)
selected chapters from the Comer book and IXP
documentation
lecture slides (including slides from presented
papers)
content of lab assignments

8
About INF5060 Exam (5sp)

Mandatory assignment
lab assignment 5 HTTP protocol translator
deliver code and a short report
present and demonstrate to the class at the end
of the course
approved assignment gives a passed course
(INF5060)

9
Available Resources

Book Douglas E. Comer Network Systems Design
using Network Processors Intel IXP2xxx
Version, Pearson Prentice Hall, 2004
Other resources will be placed at
http//www.ifi.uio.no/paalh/INF5061
Login inf5061
Password ixp
Manuals for IXP2400 /paalh/INF5061/IXP2400
Code /paalh/INF5061/code

10
Disclaimer

In the field of network processors, I am a tyro
Definition Tyro \Tyro\, n. pl. Tyros. A
beginner in learning one who is in the
rudiments of any branch of study a person
imperfectly acquainted with a subject a
novice
Then, by definition, in the field of network
processors, we are all tyros
In our defense, when it comes to network
processors, everyone is a tyro

11
Background and Motivation
12
Software-Based Network System

Uses conventional, shared hardware (e.g., a PC)
Software
runs the entire system
allocates memory
controls I/O devices
performs all protocol processing
First generation network systems

13
Review of General Data Path on Conventional
Computer Hardware Architectures
sending
receiving
forwarding
application
application
application
communication system
communication system
communication system
transport (TCP/UDP)
network(IP)
link
14
Review of Conventional Computer Hardware
Architectures
Intel D850MD Motherboard - Intel Hub Architecture
(850 Chipset)
RDRAM connectors
CPU socket
RDRAM interface
system bus
hub interface
PCI bus
Memory Controller Hub
I/O Controller Hub
PCI connectors
15
Forwarding Example for an Intermediate Node
Intel Hub Architecture
application
user space kernel space
Note- one single average MPEG-II DVD stream
require 330-660 packets per second of 1500
Bytes (4-8 Mbps) - then use smaller packets, add
concurrent clients, other applications,
communication system
Pentium 4 Processor
registers
cache(s)
communication system
application
network card
16
Main Packet Processing Costs

Copying used when moving a packet from one
memory location to another
expensive (proportional to packet size)
should be avoided whenever possible (use
pointers)
Checksuming used to detect errors
expensive (proportional to packet size)
transport layer payload header
network layer header
Fragmentation/reassembly needed when packet is
larger than smallest MTU
generate headers header checksum
receiving many small data fragments

17
Question

Which is growing faster?
network bandwidth
processing power
Note if network bandwidth is growing faster
CPU may be the bottleneck
need special-purpose hardware
conventional hardware will become irrelevant
Note if processing power is growing faster
no problems with processing
network/busses will be bottlenecks

18
Growth Of Technologies
Mbps
year
19
Packet Rates and Software Processing
64 B 1500 B
10BASE-T (10 Mbps) 19.531 833
1000BASE-T (1 Gbps) 1.953.125 83.333
OC-192 (9.95 Gbps) 19.439.453 829.416

Packet rates (packets per second)
Packet processing (MIPS, assuming 5K instructions
per packet)
the Comer book uses 10K instructions as an upper
bound per packet
it varies according to which protocols are used,
implementation, data size, etc.
more if moved through a fire wall
engineering rule 1GHz general purpose CPU
1Gbps network data rate
Note this is only processing time must be
added to handle interrupts and move data into
memory
Thus, software running on a general-purpose
processor is insufficient to handle high-speed
networks because the aggregate packet rate
exceeds the capabilities of the CPU

64 B 1500 B
10BASE-T (10 Mbps) 97,65 4,17
1000BASE-T (1 Gbps) 9.765,63 416,67
OC-192 (9.95 Gbps) 97.197,27 4.147,08
20
The Network System Challenges

Data rates in general keep increasing
Network rate gt CPU rate gt memory, busses and I/O
interfaces
Protocols and applications keep evolving
System design, implementation and testing is time
consuming and expensive
Systems often contain errors
Special-purpose hardware (ASIC) designed for one
type of system can usually not be reused
Host machine must inspect all incoming packets
Challenge find ways to improve the design and
manufacture of complex networking systems

21
Statement of Hope

If there is hope, it lies in
1990 faster CPUs
1995 the application specific integrated
circuit (ASIC) designers
2002 the programmers!
Programmability
we need a programmable device with more
capability than a conventional CPU
key to low-cost hardware for next generation
network systems
compared to ASIC designs, it is more flexible,
easier and faster to upgrade, and thus, less
expensive

22
First Generation

General idea To optimize computation, move
operations that account for the most CPU time
from software into hardware
Onboard
address recognition and filtering
onboard buffering
DMA
buffer and operation chaining

Add hardware to NIC
off-the-shelf chips for layer 2
ASICs for layer 3
Allows each NIC to operate independently
effectively a multiprocessor
total processing power increased dramatically

23
Second Generation (early 1990s)

Designed for greater scale
Decentralized architecture
additional computational power on each NIC
NIC implements classification and forwarding
High-speed internal interconnection mechanism
interconnects NICs
provides fast data path

Multiple network interfaces
High-speed hardware interconnects NICs
General-purpose processor only handles exceptions
Sufficient for medium speed interfaces (100 Mbps)

24
Third Generation (late 1990s)

Almost all packet processing off-loaded from CPU
Special-purpose ASICs handle lower layer
functions
Embedded (RISC) processor handles layer 4
CPU only handles low-demand processing

Functionality partitioned further
Additional hardware on each NIC
Onboard
classification
forwarding
traffic policing
monitoring and statistics

25
Third Generation (late 1990s)

Enough, are third generation sufficient??
Almost!!
But not quite! -(
Whats the problem?
high cost
long time to market
difficult to test
expensive and time-consuming to change
even trivial changes require silicon respin
18-20 month development cycle
little reuse across products and versions
require in-house expertise (ASIC designers)

26
Network Processors The Idea in a Nutshell

Devise new hardware building blocks, but make
them programmable
Include support for protocol processing and I/O
General-purpose processor(s) for control tasks
Special-purpose processor(s) for packet
processing and table lookup
Include functional units for tasks such as
checksum computation, hashing,
Integrate as much as possible onto one chip
Call the result a network processor

27
Review of Conventional Computer Hardware
Architectures
Intel D850MD Motherboard - Intel Hub Architecture
(850 Chipset)
RDRAM connectors
CPU socket
RDRAM interface
system bus
hub interface
PCI bus
Memory Controller Hub
I/O Controller Hub
PCI connectors
28
Network Processors Main Idea
Traditional system - slow - resource demanding -
shared with other operations
Network processors - a computer within the
computer - special, programmable hardware -
offloads host resources
29
Designing a Network Processor

Depends on
operations network processor will perform
role of network processor in overall system
Goals
generality sufficient for all protocols, all
protocol processing tasks and all possible
networks
high speed scale to high bit rates and high
packet rates
Key point A network processor is not designed
to process a specific protocol or part of a
protocol. Instead, designers seek a minimal set
of instructions that are sufficient to handle an
arbitrary protocol processing task at high speed

30
Where to Place Network Processors

Thus, network processors is somewhere in the
middle

performance

Goal increase performance and reduce costs

ASIC designs

Increase performance
known issues
must partition packet processing into
separate functions
to achieve highest speed, must handle
each function with separate hardware
unknown issues
which functions to choose
what hardware building blocks to use
how to interconnect building blocks

network processors
software on conventional prosessor
cost

Decrease costs
Economics driving a gold rush
NPs will dramatically lower production
costs for network systems
good NP designs worth lots of

31
Explosion of Commercial Products

1990 ? 2000 network processors transformed from
interesting curiosity to mainstream product
used to reduce both overall costs and time to
market
2002 over 30 vendors with a vide range of
architectures
e.g.,
Multi-Chip Pipeline (Agere)
Augmented RISC Processor (Alchemy)
Embedded Processor Plus Coprocessors (Applied
Micro Circuit Corporation)
Pipeline of Homogeneous Processors (Cisco)
Pipeline of Heterogeneous Processors (EZchip)
Configurable Instruction Set Processors
(Cognigine)
Extensive And Diverse Processors (IBM)
Flexible RISC Plus Coprocessors (Motorola)
Internet Exchange Processor (Intel)

32
Agere PayloadPlusA Short Overview
33
Agere PayloadPlus (APP)

Agere PayloadPlus (APP)
consists of both programmable hardware and
software
consists of both data and control planes (i.e.,
slow and fast plane)
APP defines HW architectures, SW mechanisms,
interconnection mechanisms and interfaces, BUT
does not specify how to implement them.
Several versions of APP exist differing in the
number and types of functional units, degree of
parallelism and internal bandwidth (2.
generation 5 models)

34
APP Conceptual Pipeline

State engine
initiate, configure and control classifier and
traffic manager
receives control from classifier
update statistics (e.g., packet count)
check packets against profiles(and inform
classifier)
Forwarder
get packet from classifier
perform traffic shaping and management
fragment packet (if necessary)
modify headers (if necessary)

Classifier
extract packets from ingress
classify packet
send statistics to state engine
reassemble blocks
pass packet to forwarder together with
classification decision

35
APP550 Chip
36
APP550 Chip

Memory interfaces
two types of physical memory
fast cycle RAM (FCRAM) for fast memory accesses
double data rate SRAM (DDR-SRAM) for high
throughput
the different memory types are usually used like
this

37
APP550 Chip

Media interfaces
several to form fast data paths
two external connections
cell-oriented (ATM)
packet-oriented (Ethernet)

38
APP550 Chip

Scheduling interface interfaces
an external scheduling interface
external logic can use information about queues

PCI bus interfaces
allows communication with host CPU
mainly to control the whole operation

39
APP550 Chip

Coprocessor interfaces
APP550 should be able to process a packet
BUT, to accommodate special cases, e.g., adding
additional headers a co-processor interface is
provided

40
APP550 Chip
41
APP550 Chip

Stream Editor (SED)
two parallel engines
modify outgoing packets (e.g., checksum, TTL, )
configurable, but not programmable

Packet (protocol data unit) assembler
collect all blocks of a frame
not programmable

Pattern Processing Engine
patterns specified by programmer
programmable using a special high-level language
only pattern matching instructions
parallelism by hardware using multiple copies
and several sets of variables
access to different memories

Reorder Buffer Manager
transfers data between classifier and traffic
manager
ensure packet order due to parallelism and
variable processing time in the pattern
processing

Traffic Manager
schedule packets and shape traffic flow
programmable via scripts
sends packets to output interface
according to implemented policy
discard packets
choose queue

State Engine
gather information (statistics) for scheduling
verify flow within bounds
provide an interface to the host
configure and control other functional units

42
APP550 Full Duplex

Clock rate for APP550 is 233 MHz
One chip cannot manage packet at wire speed in
both directions often two in parallel (one each
direction)
all features needed in both direction?
classification only one direction ? checks
outgoing packets and enqueues using special queue

43
Intel IXP1200 / 2400A Short Overview
44
IXA Internet Exchange Architecture

IXA is a broad term to describe the Intel network
architecture (HW SW, control- data plane)
IXP Internet Exchange Processor
processor that implements IXA
IXP1200 is the first IXP chip (4 versions)
IXP2xxx has now replaced the first version

IXP1200 basic features
1 embedded 232 MHz StrongARM
6 packet 232 MHz µengines
onboard memory
4 x 100 Mbps Ethernet ports
multiple, independent busses
low-speed serial interface
interfaces for external memory and I/O busses
IXP2400 basic features
1 embedded 600 MHz XScale
8 packet 600 MHz µengines
3 x 1 Gbps Ethernet ports

45
IXP1200 Architecture
PCI bus - allow IXP to connect to I/O devices -
enable use of host CPU - rate 2.2 Gbps
SRAM bus - shared bus (several external units) -
usually control rather than data - rate 3.71 Gbps
Serial line - connects to the RISC - intended
for control and management - rate 38 Kbps

SDRAM bus
- provide access to external SDRAM memory
used to store packets
- can also pass addresses, control/store
operations, etc.
- rate 7.42 Gbps

IX (Intel eXchange) bus
enable higher rates compared to PCI
form fast path (IXP and high-speed interfaces)
- interface to other IXP cards
- 4.4 Gbps

46
IXP1200 Architecture
RISC processor - StrongARM running Linux -
control, higher layer protocols and exceptions -
232 MHz
Access units - coordinate access to external
units
Scratchpad - on-chip memory - used for IPC and
synchronization
Microengines - low-level devices with limited
set of instructions - transfers between memory
devices - packet processing - 232 MHz
47
IXP1200 Processor Hierarchy
General-Purpose Processor - used for control and
management - running general applications
RISC processor - chip configuration interface
(serial line) - control, higher layer protocols
and exceptions
I/O processors (microengines) - transfers
between memory devices - packet processing

Coprocessors
- real-time clock and timers
IX bus controller
hashing unit
...

Physical interface processors - implement layer
1 2 processing
48
IXP1200 Memory Hierarchy
49
IXP1200 Memory Hierarchy

Different memory types
are organized into different addressable data
units (words or longwords)
have different access times
connected to different busses
Therefore, to achieve optimal performance,
programmers must understand the organization and
allocate items from the appropriate type

50
IXP Performance Improvement Forwarding

Linux 2.4 vs. IXP 1200
Intel P4 host machine

The forwarding latency improvement itself may
only be relevant to very time-sensitive
interactive applications
Offloading at least equally important

51
IXP1200 ? IXP2400
PCI bus
IXP1200
SRAM bus
SRAM access
PCI access
Embedded RISK CPU (StrongARM)
SRAM
FLASH
SCRATCH memory
MEMORYMAPPEDI/O
SDRAM access
IX access
DRAM
DRAM bus
IX bus
52
IXP2400 Architecture

Coprocessors
hash unit
4 timers
general purpose I/O pins
external JTAG connections
several bulk cyphers (IXP2850 only)
checksum (IXP2850 only)

PCI bus
IXP2400
RISC processor - StrongArm ? XScale - 233 MHz ?
600 MHz
SRAM bus
SRAM access
PCI access
Embedded RISK CPU (XScale)
SRAM
coprocessor
SCRATCH memory
FLASH
slowport access

Media Switch Fabric
forms fast path for transfers
interconnect for several IXP2xxx

Slowport
shared inteface to external units
used for FlashRom during bootstrap

Microengines - 6 ? 8 - 233 MHz ? 600 MHz

SDRAM access
MSFaccess
DRAM
microengine 8
DRAM bus
receive bus
transmit bus

Receive/transmit buses
shared bus ? separate busses

53
IXP2400 Architecture

Memory
generally more of everything
generally larger gap between CPUs and memory
access in terms of cycles
local memory on each microengine
saving temporary results
private per packet processor
small (2560 bytes)
low latency (one cycle)
accessed through special registers

54
IXP2400 Packet Processing
PCI bus
SRAM bus
SRAM access
PCI access
Embedded RISK CPU (XScale)
SRAM
coprocessor
SCRATCH memory
FLASH
slowport access

SDRAM access
MSFaccess
DRAM
DRAM bus
receive bus
transmit bus
55
IXP2400 Use

Easier to use and understand
Pure Linux environment (except if workbench)
More stable
Faster to reset

56
Summary

The network challenges are many
Challenge find ways to improve the design and
manufacture of complex networking systems
Hope (2002 version) lies in the programmers and
network processors
We will use Intel IXP2400 as an example which
offers
embedded processor plus parallel packet
processors
connections to external memories and buses
Next time how to start programming these monsters