INF5061: Multimedia data communication using network processors - PowerPoint PPT Presentation

About This Presentation
Title:

INF5061: Multimedia data communication using network processors

Description:

INF5061 multimedia data communication using network processors. Overview ... offloads host resources. 2005 Carsten Griwodz & P l Halvorsen ... – PowerPoint PPT presentation

Number of Views:77
Avg rating:3.0/5.0
Slides: 57
Provided by: paa5138
Category:

less

Transcript and Presenter's Notes

Title: INF5061: Multimedia data communication using network processors


1
Introduction
INF5061Multimedia data communication using
network processors
  • 2/9 - 2005

2
Overview
  • Course topic and scope
  • Background
  • software-based network systems
  • challenges and new requirements
  • evolution of network processors
  • (Very) short overview of some example network
    processors

3
INF5061The Course
4
Lecturers
  • Carsten Griwodzemail griff _at_ ifi
  • Pål Halvorsenemail paalh _at_ ifi

5
About INF5061 Topic Scope
  • Content The course gives
  • an overview of network processor cards
    (architectures and use)
  • an introduction of how to program Intel IXP
    network processors
  • some ideas of how to use network processors

6
About INF5061 Topic Scope
  • Lab-assignmentsAn important part of the course
    are lab-assignments where the students should
    make a program for the Intel IXP2400 network
    processor
  • wwpingbump download and run
  • protocol statistics extend the wwpingbump to
    give processor, interface and protocol statistics
  • packet bridge with ARP support forward packet
    to correct interface (of 3 available)
  • transparent load balancer balance load and
    forward packets to the right machine in a cluster
    of two with same IP address
  • HTTP protocol translator add support in the
    transparent load balancer for HTTP streaming
    having an RTSP/RTP server

7
About INF5061 Exam (10sp)
  • Prerequisite mandatory assignments
  • lab assignment 2 protocol statistics
  • presentation of a relevant paper
  • Graded assignments
  • lab assignment 4 transparent load balancer
  • deliver code
  • short demo/explanation of code (to lecturers
    only)
  • lab assignment 5 HTTP protocol translator
  • deliver code and a short report
  • present and demonstrate to the class at the end
    of the course
  • Final exam oral exam (???/12-2005)
  • selected chapters from the Comer book and IXP
    documentation
  • lecture slides (including slides from presented
    papers)
  • content of lab assignments

8
About INF5060 Exam (5sp)
  • Mandatory assignment
  • lab assignment 5 HTTP protocol translator
  • deliver code and a short report
  • present and demonstrate to the class at the end
    of the course
  • approved assignment gives a passed course
    (INF5060)

9
Available Resources
  • Book Douglas E. Comer Network Systems Design
    using Network Processors Intel IXP2xxx
    Version, Pearson Prentice Hall, 2004
  • Other resources will be placed at
  • http//www.ifi.uio.no/paalh/INF5061
  • Login inf5061
  • Password ixp
  • Manuals for IXP2400 /paalh/INF5061/IXP2400
  • Code /paalh/INF5061/code

10
Disclaimer
  • In the field of network processors, I am a tyro
  • Definition Tyro \Tyro\, n. pl. Tyros. A
    beginner in learning one who is in the
    rudiments of any branch of study a person
    imperfectly acquainted with a subject a
    novice
  • Then, by definition, in the field of network
    processors, we are all tyros
  • In our defense, when it comes to network
    processors, everyone is a tyro

11
Background and Motivation
12
Software-Based Network System
  • Uses conventional, shared hardware (e.g., a PC)
  • Software
  • runs the entire system
  • allocates memory
  • controls I/O devices
  • performs all protocol processing
  • First generation network systems

13
Review of General Data Path on Conventional
Computer Hardware Architectures
sending
receiving
forwarding
application
application
application
communication system
communication system
communication system
transport (TCP/UDP)
network(IP)
link
14
Review of Conventional Computer Hardware
Architectures
Intel D850MD Motherboard - Intel Hub Architecture
(850 Chipset)
RDRAM connectors
CPU socket
RDRAM interface
system bus
hub interface
PCI bus
Memory Controller Hub
I/O Controller Hub
PCI connectors
15
Forwarding Example for an Intermediate Node
Intel Hub Architecture
application
user space kernel space
Note- one single average MPEG-II DVD stream
require 330-660 packets per second of 1500
Bytes (4-8 Mbps) - then use smaller packets, add
concurrent clients, other applications,
communication system
Pentium 4 Processor
registers
cache(s)
communication system
application
network card
16
Main Packet Processing Costs
  • Copying used when moving a packet from one
    memory location to another
  • expensive (proportional to packet size)
  • should be avoided whenever possible (use
    pointers)
  • Checksuming used to detect errors
  • expensive (proportional to packet size)
  • transport layer payload header
  • network layer header
  • Fragmentation/reassembly needed when packet is
    larger than smallest MTU
  • generate headers header checksum
  • receiving many small data fragments

17
Question
  • Which is growing faster?
  • network bandwidth
  • processing power
  • Note if network bandwidth is growing faster
  • CPU may be the bottleneck
  • need special-purpose hardware
  • conventional hardware will become irrelevant
  • Note if processing power is growing faster
  • no problems with processing
  • network/busses will be bottlenecks

18
Growth Of Technologies
Mbps
year
19
Packet Rates and Software Processing
64 B 1500 B
10BASE-T (10 Mbps) 19.531 833
1000BASE-T (1 Gbps) 1.953.125 83.333
OC-192 (9.95 Gbps) 19.439.453 829.416
  • Packet rates (packets per second)
  • Packet processing (MIPS, assuming 5K instructions
    per packet)
  • the Comer book uses 10K instructions as an upper
    bound per packet
  • it varies according to which protocols are used,
    implementation, data size, etc.
  • more if moved through a fire wall
  • engineering rule 1GHz general purpose CPU
    1Gbps network data rate
  • Note this is only processing time must be
    added to handle interrupts and move data into
    memory
  • Thus, software running on a general-purpose
    processor is insufficient to handle high-speed
    networks because the aggregate packet rate
    exceeds the capabilities of the CPU

64 B 1500 B
10BASE-T (10 Mbps) 97,65 4,17
1000BASE-T (1 Gbps) 9.765,63 416,67
OC-192 (9.95 Gbps) 97.197,27 4.147,08
20
The Network System Challenges
  • Data rates in general keep increasing
  • Network rate gt CPU rate gt memory, busses and I/O
    interfaces
  • Protocols and applications keep evolving
  • System design, implementation and testing is time
    consuming and expensive
  • Systems often contain errors
  • Special-purpose hardware (ASIC) designed for one
    type of system can usually not be reused
  • Host machine must inspect all incoming packets
  • Challenge find ways to improve the design and
    manufacture of complex networking systems

21
Statement of Hope
  • If there is hope, it lies in
  • 1990 faster CPUs
  • 1995 the application specific integrated
    circuit (ASIC) designers
  • 2002 the programmers!
  • Programmability
  • we need a programmable device with more
    capability than a conventional CPU
  • key to low-cost hardware for next generation
    network systems
  • compared to ASIC designs, it is more flexible,
    easier and faster to upgrade, and thus, less
    expensive

22
First Generation
  • General idea To optimize computation, move
    operations that account for the most CPU time
    from software into hardware
  • Onboard
  • address recognition and filtering
  • onboard buffering
  • DMA
  • buffer and operation chaining
  • Add hardware to NIC
  • off-the-shelf chips for layer 2
  • ASICs for layer 3
  • Allows each NIC to operate independently
  • effectively a multiprocessor
  • total processing power increased dramatically

23
Second Generation (early 1990s)
  • Designed for greater scale
  • Decentralized architecture
  • additional computational power on each NIC
  • NIC implements classification and forwarding
  • High-speed internal interconnection mechanism
  • interconnects NICs
  • provides fast data path
  • Multiple network interfaces
  • High-speed hardware interconnects NICs
  • General-purpose processor only handles exceptions
  • Sufficient for medium speed interfaces (100 Mbps)

24
Third Generation (late 1990s)
  • Almost all packet processing off-loaded from CPU
  • Special-purpose ASICs handle lower layer
    functions
  • Embedded (RISC) processor handles layer 4
  • CPU only handles low-demand processing
  • Functionality partitioned further
  • Additional hardware on each NIC
  • Onboard
  • classification
  • forwarding
  • traffic policing
  • monitoring and statistics

25
Third Generation (late 1990s)
  • Enough, are third generation sufficient??
  • Almost!!
  • But not quite! -(
  • Whats the problem?
  • high cost
  • long time to market
  • difficult to test
  • expensive and time-consuming to change
  • even trivial changes require silicon respin
  • 18-20 month development cycle
  • little reuse across products and versions
  • require in-house expertise (ASIC designers)

26
Network Processors The Idea in a Nutshell
  • Devise new hardware building blocks, but make
    them programmable
  • Include support for protocol processing and I/O
  • General-purpose processor(s) for control tasks
  • Special-purpose processor(s) for packet
    processing and table lookup
  • Include functional units for tasks such as
    checksum computation, hashing,
  • Integrate as much as possible onto one chip
  • Call the result a network processor

27
Review of Conventional Computer Hardware
Architectures
Intel D850MD Motherboard - Intel Hub Architecture
(850 Chipset)
RDRAM connectors
CPU socket
RDRAM interface
system bus
hub interface
PCI bus
Memory Controller Hub
I/O Controller Hub
PCI connectors
28
Network Processors Main Idea
Traditional system - slow - resource demanding -
shared with other operations
Network processors - a computer within the
computer - special, programmable hardware -
offloads host resources
29
Designing a Network Processor
  • Depends on
  • operations network processor will perform
  • role of network processor in overall system
  • Goals
  • generality sufficient for all protocols, all
    protocol processing tasks and all possible
    networks
  • high speed scale to high bit rates and high
    packet rates
  • Key point A network processor is not designed
    to process a specific protocol or part of a
    protocol. Instead, designers seek a minimal set
    of instructions that are sufficient to handle an
    arbitrary protocol processing task at high speed

30
Where to Place Network Processors
  • Thus, network processors is somewhere in the
    middle

performance
  • Goal increase performance and reduce costs

ASIC designs
  • Increase performance
  • known issues
  • must partition packet processing into
  • separate functions
  • to achieve highest speed, must handle
  • each function with separate hardware
  • unknown issues
  • which functions to choose
  • what hardware building blocks to use
  • how to interconnect building blocks

network processors
software on conventional prosessor
cost
  • Decrease costs
  • Economics driving a gold rush
  • NPs will dramatically lower production
  • costs for network systems
  • good NP designs worth lots of

31
Explosion of Commercial Products
  • 1990 ? 2000 network processors transformed from
    interesting curiosity to mainstream product
  • used to reduce both overall costs and time to
    market
  • 2002 over 30 vendors with a vide range of
    architectures
  • e.g.,
  • Multi-Chip Pipeline (Agere)
  • Augmented RISC Processor (Alchemy)
  • Embedded Processor Plus Coprocessors (Applied
    Micro Circuit Corporation)
  • Pipeline of Homogeneous Processors (Cisco)
  • Pipeline of Heterogeneous Processors (EZchip)
  • Configurable Instruction Set Processors
    (Cognigine)
  • Extensive And Diverse Processors (IBM)
  • Flexible RISC Plus Coprocessors (Motorola)
  • Internet Exchange Processor (Intel)

32
Agere PayloadPlusA Short Overview
33
Agere PayloadPlus (APP)
  • Agere PayloadPlus (APP)
  • consists of both programmable hardware and
    software
  • consists of both data and control planes (i.e.,
    slow and fast plane)
  • APP defines HW architectures, SW mechanisms,
    interconnection mechanisms and interfaces, BUT
    does not specify how to implement them.
  • Several versions of APP exist differing in the
    number and types of functional units, degree of
    parallelism and internal bandwidth (2.
    generation 5 models)

34
APP Conceptual Pipeline
  • State engine
  • initiate, configure and control classifier and
    traffic manager
  • receives control from classifier
  • update statistics (e.g., packet count)
  • check packets against profiles(and inform
    classifier)
  • Forwarder
  • get packet from classifier
  • perform traffic shaping and management
  • fragment packet (if necessary)
  • modify headers (if necessary)
  • Classifier
  • extract packets from ingress
  • classify packet
  • send statistics to state engine
  • reassemble blocks
  • pass packet to forwarder together with
    classification decision

35
APP550 Chip
36
APP550 Chip
  • Memory interfaces
  • two types of physical memory
  • fast cycle RAM (FCRAM) for fast memory accesses
  • double data rate SRAM (DDR-SRAM) for high
    throughput
  • the different memory types are usually used like
    this

37
APP550 Chip
  • Media interfaces
  • several to form fast data paths
  • two external connections
  • cell-oriented (ATM)
  • packet-oriented (Ethernet)

38
APP550 Chip
  • Scheduling interface interfaces
  • an external scheduling interface
  • external logic can use information about queues
  • PCI bus interfaces
  • allows communication with host CPU
  • mainly to control the whole operation

39
APP550 Chip
  • Coprocessor interfaces
  • APP550 should be able to process a packet
  • BUT, to accommodate special cases, e.g., adding
    additional headers a co-processor interface is
    provided

40
APP550 Chip
41
APP550 Chip
  • Stream Editor (SED)
  • two parallel engines
  • modify outgoing packets (e.g., checksum, TTL, )
  • configurable, but not programmable
  • Packet (protocol data unit) assembler
  • collect all blocks of a frame
  • not programmable
  • Pattern Processing Engine
  • patterns specified by programmer
  • programmable using a special high-level language
  • only pattern matching instructions
  • parallelism by hardware using multiple copies
    and several sets of variables
  • access to different memories
  • Reorder Buffer Manager
  • transfers data between classifier and traffic
    manager
  • ensure packet order due to parallelism and
    variable processing time in the pattern
    processing
  • Traffic Manager
  • schedule packets and shape traffic flow
  • programmable via scripts
  • sends packets to output interface
  • according to implemented policy
  • discard packets
  • choose queue
  • State Engine
  • gather information (statistics) for scheduling
  • verify flow within bounds
  • provide an interface to the host
  • configure and control other functional units

42
APP550 Full Duplex
  • Clock rate for APP550 is 233 MHz
  • One chip cannot manage packet at wire speed in
    both directions often two in parallel (one each
    direction)
  • all features needed in both direction?
  • classification only one direction ? checks
    outgoing packets and enqueues using special queue

43
Intel IXP1200 / 2400A Short Overview
44
IXA Internet Exchange Architecture
  • IXA is a broad term to describe the Intel network
    architecture (HW SW, control- data plane)
  • IXP Internet Exchange Processor
  • processor that implements IXA
  • IXP1200 is the first IXP chip (4 versions)
  • IXP2xxx has now replaced the first version
  • IXP1200 basic features
  • 1 embedded 232 MHz StrongARM
  • 6 packet 232 MHz µengines
  • onboard memory
  • 4 x 100 Mbps Ethernet ports
  • multiple, independent busses
  • low-speed serial interface
  • interfaces for external memory and I/O busses
  • IXP2400 basic features
  • 1 embedded 600 MHz XScale
  • 8 packet 600 MHz µengines
  • 3 x 1 Gbps Ethernet ports

45
IXP1200 Architecture
PCI bus - allow IXP to connect to I/O devices -
enable use of host CPU - rate 2.2 Gbps
SRAM bus - shared bus (several external units) -
usually control rather than data - rate 3.71 Gbps
Serial line - connects to the RISC - intended
for control and management - rate 38 Kbps
  • SDRAM bus
  • - provide access to external SDRAM memory
  • used to store packets
  • - can also pass addresses, control/store
    operations, etc.
  • - rate 7.42 Gbps
  • IX (Intel eXchange) bus
  • enable higher rates compared to PCI
  • form fast path (IXP and high-speed interfaces)
  • - interface to other IXP cards
  • - 4.4 Gbps

46
IXP1200 Architecture
RISC processor - StrongARM running Linux -
control, higher layer protocols and exceptions -
232 MHz
Access units - coordinate access to external
units
Scratchpad - on-chip memory - used for IPC and
synchronization
Microengines - low-level devices with limited
set of instructions - transfers between memory
devices - packet processing - 232 MHz
47
IXP1200 Processor Hierarchy
General-Purpose Processor - used for control and
management - running general applications
RISC processor - chip configuration interface
(serial line) - control, higher layer protocols
and exceptions
I/O processors (microengines) - transfers
between memory devices - packet processing
  • Coprocessors
  • - real-time clock and timers
  • IX bus controller
  • hashing unit
  • ...

Physical interface processors - implement layer
1 2 processing
48
IXP1200 Memory Hierarchy
49
IXP1200 Memory Hierarchy
  • Different memory types
  • are organized into different addressable data
    units (words or longwords)
  • have different access times
  • connected to different busses
  • Therefore, to achieve optimal performance,
    programmers must understand the organization and
    allocate items from the appropriate type

50
IXP Performance Improvement Forwarding
  • Linux 2.4 vs. IXP 1200
  • Intel P4 host machine
  • The forwarding latency improvement itself may
    only be relevant to very time-sensitive
    interactive applications
  • Offloading at least equally important

51
IXP1200 ? IXP2400
PCI bus
IXP1200
SRAM bus
SRAM access
PCI access
Embedded RISK CPU (StrongARM)
SRAM
FLASH
SCRATCH memory
MEMORYMAPPEDI/O
SDRAM access
IX access
DRAM
DRAM bus
IX bus
52
IXP2400 Architecture
  • Coprocessors
  • hash unit
  • 4 timers
  • general purpose I/O pins
  • external JTAG connections
  • several bulk cyphers (IXP2850 only)
  • checksum (IXP2850 only)

PCI bus
IXP2400
RISC processor - StrongArm ? XScale - 233 MHz ?
600 MHz
SRAM bus
SRAM access
PCI access
Embedded RISK CPU (XScale)
SRAM
coprocessor
SCRATCH memory
FLASH
slowport access
  • Media Switch Fabric
  • forms fast path for transfers
  • interconnect for several IXP2xxx
  • Slowport
  • shared inteface to external units
  • used for FlashRom during bootstrap

Microengines - 6 ? 8 - 233 MHz ? 600 MHz

SDRAM access
MSFaccess
DRAM
microengine 8
DRAM bus
receive bus
transmit bus
  • Receive/transmit buses
  • shared bus ? separate busses

53
IXP2400 Architecture
  • Memory
  • generally more of everything
  • generally larger gap between CPUs and memory
    access in terms of cycles
  • local memory on each microengine
  • saving temporary results
  • private per packet processor
  • small (2560 bytes)
  • low latency (one cycle)
  • accessed through special registers

54
IXP2400 Packet Processing
PCI bus
SRAM bus
SRAM access
PCI access
Embedded RISK CPU (XScale)
SRAM
coprocessor
SCRATCH memory
FLASH
slowport access

SDRAM access
MSFaccess
DRAM
DRAM bus
receive bus
transmit bus
55
IXP2400 Use
  • Easier to use and understand
  • Pure Linux environment (except if workbench)
  • More stable
  • Faster to reset

56
Summary
  • The network challenges are many
  • Challenge find ways to improve the design and
    manufacture of complex networking systems
  • Hope (2002 version) lies in the programmers and
    network processors
  • We will use Intel IXP2400 as an example which
    offers
  • embedded processor plus parallel packet
    processors
  • connections to external memories and buses
  • Next time how to start programming these monsters
Write a Comment
User Comments (0)
About PowerShow.com