Packet Switching on Raw - PowerPoint PPT Presentation

1 / 48
About This Presentation
Title:

Packet Switching on Raw

Description:

High-Bandwidth Packet Switching on the Raw General-Purpose Architecture, ... Raw prototype clock speed is assumed to be 250 MHz ... – PowerPoint PPT presentation

Number of Views:54
Avg rating:3.0/5.0
Slides: 49
Provided by: glebach
Category:
Tags: packet | raw | switching

less

Transcript and Presenter's Notes

Title: Packet Switching on Raw


1
Packet Switching on Raw
  • Research Qualifying Exam
  • Gleb A Chuvpilo
  • January 28, 2005

2
Project Publications
  • High-Bandwidth Packet Switching on the Raw
    General-Purpose Architecture,Gleb A. Chuvpilo
    and Saman AmarasingheIn Proceedings of the
    International Conference on Parallel Processing
    (ICPP-03), Kaohsiung, Taiwan, Republic of China,
    October 6-9, 2003.
  • High-Bandwidth Packet Switching on the Raw
    General-Purpose Architecture,Gleb A.
    Chuvpilo,S.M. Thesis, Massachusetts Institute of
    Technology, Cambridge, Massachusetts, August,
    2002.
  • RawNet Network Processing on the Raw
    Processor,David Wentzlaff, Gleb A. Chuvpilo,
    Arvind Saraf, Saman Amarasinghe, and Anant
    Agarwal,In Research Abstracts of the MIT
    Laboratory for Computer Science, Cambridge,
    Massachusetts, March 2002.
  • Gigabit IP Routing on Raw,Gleb A. Chuvpilo,
    David Wentzlaff, and Saman Amarasinghe,In
    Proceedings of the 1st HPCA Workshop on Network
    Processors, Cambridge, Massachusetts, February 3,
    2002.
  • Also, unpublished work on Network Calculus at the
    Computer Engineering and Networks Laboratory of
    the ETH Swiss Federal Institute of Technology

3
Outline
  • Introduction
  • Raw Processor Overview
  • Internet Router Overview
  • Packet Switching on Raw
  • Raw Router Architecture
  • Rotating Crossbar Design for Switch Fabric
  • Distributed Scheduling Algorithm
  • Minimization and Scheduling
  • Results
  • Conclusion

4
Introduction
5
Goal
  • Build an IP router on a general-purpose processor
  • Why?
  • Flexibility ? new protocols and services
  • Price ? economies of scale

6
Raw
7
Raw Processor
  • A scalable computation fabric
  • 4 x 4 mesh of tiles, each tile is a RISC
    microprocessor
  • Ultra fast interconnect network
  • Exposes the wires to the compiler
  • Compiler orchestrates the communication

8
Raw Facts
  • Performance
  • 16 OPS/FLOPS per cycle
  • 230 Gb/s of on-chip bisection bandwidth
  • 201 Gb/s off-chip I/O bandwidth
  • 57 GB/s of on-chip memory bandwidth

9
Raw Facts
  • Layout
  • Longest wire is the length of tile ? fast
    clocking
  • Each tile
  • MIPS R4000 router interconnect
  • 32 KB IMEM
  • 32 KB data cache
  • 64 KB SMEM ? 2 MB total per chip

10
Raw Facts
  • Instruction Set Architecture
  • Eight stage pipeline FETCH, DECODE, RF/STALL,
    EXE, MUL, MEM, FPU
  • MIPS instruction set
  • 28 general-purpose registers
  • 4 register-mapped network ports
  • 2-way set-associative cache,3 cycle latency, 32
    byte lines

11
Raw Facts
  • Implementation
  • ASIC _at_ 250 MHz Worst Case
  • 122 million transistors (P4 43 million)
  • 18.2mm x 18.2mm die (P4 15mm x 15mm)
  • 1080 signal I/O pins
  • 25 Watts
  • IBM SA-27E 6 layer metal copper 0.15µ process
    (P4 0.13µ)

12
Raw Layout
13
Communication Mechanisms
  • 2 static networks
  • 2 dynamic networks

14
Static Networks
  • Destinations known at compile time
  • Message size known at compile time
  • Cycle-by-cycle switch schedule
  • Three-cycle nearest neighbor send-to-use latency
  • No processing overhead

15
Static Network Send
16
Static Network Receive
17
Dynamic Networks
  • Unpredictable events
  • External asynchronous interrupts
  • Cache misses
  • 15- to 30-cycle nearest neighbor send-to-use
    latency (message header processing overhead)
  • Wormhole routed, two-stage pipelined,dimension-or
    dered

18
Routing
19
What is Routing? RM OSI
20
IP Router
21
Switch Fabric
22
Click Modular Router
  • Modular software router
  • MIT Parallel and Distributed OS Group
  • 435,000 64-byte packets a second on a 700 MHz
    Pentium III (commodity hardware)
  • Flexible, configurable, and easy to understand
  • Interconnected collection of modules called
    elements

23
Click Modular Router
24
Packet Switching on Raw
25
Problem Four Networks
26
and Sixteen Tiles
27
What is the Mapping?
?
StaticInterconnect
Dynamic Communication
28
Solution Rotating Crossbar
Out 0
Out 1
In 0
In 1
In 3
In 2
Out 3
Out 2
29
Switch Fabric Design
  • The idea of a Token Ring network ? absolute
    fairness
  • Algorithm uses two static networks, dynamic
    networks are idle
  • All deadlock-free configurations are scheduled
    at compile time
  • Four headers and token location define a global
    configuration
  • Global configuration is computed in a distributed
    manner at run time

30
Rotating Crossbar Illustrated
31
Rotating Crossbar Illustrated
32
Phases of the Algorithm
TILE PROCESSOR
SWITCH PROCESSOR
headers_request
headers
send_prev_config
choose_new_config
route_body
confirm
update_token
33
Distributed Scheduling Algorithm
  • Lets enumerate the number of configurations
  • SPACE Hdr0 x x Hdr3 x Token,
  • where Hdr0 Hdr3 5,
  • and Token 4 ?
  • therefore
  • SPACE 54 x 4 2,500 distinct configurations

34
So What?...
  • Each tile has 8,192 words of instruction memory,
    same for switch ?
  • ? 8,192/2,500 3.3 instructions per
    configuration ? not enough! ? need to use
    off-chip memory ? slow! ?
  • ? need to minimize SPACE

35
Minimization
out
cwnext
in
ccwprev
cwprev
ccwnext
36
Clients and Servers of a Crossbar Processor
37
Minimization and Scheduling
  • We cut down the number of configurations by 78
    times! Now there are only 32 entries!
  • ? the program can fit in the local instruction
    memory!
  • Code generated by an automatic compile-time
    scheduler
  • In addition, software pipelining loop unrolling
    of the assembly code of the switch processors of
    the crossbar to avoid deadlock

38
Scheduler Output
  • / AUTOGENERATED SCHEDULE FOR PORT 0 /
  • / Tile Processor /
  • / /
  • conf_1_0303
  • mtsri SW_PC, lo(sw_conf_1000)
  • j conf_done
  • conf_1_0304
  • mtsri SW_PC, lo(sw_conf_1000)
  • j conf_done
  • conf_1_0310
  • mtsri SW_PC, lo(sw_conf_2001)
  • j conf_done
  • conf_1_0311
  • mtsri SW_PC, lo(sw_conf_1210)
  • j conf_done

/ HAND-CODED SCHEDULE FOR PORT 0 / / Switch
Processor / / / / in-gtout, prev-gtnext,
dist1 / sw_conf_1210 nop
route IN-gtOUT nop
route IN-gtOUT, PREV-gtNEXT nop
route IN-gtOUT, PREV-gtNEXT
nop route IN-gtOUT,
PREV-gtNEXT nop
route IN-gtOUT, PREV-gtNEXT nop
route IN-gtOUT, PREV-gtNEXT
nop route IN-gtOUT,
PREV-gtNEXT nop
route IN-gtOUT, PREV-gtNEXT / /
39
Results
40
Implementation
  • Raw Router was tested in a cycle-accurate
    simulator of the Raw processor
  • Raw prototype clock speed is assumed to be 250
    MHz
  • The focus of research is on switch fabric, NOT on
    route lookup, etc.
  • Over 75,000 lines of assembly code, many of them
    hand-coded

41
Raw Router Results
  • Features
  • 4-port edge router
  • 3.3 Mpps
  • 26.9 Gbps
  • Uses Raw static networks to stream data

42
Conclusion
43
Conclusion
  • Implemented a gigabit switch on Raw
  • Mapped dynamic communication to static
    interconnect
  • Can intermix switch fabric with computation
  • High-bandwidth I/O allows performance of custom
    ASIC processors

44
Future Work Critique
  • Take advantage of dynamic networks
  • Implement IP route lookup
  • Add computation on data (encryption)
  • Add support of multicast traffic
  • Implement Quality of Service
  • Add virtual output queueing
  • Explore larger router configurations

45
End of the official part!
46
Current Research
  • Probabilistic Robotics with Prof. John Leonard
  • Robust Feature-Relative Navigation for Autonomous
    Underwater Vehicles

47
Robotic Kayaks
48
Questions?
Write a Comment
User Comments (0)
About PowerShow.com