Title: Reconfigurable Computing: HPC Network Aspects
1Reconfigurable Computing HPC Network Aspects
Craig Ulmer (8963) cdulmer_at_sandia.gov
Pete Dean RD Seminar December 11, 2003
- Mitch Sukalski (8961)
- David Thompson (8963)
2FPGAs are promising
- But whats the catch?
- There are three main challenges that need to be
addressed in order to apply to practical,
scientific computing.
3RC Challenge 1 Floating Point
- Most FPGAs fine grained
- Floating point units are large
- 32b FP occupies 1,000 CLBs
- Commercial capacity improving
- 2000 6,000 CLBs
- 2003 40,000 CLBs (Max 220,000)
- Keith Underwood at Sandia/NM
- LDRD Working on high-speed 64b floating-point
cores
32b FP in Xilinx V2P7
4RC Challenge 2 Design Tools
- Hardware design is non-trivial
- Micromanage computations, clock-by-clock
- Not appropriate for most scientists
- Need languages, APIs that are easy to use
- Maya Gokhale at LANL
- Streams-C C-like language for HW design
- Pipeline/unroll loops
- Schedules access to external memory
5RC Challenge 3 High-speed I/O
- FPGAs have large internal computational power
- How do we get data into/out of FPGA?
- How do we connect to our existing HPC machines?
- Mitch Sukalski, David Thompson, Craig Ulmer
- LDRD Connect FPGAs to high-performance SANs
?
FPGA
FPGA
6Outline
- Where we have been
- Networking FPGAs using external NI cards
- Where we are going
- Networking FPGAs using internal transceivers
- Project status
- Early details
7Previous Work
8Networking Earlier FPGAs
- Previous generation of FPGAs were like blank
ASICs - Configurable logic and pins
- Attach a network card to an FPGA card
- Communication over PCI
-
- Examples
- Virginia Tech Myrinet
- Washington U. in St. Louis ATM (inline)
- Clemson University Gigabit Ethernet
- Georgia Tech Myrinet
9GRIM Project at Georgia Tech
- Add multimedia devices to cluster
- Message layer connects CPUs, memory, and
peripherals - Myrinet between hosts,PCI within hosts
- Celoxica RC-1000 FPGA
- Virtex FPGA (1M logic gates)
- Four SRAM banks
- PCI w/ PMC
10FPGA Organization
FPGA Card Memory
Frame
Circuit Canvas
FPGA
11Lessons Learned
- Frame provides simple OS
- Isolates users from board
- Portability
- Dynamically manage resources
- Card memory
- Computational circuits
- PCI bottleneck
- Distance between NI and FPGA
- PCI difficult to work with
Host CPU
Page A
Page C
SRAM 1
Page B
FPGA
SRAM 2
NIC
12Network Features of Recent FPGAs
13FPGA Network Improvements
- Recent FPGAs have special, built-in cores
- High-speed transceivers, dedicated processors
- Idea Build our NI inside the FPGA
- FPGA becomes a networked, compute resource
- Removes the PCI bottleneck
14Xilinx Virtex-II/Pro FPGA
- Up to 4 PowerPC405 cores
- Embedded version of PPC
- 300-400MHz
- Multiple gigabit transceivers
- Run at 600Mbps to 3.125Gbps
- Up to twenty-four transceivers
- Additional cores
- Distributed internal memory
- Arrays of 18b multipliers
- Digital clock multipliers, PLLs
Xilinx V2P20
15Multi-Gigabit Transceivers Rocket I/O
- Flexible, high-speed transceivers
- Can be configured to connect with different
physical layers - InfiniBand, GigE, FC, 10GigE, Aurora
- Note low-level interface (commas, disparity,
clock mismatches)
16Why MGTs are Important
- Direct connection to networks
- Same chip, different network
- Remove PCI from equation
- Fast connections between FPGAs
- Reduces analog design issues
- Chain FPGAs together
- Reduce pin count
- Update Virtex II/ProX
- Now 2.488 Gbps 10.3125 Gbps
- Chips have either 8 or 20 transceivers
3.125 Gbps over 44 FR4
From Xilinx, http//www.xilinx.com/products/vi
rtex2pro/mgtcharacter.htm
17Hard PowerPC Core
- PowerPC 405
- 16KB Instruction / 16KB Data caches
- Real and Virtual memory modes
- GCC is available
- Multiple memory ports for core
- On-chip memory (OCM)
- Processor Local Bus (PLB)
- User-defined memory map
- Connect memory blocks or cores
- External memory cores available
PowerPC
I-Cache
D-Cache
Processor Local Bus (PLB)
On-Chip Memory (OCM) Interface
18System on a Chip (SoC)
- Commercial SoC
- Designing with cores
- Customize system
- New tools
- Rapidly connect cores
- Library of cores buses
- Saves on wiring legwork
Xilinx Platform Studio
19Current Status
- Exploring V2P
- New architecture, new tools
- Two reference boards
- ML300 (V2P7-6)
- Avnet (V2P20-6)
- Transceiver work
- Raw transmission over fiber
- Working towards IB
http//cdulmer.ran.sandia.gov