Title: System Level Applications of Adaptive Computing
1System Level Applications of Adaptive Computing
- Brian SchottUSC Information Sciences Institute
- DARPA TTO ACS PI, April 2000
2SLAAC Technology
- Goal ACS research insertion into deployed DoD
systems. - Distributed ACS architecture for research lab and
embedded systems. - Domain compiler tools and module generators.
- Application/algorithm mapping to reconfigurable
logic .
3SLAAC Team
- This talk covers a slice of SLAAC.
4Application Challenges
- SLAAC applications have a variety of physical
form factors and scalability requirements. - However, most development will occur in
university labs.
NVL IR/ATR
Sandia SAR/ATR
NUWC Sonar Beamforming
5SLAAC Architectural Approach
- SLAAC defines a scalable, portable, distributed
systems architecture based on a high speed
network of ACS accelerated nodes.
- Use COTS
- Use existing network and cluster computing
technology. - Use comparable embedded systems technology.
6Reference Platforms
- SLAAC has defined two reference platforms
- RRP is an NT/Linux cluster of ACS-accelerated
workstations. - DRP is an ACS-accelerated embedded PowerPC
Multicomputer. - The goal is to provide scalable, source-code
compatibility between RRP and DRP.
7How do you program?
- You can use existing parallel libraries to move
data from host to host and control ACS boards
individually. - We have created an ACS system API that
generalizes control functions and introduces a
logical node number (rank). - Intended to augment (not interfere with) existing
parallel libraries (MPI).
- System creation and configuration functions.
- ACS_System_Create(), ACS_Configure()
- Memory access functions.
- ACS_Read(), ACS_Write()
- Streaming data functions.
- ACS_Enqueue(), ACS_Dequeue()
- Debugging functions.
- ACS_Readback()
8ACS System API
- C-callable API hides a set of C classes.
- Intended to make it easy to plug in different
nodes and communicators.
Host C-callable API
ACS System Class
System creation, configuration, memory access,
and streaming data, and debugging methods.
Communicator Classes
MPI Communicator
Myrinet GM
Local Node
ACS Node Classes
CSRC/RCM
WildForce
SLAAC-1
SLAAC-1V
WildStar
SLAAC-2
9Single Board API Benefits
- Provides portability for host control code.
Write your application once and migrate to newer
ACS boards. - Open API w/ source code makes it easier to
develop control software for new boards.
Standardizes the minimum feature set for ACS
hardware. - Extensible API provides common target for tool
development. Encourages wider audience for your
tool efforts.
10Multiple Board Programming Model
- Intended to reduce distributed ACS system
programming complexity. - We define a system of hosts, nodes, and channels.
- Hosts are user processes that run on GPPs .
- Nodes are ACS boards with control threads.
- Channels automatically stream data between FIFOs
on host/nodes.
Hosts
Nodes
Network
11Network Channels
- Use network channels in place of physical
point-to-point connections.
- Boards operate on individual clocks, but are
data-synchronous. - Channels can apply back-pressure to stall
producers.
Network-channel
12FPGA Design Paths
DEFACTO
High Level Compilers
HRT Express
Domain Specific Compilers
Streams C
JHDL
Low Level Design Tools
COTS VHDL
Domain Module Generators
Runtime System
SLAAC ACS API, Debugger, Runtime
ACS Hardware
WildForce
CSRC/RCM
SLAAC-1
SLAAC-1V
WildStar
SLAAC-2
13SLAAC Hardware
14SLAAC Hardware Approach
- We want ACS researchers to develop on PCI
platforms and seamlessly migrate to VME when
necessary. - ACS API provides source-code compatibility on
host. What is the moral equivalent for ACS
boards? - FPGA designs sensitive to hardware architecture.
- Variety of tools are used to generate bitfiles.
- Bitfile compatibility is least-common
denominator. - SLAAC approach is to provide bitfile compatible
architectures in PCI and VME form factors.
15SLAAC1 PCI Architecture
- Full-sized PCI card.
- One XC4085 and two Xilinx xc40150s (750K user
gates). - Twelve 256kx18 ZBT synchronous SRAMs.
- External I/O connectors.
- Xilinx 4062 PCI interface.
- Two 72-bit FIFO ports.
- External memory bus.
- 100mhz programmable clock.
72/
72 /
16SLAAC1 Front
Memory Module (3)
SLAAC-1 PCI
17SLAAC-1 Back
QC64 I/O
18SLAAC-1 Status
- We have 20 boards built.
- BYU, UCLA, LANL, VT, LMGES, ISI. DoD (external).
- We are in a maintenance cycle on SLAAC-1.
- Incrementally improving software support.
- Fixing bugs that application developers encounter.
- SLAAC team working on S-1 apps for 6-8 months.
- VHDL and JHDL development environments available.
- Applications developed
- NVL IR/ATR.
- HTC Interfacing Stressmark.
- ECMA modules.
- Sonar Beamformer.
19CSPI M2621/S Carrier
- CSPI is a commercial VME multicomputer vendor
used by Sandia and LMGES. - CSPI modified their M2641 Quad PowerPC baseboard.
- Both PowerPC busses from baseboard brought to
connectors. - Two PowerPCs on mezzanine replaced with SLAAC
technology.
CSPI 2641Quad PowerPC Board 1.6 GFLOPS Quad
PowerPC Four Myrinet SANs in 6U Slot 1.28
GBytes/s I/O with 4 Myrinet SANs 64 to 256 MB
On-Board Memory I/O Front Panel VME64 PO
Backplane Active and Passive Multicomputing
Software MPI, ISSPL
20CSPI Baseboard
PPC 603
Myricom LANai
Future PPC Connector
SAN connector
Myricom 8-port Switch
VME Myrinet P0 connector
21CSPI M2621/S Carrier
- This carrier is commercially available from CSPI.
- Two 40MHz 64-bit PowerPC busses.
- VxWorks OS.
- PowerPC bus VHDL core.
22SLAAC2 Architecture
- Two independent SLAAC1 boards in single 6U VME
mezzanine. - Four XC40150s, two XC4085s - 1.5M gates total.
- Twenty 256Kx18 SSRAMs.
- Sacrificed external memory bus.
40 /
40 /
72/
72/
23SLAAC2 Front
24SLAAC-2 Back
25SLAAC-2 Status
- We have four xc40150-based boards built (three
older xc4085 prototypes retired). - Hardware operational and well tested.
- Migrated SLAAC-1 control software to VxWorks.
- Software and interface logic demonstration-ready
, but not customer-ready.
- We are in clean-up mode on control logic and
software. - VxWorks is hard.
- Applications developed
- HTC interfacing Stressmark.
- ECMA modules.
- Expected customer deliveries this month.
262nd Gen SLAAC Hardware Goals
- Create a SLAAC architecture migration path to
faster/denser Virtex devices and larger memories. - Increase the number of available external I/O
ports to three. - Tightly integrate user designs with 64/66 PCI
interface. - Open platform to meet the needs of two ACS
efforts - LOKI (Xilinx, VT). Provide experimental platform
for fast partial runtime reconfiguration on
Virtex. - Unified debug (BYU). Provide RTR and full
board-state dump capability for checkpoint and
debug.
27SLAAC-1V Architecture
- Three Virtex 1000.
- 3M logic gates _at_ 200MHz.
- Use Xilinx 64/66 core.
- Virtex100 configuration controller, FLASH,
SRAM. - Ten 256Kx36 ZBT SRAMs.
- Bus switches allow single-cycle memory bank
exchange between X0 and X1/X2. - Support up to 1Mx36.
- Three I/O connectors.
- Crossbar ports give access to external I/O
connectors.
X1
X2
72
60
X
X
X
72
X0
IF
X0
72
72
User
Interface
CC
F
S
64/66 PCI
28Detailed View of X0
- X0 is Virtex 1000 (1M gates).
- 72-bit left and right ring ports.
- 72-bit crossbar ext IO port.
- 60 pins to each memory card.
- Utility bus to Virtex 100 configuration
controller. - lt25 used for 64/66 Xilinx PCI core, and IF
bridge. - Columns parallelto PCI core.
- Configuration Controller (CC)
- FLASH (boot configurations)
- SRAM (readback and configuration cache).
- SelectMAP registers for X0, X1, and X2.
L R
xbar
SelectMAP
X0
F
S
M0 M1
CC
UTL
PCI
IF
29Detailed View of XP
- X1 and X2 are identical.
- Virtex 1000 (1M gates)
- 72-bit left and right ring ports.
- 72-bit crossbar ext IO port.
- 240 pins to memory card.
- Floor plan allows two virtual PEs per XP chip.
- VPEs each have 1 memory and 36-bits of XBAR.
- Horizontal long-lines need to left-to-right
connectivity.
- SelectMAP supports fast column configuration
- 15ms for whole chip.
- lt1ms single column configuration.
30Configuration Controller
- Board designed to maximize runtime
reconfiguration efficiency. - Three parallel 8-bit _at_ 80 MHz SelectMAP ports.
- Manages local FLASH for boot and SRAM for
configuration/readback. - Four banks each of FLASH SRAM.
- User clock accessed from CC for precise
step,configure,step control. - State of entire board can be pulled for context
switching / check pointing.
31Memory Bank Switching
- X0 can swap x0m0 with any memory on X1 and x0m1
with any memory on X2.
- Easy external memory bus implementation or
user-level double-buffering.
32External I/O Support
- Bus exchange switch allows each FPGA to access
either shared crossbar bus or local external I/O
connector.
a)
b)
c)
33SLAAC-1V Front
- Two memory daughter cards, one for X1, one for
X2, X0 connects to both.
- Add an I/O card on front to get access to
connector space.
Memory Card
Memory Card
I/O Card A
34SLAAC-1V Back
- Support LANL QC64 format I/O connectors for
high-speed external I/O.
- Combined with front panel, gives three external
I/O ports!
I/O Card B (QC64 IN)
I/O Card C (QC64 OUT)
35SLAAC-1V Status
- SLAAC-1V is up and running.
- Configuration, clock, LEDs, DMA demo (740
Mb/sec). - Memory board is under test.
- Interface and control software development
continues. - Three boards presently exist.
- Delivered one to BYU for JHDL integration,
another marked for Xilinx/VT LOKI team. - We are building several more for SLAAC team, ACS
community, and external customers over next few
months. - Contact Lauretta Carter lcarter_at_east.isi.edu for
details on availability and pricing.