Title: The DARPA Dynamic Programming Benchmark on a Reconfigurable Computer
1The DARPA Dynamic Programming Benchmark on a
Reconfigurable Computer
Luis E. Cordova, Duncan A. Buell, and Sreesa
Akella Department of Computer Science and
Engineering University of South Carolina
Contributions of research
Justification
Objectives
Architecture exploration
- We have
- Designed and implemented benchmark 2 of the DARPA
HPCS Discrete Mathematics problems - Developed a methodology to map similar algorithms
to reconfigurable hardware platforms - Explore architectures and their trade-offs in
area, bandwidth, power consumption, and
parallelism
We are able to explore a large number of
possible architecures providing different
trade-offs between parallelism, economy of
resources, and throughput.
- High performance computing benchmarking
- Compare and improve the performance of
- reconfigurable computers against other
- supercomputers, distributed and massively
- parallel processing machines
- Design and Benchmark a Dynamic Programming
problem to - Understand the advantages of reconfigurable
supercomputing machines - Study and devise a methodology for optimal
mapping of algorithms - Study the scalability of algorithms with the size
of the input and with the size of the system - Explore limitations of reconfigurable computers
- Justify and propose architectural performance
improvements to important problems
Sequencing loop scheme
(a)
(b)
Dynamic Programming Problem
Specification of top performance is an ANSI C
file.
Transformation 1 column-wise reading
Two sequencing architectures optimized for area
(a) and memory bandwidth (b).
Maximizing loop scheme
Limitation We find a need to automate higher
level compilation steps at the problem level
this step requires specialized or expert
knowledge on the field of application that is
being studied.
Fully registered matrix architecture
Transformation 2 row-wise reading
Maximizing loop
Two maximizing architectures. The architecture
reading in row-wise fashion (transformation 2)
offers higher performance than 1.
Sequencing loop
High level view of hardware platform
SRAM on-board-memory banks
Reconfigurable Computing Methodology The entire
design is based on standard high-level
programming languages, ANSI C or Fortran. There
is a seamless path between the naïve version of
the algorithm coded in C language to a version
mapped to the specific SRC platform. The
methodology is based on transformations of the
initial architecture to architectures that better
exploit the parallelism of the problem. The
effective utilization of the hardware resources
is assisted by the SRC high-level compiler. The
compiler aids in code debugging and elimination
of slowdowns in suboptimal architectures.
Storing all the matrices on-chip yields the top
performance. The architecture is detailed as
above in the 3-d figure for the first matrix.
FPGA user logic chips
v