Title: Configurable Coprocessors
1Configurable Coprocessors
- William D. Bishop
- wdbishop_at_computer.org
- Wayne M. Loucks
- wmloucks_at_pads.uwaterloo.ca
2Presentation Outline
- Introduction to configurable coprocessors
- Motivation
- Definitions and concepts
- Niche applications
- Configurable coprocessor for CSIM
- CSIM A discrete-event simulation library
- Configurable coprocessor platform
- Pseudo-random numbers and event queues
- Looking towards the future
- Configurable coprocessors and virtual processors
3Motivation
"For a given class of problems, one set of basic
instructions may be more efficient than another
such set" John von Neumann, 1958
- The above statement can be extended to computer
hardware in the following way - Application-specific computer hardware may be
more efficient than general-purpose computer
hardware for solving a given class of problems
4Introduction to Coprocessors
- Definition of a coprocessor
- Coprocessors enhance performance using the
following - Hardware specialization
- Parallel computation
- Examples of popular coprocessors
- Video display adapters, math coprocessors, sound
cards
A coprocessor is a computing device that may be
added to a computer to provide application-specifi
c computer hardware to assist with the efficient
computation of a set of tasks.
5Introduction to Configurable Coprocessors
- Definition of a configurable coprocessor
- Configurable coprocessors offer the following
advantages - Increased control logic (a.k.a. processing units)
flexibility - Increased datapath (a.k.a. wiring) flexibility
- Ability to dynamically reconfigure the hardware
at runtime
A configurable coprocessor is a computing device
that may be added to a computer to provide
application-specific computer hardware that may
be modified at runtime to assist with the
efficient computation of a set of tasks.
6Introduction to Configurable Coprocessors
- Basic concept
- Build a single, configurable coprocessor board
that may be used to assist with the computation
of a wide variety of tasks - Design a set of application-specific hardware
designs suitable for programming the configurable
coprocessor board - Program and use the configurable coprocessor
board when performance enhancements are possible - Usefulness
- Best suited for applications that are not used
frequently but can benefit substantially from
acceleration when needed
7Introduction to Configurable Coprocessors
- The basic building block of a configurable
coprocessor is the High-Density Programmable
Logic Device (HDPLD). - Suitable HDPLDs have the following features
- Large capacity for digital hardware designs
- Electrically programmable in-system
- Support for high-speed reconfiguration
A Modern HDPLD The Altera 10K100 CPLD
8Niche Applications
- What are niche applications of configurable
coprocessors? - Applications that use bit-wise computations or
integer arithmetic - Performance improvements of 10? 1000? are
typical for niche applications - Examples of niche applications
- Image processing Athanas, 1995 (138? 236?)
- Cryptography Vuillemin, 1996 (10? 1000?)
- Hardware emulation Dubois, 1995 (123? 207?)
9Configurable Coprocessor for CSIM
- CSIM is a process-oriented, discrete-event
simulation library... - Popular applications of CSIM include simulating
queuing systems, assembly lines, and computer
hardware - Research goals
- Identify CSIM functions that might benefit from a
configurable coprocessor - Design and implement a library of
application-specific hardware designs to replace
CSIM functions - Evaluate the benefits of a configurable
coprocessor for CSIM
10CSIM Application Profiling
- Profiling with Intels VTune Performance Analysis
Tool revealed the following statistics on CSIM
for a simple FIFO queue example
11CSIM Choosing Suitable Functions
- Suitable functions have the following
characteristics - Computationally intensive
- Very little use of input, output or internal
registers - Potential to implement function in dedicated
hardware - Functions chosen for acceleration
- Pseudo-random number generation (streams and
distributions) - Event queue insertion and deletion (event
management)
12Configurable Coprocessor Platform
ARC-PCI Board
13Configurable Coprocessor System Components
14Pseudo-Random Number Generation
- Implemented the CSIM pseudo-random number
generation algorithm as a configurable
coprocessor... - Specifications
- 374 lines of VHDL code
- Utilizes 30 of an Altera 10K50 CPLD
- Achieves desired performance (greater than 33
MHz) - Configurable coprocessor system provides
identical results to the original software
implementationIts completely transparent!
NOTE The pseudo-random number generation
algorithm requires only 9 lines of C code!
15Pseudo-Random Number Generation Observations
- The enhanced version is slower. Why?
- ANSWER
- The time required to compute a random number on a
typical PC ranges from 80ns to 120ns for the CSIM
pseudo-random number generation algorithm - The time required to transfer a 64-bit quantity
using 33-bit PCI bus transfers is at least 330ns - A more complex computation is necessary to
justify the communication latency
16Event Queue Insertion and Deletion
- Implemented algorithms for event queue insertion
and deletion in a configurable coprocessor... - Specifications
- Min heap with 4096 entries
- Each entry has both a 32-bit key and a 32-bit
data element - Current implementation utilizes 50 of an Altera
10K50 CPLD - Achieves desired performance (greater than 33 MHz)
17Event Queue Insertion and Deletion Observations
- Simulation results indicate the following
- Performance depends upon the contents of the
event queue - Insertion and deletion can take as few as 10
clock cycles - For small heaps, speedup is unlikely due to the
communication latency of the PCI bus - For large heaps, speedup is possible
18Looking Towards the Future
19Future Applications for Configurable Coprocessors
- As HDPLDs increase in capacity and complexity,
the potential benefits of configurable
coprocessors will increase. - Future applications for configurable coprocessors
may include the following - Next-generation operating systems
- Security software for e-commerce and
telecommunications - Entertainment software
- Networking software
20Conclusions
- It is possible to completely hide the use of a
configurable coprocessor from the end-user. - Configurable coprocessors are not suitable for
simple tasks due to reconfiguration delays and
communication latency.
21Configurable Computing References
- Peter M. Athanas and A. Lynn Abbott. Addressing
the Computational Requirements of Image
Processing with a Custom Computing Machine An
Overview. In Proceedings of the Ninth
International Parallel Processing Symposium
Special Workshop on Reconfigurable Architectures
and Algorithms, Santa Barbara, California, April
1995. - Jean E. Vuillemin, Patric Bertin, Didier Roncin,
Mark Shand, Hervé H. Touati, and Philippe
Boucard. Programmable Active Memories
Reconfigurable Systems Come of Age. IEEE
Transactions on Very Large Scale Integration
(VLSI) Systems, 4(1)56-69, March 1996. - Michel Dubois, Alain Gefflaut, Jaeheon Jeong,
Adrian Moga, and Koray Oner. Multiprocessor
Emulation with RPM Early Experience. Technical
Report CENG95-23, University of Southern
California, Los Angeles, California, December
1995. - http//www.pads.uwaterloo.ca/wdbishop/arc-pci.htm
l