Title: Standard Binaries for FPGAs"
1"Standard Binaries for FPGAs""eBlocks"
- Frank Vahid
- Professor
- Department of Computer Science and Engineering
- University of California, Riverside
- Associate Director, Center for Embedded Computer
Systems, UC Irvine - Work supported by the National Science
Foundation, the Semiconductor Research
Corporation, Xilinx, Intel, and Freescale - Contributing Students
- FPGAs Roman Lysecky (PhD 2005, Asst. Prof. at U.
Arizona), Greg Stitt (PhD 2007, Assistant Prof.
at U. Florida), Ann Gordon-Ross (Ph.D. 2007,
Asst. Prof. at U. Florida), David Sheldon (4th yr
PhD), Scott Sirowy (3rd yr PhD) - eBlocks Susan Lysecky (PhD 2006, Asst. Prof. at
U. Arizona), Ryan Mannion (3rd yr PhD), Shawn
Nemetebakshi (MS 2005), plus numerous
undergraduate students
2Software...
Microprocessor instructions
FPGA circuits
- Software is no longer just "instructions"
- The elephant of software has a (new) tail FPGA
circuits
3FPGAs The Quietly Arriving New Software
FPGAs implement circuits as software bits
downloaded into memory
Circuit
a
b
F
G
LUT
- FPGA (Field-programmable gate array) thousands
of LUTs and switch matrices, plus flip-flops,
multipliers, RAMs, etc. - Tools automatically compile circuits into bits
that program FPGA - Multi-billion dollar growing industry
- Increasingly found in embedded system products
medical devices, basestations, set-top boxes,
etc.
4Why Circuits (Sometimes) Beat Instructions
C Code for FIR Filter
Circuit for FIR Filter
for (i0 i lt 128 i) yi ci
xi .. .. ..
for (i0 i lt 128 i) yi ci
xi .. .. ..
- 1000s of instructions
- Several thousand cycles
In general, FPGA better due to circuit's
concurrency, from bit-level to task level
5Extensive Studies over Past Decade
- Large speedups on many important applications
- Numerous dedicated conferences (FPGA, FCCM, FPL,
...)
6New FPGA Compilers Start from Common Programming
Languages
FPGA Compiler
Synthesis
Profiling
- Commercial products appearing in recent years
- Good news
Micro- processor
FPGA
7Problem Best Temporal/Spatial Algorithms Differ
Platform
Algorithm
quicksort( array, left, right) if right gt
left pivot arrayleft newpivot
partition(array, left, right, pivot)
quicksort(array, left, newpivot -1)
quicksort(array, newpivot 1, right)
Bitonic Sorting Network
FPGA
8Bigger Problem Algorithms Matter Even More
- For portability, need algorithms that are
efficient on both - "Compromise programming"
9New FPGA Compilers Start from Common Programming
Langauges
- BUT Standard tools/binaries important for
"ecosystem" - Countless ideas failed for not respecting the
ecosystem
FPGA Compiler
Profiling
Micro- processor
FPGA
Languages
Standard binary
Architectures
Tools
10One Solution Hide the FPGA
- Today's microprocessors use dynamic translation
(e.g., x86) - Different architectures hidden
- For FPGAs Transparently translate standard
microprocessor binaries to FPGAs - Warp Processing
- Developed at UCR 2002-present
Translator
Translator
RISC architecture
VLIW architecture
11Warp Processing Background Basic Idea
1
Initially, software binary loaded into
instruction memory
Profiler
I Mem
µP
D
FPGA
On-chip CAD
12Warp Processing Background Basic Idea
2
Microprocessor executes instructions in software
binary
Profiler
I Mem
µP
D
FPGA
On-chip CAD
13Warp Processing Background Basic Idea
3
Profiler monitors instructions and detects
critical regions in binary
Profiler
Profiler
I Mem
µP
µP
beq
beq
beq
beq
beq
beq
beq
beq
beq
beq
add
add
add
add
add
add
add
add
add
add
D
FPGA
On-chip CAD
14Warp Processing Background Basic Idea
4
On-chip CAD reads in critical region
Profiler
Profiler
I Mem
µP
µP
D
FPGA
On-chip CAD
On-chip CAD
15Warp Processing Background Basic Idea
5
On-chip CAD converts critical region into control
data flow graph (CDFG)
Profiler
Profiler
I Mem
µP
µP
D
FPGA
Dynamic Part. Module (DPM)
On-chip CAD
16Warp Processing Background Basic Idea
6
On-chip CAD synthesizes decompiled CDFG to a
custom (parallel) circuit
Profiler
Profiler
I Mem
µP
µP
D
FPGA
Dynamic Part. Module (DPM)
On-chip CAD
17Warp Processing Background Basic Idea
7
On-chip CAD maps circuit onto FPGA
Profiler
Profiler
I Mem
µP
µP
D
FPGA
FPGA
Dynamic Part. Module (DPM)
On-chip CAD
18Warp Processing Background Basic Idea
On-chip CAD replaces instructions in binary to
use hardware, causing performance and energy to
warp by an order of magnitude or more
8
Mov reg3, 0 Mov reg4, 0 loop // instructions
that interact with FPGA Ret reg4
Profiler
Profiler
I Mem
µP
µP
D
FPGA
FPGA
Dynamic Part. Module (DPM)
On-chip CAD
Feasible for repeating or long-running
applications
19Recent Warp Results on Multi-Threaded Benchmarks
- After translation (may take minutes), huge
speedups - Even compared to 64-microprocessor system
Stitt/Vahid CODES/ISSS 2007 - Translation results remembered for later execution
20FPGAs as Software Challenges and Opportunity
- Challenge Broader definitions of...
- Compilation, OS, verification, certification,
etc. - Opportunity Can create custom multi-processor
architectures just by downloading bits - After all, a processor is just another circuit
Xilinx Virtex II Pro can hold dozens of "soft
core" 32-bit processors, in addition to the four
"hard core" PowerPCs
21FPGAs and Cyber-Physical Systems
- Tough to predict future of computing
- "Victorian planners in 1830 predicted that by
1930 London street traffic would be bogged down
under 25 feet of horse manure." - Are FPGAs to microprocessors what cars were to
horses? - Probably not, but perhaps they are what tails are
to horses? - We might do well to keep FPGAs in mind as we
consider software issues
22eBlocks The Wood-and-Nails of the Electronic
Sensor World
Frank Vahid Department of Computer Science and
Engineering University of California,
Riverside vahid_at_cs.ucr.edu http//www.cs.ucr.edu/
vahid Also with the Center for Embedded
Computer Systems at UC Irvine This work is
being supported by the National Science
Foundation Title suggested by a colleague
"eBlocks Empowering the People"
23Seen this Problem?
Not!
Available
- Technology everywhere Why no good solution?
24Shrinking Processor Size/Cost Enables New Solution
Courtesy of Joe Kahn
http//www.templehealth.org
- Make sensors smarter
- By adding processorbattery
- Becomes a "block" easily connected to other
blocks
25Shrinking Processor Size/Cost Enables New
Solution eBlocks
Existing component view
New "eBlock" view
26eBlocks
- Just connect blocks, and they work
- No programming knowledge, no electronics knowledge
LED
yes/no
27What's Hard (The Research Part)
- (1) Finding right set of building blocks
28What's Hard (The Research Part)
- (2) Making the blocks understandable
- People NOT likely to read directions
- Those that do are unlikely to understand
Performed extensive user testing (over 500
students, kids, and adults) over two years
Example Combine block
Most success
29What's Hard (The Research Part)
- (3) Batteries must last years, yet performance
should appear continuous - Blocks are off 99.9 of the time
Developed theory to map eBlock events to
continuous time
Developed custom CAD tool to automatically find
the best block parameter settings out of the
billions of possibilities
30Built gt100 Prototypes
- Size of deck of cards (eventually smaller)
- 2-3 years on 2 AA batteries (eventually longer)
- Can communicate via wire gt1.5 miles, 150 ft
wireless - Hundreds of trial users
- Integer blocks too
31eBlocks
- eBlock technology nearly mature
- Possible early-adoption applications
- Hearing impaired home monitoring
- Aging parent non-intrusive monitoring
- Middle-school early engineering experience kits
- Trends work in favor each year
- Smaller, cheaper (lt5), longer battery life (or
no battery?) - Eventual applications
- General blocks for home, stores, office
- Blocks as front-end devices to "smart home"
- New blocks and tools for intermediate/advanced
users - Collaborators
- Ph.D students Susan Lysecky, Ryan Mannion, David
Sheldon Profs. Harry Hsieh, Walid Najjar, Crista
Lopes (UC Irvine) UG students Andrea Coba,
Margaret Ukwu, Caleb Leak, and several others
32Graphical Simulator
- User specifies and tests block design
- Java-based simulator
- User chooses between pallets
- Blocks added by dragging
- User is able to configure various blocks by
clicking on switches - Connections created by drawing lines between
blocks - User can create, experiment, test and configure
design
Button
33eBlocks as a Programming Paradigm
- Use virtual blocks in graphical simulator to
describe desired sensor system behavior - Intuitive due to spatial emphasis, not temporal
emphasis - Automatically compile to code on programmable
eBlocks
34eBlock Tool Generates Code
- Tool generates C code automatically
C Code
- Download code to block with click of a button
- Ordinary users can write programs in minutes
- Spatial vs. temporally-oriented language
- 20 high school graduates eBlocks (spatial) vs.
LEGO Mindstorms (temporal), 6 example systems, 40
minutes to build
35eBlocks and Embedded Microprocessors
- Can greatly simplify coding
yes/no
1/0
Micro- processor
yes/no
1/0
36Introduction
- 1998 Simple Problem
- Garage door open at night
- No simple solution
- Off-the-shelf solutions costly, hard to find,
and/or not customizable - Alarm system cost overkill
- Connecting existing sensors, logic,
transmit/receive, LEDs is hard - Electronics, programming
- 2-week project 70 EE/CS unable
- Countless similar applications go unrealized
- Why can't I just connect those components like
"LegoTM" blocks?
37eBlocks Example
- "Garage Open at Night" detector
- lt10 minutes to build
Need to indicate garage open at night use LED
block
Detect night-time use Light Sensor block
Use Combine block to combine light sensor and
contact switch into one
Detect garage door open use Contact Switch block
Plug pieces together and the system is done!
38Graphical Simulator
- User specifies and tests block design
- Java-based simulator
- User chooses between pallets
39Graphical Simulator
- User specifies and tests block design
- Java-based simulator
- User chooses between pallets
- Blocks added by dragging
40Graphical Simulator
- User specifies and tests block design
- Java-based simulator
- User chooses between pallets
- Blocks added by dragging
41Graphical Simulator
- User specifies and tests block design
- Java-based simulator
- User chooses between pallets
- Blocks added by dragging
42Graphical Simulator
- User specifies and tests block design
- Java-based simulator
- User chooses between pallets
- Blocks added by dragging
43Graphical Simulator
- User specifies and tests block design
- Java-based simulator
- User chooses between pallets
- Blocks added by dragging
- User is able to configure various blocks by
clicking on switches
44Graphical Simulator
- User specifies and tests block design
- Java-based simulator
- User chooses between pallets
- Blocks added by dragging
- User is able to configure various blocks by
clicking on switches - Connections created by drawing lines between
blocks