Warp Processing: Making FPGAs Ubiquitous via Invisible Synthesis - PowerPoint PPT Presentation

1 / 26
About This Presentation
Title:

Warp Processing: Making FPGAs Ubiquitous via Invisible Synthesis

Description:

none – PowerPoint PPT presentation

Number of Views:40
Avg rating:3.0/5.0
Slides: 27
Provided by: RomanL4
Category:

less

Transcript and Presenter's Notes

Title: Warp Processing: Making FPGAs Ubiquitous via Invisible Synthesis


1
Warp Processing Making FPGAs Ubiquitous via
Invisible Synthesis
  • Greg Stitt
  • Department of Electrical and Computer Engineering
  • University of Florida

2
Introduction
  • Improved performance enables new applications
  • Past decade - Mp3 players, portable game
    consoles, cell phones, etc.
  • Future architectures - Speech/image recognition,
    self-guiding cars, computation biology, etc.

3
Introduction
  • FPGAs (Field Programmable Gate Arrays)
    Implement custom circuits
  • 10x, 100x, even 1000x for scientific and embedded
    apps
  • Najjar 04He, Lu, Sun 05Levine, Schmit
    03Prasanna 06Stitt, Vahid 05,
  • But, FPGAs not mainstream
  • Warp Processing Goal Bring FPGAs into mainstream
  • Make FPGAs Invisible

FPGAs capable of large performance improvements
Performance
uP
FPGA
4
Introduction Hardware/Software Partitioning
C Code for FIR Filter
for (i0 i lt 16 i) yi ci
xi .. .. ..
for (i0 i lt 128 i) yi ci
xi .. .. ..
  • 1000 cycles

5
Introduction High-level Synthesis
  • Problem Describing circuit using HDL is time
    consuming/difficult
  • Solution High-level synthesis
  • Create circuit from high-level code
  • Gupta, DeMicheli 92Camposano, Wolf 91Rabaey
    96Gajski, Dutt 92
  • Allows developers to use higher-level
    specification
  • Potentially, enables synthesis for software
    developers

6
Introduction High-level Synthesis
  • Problem Describing circuit using HDL is time
    consuming/difficult
  • Solution High-level synthesis
  • Create circuit from high-level code
  • Gupta, DeMicheli 92Camposano, Wolf 91Rabaey
    96Gajski, Dutt 92
  • Allows developers to use higher-level
    specification
  • Potentially, enables synthesis for software
    developers

for (i0 i lt 16 i) yi ci xi
7
Problems with High-Level Synthesis
  • Problem High-level synthesis is unattractive to
    software developers
  • Requires specialized language
  • SystemC, NapaC, HandelC,
  • Requires specialized compiler
  • Spark, ROCCC, CatapultC,
  • Limited commercial success
  • Software developers reluctant to change tools

uP
FPGA
8
Warp Processing Invisible Synthesis
  • Solution Make synthesis invisible
  • 2 Requirements
  • Standard software tool flow
  • Perform compilation before synthesis
  • Hide synthesis tool
  • Move synthesis on chip
  • Similar to dynamic binary translation
  • Transmeta
  • But, translate to hw

9
Warp Processing Invisible Synthesis
  • Solution Make synthesis invisible
  • 2 Requirements
  • Standard software tool flow
  • Perform compilation before synthesis
  • Hide synthesis tool
  • Move synthesis on chip
  • Similar to dynamic binary translation
  • Transmeta
  • But, translate to hw

Warp processor looks like standard uP but
invisibly synthesizes hardware
uP
FPGA
10
Warp Processing Invisible Synthesis
  • Advantages
  • Supports all languages,compilers, IDEs
  • Supports synthesis of assembly code
  • Support synthesis of library code
  • Also, enables dynamic optimizations

Warp processor looks like standard uP but
invisibly synthesizes hardware
uP
FPGA
11
Warp Processing Background Basic Idea
1
Initially, software binary loaded into
instruction memory
Profiler
I Mem
µP
D
FPGA
On-chip CAD
12
Warp Processing Background Basic Idea
2
Microprocessor executes instructions in software
binary
Profiler
I Mem
µP
D
FPGA
On-chip CAD
13
Warp Processing Background Basic Idea
3
Profiler monitors instructions and detects
critical regions in binary
Profiler
Profiler
I Mem
µP
µP
beq
beq
beq
beq
beq
beq
beq
beq
beq
beq
add
add
add
add
add
add
add
add
add
add
D
FPGA
On-chip CAD
14
Warp Processing Background Basic Idea
4
On-chip CAD reads in critical region
Profiler
Profiler
I Mem
µP
µP
D
FPGA
On-chip CAD
On-chip CAD
15
Warp Processing Background Basic Idea
5
On-chip CAD converts critical region into control
data flow graph (CDFG)
Profiler
Profiler
I Mem
µP
µP
D
FPGA
Dynamic Part. Module (DPM)
On-chip CAD
16
Warp Processing Background Basic Idea
6
On-chip CAD synthesizes decompiled CDFG to a
custom (parallel) circuit
Profiler
Profiler
I Mem
µP
µP
D
FPGA
Dynamic Part. Module (DPM)
On-chip CAD
17
Warp Processing Background Basic Idea
7
On-chip CAD maps circuit onto FPGA
Profiler
Profiler
I Mem
µP
µP
D
FPGA
FPGA
Dynamic Part. Module (DPM)
On-chip CAD


18
Warp Processing Background Basic Idea
On-chip CAD replaces instructions in binary to
use hardware, causing performance and energy to
warp by an order of magnitude or more
8
Mov reg3, 0 Mov reg4, 0 loop // instructions
that interact with FPGA Ret reg4
Profiler
Profiler
I Mem
µP
µP
D
FPGA
FPGA
Dynamic Part. Module (DPM)
On-chip CAD


19
Warp Processing Background Basic Technology
  • Challenge CAD tools normally require powerful
    workstations
  • Develop extremely efficient on-chip CAD tools
  • Requires efficient synthesis
  • Requires specialized FPGA, physical design tools
    (JIT FPGA compilation)
  • Lysecky FCCM05/DAC04, University of
    Arizona

46x improvement 30 perf. penalty
JIT FPGA compilation
20
Warp Processing Initial Results
- Embedded Applications
  • Average speedup of 6.3x
  • Achieved completely transparently
  • Also, energy savings of 66

21
Thread Warping - Overview
for (i 0 i lt 10 i) thread_create( f, i
)
Multi-core platforms ? multi-threaded apps
Performance
OS schedules threads onto accelerators (possibly
dozens), in addition to µPs
Compiler
Very large speedups possible parallelism at
bit, arithmetic, and now thread level too
µP
µP
FPGA
Binary
f()
OS schedules threads onto available µPs
µP
µP
µP
f()
OS
OS invokes on-chip CAD tools to create
accelerators for f()
Thread warping use one core to create
accelerator for waiting threads
Remaining threads added to queue
22
Speedup from Thread Warping
  • Average 130x speedup

But, FPGA uses additional area
So we also compare to systems with 8 to 64 ARM11
uPs FPGA size 36 ARM11s
  • 11x faster than 64-core system
  • Simulation pessimistic, actual results likely
    better

23
Dynamic enables Custom Communication
Problem Best topology is application dependent
NoC Network on a Chip provides communication
between multiple cores
App1
µP
µP
Bus
Mesh
App2
µP
µP
Bus
Mesh
24
Dynamic enables Custom Communication
Problem Best topology is application dependent
NoC Network on a Chip provides communication
between multiple cores
App1
FPGA
Bus
Mesh
App2
Bus
Mesh
Warp processing can dynamically choose topology
25
Summary
  • Warp processors
  • Achieves performance advantages of FPGA without
    any extra effort
  • Invisible synthesis
  • Allows designers to use existing tools/languages
  • Enables dynamic hardware optimization
  • Thread warping
  • Dynamic synthesis of thread accelerators for
    multi-cores
  • Custom communication
  • Warp processing can adapt communication topology
    to needs of application or a particular workload

26
References
  • Patent
  • Warp Processor for Dynamic Hardware/Software
    Partitioning. F. Vahid, R. Lysecky, G. Stitt.
    Patent Pending, 2004
  • Hardware/Software Partitioning of Software
    Binaries G. Stitt and F. VahidIEEE/ACM
    International Conference on Computer Aided Design
    (ICCAD), 2002, pp. 164- 170.
  • Warp Processors R. Lysecky, G. Stitt, and F.
    Vahid. ACM Transactions on Design Automation of
    Electronic Systems (TODAES), 2006, Volume 11,
    Number 3, pp. 659-681.
  • Binary Synthesis G. Stitt and F. Vahid Accepted
    for publication in ACM Transactions on Design
    Automation of Electronic Systems (TODAES)
  • Expandable Logic G. Stitt, F. Vahid Submitted
    to IEEE/ACM Conference on Design Automation
    (DAC), 2007.
  • New Decompilation Techniques for Binary-level
    Co-processor Generation G. Stitt, F. Vahid
    IEEE/ACM International Conference on Computer
    Aided Design (ICCAD), 2005, pp. 547-554.
  • Hardware/Software Partitioning of Software
    Binaries A Case Study of H.264 Decode G.Stitt,
    F. Vahid, G. McGregor, B. Einloth IEEE/ACM/IFIP
    International Conference on Hardware/Software
    Codesign and System Synthesis (CODES/ISSS), 2005,
    pp. 285-290.
  • A Decompilation Approach to Partitioning Software
    for Microprocessor/FPGA Platforms. G. Stitt and
    F. Vahid IEEE/ACM Design Automation and Test in
    Europe (DATE), 2005, pp.396-397.
  • Dynamic Hardware/Software Partitioning A First
    Approach G. Stitt, R. Lysecky and F. Vahid
    IEEE/ACM Conference on Design Automation (DAC),
    2003, pp. 250-255.
Write a Comment
User Comments (0)
About PowerShow.com