HSRA: High-Speed, Hierarchical Synchronous Reconfigurable Array - PowerPoint PPT Presentation

1 / 16
About This Presentation
Title:

HSRA: High-Speed, Hierarchical Synchronous Reconfigurable Array

Description:

Attacks architecture and CAD impediments. pipeline the interconnect (4) ... Current FPGAs lack architectural and CAD support to reliably achieve high clock rates ... – PowerPoint PPT presentation

Number of Views:16
Avg rating:3.0/5.0
Slides: 17
Provided by: Andre286
Category:

less

Transcript and Presenter's Notes

Title: HSRA: High-Speed, Hierarchical Synchronous Reconfigurable Array


1
HSRAHigh-Speed, Hierarchical Synchronous
Reconfigurable Array
  • William Tsu, Kip Macy, Atul Joshi, Randy Huang,
  • Norman Walker, Tony Tung, Omid Rowhani, Varghese
    George,
  • John Wawrzynek, and André DeHon

BRASS Project University of California at Berkeley
2
Myth
  • FPGAs inherently run at an order of magnitude
    lower clock rates
  • than microprocessors.

3
Dont Believe It!
  • Example XC4000XL-09 (0.35mm)
  • Minimum clock low/high 2.3ns ? 4.6ns cycle
  • Composing
  • clock?Q 1.5ns
  • interconnect budget 1.5ns
  • logic?clock setup 1.6ns
  • 4.6ns

Also Von Herzen FPGA97, XC3100-09 ? 4ns
4
Cycle Comparison
FPGA cycles comparable to contemporary
microprocessors.
5
Outline
  • FPGA cycle times
  • Why low frequency?
  • Architecture and CAD for high frequency
  • HSRA
  • Experiments
  • Assessment

6
Why FPGA designs run slowly?
  • Few designs run at 200MHz...
  • 1. Limited application/user requirements
  • 2. Cyclic data dependencies
  • 3. Poor tool support
  • 4. Long interconnect delays
  • 5. Pipelining expensive?

7
HSRA
  • High-Speed, Hierarchical Synchronous
    Reconfigurable Array
  • Attacks architecture and CAD impediments
  • pipeline the interconnect (4)
  • balance retiming resources (5)
  • CAD for auto retiming (3)

8
HSRA Architecture
9
Pipelined Interconnect
10
Input Retiming
11
Flop Experiment 1
  • Pipeline and retime to single LUT delay per cycle
  • MCNC benchmarks to 256 4-LUTs
  • no interconnect accounting
  • average 1.7 registers/LUT (some circuits 2--7)

12
Add Interconnect Delays
13
Flop Experiment 2
  • Pipeline and retime to HSRA cycle
  • place on HSRA
  • single LUT or interconnect domain
  • same MCNC benchmarks
  • average 4.7 registers/LUT

14
Input Depth Optimization
  • Real design, fixed input retiming depth
  • truncate deeper and allocate additional logic
    blocks

15
Assessment
  • Cost
  • our designs 1.5? area of no pipelining
  • plausible ballpark for other designs
  • w/ 8 deep retiming, 20 BLB overhead
  • total 1.8? area
  • Running LUT?LUT delay on FPGA
  • 70 overhead for retiming
  • freq still vary with interconnect
  • Benefits
  • 2--17? higher frequency operation than
    unpipelined

? Net Area-Time win automation/consistency
16
Summary
  • No inherent reasons for FPGAs/RC arrays to run
    slower than microprocessors
  • Current FPGAs lack architectural and CAD support
    to reliably achieve high clock rates
  • HSRA demonstrates how to attack problems
  • retiming balance
  • interconnect pipelining
  • automated retiming
Write a Comment
User Comments (0)
About PowerShow.com