Integrating%20Post-programmability%20Into%20the%20High-level%20Synthesis%20Equation* - PowerPoint PPT Presentation

About This Presentation
Title:

Integrating%20Post-programmability%20Into%20the%20High-level%20Synthesis%20Equation*

Description:

This is work done by Kevin Fan and Manjunath Kudlur at UM. University of Michigan ... 6. Memory for decoded control. University of Michigan ... – PowerPoint PPT presentation

Number of Views:34
Avg rating:3.0/5.0
Slides: 16
Provided by: fank
Category:

less

Transcript and Presenter's Notes

Title: Integrating%20Post-programmability%20Into%20the%20High-level%20Synthesis%20Equation*


1
Integrating Post-programmability Into the
High-level Synthesis Equation
  • Scott Mahlke
  • Advanced Computer Architecture Laboratory
  • University of Michigan
  • Ann Arbor, MI USA
  • This is work done by Kevin Fan and Manjunath
    Kudlur at UM

2
Application Engines Differentiate Consumer SoCs
Slide Courtesy of Synfora
3
The HLS Equation
  • What about programmability?
  • How to deal with application changes?
  • Time to market

4
Substrate Determines Programmability
.5-5 MOPS/mW
10-100 MOPS/mW
Flexibility
Embedded Processor
DSP (e.g. TI 320CXX )
100-1000 MOPS/mW
Reconfigurable Processors (Maia)
Embedded
Factor of 100-1000
FPGA
Direct Mapped
Area or Power
Hardware
5
How Much Programmability?
Just Enough!
6
StreamRoller Approach
Loop 1
Frame Type?
Loop 2
Loop 3
Loop 4
Block 5

Application
7
LA Programmability Shortcomings
8
Programmable Loop Accelerator
CRF
Literals
Point-to-point Connections
Bus






Control Memory
Local Mem
/-
/
MEM
BR
Controlsignals
RR
RR
RR
RR
9
Mapping New Loops onto a PLA
Loop
Move Insertion
SMT Scheduling
Register Allocation
Control Signals
Machine description
Increment II
  • Large search space, few solutions
  • Op-centric approaches unable to find solutions
  • Satisfiability Modulo Theory (SMT) formulation to
    solve linear and SAT constraints simultaneously

10
Area Comparison 130nm Library
LA single function accelerator, PLA
programmable accelerator, OR1K OR-1200 processor
11
Power Comparison
1.0 power for single function LA, OR1K-equiv
performance equivalent processor
12
Efficiency Comparison
200 MIPS/mW
20 MIPS/mW
2 MIPS/mW
13
Programmability Assessment
Number of algorithm perturbations tolerated while
maintaining the same performance
14
Final Thoughts
  • Programmability not an all or nothing issue
  • Application accelerators need to be able to
    evolve
  • HLS targeted design generalizations yield a
    highly customized, but semi-programmable ASIC
  • Bottom line tradeoffs
  • PLA vs OR-1200 4 - 34x more power efficient, 30x
    smaller
  • PLA vs ASIC 2 - 9x worse power, 2x larger
  • Cost breakdown
  • Addressable register storage and generalized FUs
    most costly
  • Interconnect extensions less costly

15
For More Information
  • Modulo Scheduling for Highly Customized
    Datapaths to Increase Hardware Reusability, K.
    Fan, H. Park, M. Kudlur, and S. Mahlke, Proc.
    2008 International Symposium on Code Generation
    and Optimization, Apr. 2008, pp. 124-133.
  • Orchestrating the Execution of Stream Programs
    on Multicore Platforms, M. Kudlur and S. Mahlke,
    Proc. ACM SIGPLAN 2008 Conference on Programming
    Languages Design and Implementation, Jun. 2008.

http//cccp.eecs.umich.edu
Write a Comment
User Comments (0)
About PowerShow.com