Interprocedural Optimization for Dynamic Hardware Configurations - PowerPoint PPT Presentation

1 / 22
About This Presentation
Title:

Interprocedural Optimization for Dynamic Hardware Configurations

Description:

Interprocedural Optimization for Dynamic Hardware Configurations ... DCT Quant cf. No cf [#SET] HW op. With interprocedural optimization. Initial. TU Delft ... – PowerPoint PPT presentation

Number of Views:46
Avg rating:3.0/5.0
Slides: 23
Provided by: Ele974
Category:

less

Transcript and Presenter's Notes

Title: Interprocedural Optimization for Dynamic Hardware Configurations


1
Interprocedural Optimization for Dynamic
Hardware Configurations
Elena Moscu Panainte
Koen Bertels
Stamatis Vassiliadis
  • Computer Engineering
  • TU DELFT
  • The Netherlands

2
OUTLINE
  • Background
  • Molen machine organization
  • Molen programming paradigm
  • Molen compiler
  • Challenges
  • Interprocedural Optimization Algorithm
  • Results
  • Conclusions

3
The Molen Machine Organization
  • Main components
  • GPP
  • Reconfigurable Processor
  • Arbiter
  • Exchange Registers

4
The Molen Programming Paradigm (I)
  • A one time architectural extension of few
    instructions
  • Two instructions for controlling the FPGA
  • SET ltaddressgt for hardware configuration
  • EXECUTE ltaddressgt for controlling the execution
    on the FPGA
  • Two move instructions for passing values to and
    from the FPGA

5
The Molen Compiler
Compiler
FCCM
C application
File_n.c
MAIN.c
SUIF frontend
Molen Extensions
Machine SUIF backend framework
ISA extension (SET/EXEC)
Register extension
PowerPC backend
Molen Optimizations
6
OUTLINE
  • Background
  • Molen machine organization
  • Molen programming paradigm
  • Molen compiler
  • Challenges
  • Interprocedural Optimization Algorithm
  • Results
  • Conclusions

7
Challenges
  • One main shortcoming of current FCCMs
  • huge reconfiguration latency (for SET
    instruction)
  • MPEG2 encoder performance estimation
  • Kernel speedups (10 100x)
  • overall performance decrease ( 100x)
  • Challenge hide the reconfiguration latency

8
Solutions
  • Hardware solutions
  • Partial configurations
  • Configuration Prefetching
  • Compiler solution
  • - Scheduling of SET instructions
  • Software solution
  • Application rewriting (code transformation)

9
OUTLINE
  • Background
  • Molen machine organization
  • Molen programming paradigm
  • Molen compiler
  • Challenges
  • Interprocedural Optimization Algorithm
  • Results
  • Conclusions

10
Motivational Example
Goal anticipation of SET instructions at
interprocedural level
Initial
SAD 117084 DCT 1152 IDCT - 1152
Call sub-Graph for MPEG2 encoder
Final
SAD 1 DCT 1 IDCT - 1
11
Motivational Example
Call sub-graph for MPEG2 encoder
12
Step 1 Construction of the Call Graph
  • We use suifbrowser package
  • No indirect procedure calls
  • The call graph is a DAG

MPEG2 Encoder
13
Step 2 Propagation of Hardware Reconfigurations
  • Interprocedural data-flow analysis
  • Backward propagation
  • For each procedure compute LRMOD and RMOD
  • LRMOD(p) Rop, if p is executed on the FPGA
  • Ø ,
    otherwise


14
(No Transcript)
15
Step 3 Conflict Propoagation and Instruction
Scheduling
  • Compute CF for each procedure
  • for each edge ltpi,pjgt in the call graph
  • for each op in CF(pi) and
    RMOD(pj)-CF(pj)
  • insert SET op in pi where pj is
    called
  • for each op in RMOD(root) CF(root)
  • insert SET op at the application entry
    point

16
putseq .. SET sad call
motion_estimation SET dct call
transform . SET idct call itransform
17
OUTLINE
  • Background
  • Molen machine organization
  • Molen programming paradigm
  • Molen compiler
  • Challenges
  • Interprocedural Optimization Algorithm
  • Results
  • Conclusions

18
Experimental Setup
  • M-JPEG encoder
  • input 30 frames from tennis, 256x256
  • Hardware operations DCT, Quantization, VLC
  • MPEG2 encoder
  • input 3 standard test frames
  • Hardware operations SAD, DCT, IDCT

19
Results (I)
  • M-JPEG encoder

20
Results (II)
  • MPEG2 encoder

21
Conclusions
  • The proposed interprocedural optimization can
    significantly reduce the number of performed
    reconfigurations
  • The anticipation of the SET instructions will
    allow the hardware reconfigurations to be
    performed in parallel with the GPP execution
  • Ongoing work
  • Develop algorithms for optimal FPGA area
    allocation for reconfigurable operations

22
Thank you!
?
Write a Comment
User Comments (0)
About PowerShow.com