Title: Interprocedural Optimization for Dynamic Hardware Configurations
1Interprocedural Optimization for Dynamic
Hardware Configurations
Elena Moscu Panainte
Koen Bertels
Stamatis Vassiliadis
- Computer Engineering
- TU DELFT
- The Netherlands
2OUTLINE
- Background
- Molen machine organization
- Molen programming paradigm
- Molen compiler
- Challenges
- Interprocedural Optimization Algorithm
- Results
- Conclusions
3The Molen Machine Organization
- Main components
- GPP
- Reconfigurable Processor
- Arbiter
- Exchange Registers
4The Molen Programming Paradigm (I)
- A one time architectural extension of few
instructions - Two instructions for controlling the FPGA
- SET ltaddressgt for hardware configuration
- EXECUTE ltaddressgt for controlling the execution
on the FPGA - Two move instructions for passing values to and
from the FPGA
5The Molen Compiler
Compiler
FCCM
C application
File_n.c
MAIN.c
SUIF frontend
Molen Extensions
Machine SUIF backend framework
ISA extension (SET/EXEC)
Register extension
PowerPC backend
Molen Optimizations
6OUTLINE
- Background
- Molen machine organization
- Molen programming paradigm
- Molen compiler
- Challenges
- Interprocedural Optimization Algorithm
- Results
- Conclusions
7Challenges
- One main shortcoming of current FCCMs
- huge reconfiguration latency (for SET
instruction) - MPEG2 encoder performance estimation
- Kernel speedups (10 100x)
- overall performance decrease ( 100x)
- Challenge hide the reconfiguration latency
8Solutions
- Hardware solutions
- Partial configurations
- Configuration Prefetching
- Compiler solution
- - Scheduling of SET instructions
- Software solution
- Application rewriting (code transformation)
9OUTLINE
- Background
- Molen machine organization
- Molen programming paradigm
- Molen compiler
- Challenges
- Interprocedural Optimization Algorithm
- Results
- Conclusions
10Motivational Example
Goal anticipation of SET instructions at
interprocedural level
Initial
SAD 117084 DCT 1152 IDCT - 1152
Call sub-Graph for MPEG2 encoder
Final
SAD 1 DCT 1 IDCT - 1
11Motivational Example
Call sub-graph for MPEG2 encoder
12Step 1 Construction of the Call Graph
- We use suifbrowser package
- No indirect procedure calls
- The call graph is a DAG
MPEG2 Encoder
13Step 2 Propagation of Hardware Reconfigurations
- Interprocedural data-flow analysis
- Backward propagation
- For each procedure compute LRMOD and RMOD
- LRMOD(p) Rop, if p is executed on the FPGA
- Ø ,
otherwise -
14(No Transcript)
15Step 3 Conflict Propoagation and Instruction
Scheduling
- Compute CF for each procedure
-
- for each edge ltpi,pjgt in the call graph
- for each op in CF(pi) and
RMOD(pj)-CF(pj) - insert SET op in pi where pj is
called - for each op in RMOD(root) CF(root)
- insert SET op at the application entry
point
16putseq .. SET sad call
motion_estimation SET dct call
transform . SET idct call itransform
17OUTLINE
- Background
- Molen machine organization
- Molen programming paradigm
- Molen compiler
- Challenges
- Interprocedural Optimization Algorithm
- Results
- Conclusions
18Experimental Setup
- M-JPEG encoder
- input 30 frames from tennis, 256x256
- Hardware operations DCT, Quantization, VLC
- MPEG2 encoder
- input 3 standard test frames
- Hardware operations SAD, DCT, IDCT
19Results (I)
20Results (II)
21Conclusions
- The proposed interprocedural optimization can
significantly reduce the number of performed
reconfigurations - The anticipation of the SET instructions will
allow the hardware reconfigurations to be
performed in parallel with the GPP execution - Ongoing work
- Develop algorithms for optimal FPGA area
allocation for reconfigurable operations
22Thank you!
?