Compiler-Based Register Name Adjustment for Low-Power Embedded Processors

About This Presentation
Title:

Compiler-Based Register Name Adjustment for Low-Power Embedded Processors

Description:

... compilers allocate registers with minimal amount ... Register PermuTation ... Presented compiler-driven, power-aware register name adjustment (RNA) algorithm ... –

Number of Views:22
Avg rating:3.0/5.0
Slides: 12
Provided by: garobour
Learn more at: https://cseweb.ucsd.edu
Category:

less

Transcript and Presenter's Notes

Title: Compiler-Based Register Name Adjustment for Low-Power Embedded Processors


1
Compiler-BasedRegister Name Adjustment for
Low-PowerEmbeddedProcessors
  • Discussion by
  • Garo Bournoutian

2
Introduction
  • Problem High power consumption due to bit
    transitions on instruction bus
  • Current compilers allocate registers with minimal
    amount of spill/fill code
  • Importance Growing number of mobile,
    battery-powered devices.
  • Solution allows for longer battery-life, larger
    die sizes
  • Approach rearrange, rename registers in code to
    allow for minimal bit transitions

3
What is a bit transition?
  • Assembly Code Instruction Word
  • add r3, r2, r4 0011 0010 0100
  • sub r6, r3, r5 0110 0011 0101
  • sub r3, r2, r6 0011 0010 0110
  • mul r4, r4, r5 0100 0100 0101
  • Transitions / field 7 4 5
  • Total transitions 16

4
An Example
  • The original code had a total of 16 transitions
  • add r3, r2, r4
  • sub r6, r3, r5
  • sub r3, r2, r6
  • mul r4, r4, r5
  • The optimized code now has a total of 10
    transitions
  • add r6, r2, r4
  • sub r7, r6, r5
  • sub r6, r2, r7
  • mul r4, r4, r5
  • Just renaming r3 to r6 and r6 to r7, you have a
    37 reduction of bit transitions.

5
Formulation
  • Must map code in basic blocks to numerical
    structures
  • ld r5, (r1)0
  • add r3, r2, r5
  • add r4, r3, r2
  • mul r3, r4, r3
  • st r3, (r7)10

6
Heuristic Solution
  • Solving this problem for multiple basic blocks
    and literals is NP-Complete (Traveling Salesman
    Problem)
  • Effective, efficient heuristic solution for RNA
    requires two steps
  • Register PerturBation (RPB)
  • Maximizes distribution skew of register pairs
  • Register PermuTation (RPT)
  • Uses frequencies of register pairs to minimize
    hamming distance

7
Register PerturBation
  • Commutativity Transformation
  • add,mul, and, or operations
  • No side-effects in code performance
  • Linear Time Complexity
  • Dead Register Reassignment
  • r1 ? r2, r3 r1 ? r2, r3
  • r4 ? r1, r2 r2 ? r1, r2
  • r2 ? r3, r4 r2 ? r3, r2
  • Linear Time Complexity

8
Register PermuTation
  • Capture utilization frequency of register/literal
    pairs by means of Register Histogram Graph (RHG)
  • Directed graph
  • Nodes registers/literal
  • Edge between two nodes whose registers are
    consecutive in the code
  • Iterative search finds optimal encoding between
    each pair.
  • Complexity of O(ER2) R set of all
    registers
  • E number of edges

9
Application of Heuristic
  • Applied primarily on major application loops
  • Special care taken to preserve def-use chains
    between loops
  • Adds trivial number of instructions at hot
    spots
  • Profile information may be used to prioritize
    which order to visit basic blocks
  • Can be used within compilation system or as
    stand-alone tool operating on binary code

10
Experimental Results
(How they supported their findings)
  • Used modified version of SimpleScalar
  • Made Control Flow Graph for each Hot Spot
  • Generated Basic Block Frequencies
  • Encapsulated RPB and RPT into stand-alone module
  • Input the generated CFG into this module
  • Ran module on six different benchmarks
  • RPT Improvement as high as 25
  • RPB Improvement as high as 44

11
Conclusions
  • Presented compiler-driven, power-aware register
    name adjustment (RNA) algorithm
  • Formally defined as NP-Complete
  • Two efficient heuristics for attacking problem
  • RPB commutativity and dead register
    reassignment
  • RPT register pair frequencies and remappings
  • Significant power improvements resulting from
    compiler-based optimization (no additional
    hardware support needed)
  • Independent of ISA
  • Easily integrated within any compilation framework
Write a Comment
User Comments (0)
About PowerShow.com