Bitwidth Analysis with Application to Silicon Compilation - PowerPoint PPT Presentation

1 / 41
About This Presentation
Title:

Bitwidth Analysis with Application to Silicon Compilation

Description:

convolve (16) histogram (16) intfir (32) intmatmul (16) jacobi (8) life (1) median (32) ... convolve. histogram. intfir. intmatmul. jacobi. life. median ... – PowerPoint PPT presentation

Number of Views:35
Avg rating:3.0/5.0
Slides: 42
Provided by: marks255
Category:

less

Transcript and Presenter's Notes

Title: Bitwidth Analysis with Application to Silicon Compilation


1
Bitwidth Analysis with Application to Silicon
Compilation
Amit Chaudhari
  • Paper by Mark Stephenson,
  • Jonathan Babb, Saman Amarasinghe
  • MIT Laboratory for Computer Science
  • Princeton
  • _at_ ACM SIGPLAN conference on Programming Language
    Design and Implementation, Vancouver, British
    Columbia, June 2000

2
Goal
  • For a program written in a high level language,
    automatically find the minimum number of bits
    needed to represent
  • Each static variable in the program
  • Each operation in the program.

3
Usefulness of Bitwidth Analysis
  • Higher Language Abstraction
  • Enables other compiler optimizations
  • Synthesizing application-specific processors
  • Optimizing for power-aware processors
  • Extracting more parallelism for SIMD processors

4
Bitwidth Opportunities
  • Runtime profiling reveals plenty of bitwidth
    opportunities.
  • For the SPECint95 benchmark suite,
  • Over 50 of operands use less than half the
    number of bits specified by the programmer.

5
Analysis Constraints
  • Bitwidth results must maintain program
    correctness for all input data sets
  • Results are not runtime/data dependent
  • A static analysis can do very well, even in light
    of this constraint

6
Bitwidth Extraction
  • Use abundant hints in the source language to
    discover bitwidths with near optimal precision.
  • Caveats
  • Analysis limited to fixed-point variables.
  • The hints assume source program correctness.

7
The Hints
  • Bitwidth refining constructs
  • Arithmetic operations
  • Boolean operations
  • Bitmask operations
  • Loop induction variable bounding
  • Clamping operations
  • Type castings
  • Static array index bounding

8
1. Arithmetic Operations
  • Example
  • int a
  • unsigned b
  • a random()
  • b random()
  • a a / 2
  • b b gtgt 4

a 32 bits b 32 bits
a 31 bits b 32 bits
a 31 bits b 28 bits
9
2. Boolean Operations
  • Example
  • int a
  • a (b ! 15)

a 32 bits
a 1 bit
10
3. Bitmask Operations
  • Example

int a a random() 0xff
a 32 bits
a 8 bits
11
4. Loop Induction Variable Bounding
  • Applicable to for loop induction variables.
  • Example
  • int i
  • for (i 0 i lt 6 i)

i 32 bits
12
5. Clamping Optimization
  • Multimedia codes often simulate saturating
    instructions.
  • Example
  • int valpred
  • if (valpred gt 32767)
  • valpred 32767
  • else if (valpred lt -32768)
  • valpred -32768

valpred 32 bits
valpred 16 bits
13
6. Type Casting (Part I)
  • Example
  • int a
  • char b
  • a b

a 32 bits b 8 bits
a 8 bits b 8 bits
14
6. Type Cast1ing (Part II)
  • Example
  • int a
  • char b
  • b a

a 32 bits b 8 bits
a 8 bits b 8 bits
a 8 bits b 8 bits
15
7. Array Index Optimization
  • An index into an array can be set based on the
    bounds of the array.
  • Example
  • int a, b
  • int X1024
  • Xa X4b

a 32 bits b 32 bits
a 10 bits b 8 bits
a 10 bits b 8 bits
16
Propagating Data-Ranges
  • Data-flow analysis
  • Three candidate lattices
  • Bitwidth
  • Vector of bits
  • Data-ranges

a 4 bits
Propagating bitwidths
a a 1
a 5 bits
17
Propagating Data-Ranges
  • Data-flow analysis
  • Three candidate lattices
  • Bitwidth
  • Vector of bits
  • Data-ranges

a ??????1X
Propagating bit vectors
a a 1
a ?????XXX
18
Propagating Data-Ranges
  • Data-flow analysis
  • Three candidate lattices
  • Bitwidth
  • Vector of bits
  • Data-ranges

a lt0,13gt
Propagating data-ranges
a a 1
a lt1,14gt
19
Propagating Data-Ranges
  • Propagate data-ranges forward and backward over
    the control-flow graph using transfer functions
    described in the paper
  • Use Static Single Assignment (SSA) form with
    extensions to
  • Gracefully handle pointers and arrays.
  • Extract data-range information from conditional
    statements.

20
Example of Data-Range Propagation
a0 input() a1 a0 1
a1 lt 0
true
a2 a1(a1?0) a3 a2 1
a4 a1(a1?0) c0 a4
a5 ?(a3,a4) b0 arraya5
21
Example of Data-Range Propagation
a0 input() a1 a0 1
a1 lt 0
true
a2 a1(a1?0) a3 a2 1
a4 a1(a1?0) c0 a4
a5 ?(a3,a4) b0 arraya5
22
What to do with Loops?
  • Finding the fixed-point around back edges will
    often saturate data-ranges.
  • Instruction in loops comprise the bulk of
    dynamically executed instruction!

23
Their Loop Solution
  • Find the closed-form solutions to commonly
    occurring sequences.
  • A sequence is a mutually dependent group of
    instructions.
  • Use the closed-form solutions to determine final
    ranges.

24
Finding the Closed-Form Solution
  • a 0
  • for i 1 to 10
  • a a 1
  • for j 1 to 10
  • a a 2
  • for k 1 to 10
  • a a 3
  • ... a 4

25
Finding the Closed-Form Solution
  • a 0
  • for i 1 to 10
  • a a 1
  • for j 1 to 10
  • a a 2
  • for k 1 to 10
  • a a 3
  • ... a 4

26
Finding the Closed-Form Solution
  • a 0 lt0,0gt
  • for i 1 to 10
  • a a 1 lt1,460gt
  • for j 1 to 10
  • a a 2 lt3,480gt
  • for k 1 to 10
  • a a 3 lt24,510gt
  • ... a 4 lt510,510gt
  • Non-trivial to find the exact ranges

27
Finding the Closed-Form Solution
  • a 0 lt0,0gt
  • for i 1 to 10
  • a a 1 lt1,460gt
  • for j 1 to 10
  • a a 2 lt3,480gt
  • for k 1 to 10
  • a a 3 lt24,510gt
  • ... a 4 lt510,510gt
  • Non-trivial to find the exact ranges

28
Finding the Closed-Form Solution
  • a 0 lt0,0gt
  • for i 1 to 10
  • a a 1 lt1,460gt
  • for j 1 to 10
  • a a 2 lt3,480gt
  • for k 1 to 10
  • a a 3 lt24,510gt
  • ... a 4 lt510,510gt
  • Can easily find conservative range of lt0,510gt

29
Solving the Linear Sequence
  • a 0
  • for i 1 to 10
  • a a 1
  • for j 1 to 10
  • a a 2
  • for k 1 to 10
  • a a 3
  • ... a 4
  • Figure out the iteration count of each loop.

30
Solving the Linear Sequence
  • a 0
  • for i 1 to 10
  • a a 1
  • for j 1 to 10
  • a a 2
  • for k 1 to 10
  • a a 3
  • ... a 4

lt1,10gt
lt1,100gt
lt1,100gt
  • Find out how much each instruction contributes to
    sequence using iteration count.

31
Solving the Linear Sequence
  • a 0
  • for i 1 to 10
  • a a 1
  • for j 1 to 10
  • a a 2
  • for k 1 to 10
  • a a 3
  • ... a 4

lt1,10gt
lt1,10gtlt1,1gtlt1,10gt
lt1,100gt
lt1,100gtlt2,2gtlt2,200gt
lt1,100gt
lt1,100gtlt3,3gtlt3,300gt
(lt1,10gtlt2,200gtlt3,300gt)?lt0,0gtlt0,510gt
  • Sum all the contributions together, and take the
    data-range union with the initial value.

32
Results
  • Standalone Bitwise compiler.
  • Bits cut from scalar variables
  • Bits cut from array variables
  • With the DeepC silicon compiler.

33
Percentage of Original Scalar Bits
34
Percentage of Original Array Bits
35
DeepC Compiler Targeted to FPGAs
C/Fortran program
Suif Frontend
Pointer alias and other high-level analyses
Bitwidth Analysis
MachSuif Codegen
Raw parallelization
DeepC specialization
Verilog
Traditional CAD optimizations
Physical Circuit
36
FPGA Area
Without bitwise
With bitwise
2000
1800
1600
1400
1200
Area (CLB count)
1000
800
600
400
200
0
life (1)
sor (32)
intfir (32)
jacobi (8)
newlife (1)
parity (32)
adpcm (8)
median (32)
pmatch (32)
convolve (16)
intmatmul (16)
histogram (16)
mpegcorr (16)
bubblesort (32)
  • On average bitwidth optimized circuit used 57
    less area

Benchmark (main datapath width)
37
FPGA Clock Speed (50 MHz Target)
Without bitwise
With bitwise
150
125
100
XC4000-09 Clock Speed (MHZ)
75
50
25
0
life
sor
intfir
parity
jacobi
adpcm
newlife
median
pmatch
convolve
intmatmul
mpegcorr
histogram
bubblesort
38
Power Savings
Without bitwidth analysis
With bitwidth analysis
5
4.5
4
3.5
3
Average Dynamic Power (mW)
2.5
2
1.5
1
0.5
0
bubblesort
histogram
jacobi
pmatch
  • On average, analysis reduced power by 50.

39
Power Savings
  • C ? ASIC
  • IBM SA27E process
  • 0.15 micron drawn
  • 200 MHz
  • Methodology
  • C ? RTL
  • RTL simulation ? Register switching activity
  • Synthesis reports dynamic power

40
Summary
  • Bitwise a scalable bitwidth analyzer
  • Standard data-flow analysis
  • Loop analysis
  • Incorporate pointer analysis
  • Demonstrated savings when targeting silicon from
    high-level languages
  • 57 less area
  • up to 86 improvement in clock speed
  • less than 50 of the power

41
  • Thank You
Write a Comment
User Comments (0)
About PowerShow.com