Optimizing high speed arithmetic circuits using threeterm extraction - PowerPoint PPT Presentation

About This Presentation
Title:

Optimizing high speed arithmetic circuits using threeterm extraction

Description:

Optimizing high speed arithmetic circuits using threeterm extraction – PowerPoint PPT presentation

Number of Views:50
Avg rating:3.0/5.0
Slides: 25
Provided by: jau74
Learn more at: https://cseweb.ucsd.edu
Category:

less

Transcript and Presenter's Notes

Title: Optimizing high speed arithmetic circuits using threeterm extraction


1
Optimizing high speed arithmetic circuits using
three-term extraction
  • Anup Hosangadi
  • Ryan Kastner Farzan
    Fallah
  • ECE Department
    Fujitsu Laboratories
  • University of California, Santa Barbara
    of America

2
Outline
  • Carry Save Arithmetic
  • Related Work
  • Problem formulation
  • Algebraic methods
  • Delay aware optimization
  • Experimental results

3
Carry Save Arithmetic
  • Multi-Operand addition
  • F A B C D E F
  • Carry propagation major bottleneck
  • Fast adders Carry Lookahead Adder (CLA), Carry
    Select Adders, not fast enough
  • Solution Eliminate Carry propagation to the
    final step
  • Generate Sums and Carries separately
  • Treat them as separate numbers
  • Keep adding till only two numbers remain
  • Add the numbers using fast adder (CLA)

4
Carry Save Arithmetic
Delay 3 log2(M 3)
S
S
C
C
Tree height log1.5(N/2)
3 height of CSA tree M bitwidth of operands
C
S
S
C
CLA
F
5
Carry Save arithmetic
Using Ripple carry adders (RCAs)
(M 1)
(M 2)
Delay (M5) 4
(M 3)
(M 4)
Delay thru CSA network 3 log1.5(M 3)
(M 5)
6
Related Work
  • Kim et. al Arithmetic optimization using Carry
    Save Adders, DAC98

D
E
7
Related Work
  • Kim. et. al Optimal allocation of CSAs,
    ICCAD99
  • Delay aware CSA allocation
  • Kim et. al High performance, low power
    synthesis, DAC2000
  • SynopsysTM Behavioral optimization for arithmetic
    (BOA)
  • A.Verma and P.Ienne Improved use of the carry
    save representation for the synthesis of complex
    arithmetic circuits, ICCAD2004

8
Problem formulation
  • No methodology for detecting redundancy in CSA
    computations
  • Can reduce the number of CSAs
  • Can reduce the number of wires
  • Common subexpression elimination
  • Standard compiler technique
  • Applied to 2-term arithmetic operations
  • Polynomial expressions (ICCAD04, VLSI05)
  • Constant multiplications (ASAP04, ASPDAC05)
  • CSA expressions (Common 3-term subexpressions)

9
Problem formulation
Y1 X1 X1ltlt2 X2 X2ltlt1 X2ltlt2 Y2 X1ltlt2
X2ltlt2 X2ltlt3
D1 X1 X2 X2ltlt1 Y1 (D1S D1C) X1ltlt2
X2ltlt2 Y2 (D1S D1C)
10
Algebraic methods
  • Polynomial transformation
  • Xltlti XLi
  • Detects shifted common subexpressions and also
    extends to multiple variables

C X ?(XLi)
(14)10 X (1110)2 X
Xltlt3 Xltlt2 Xltlt1 XL3
XL2 XL1
(100-10)CSD X XL4 XL1
11
Algebraic methods
  • 3-term divisors All potential common
    subexpressions
  • Divisor generation
  • One for every combination of 3 terms
  • eg. F1 X1 X1L2 X2 X2L X2L2
  • d1 X1L2 X2L X2L2
  • MinL L
  • Divisor D1 d1/L X1L X2 X2L
  • of divisors
  • Theorem
  • There exists a 3-term common subexpression iff
    there exists a non-overlapping intersection among
    the set of 3-term divisors

12
Algebraic methods
  • Greedy Iterative algorithm
  • Extracts the best 3-term divisor
  • Rewrites the expressions containing it
  • Terminates when there are no more common
    subexpressions

F1 a b c d e F2 a b c d f
F1 D1S D1C d e F2 D1S D1C d f
F1 D2S D2C e F2 D2S D2C f
gtgt D1 a b c
gtgt D2 D1S D1C e
13
Algebraic methods
  • Algorithm details

Optimize (Pi) Pi Set of
expressions in polynomial form D Set
of divisors f // Step 1. Creating divisors
and their frequency statistics for each
expression Pi in Pi Dnew
Divisors(Pi) Update frequency
statistics of divisors in D D
D Dnew //Step 2. Iterative
selection and elimination of best divisor
while (1) Find d divisor in
D with most number of
non-overlapping intersections if
(d NULL) break Rewrite affected
expressions in Pi using d Remove
divisors in D that have become invalid
Update frequency statistics of affected
divisors Dnew Set of new divisors
from new terms added by
division D D Dnew

14
Algebraic methods
  • Algorithm complexity
  • M expressions, each with N terms
  • Divisor generation M O(MN3)
  • Iterative algorithm, worst case
  • N terms reduced to 2 terms (N -2) steps
  • M expressions O(MN) steps

15
Delay aware optimization
  • Sharing subexpressions can increase the total
    delay
  • Traditional high level synthesis approach Reduce
    delay by Tree Height Reduction (THR)
  • Our solution Control delay during optimization
    itself
  • Optimal delay CSA allocation (T.Kim, J.Um,
    Timing driven synthesis, ASPDAC2000)
  • Use this to get minimum possible delay

F1 a(2) b(0) c(0) d(0) e(0) F2 a(2)
b(0) c(0) d(0) f(0)
16
Delay aware optimization
  • Optimal allocation Delay ignorant
    extraction

Delay(F1) Delay(F2) 3 D(Add)
17
Delay aware extraction
  • Control delay during optimization
  • Evaluate each candidate divisor for delay
  • Only consider those divisors that do not increase
    the delay

F1 a(2) b(0) c(0) d(0) e(0) F2 a(2)
b(0) c(0) d(0) f(0)
Delay 5 D(Add)
F1 D1S(3) D1C(3) d(0) e(0) F2 D1S(3)
D1C(3) d(0) f(0)
Delay 5 D(Add)
gtgt D1(3) a(2) b(0) c(0)
18
Delay aware extraction
  • Control delay during optimization
  • Evaluate each candidate divisor for delay
  • Only consider those divisors that do not increase
    the delay

F1 a(2) b(0) c(0) d(0) e(0) F2 a(2)
b(0) c(0) d(0) f(0)
Delay 3 D(Add)
F1 D2S(1) D2C(1) e(0) a(2) F2 D2S(1)
D2C(1) f(0) a(2)
Delay 3 D(Add)
gtgt D2(1) b(0) c(0) d(0)
19
Experimental results
  • Comparing of CSAs

Average 38.4 reduction
20
Experimental results
  • Synthesis for Standard Cell Designs
  • SynopsysTM Design compiler
  • 0.25 micron library
  • Synthesized for minimum delay

Avg 32.7 Area reduction Avg 3.7 increase in
delay
21
Experimental results
  • FPGA synthesis
  • Virtex II FPGAs
  • Synthesized designs and performed place route

Avg 14.1 reduction in Slices and Avg 12.9
reduction in LUTs Avg 5.7 increase in the delay
22
Experimental results
  • Evaluate Delay aware extraction algorithm
  • Consider different arrival times of the signals
  • Assume delay dominated by gate delay (FA delay)
  • Only consider best case delay

Best delay with 15.5 increase in CSAs
23
Conclusions
  • First methodology for common subexpression
    elimination for Carry Save Arithmetic
  • Significant area/power reduction
  • Delay aware optimization algorithm also developed
  • Can be combined with CSA tree extraction methods
    for actual application improvement

24
Thank you!!
  • Questions?
Write a Comment
User Comments (0)
About PowerShow.com