The Use of CarrySave Representation in Joint Module Selection with Retiming

1 / 27
About This Presentation
Title:

The Use of CarrySave Representation in Joint Module Selection with Retiming

Description:

Y ... (MFG) Cost modeling. Improved MILP model. Results. Conclusions ... Accurate models of the implementation costs associated with signal representation. ... –

Number of Views:51
Avg rating:3.0/5.0
Slides: 28
Provided by: zha61
Category:

less

Transcript and Presenter's Notes

Title: The Use of CarrySave Representation in Joint Module Selection with Retiming


1
The Use of Carry-Save Representation in Joint
Module Selection with Retiming
  • Zhan Yu, Kei-Yong Khoo and Alan N. Willson, Jr.
  • Integrated Circuits and Systems Laboratory
  • University of California, Los Angeles
  • zhanyu, khoo, willson_at_icsl.ucla.edu

2
Overview
  • Carry-save arithmetic has been widely used for
    high-speed applications.
  • Design automation using carry-save arithmetic has
    been exploit recently for limited types of
    arithmetic functions.
  • Our contributions
  • Allow the use of carry-save representation in the
    joint module selection with retiming step.
  • Formulate the problem as an MILP, with solution
    time reduction method able to solve practical
    design problems.
  • High-performance circuits are obtained 28 speed
    improvement and 47 area saving comparing with
    CATHEDRAL-III.

3
Outline
  • Introduction
  • Motivation
  • Backgrounds
  • Joint optimization with carry-save representation
  • Mixed-representation flow-graph (MFG)
  • Cost modeling
  • Improved MILP model
  • Results
  • Conclusions

4
Outline
  • Introduction
  • Motivation
  • Backgrounds
  • Joint optimization with carry-save representation
  • Mixed-representation flow-graph (MFG)
  • Cost modeling
  • Improved MILP model
  • Results
  • Conclusions

5
Introduction
  • An n-bit carry-save adder produces an arithmetic
    value SC, represented in sum and carry vectors S
    and C (carry-save (CS) representation).
  • A vector-merge adder performs carry-propagation,
    and generates the result in vector-merge (VM)
    representation.

Carry-Save Adder
Vector-Merge Adder
6
Motivation
in1
in1
in1
in2
in2
in2
multiply
add
register
vector-merge representation
?
in3
in3
in3
compare
carry-save representation
7
Previous Works
  • DAC98
  • T. Kim et al. Arithmetic optimization using
    carry-save adders but limited to certain
    operators and structures.
  • ICCAD98
  • K. Parhi et al. Scheduling with module selection
    and data format conversion discuss the use of
    mixed bit-serial, digit-serial and bit-parallel
    design styles.
  • CATHEDRAL-III
  • S. Note et al. Joint module selection and
    retiming optimization without carry-save
    representation.

8
Background
  • Synchronous data-flow graph (DFG)

in1
in1
w0
w0
in2
multiply
in2
w0
add
register
w1
w1
w0
in3
w0
in3
compare
DFG G0(V0,E0)
9
Retiming
  • Find an integer retiming variable r(v) for each v
    that satisfies MILP constraints
  • s(e) lt T.
  • r(u) - r(v) ? w(e), whenever .
  • - s(e) dv(ea , e) ? 0, ? ea that dv(ea , e) is
    defined.
  • s(ea) - s(e) ? T wr(ea) - dv(ea , e), ? ea that
    dv(ea , e) is defined.
  • s(e) is a real valued slack variable for each
    edge.
  • dv(ea ,e) is defined when there is a signal path
    from edge ea to e through v.
  • wr(e) is the number of registers on edge e after
    retiming
  • wr(e) w(e) r(u) r(v)
  • whenever .

10
MILP Model
  • ILP-based module selection
  • Define binary selection variable xv,t , v ?V0 ,
    t ?Mv .
  • ?t?Mv xv,t 1 for each v ?V0 ,
  • Delay dv(ea , e) ?t?Mv dv,t(ea , e) xv,t .
  • Joint module selection with retiming
  • Cost
  • module (vertex) cost C(v) ?t?Mv Cv,t xv,t
  • register (edge) cost C(e) Ce wr(e)
  • total cost C ?v C(v) ?e C(e)

11
Outline
  • Introduction
  • Motivation
  • Backgrounds
  • Joint optimization with carry-save representation
  • Mixed-representation flow-graph (MFG)
  • Cost modeling
  • Improved MILP model
  • Results
  • Conclusions

12
Mixed-Representation Flow-Graph
  • Problems using standard DFG
  • Does not support different signal number
    representation for multiple fanouts
  • Needs large module library for module selection
  • Does not allow insertion of registers inside
    module
  • Solution MFG
  • Inserts converter vertices to resolve signal
    representation mismatch
  • Allows the construction of smaller module library

13
Obtain MFG from DFG
in1
MFG vs. DFG E 2E0 V V0E0
in1
w0
w0
in2
w0
w0
in2
w0
c
w0
w0
w1
w1
converter vertex Vector-merge converter (VMC)
w1
w0
w0
c
c
w1
w0
in3
w0
in3
w0
c
w0
w0
DFG G0(V0,E0)
MFG G(V,E)
14
Module Selection on MFG
  • A vector-merge converter (VMC) module is needed
    at a converter vertex c
  • when the output of u is in CS form and the
    input of v is in VM form.
  • Denote Mvcs (Mvvm) as the set of hardware
    modules that implement v with CS (VM) output.
  • ?t?Mc xc,t ? ?t?Mucs xu,t ?t?Mvinvm xv,t - 1

c
15
Register Cost
  • Register cost model C(e) Ce wr(e) is not valid,
    since signal representation is unknown before
    module selection.
  • Denote wrcs(e) and wrvm(e) as the number of CS
    and VM registers on e after retiming, are
    computed by
  • wrvm(e) ? wr(e) K ?t? Mccs xu,t
  • wrcs(e) ? wr(e) K ?t? Mcvm xu,t
  • while minimizing
  • C(e) Ce,cs wrcs(e) Ce,vm wrvm(e)
  • Extend to e.

e
e
v
c
u
16
Multiple Fanouts
  • Multiple fanout subgraph f

e1
e1
c1
v1
e2
e2
c2
v2
u
eN
eN
cN
vN
Ef
Ef
DFG
MFG
17
Resource Sharing
  • VMC converter cost could be shared when
    wr(e1)wr(e2), and same type of VMC module is
    selected at c1 and c2.
  • Once VMC converters are shared, the registers in
    e1 and e2 could be shared.

18
VMC Sharing
e1
v1
c1
u
e2
v2
c2
  • Theorem 1 wr(e1)wr(e2) contains the minimum
    cost solution when VMC modules are selected at
    both c1 and c2,.
  • Need extra constraints that only allow
    wr(e1)wr(e2) when VMC allocated at both c1 and
    c2
  • wr(e1) - wr(e2) ? K (2 - ?t?Mc xc1 ,t - ?t?Mc xc2
    ,t )
  • wr(e2) - wr(e1) ? K (2 - ?t?Mc xc2 ,t - ?t?Mc xc1
    ,t )
  • The VMC cost at multiple fanout subgraph f is
    the shared cost
  • define binary variable xf,t , for all t ? Mc to
    count the appearance of module t in f
  • xf,t ? xck,,t , for all ck ? f , t ? Mc
  • minimize shared VMC cost C(f) ?t?Mc Ct xf,t

19
Register Sharing
  • Theorem 2 if VMCs are selected at both c1 and c2
    , and the number of registers on e1 and e2 are
    non-zero, there is a no-higher cost solution with
    same type of VMC selected at c1 and c2 .
  • Implies all VM registers on e1 and e2 are
    shared.
  • The shared VM register cost in Ef is
  • CEf,vm max(wrvm(e1) , , wrvm(eN) )

20
Improved MILP model
  • Equivalent solutions exist

e
e
e
e
u
v
c
u
v
c
  • Prune equivalent solutions forcing wr(e)0 when
    no VMC module is selected at c
  • wr(e) ? K ?t?Mc xc,t
  • The register cost on e is directly given by
  • C(e) Ce,vm wr(e)
  • Extend to multiple fanouts.

21
Outline
  • Introduction
  • Motivation
  • Backgrounds
  • Joint optimization with carry-save representation
  • Mixed-representation flow-graph (MFG)
  • Cost modeling
  • Improved MILP model
  • Results
  • Conclusions

22
Module Library
Extracted from Synopsys using LSI_10K library.
23
Benchmarks
24
Runtime Comparison
CPU time collected on Ultra-10 workstation,
using CPLEX MILP solver.
25
Circuit Speed Comparison
Result from the method used in CATHEDRAL-III
26
Circuit Cost Comparison
Result from the method used in CATHEDRAL-III
27
Conclusions
  • Combined the joint module selection and retiming
    technique with the use of carry-save
    representation.
  • Our contribution
  • A mixed-representation data-flow graph (MFG) that
    allows signals to be in the carry-save
    representation.
  • Accurate models of the implementation costs
    associated with signal representation.
  • A solution space pruning technique.
  • 28 faster and 47 smaller designs are achieved
    in our examples.
Write a Comment
User Comments (0)
About PowerShow.com