Title: Global Delay Optimization using Structural Choices
1Global Delay Optimization using Structural Choices
- Alan Mishchenko Robert Brayton
- UC Berkeley
- Stephen Jang
- Xilinx Inc.
2Overview
- Motivation
- Timing criticality
- Restructuring for delay
- Algorithm
- Experimental results
- Conclusions
- Future work
3Motivation
- AIG is an And-Inverter Graph
- AIG-based combinational logic synthesis is fast
and effective - AIG-based synthesis is area-oriented (except
balancing) - Needed Delay optimization in AIG-based synthesis
- AIGs allow for accumulation of structural choices
Lehman et al, TCAD97 Chatterjee et al,
ICCAD05 - Can leverage efficient technology mapper with
choices - Can lead to fast delay optimization (10 of
mapping time)
4Distinctive Features
- Traditional approach
- For all timing-critical areas
- Perform timing analysis
- Generate alternative structures
- Evaluate the improvement and decide is
transformation is accepted - Proposed approach
- Perform timing analysis only once
- For all timing-critical areas
- Generate and store structural choices
- Use technology mapper to pick and choose good
structures - Characteristics of the proposed approach
- Fast because there is no repeated timing
analysis - Simple because it leverages AIG package and LUT
mapper - Effective because it makes decision in the
global space
5Timing Criticality
- Critical nodes
- Used by many traditional algorithms
- Critical edges
- Used by our algorithm
- We pre-compute critical edges of critical nodes
- Reduces computation
- An edge between critical nodes may not be
critical - See illustration edge 1?3
Primary outputs
4
4
3
3
2
2
1
1
Primary inputs
6Delay-Oriented Restructuring
- Using traditional MUX-restructuring
- AKA generalized select transform
7Overall Algorithm
- mapped netlist performSpeedup (
- subject graph S, // S is an And-Inverter
Graph - mapped netlist M, // M was previously
derived by tech-mapping of S - timing window w, // w is used to detect the
critical paths - logic depth l, // l is used to
detect a logic cone rooted at a node - edge count p ) // p limits the number
critical edges of the cone -
- perform timing analysis of M with unit-delay
or LUT-library model - pre-compute critical section of M as nodes n
such that 0 ? slack(n) ? w - pre-compute timing-critical edges connecting
these nodes - for each timing critical node n
- find cone C of M that extends l
levels down from n - pick the set of timing-critical
edges V feeding into C - if the number of edges in V exceeds
p, continue - find logic cone C in S
corresponding to C in M - find variables V in S corresponding
to V in M - derive cofactors of the function of
C w.r.t. variables in V - build multiplexer tree C of the
cofactors using variables in V - add structural choice C C to the
subject graph S
8Experimental Setup
- Implemented in ABC as command speedup
- Used FPGA technology mapper if
- Verified the results using CEC engine cec
- Experiments targeting 6-LUTs were run on an Intel
Xeon 2-CPU 4-core computer with 8Gb RAM. - Experimentally compared the following scripts
- Without delay-optimization
- (st dchoice if -C 16 -F 2)8
- With delay-optimization
- (st dchoice if -C 16 -F 2)4
- (speedup if -C 16 -F 2)3
- (st dchoice if -C 16 -F 2)4
9Examples of LUT Libraries
- A variable-pin-delay LUT library
- 1 1.0 0.2
- 2 1.0 0.2 0.3
- 3 1.0 0.2 0.3 0.4
- 4 1.0 0.2 0.3 0.4 0.45
- 5 1.0 0.2 0.3 0.4 0.45 0.55
- 6 1.0 0.2 0.3 0.4 0.45 0.55 0.65
-
The unit-delay LUT library 1 1.0
1.0 2 1.0 1.0 1.0 3 1.0
1.0 1.0 1.0 4 1.0 1.0 1.0 1.0 1.0
5 1.0 1.0 1.0 1.0 1.0 1.0 6 1.0
1.0 1.0 1.0 1.0 1.0 1.0
A variable-pin-delay LUT library with
wire-delays 1 1.0 0.4 2 1.0
0.4 0.5 3 1.0 0.4 0.5 0.6 4
1.0 0.4 0.5 0.6 0.65 5 1.0 0.4 0.5
0.6 0.65 0.75 6 1.0 0.4 0.5 0.6 0.65
0.75 0.85
LUT size
LUT area
LUT pin delays
10Experimental Results
Time1 the runtime of AIG restructuring
only Time2 the total runtime of Speeup Geomean
geometric averages of columns Ratios ratios
of geometric averages
LUT number of LUTs Lev number of LUT
levels Delay delay using LUT library Total
total runtime of Baseline
11Conclusions and Future Work
- Developed a method that is
- Fast because there is no repeated timing
analysis - Simple because it leverages AIG package and LUT
mapper - Effective because it makes decision in the
global space - Future work may include
- measuring improvements after place-and-route
- extending the algorithm to work for sequential
circuits - applying similar optimization for cost functions
other than delay (e.g. switching activity
minimization)