Global Delay Optimization using Structural Choices - PowerPoint PPT Presentation

1 / 11

About This Presentation

Title:

Global Delay Optimization using Structural Choices

Description:

Title: Recording Synthesis History Author: Alan Last modified by: Alan Created Date: 3/17/2006 1:04:40 AM Document presentation format: On-screen Show – PowerPoint PPT presentation

Number of Views:26

Avg rating:3.0/5.0

Slides: 12

Provided by: Alan204

Learn more at: https://people.eecs.berkeley.edu

Category:

more less

Transcript and Presenter's Notes

Title: Global Delay Optimization using Structural Choices

1
Global Delay Optimization using Structural Choices

Alan Mishchenko Robert Brayton
UC Berkeley
Stephen Jang
Xilinx Inc.

2
Overview

Motivation
Timing criticality
Restructuring for delay
Algorithm
Experimental results
Conclusions
Future work

3
Motivation

AIG is an And-Inverter Graph
AIG-based combinational logic synthesis is fast
and effective
AIG-based synthesis is area-oriented (except
balancing)
Needed Delay optimization in AIG-based synthesis
AIGs allow for accumulation of structural choices
Lehman et al, TCAD97 Chatterjee et al,
ICCAD05
Can leverage efficient technology mapper with
choices
Can lead to fast delay optimization (10 of
mapping time)

4
Distinctive Features

Traditional approach
For all timing-critical areas
Perform timing analysis
Generate alternative structures
Evaluate the improvement and decide is
transformation is accepted
Proposed approach
Perform timing analysis only once
For all timing-critical areas
Generate and store structural choices
Use technology mapper to pick and choose good
structures
Characteristics of the proposed approach
Fast because there is no repeated timing
analysis
Simple because it leverages AIG package and LUT
mapper
Effective because it makes decision in the
global space

5
Timing Criticality

Critical nodes
Used by many traditional algorithms
Critical edges
Used by our algorithm
We pre-compute critical edges of critical nodes
Reduces computation
An edge between critical nodes may not be
critical
See illustration edge 1?3

Primary outputs
4
4
3
3
2
2
1
1
Primary inputs
6
Delay-Oriented Restructuring

Using traditional MUX-restructuring
AKA generalized select transform

7
Overall Algorithm

mapped netlist performSpeedup (
subject graph S, // S is an And-Inverter
Graph
mapped netlist M, // M was previously
derived by tech-mapping of S
timing window w, // w is used to detect the
critical paths
logic depth l, // l is used to
detect a logic cone rooted at a node
edge count p ) // p limits the number
critical edges of the cone
perform timing analysis of M with unit-delay
or LUT-library model
pre-compute critical section of M as nodes n
such that 0 ? slack(n) ? w
pre-compute timing-critical edges connecting
these nodes
for each timing critical node n
find cone C of M that extends l
levels down from n
pick the set of timing-critical
edges V feeding into C
if the number of edges in V exceeds
p, continue
find logic cone C in S
corresponding to C in M
find variables V in S corresponding
to V in M
derive cofactors of the function of
C w.r.t. variables in V
build multiplexer tree C of the
cofactors using variables in V
add structural choice C C to the
subject graph S

8
Experimental Setup

Implemented in ABC as command speedup
Used FPGA technology mapper if
Verified the results using CEC engine cec
Experiments targeting 6-LUTs were run on an Intel
Xeon 2-CPU 4-core computer with 8Gb RAM.
Experimentally compared the following scripts
Without delay-optimization
(st dchoice if -C 16 -F 2)8
With delay-optimization
(st dchoice if -C 16 -F 2)4
(speedup if -C 16 -F 2)3
(st dchoice if -C 16 -F 2)4

9
Examples of LUT Libraries

A variable-pin-delay LUT library
1 1.0 0.2
2 1.0 0.2 0.3
3 1.0 0.2 0.3 0.4
4 1.0 0.2 0.3 0.4 0.45
5 1.0 0.2 0.3 0.4 0.45 0.55
6 1.0 0.2 0.3 0.4 0.45 0.55 0.65

The unit-delay LUT library 1 1.0
1.0 2 1.0 1.0 1.0 3 1.0
1.0 1.0 1.0 4 1.0 1.0 1.0 1.0 1.0
5 1.0 1.0 1.0 1.0 1.0 1.0 6 1.0
1.0 1.0 1.0 1.0 1.0 1.0
A variable-pin-delay LUT library with
wire-delays 1 1.0 0.4 2 1.0
0.4 0.5 3 1.0 0.4 0.5 0.6 4
1.0 0.4 0.5 0.6 0.65 5 1.0 0.4 0.5
0.6 0.65 0.75 6 1.0 0.4 0.5 0.6 0.65
0.75 0.85
LUT size
LUT area
LUT pin delays
10
Experimental Results
Time1 the runtime of AIG restructuring
only Time2 the total runtime of Speeup Geomean
geometric averages of columns Ratios ratios
of geometric averages
LUT number of LUTs Lev number of LUT
levels Delay delay using LUT library Total
total runtime of Baseline
11
Conclusions and Future Work

Developed a method that is
Fast because there is no repeated timing
analysis
Simple because it leverages AIG package and LUT
mapper
Effective because it makes decision in the
global space
Future work may include
measuring improvements after place-and-route
extending the algorithm to work for sequential
circuits
applying similar optimization for cost functions
other than delay (e.g. switching activity
minimization)