Title: July 6, 2006
1???????????????????????????????????????
???? ?????
ECO Timing Optimization Using Spare Cells and
Technology Remapping
2Outline
- Introduction problem formulation
- Previous work and preliminaries
- Algorithm
- Experimental results
- Conclusions
3Outline
- Introduction problem formulation
- Previous work and preliminaries
- Algorithm
- Experimental results
- Conclusions
4Introduction
- ECO (Engineering Change Order) is usually
performed during the chip implementation cycle. - Change the design incrementally.
- When performing ECO to a placed design, change a
small portion of netlist to - optimize the chip timing.
- Functionality is unchanged.
- change chip functions.
- Logic bugs.
- New versions.
5Netlist Change Using Spare Cells
- Spare cells are designed for design changes after
placement, and they are distributed evenly on the
chip layout. - Using spare cells is an efficient way to do
netlist changes. - Save time and effort of re-placing the netlist
- Save production cost of masks
- It is getting more and more difficult in the
nanometer technology. - Circuit size is increasing substantially.
- Timing issues are hard to be considered when
changing netlist locally.
6Problem Formulation
- Given a placed chip layout,
- rewire the circuit using spare cells. There are
several techniques - gate sizing
- buffer insertion
- technology mapping
- shorten the delays and minimize the total
negative slack of all ECO timing paths.
slack -0.7
slack 0.0
slack -0.5
slack 0.0
before
after
7Outline
- Introduction problem formulation
- Previous work and preliminaries
- Algorithm
- Experimental results
- Conclusions
8Dynamic Programming
- Buffer insertion to a single net.
- van Ginneken et al. proposed a dynamic
programming framework for slack optimal buffer
insertion to a net.
b3
Load
gT2
Load
b2
RAT
gS
RAT
b1
Load
Load
Load
b4
gT3
RAT
RAT
RAT
gT1
9Path Based Buffer Insertion
- Shi et al. proposed a dynamic programming method
to perform buffer insertion and gate sizing to a
path by - Cut the timing violated paths into distinct paths
- View the gates on the path as special type
buffers and merge the whole path into a big
routing tree. - Perform gate sizing and buffer insertion
simultaneously to the routing tree. -
Start point
End point
NAND
OR
NAND type buffer
OR type buffer
AND type buffer
AND
10Logic Physical Co-synthesis
- Layout driven technology mapping
- Proposed by Stok et al.
- Place the base gates as an initial placement.
- Map the base gates using the coordinates as cost.
- Local netlist transformation
- Proposed by Lou et al.
- Identify parts of the placed netlist that violate
some target cost. - Extract those critical parts from the chip
placement. - Re-synthesis and re-place the extracted netlist
according to the target cost.
11Timing Model
- Synopsys Liberty library format
- Use lookup table to calculate gate delays.
- The gate delay and the output transition time are
functions of the output loading and the input
transition time.
Input Transition Time
Output capacitive loading
12Timing Model (contd)
- Output loading consists of
- input pin capacitance
- output pin capacitance
- wire loading
-
- FIs the amount of capacitance per unit wirelengh.
-
13Properties of The Timing Model
- Loading dominance
- Output loading has a larger effect on gate delay
and output transition time than input transition
time. (6.74x vs 1.48x) - Shielding
- Change of the netlist effects delay of neighbor
gates only.
gk
gj
gi
gk
gi
14Properties of The Timing Model (cont)
- A buffer chain with the same type BUFX1
Input slope
Output slope
delay
output slope
15Outline
- Introduction problem formulation
- Previous work and preliminaries
- Algorithm
- Overview
- Tracing ECO paths
- Dynamic cost programming
- Example
- Timing complexity analysis
- Technology remapping
- Experimental results
- Conclusions
16Optimization Flow
- Iterate the optimization loop until the total
negative slack reaches zero or no path can be
improved.
Extension
17Tracing ECO paths
- When doing STA (static timing analysis),
- store a pointer at each gate to point one of its
fan-ins with the largest arrival time. - Obtain the ECO path
- Trace this pointer from the end-point of the path
to the corresponding start-point.
Start point
End point
18Dynamic Cost Programming (DCP)
- Dynamic programming framework with dynamic cost
(3 steps) - View the gate as a special type buffer and
merge the whole ECO path as a big routing tree. - Perform gate sizing and buffer insertion
simultaneously from the end-point to the
start-point. - Perform one buffer insertion operation for each
net and one gate sizing operation for each gate. -
Start point
End point
NAND
OR
NAND type buffer
OR type buffer
AND type buffer
AND
19Dynamic Cost
- Unlike the traditional buffer insertion problem,
the buffering/sizing cost is dynamic because - all spare cells are candidates for
buffering/sizing. - number of spare cells are changing during the
optimization process. - Optimum solutions of sub-problems do not
necessarily result in the optimum one of the
overall problem. - Need to store a set of solutions for each
gate/net.
b1
ECO path 1
inserted buffer
S2
S3
b2
S1No buffer insertion
1
S1
S2Insert buffer b1
0
ECO path 2
S3Insert buffer b2
Path delay
20Solution Propagation during DCP
- Store each solution as a point on a plane if it
shortens the ECO timing path delays. - The two coordinates are
- inserted buffer
- approximated sub-path delays from the current
gate to the end point of the path. - Sized gates are not counted.
- Estimate the effect of operations without
actually applying them. - Generate solutions based on the solutions of the
driven gate/net.
inserted buffer
inserted buffer
b1
S2
S3
S2
S3
S5
S6
1
g1
S1
1
0
S1
S4
0
Path delay
b2
g2
Path delay
21Judgment of Operations
- The timing effect of a sizing/buffering operation
can be estimated by its effect on its fanins. - Buffer insertion operaion to net ni
- If delay(source of ni)delay(buffer)ltdelay(source
of ni), store the solutions corresponding to the
operation. - Gate sizing operation to gate gi
- If delay(spare cell)ltdelay(gi) and If
delay(fanin of gi)lt delay(fanin of gi), store
the solutions corresponding to the operation. - Timing of non-ECO paths are preserved after
optimization.
Net ni
gi
Buffer insertion
Gate sizing
22Bounding Box Theorem
- We find a theorem to greatly reduce
buffering/sizing candidates. - Assumption
- Gate delays are independent of the input
transition time. - The driving capabilities of the sized gate and
the sizing spare cell are the same. -
23widthdis(gE1,gE2)dis(gE1,gE3)(CEi1CEi2 )/F,
center gE1
gE2
nE1
gE1
gE3
24Bounding Box Theorem
25Bounding polygon
widthdis(gE1,gE2)dis(gE1,gE3) (CEo1 )/F,
center gE2
widthdis(gE1,gE4) (CEi1)/F, center gE4
gE2
gE1
gE4
gE3
widthdis(gE1,gE2)dis(gE1,gE3) (CEo1 )/F ,
center gE3
26Solution Pruning during DCP
- For each set of solutions, we keep at most k
solutions. (k is a user-defined parameter) - Discard non-dominant solutions.
- Classify these solutions by the number of used
buffers. - Keep the best solutions for each
class.
inserted buffer
3
1
2
1
1
0
Path delay
0
0
27End of DCP
- At the start point of the ECO path, choose the
solution which - meets the timing constraint
- uses the least number of buffers
- Change netlist according to the solution
- Run STA to update the timing information.
inserted buffer
3
Start point
2
End point
1
0
Path delay
clock cycle
28An Example for Complex ECO Paths
buffer type spare cell
Path Source Target Negative slack
P1 S1-T1
P2 S1-T2 medium
P3 S2-T3 small
gate type spare cell
large
zero
T1
small
zero
S2
zero
S1
P1
P1
Slack
P2
P2
P2
P2
P3
P3
P2
T2
FINISH
0
T3
LIST
29Timing Complexity Analysis of phase 1
- Parameters
- Gate count V
- spare cells N
- iterations of DCP L
- Max gates of ECO path M
- Keep at most k solutions per operation
- Complexity of DCPO(kMN)
- Complexity of STAO(V)
- Complexity of phase 1O( (kMNV)L )
30Extension Technology Remapping
- After DCP, we can further improve the circuit
timing by following steps - Identify timing critical parts of the netlist.
- Extract those parts from the netlist.
- Re-synthesize and map the extracted netlist.
- Decomposition by MVSIS
- Ideal mapping locations
- Technology mapping
- Run STA to update the timing information.
31Optimal Buffering to a Line
- The optimal buffering to a line is to insert
buffers with equal distance - No gate drives a too large loading.
Optimal buffering
Non-optimal buffering
32Ideal Mapping Locations
- Given locations of the input and output pins, map
the base gates evenly between the input and
output pins. - No gate drives a too large loading, and the path
delay is smaller. (Delay is proportional to
square of wirelength) - Makes buffer insertion easier.
inserted buffers
delay
Input A
Output
Input B
Input A
Output
Input B
33Calculating Ideal Mapping Locations
- From each path from one input pin to one output
pin, calculate ideal locations of every passed
base gate by equal distance. - If a base gate has more than one ideal location,
average these values and get a final ideal
location.
Input A
Output
Input B
Input A
Output
Input B
34Technology Mapping
- Consider actual locations of spare cells as
costs. - Cut the network into trees.
- Apply dynamic programming method to map each
tree. -
- Locations of mapped base gates are locations of
corresponding spare cells. - Locations of unmapped base gates are ideal
locations of base gates. - Insert buffers into mapped circuit to further
improve timing.
Input A
Output
Input B
35Maximum Independent Set
- For choosing global optimum solution of the
technology remapping, we store a set of match
solutions for each tree and use MIS to find the
best assignments.
Tree T2
Tree T1
g1
M2_2
M1_2
M2_3
g5
M1_1
g4
M2_1
M3_2
Tree T3
g2
g3
g6
M3_1
36Outline
- Introduction problem formulation
- Previous work and preliminaries
- Algorithm
- Experimental results
- Conclusions
37Experimental Results
- The five benchmarks are industrial designs.
- Our tool is run on Linux workstation with 3.2Ghz
CPU and 3GB memory.
38Experimental Results (contd)
- Our tool beat all competitors with the same
subject in the CAD contest 05. - We compare the results of our algorithm with
- the case without the aid of the bounding box
theorem. - a greedy wire cost heuristic.
39Experimental Results (contd)
Before optimization
After optimization
40Outline
- Introduction problem formulation
- Previous work and preliminaries
- Algorithm
- Experimental results
- Conclusions
41Conclusions
- We proposed a dynamic programming method
considering dynamic cost to solve the ECO timing
optimization problem. - Functional change considering timing is a tougher
work, and we will extend our work in this
direction.