Title: Timing Issues for DSM
1Timing Issues for DSM
2Caveats
- This talk is about a work in progress
- Much of the work is roughly described with the
idea of just communicating the general thrust. - Many details remain to be decided and currently
several algorithms are being programmed for
experimental purposes. - We are just in the middle of many studies and
depending on their results, the direction of the
project may change.
3Outline
- Introduction - DSM project at Berkeley
- Our timing abstraction and motivation
- Timing driven placement (wireplanning)
- slicing approach
- programming approach
- matching approach
- Iterated logic decomposition
- Logic rip-up and re-route
- Technology aspects
4Overview
- Two levels of approach
- electrical and technology level
- logic level using timing abstraction
- Electrical level used to insure reality
- predict technology dimensions
- place and wire transistors to create leaf cells
using Cadences LAS tool or CADABRA - extract parasitics using SPACE or FASTCAP
- simulate using SPICE with advanced BSIM model
5Overview
- Logic level works with a timing abstraction (to
be explained) - we need to be sure that abstraction is correct
(thus electrical experiments) - Currently cross-talk noise effects on timing
ignored - Immediate goal is to build combinational logic
macros that meet timing constraints - sequential circuits can be handled similarly
6Macro Problem Statement
- Given
- rectangular area, inputs and outputs on
perimeter. - required times on outputs, arrival times on
inputs. - set of logic functions to be synthesized
(possibly pin locations can be somewhat flexible)
- Find Logic decomposition of the functions that
can be - placed and wired in the given area
- meeting the timing constraints.
7Some Facts
- As dimensions shrink, gate delays decrease and
wire delays increase - in the limit all delays are in the wires.
- On a net, by a combination of buffer insertion
and wire sizing - delay of net from root to any leaf can be made
linear in the Manhatten distance from root to
leaf.
8Linear Delay
- By buffer insertion
- spacing is determined by resistance and
capacitance of the line and the buffers - optimum of optimum sized buffers makes the
delay linear
9Linear Wire Delay Model for a Net
y
x
Delay is made linear by buffer insertion and
wire and buffer sizing
10Timing Abstraction Linear Delay Model (LDM)
- Delay is linear function of the Manhatten
distance, independent of the logic it meets along
a path.
a
f
b
c
11Caveat
- So far we are not considering the effect of
cross-talk noise on delay
victim
aggressor
Victim can be slowed by aggressor if transitions
are opposing
12Common Divisors May Cause Paths to Stray
But in this example, the longest path is not
increased
13Example Where Longest Path Must be Increased
f
b
Any divisor h(a,b) common to both f and g cannot
be placed without increasing longest path
h
g
a
14Problem 1 Timing Driven Point Placement
- Given Area, Arrival and Required times, pin
positions, and a decomposition (netlist) - Find Point placement that satisfies all
timing constraints. - No consideration of areas required to implement
logic gates - Areas of gates can be approximated by count of
literals in factored form
15Pure Point Placement
congested area
f
g
a
b
c
16Problem 2 Placement with Area Constraints
- Areas are flexible. Leaf cell gates remain to
be built. Gates types remain to be determined
(PLAs, domino, PTL, etc.) - Three experimental wireplanning approaches
- slicing
- programming
- matching
17Slicing Approach
- Use simulated annealing to get point placement
- cost function for SA is derived by doing a delay
trace through the placed points - After SA, derive slicing structure from point
placement - Use flexibility of areas for final placement
18Slicing Approach
Hypothesis Can make slicing so that distances
are not perturbed too much from point placement
Distances are estimated now as Manhatten
distance center-to-center
Once we get slicing structure, we need to build
logic in blocks allocated
LDM implies that we can build the logic so that
delay lt distance across logic sub-block
19Programming Approach
- Get initial point placement with force directed
type method (or SA) - force points apart to provide space for areas
- this gives relative point positions
- Distribute slacks using zero slack distribution
- Formulate and solve LP
20LP Formulation
- Distributed slacks give bound on wire lengths,
dij - Assume aspect ratio given for each gate
- Point placement gives relative positions
All areas scaled by to guarantee
feasibility
21Matching Approach
- Divide area into minimum size squares
- Label each square with functions that it can
contain without violating timing
f
fg/abc gh/bc fh/ac
a
b
g
h
c
22Matching Approach
- Each logic gate fans out to set of primary
outputs (fg) and fans in from set of primary
inputs (abc) - Thus a gate is labeled say fg/abc
- Each gate is given an area (lits in FF)
- Want to match gates to squares so that squares
capacity is not violated.
23Iterated Decomposition
- Given netlist and current placement
- Select divisor that can be placed, still
satisfying timing constraints
smaller areas
some paths longer
after
24Iterated Decomposition
- Choose divisor that maximally decreases
- Algorithm
Get initial decomposition (say minimum
area) Selectively duplicate nodes and adjust
outputs Collapse local trees Global timing driven
placement Do select best divisor locally
adjust placement (reset global placement
after k divisors) Until area
constraints are met
25Fast Local Adjustment
- With slicing method, can insert new divisor into
slicing structure, get new placement and do delay
trace efficiently. - So we can accurately reflect area change as it
affects delay - With LP method, can also solve fast.
- Just need inequalities where areas may overlap
26Comments
- After k divisors selected and placed, re-do
global placement to better reflect all divisors - i.e. do total timing driven placement on new
netlist - Selective duplication and collapsing can be done
to improve timing during the iteration. - experimenting with how to choose this selective
collapsing
27Rewiring
- To alleviate timing further, rewiring can be done
- Can use SPFDs since exact logic in gate is
somewhat irrelevant. - SPFDs allow one wire to replace another
Gives more flexibility than redundancy
addition and removal
Uses that logic in blue box can be changed
28Technology Studies
- Guess at process dimensions for DSM
- strawman .25m process
- shrink to get .18m, ... , .05m processes
- Design and layout different complex gates
- Use Cadences LAS tool or Cadabra tool
- Extract parasitics using SPACE or FASTCAP
- Simulate with SPICE and Hus advanced BSIM model
- Verify LDM
29Strawman 0.05 um Process Interconnect
H/W 2.5/2.0
- 9 metal layers
- Copper wires and vias
- Polyimide dielectric (k2)
- H/W 2 for all layers except M9
- M9 kept same as .25 um process
- Insulator thickness .7m
H/W 2.4/1.2
Not to scale
H/W 1.6/0.8
H/W 0.6/0.3
H/W 0.14/0.07
30First Six Layers of Metal
Approximately to scale
31Design and Extract Flow
manual
wireplanning netlist decomposition
technology file
Hand design Standard Cell Domino Pass Transistor
Logic
test.blif
format?
LAS or Cadabra
test.gds
test.blifmv
constraint file
test.verilog
SPACE(3D)
test.gds
SPICE
0.25m... 0.18m... 0.10m... 0.05m...
...0.25m ...0.18m ...0.10m ...0.05m
interconnect technology parameters
transistor models
32Acknowledgements
- Richard Newton
- Alberto Sangiovanni
- Ralph Otten
- Wilsin Gosti
- Amit Narayan
- Philip Chong
- Mukul Prasad
- Amit Mehrotra
- Sunil Khatri
- Ravi Gunturi
- Subarna Sinha
- Hiroshi Murata
- IBM, Motorola, Intel, Fujitsu, Cadence
- SRC