Title: An Efficient Technology Mapping Algorithm Targeting Routing Congestion Under Delay Constraints
1An Efficient Technology Mapping Algorithm
Targeting Routing Congestion Under Delay
Constraints
- Rupesh S. Shelar
- Intel Corporation
- Hillsboro, OR 97124
Prashant Saxena Synopsys Inc Hillsboro, OR 97124
Xinning Wang Intel Corporation Hillsboro, OR
97124
Sachin S. Sapatnekar University of
Minnesota Minneapolis, MN 55455
International Symposium on Physical Design San
Francisco April 5, 2005
2Outline
- Introduction
- Algorithm Overview
- Congestion Map Generation
- Slack-constrained Covering
- Results Conclusion
3Motivation
- Technology Scaling
- Routing resources growing at same rate?
- Upper metal layers for global signals
- Resistive (i.e., wide) wires
- Result Routing Congestion
4Targeting Routing Congestion
RTL
Technology Mapping
Placement
Routing
- Can be alleviated during routing, placement,
technology mapping, and logic synthesis - Limited flexibility during P R points to
technology mapping - Mapping decides wires
5Previous Work
- Structural logic synthesis
- Adhesion metric, Kudva et al.,TCAD03
- Computationally expensive
- Congestion-aware Technology Mapping using
- Wirelength, Stok et al., ICCAD01, Pandini et
al.,TCAD03 - a purely top-down single-pass
congestion-aware technology mapping is merely
wishful thinking. - Mutual contraction (MC), Liu et al., ISPD05
- Predictive probabilistic congestion, Shelar et
al., TCAD05 - Congestion map based on subject graph
6Outline
- Introduction
- Algorithm Overview
- Congestion Map Generation
- Slack-constrained Covering
- Results Conclusion
7Problem Definition
- Minimize routing congestion under delay
constraints during technology mapping - Dynamic programming for delay constraints
- Routing congestion captured by track overflow
and max. congestion - Minimize total track overflow under delay
constraints
8Employing Placement-level Metric
- Wirelength and mutual contraction cannot capture
track overflow - Predictive probabilistic congestion map can
- Same congestion map for different choices
- Can we instead employ placement-/routing-level
metric?
9Probabilistic Congestion Map
- Probabilistic congestion map, a post-placement
metric - Lou et al., TCAD02 Westra et al., ISPD04
10Chicken-and-Egg Problem
- Overflow computation requires congestion map
- Available after mapping
- Track overflow of a wire depends on other cones
also - Overflow due to Wire1 depends on Wire2 and vice
versa - Area or delay at Wire1 do not depend on Wire2
11Solution Overview
- Track overflow cannot be computed incrementally,
but congestion maps can. - Construct congestion maps using algebraic
operations - Defer track overflow computation to covering
- Requires congestion maps capturing all wires in
mapping solutions - Overcome the chicken-and-egg problem
- Construct congestion maps bottom-up during
matching - Compute track overflow during covering
12Outline
- Introduction
- Algorithm Overview
- Congestion Map Generation
- Slack-constrained Covering
- Results Conclusion
13The Matching Phase
Delay
M3
D2
M2
D1
M1
L1
L2
Load
- Store the load-delay curve containing
non-inferior delay matches - Performed for all nodes in topological order
- Compute congestion map for each non-inferior match
14Algebraic Addition for Congestion Maps
N2
N1
M1
N3
15Handling Multiple Fanouts
- For forward propagation, divide congestion maps
by the number of fanouts - Allows correct computation of maps for solutions
at POs
16Congestion Map Generation
- Congestion map for a match at a node represents
wires from the fan-in cone only - Add congestion maps for matches at POs to get
congestion map for an entire solution - Extensible to congestion based on fast global
routing - Applicable to generation and propagation of any
2-D maps, e.g., power-density map
17Outline
- Introduction
- Algorithm Overview
- Congestion Map Generation
- Slack-constrained Covering
- Results Conclusion
18Exploiting Slacks
Delay
1
60
2
M3
40
M2
3
M1
10
10
20
Load
- Classical covering choose an optimum delay match
- For Cload15, M2 is optimal with Delay 50
- Assume slack of 10
- M1 and M3 also satisfy delay constraints
- Allow non-delay-optimal matches on non-critical
paths - M1 or M3 preferred if the corresponding overflows
smaller
19Slack-constrained Covering
- Compute delays and slacks at the primary outputs
(POs) due to delay-optimal solution - Compute corresponding congestion map
- For all nodes in reverse topological order,
- Compute delay and track overflow due to
delay-optimal and congestion-optimal matches - If congestion-optimal match exists, store it
- Else, store delay-optimal match
- Propagate updated slacks to inputs of match
20 Extensions and Complexity
- Slack-constrained covering applicable for
- Different cost functions, e.g., maximum
congestion - Traditional objectives, e.g., area, power
- Time complexity
- Linear in number of nodes (for a fixed library
and layout area) - Run-times practical
- Memory complexity
- High memory requirement due to congestion map
storage for all matches - Asymptotically same as conventional
- Memory efficient variants possible
- Current implementation applicable up to 5,000
cells - Ideal for ECO mode hot-spot (re-)synthesis
21Outline
- Introduction
- Algorithm Overview
- Congestion Map Generation
- Slack-constrained Covering
- Results Conclusion
22Experimental Setup
- Mapping algorithm incorporated in SIS
- Capo for placement
- Timing driven routing
- ISCAS85 benchmarks
- 100 nm process parameters from Predictive
Technology Model - Library enhanced lib2.genlib with up to 4
strengths for each gate - Experiments on 400 MHz Sun Ultra Sparc 60
- Comparison with conventional mapping in SIS
23Track Overflow Comparison
24Maximum Congestion Comparison
25Delay Comparison
26Row-utilization Comparison
27Run-time Comparison
28Summary of Experimental Results
- Track overflows 44 better
- Delays no adverse impact
- Maximum congestion 25 better
- Row-utilization no significant correlation
- Run-times 2x worse, but still practical
29Conclusion
- Presented a delay-optimal mapping algorithm to
minimize routing congestion - Validated effectiveness on benchmark circuits
- Algorithmic framework applicable for optimization
of other cost functions and properties - Future directions
- Implementation of memory efficient version
- Placement-legalization based flow
- Application to ECO-mode logic (re-)synthesis
30Backup
31Analogy with Classical Matching
- Mapping for area optimization under delay
constraint, Chaudhary et al., TCAD95 - Similarities
- The gate-area for a match at a given node
represents gates only due to the nodes in the
fan-in cone - Similarly, congestion map for a match at a given
node represents wires due to the nodes in fan-in
cone - Gate-area divided at multiple fanout points
- Congestion-maps divided at multiple fanout points
- Differences
- Ensures delay optimality
- Wire-delays accounted for in the delay
computation - Routing congestion more complex than gate-area
32Experimental Results
Ckt. Area (µ2) RU () Overflow (Gain ) Delay (ps)
C1355 3439 80 81 227 134 40 789 786
C1908 3616 80 80 323 225 30 1059 1042
C2670 11707 75 77 417 167 59 1258 1240
C3540 25994 75 80 1078 294 72 1655 1632
C432 1962 80 82 66 49 25 854 842
C499 3550 80 79 262 135 48 823 821
C5315 17265 75 77 1100 289 73 1120 1114
C6288 21379 80 80 515 452 12 4771 4731
C7552 28223 75 73 1343 547 59 1341 1309
C880 3944 80 76 378 260 31 890 884
Avg. 78 78 554 255 44 1455 1439
33Experimental Results (Continued)
Ckt. MC of Cells Run-time (s)
C1355 1.70 1.30 621 592 11 12
C1908 1.70 1.40 578 571 12 13
C2670 1.65 1.20 1482 1426 24 51
C3540 2.25 1.40 3254 3105 90 279
C432 1.40 1.20 264 311 7 9
C499 1.60 1.20 595 563 11 13
C5315 2.20 1.40 2122 2131 38 121
C6288 1.70 1.40 3737 3596 88 135
C7552 1.60 1.30 3198 3080 132 213
C880 1.70 1.20 584 575 12 13
Avg. 1.74 1.29 1640 1595 42 85