Title: Fast Timing Closure by Interconnect Criticality Driven Delay Relaxation
1Fast Timing Closure by Interconnect Criticality
Driven Delay Relaxation
- Love Singhal and Elaheh Bozorgzadeh
- Center for Embedded Computer Systems,
- University of California, Irvine
2Outline
- Critical Interconnect
- Delay Relaxation
- CER Algorithm
- Experiments
- Conclusion
3Outline
- Critical Interconnect
- Delay Relaxation
- CER Algorithm
- Experiments
- Conclusion
4Critical Interconnect
4
4
5
4
Slack on Edges is zero
5
4
5
4
Non-critical edges
5
6
5
5
6
6
Timing Constraint, T 20 cycles
5Critical Interconnect
4
4
5
4
Wire Pipelining
5
4
5
4
5
6
5
5
6
6
Timing Constraint, T 20 cycles
6Placement and Routing
Registers free congestion
- Second design has
- Less routing overlap and less congestion
- Easier placement and routing and faster timing
closure.
7Wire Delay
- Wire delay is significant in FPGAs due to routing
switches and heterogeneous routing architecture - On average, 50 of clock period is due to routing
delay. - A register added on wire (wire pipelining)
reduces average wire delay to 60.
8Wire Pipelining Related Work
- C. Lin and H. Zhou, ICCAD 2003
- Retiming for Wire pipelining
- On ASIC
- L. P. Carloni and A. L.Sangiovanni-Vincentelli,
CODESISSS 2003 - Retiming for long wires in GALS
- J. Cong, Y. Fang, and Z. Zhang, DAC 2004
- Wire pipelining during architecture synthesis
- Our work deals with early planning for wire
pipelining to avoid long wires
9Outline
- Critical Interconnect
- Delay Relaxation
- CER Algorithm
- Experiments
- Conclusion
10Delay Relaxation
- Delay relaxation or budgeting is assigning extra
delay to nodes or edges under timing constraint - Maximal delay budgeting no node can get any more
delay - Budgeting improves area and complexity of design
adds simplification
Timing Constraint, T 20 cycles
11Drawbacks of Delay Relaxation
- Every node has atleast one incoming and one
outgoing critical edge. - Maximum budgeting can increase critical edges in
the graph
Timing Constraint, T 20 cycles
12Delay Budgeting Related Work
- Previous work has concentrated on maximizing
total budget - Heuristics, ZSA
- R. Nair, et. al. TCAD 1989
- Polynomial time optimal algorithm for maximum
budgeting - S. Ghiasi, E. Bozorgzadeh, S. Choudhary, and M.
Sarrafzadeh, ICCAD 2004. - No previous work considers effect of critical
edges
13Outline
- Critical Interconnect
- Delay Relaxation
- CER Algorithm
- Experiments
- Conclusion
14Problem Formulation
- Problem Given an acyclic data flow graph G
(V,E), find a Maximal Budgeting B on the nodes
such that number of Critical Edges is Minimized
(MB-MCE problem) - Solutions
- Integer Linear Programming Formulation Optimal
but Inefficient - Heuristic CER Algorithm
15Problem Input and Output
4
4
5
4
CER Algorithm
5
4
5
4
5
6
5
5
6
6
Timing Constraint, T 20 cycles
16An Example
4
4
A
C
4
4
B
D
Budget Reassignment
17An Example
4
4
A
C
4
4
B
D
? 2 -1
? 1 0
A Budget Reassignment, ?, increases delay of
parent node by ? and decreases delay of child
node by ?.
18An Example
4
4
A
C
4
4
B
D
Lemma Sufficient Condition for critical edge
reduction in this graph is ? 1 gt ? 2
? 1 gt ? 2
19Larger Critical Bipartite Graph
4
4
4
4
E
G
A
C
4
4
4
4
F
H
B
D
20Larger Critical Bipartite Graph
Find maximum candidate critical edges between
clusters
21Cluster Nodes Our Algorithm
Pick the node with minimum degree as seed
4
4
4
4
E
G
C
A
A
4
4
4
4
F
H
B
F
H
D
D
Next Task ? 1 gt ? 2
Composite Edges
Composite Nodes
22Assign Deltas to Composite Graph
Q
P
R
S
Q
P
R
S
? Q lt ? P
? P gt ? R gt ? S
23Delta Assignment Problem
- Given a directed acyclic graph, assign delta to
the nodes such that the number of edges with ?1 gt
?2 is maximized, where ?1 and ?2 are deltas of
the two nodes containing composite edge. - NP Hard Problem
- Solution Reverse Breadth First Search
- Find an output node
- Assign maximum Delta to it.
- Remove it from graph
24Assign Deltas to Composite Graph
4
4
4
4
E
G
A
C
4
4
4
4
F
H
B
D
Q
P
R
S
Q
P
R
S
? Q lt ? P
? P gt ? R gt ? S
25Assign Deltas to Composite Graph
4
5
3
4
E
G
A
C
3
5
4
4
F
H
B
D
Q
P
R
S
?Q 0
?P1
?R0
?s-1
3 Critical Edges are reduced after delta
assignment
26Reducing critical edges
27Directed Acyclic Graph
- Parents in critical bipartite graphs have same
arrival time. - Djikstra Algorithm picking the minimum arrival
time nodes - Make a critical bipartite graph of the chosen
nodes and process it. - Preserve existing non-critical edges
28Critical Edge Reduction (CER) Algorithm
- Find Critical Bipartite Graphs
- For each critical bipartite graph,
- Create Composite Graphs
- Assign Deltas to reduce critical edges in
composite graphs - Update budgets on nodes
Complexity of CER Algorithm O (V E)
29Outline
- Critical Interconnect
- Delay Relaxation
- CER Algorithm
- Experiments
- Conclusion
30Experiments Design Flow
31Benchmarks
32Results Design Flow
No Budgeting
DFG
Pipelining Module
Our Algorithm (CER)
CPLEX ILP Solution
Maximum Budgeting
Resource and Wire Pipelining
DFG to Verilog Generator
Synplicity Synthesis (with retiming)
Xilinx Place And Route (with highest PAR effort)
Critical Edge Minimization Experiment Design Flow
33Critical Edges and Total Budget
AVERAGE NUMBER OF CRITICAL EDGES AND TOTAL DELAY
BUDGET
- Critical Edges reduced by the algorithm are close
to ILP - Total Budget loss is also comparable
34Results Design Flow
35Place and Route Runtime
- Designs using CER algorithm achieve timing
closure faster than all other cases - No delay relaxation (No budgeting) leads to more
congestion
36Clock Period
- The No budgeting technique doesnt give good
timing with higher PAR time. - Clock period of ILP and CER algorithm is
comparable. - Average clock period of our algorithm is least.
- The clock period is achieved 2.8 times faster
than maximum budgeting.
37Outline
- Critical Interconnect
- Delay Relaxation
- CER Algorithm
- Experiments
- Conclusion
38Conclusion
- Interconnect criticality can play important role
in design complexity and final place and route
stage - We give a fast, efficient algorithm to reduce
critical edges in an acyclic dataflow graph - Compared to existing techniques, our algorithm
enables incremental delay reassignments. - For future work, the algorithm can be adapted to
be more circuit aware.
39Thank You