Title: Leakage Power Optimization With DualVth Library In HighLevel Synthesis
1Leakage Power Optimization With Dual-Vth Library
In High-Level Synthesis
Hai Zhou haizhou_at_ece.northwestern.edu Northwestern
University Evanston, IL, USA (Presenter)
Prith Banerjee prith_at_uic.edu
University of Illinois at Chicago Chicago,
IL, USA
Xiaoyong Tang xtang_at_magma-da.com Magma Design
Automation, Inc Santa Clara, CA, USA
2Outline
- Introduction
- Related Work
- Problem Formulation
- MWIS-Based Algorithm for Leakage Power
Optimization - Experimental Results
- Conclusions and Future Work
3Introduction
- Low Power Design
- Portable systems
- Thermal considerations
- Reliability issues
- Environmental concerns
- Leakage Power Consumption
- become dominant in digital designsunder 90nm
4Source of Static Power Consumption
- Psub-threshold Due to sub-threshold leakage
currents - Pgate Due to gate oxide tunneling, hot carrier
injection into gates, gate induced drain leakage
currents - Others Ppn (PN junction reverse bias currents),
Ppunch-thru (punch-through effects)
5Basic Facts of Leakage Currents
- Leakage current for a n-type MOSFET transistor
with Vgs 0 - Leakage current decreases exponentially with the
increase of the threshold voltage - For gates, leakage current also depends on the
input state (stack effects)
6Related Work
- Leakage Power Estimation
- K. Roy et. al. (IEEE03), R. X. Gu et. al. (J.
of Solid-States Circuits96), Khouri et. al.
(TVLSI02), Bobba et. al. (1999), - Device or Gate Level Estimation
- Leakage Power Optimization
- Srivastava et. Al. (ASPDAC03), Khouri et. Al.
(TVLSI02), Ye et. al. (Symp. On VLSI Cir. 98),
Abdollahi et. al. (TVLSI04) - Algorithms based on MWIS
- Chen et al. (DAC96)
7Dual Threshold Voltage Library inHigh Level
Synthesis
- High level synthesis has the biggest impact on
the whole system design - Number and type of resources
- Operation execution sequence
- Early estimation of performance, area and power
- High-Vth device has lower leakage power
consumption but slower speed than low-Vth device - Perform replacements of module instances with
their high-Vth correspondents on non-critical
paths to minimize power
8Problem Definition
- Given a synthesized data flow graph, timing
constraints, a dual-Vth library, replace some
module instances with their corresponding
high-Vth implementations, such that the data
dependency and timing constraints are satisfied,
and the total leakage power consumption is
minimized. - Challenges
- Slack dependency
- Dual-Vth library (incomplete library)
- Resource sharing
9Graph Representations Example
C
A
B
C
D
-
-
-
-
-
-
B
E
F
I
A
E
D
G
H
I
F
-
-
G
H
fa_1
sa_1
sa_2
fs_1
ss_1
ss_2
ss_3
Module Instance Usage Graph (MIUG)
- To be illustrative, use the following
assumptions - For low-Vth implementations, the delay of fa_1
and fs_1 is 1 cycle, others are 2 cycles - Their high-Vth designs will increase the delays
by 1 cycle - Timing constraint is 8 cycles
10Graph Representation
2
1
1
2
A
B
C
D
1/4/3
A
B
C
D
1/4/3
1/4/3
2/5/3
-
-
1
1
2
E
F
E
F
I
I
3/6/3
1/5/4
3/6/3
-
-
2
2
G
H
G
H
4/7/3
4/7/3
Composite Constraint Graph
Slack Graph
- CCG contains data dependency and resource
constrains in one graph
- SG represents the delay for each node, as well as
the time triplet ASAP/ALAP/Slack
11Estimation of Leakage Energy Reduction by Using
Dual-Vth Library
- MIUG(m) MIUG of module instance m
- LPWTv leakage power consumption for the input
states of node v - tidle idle time for the module instance
- a empirical coefficient for effective idle
portion during computation - D delay for the computation
12Greedy Approach for Replacement
sa_2
sa_1
fa_1
fa_1
22
34
34
22
fs_1
ss_3
fa_1
26
34
30
ss_1
ss_2
26
26
DFG with Resources and Leakage Power Reduction
Labeled
- Greedy approach first replace the one with the
biggest leakage power reduction fa_1
13Maximum Weight Independent Set Problem
- Our goal
- Simultaneous replacements with maximum leakage
power reduction - Ensure slack independent to each other for the
instances in the replacement set - Maximum Weight Independent Set (MWIS) Problem
- Definition If G (V, E) is a undirected graph
and w V-gtR is a weight function defined on the
node set, then find the independent set S that
maximizes the weight function w(S) sum ( w(s)
s in S) - NP-complete problem for general graph
- There are polynomial algorithms for comparability
graphs (i.e. transitive orientable graphs or
partially orderable graphs)
14Heuristic Approach Based On Simultaneous
Replacements
- Basic Ideas
- Analyze the slack distributions
- Estimates the reductions of leakage power for
individual replacements of the module instances - Analyze the correlations between slack changes
and module replacements - Perform multiple replacements with maximum
leakage reduction while maintaining the validity
of the synthesis result - Difficulties
- Slack dependence analysis
- Selection of module instance set to be replaced
15Slack Sensitive Graph and Its Transitive Closure
Graph
2
1
1
2
2
1
1
2
A
B
C
D
A
B
C
D
1/4/3
1/4/3
1/4/3
1/4/3
1/4/3
2/5/3
1/4/3
2/5/3
1
1
2
1
1
2
E
F
I
E
F
I
3/6/3
1/5/4
3/6/3
3/6/3
1/5/4
3/6/3
2
2
2
2
G
H
G
H
4/7/3
4/7/3
4/7/3
4/7/3
Slack Sensitive Graph
Slack Sensitive Transitive Closure Graph
- Slack Sensitive Edge (u, v) ASAP critical or
ALAP critical (Chen. et. al. DAC96) - Slack Insensitive Set eg. E, F, I, A, B, D
16MWIS Based Heuristic Algorithm
- Step1 Construct module instance usage graphs
MIUGs - Step 2 Construct composite constraint graph CCG
- Step 3 Construct a general transitive closure
graph (TG) from CCG - Step 4 Construct a module instance sensitive
graph (MISG) from TG - Step 5 Recognition and finding a transitive
orientation for the module instance sensitive
graph (MISG) - Step 6 For each module instance perform
tentative replacement, build slack graph SG from
CCG, and update the safety of the replacement.
If there is no safe replacement, return.
17MWIS Based Heuristic Algorithm
- Step 7 For each safe node U in the MISG,
calculate and assign a leakage power reduction
weight. - Step 8 Find the maximum weight independent set
of MISG if MISG is a transitive graph otherwise,
find a near-maximum independent set of MISG using
greedy approach. - Step 9 Replace the module instances node with
their high_Vth designs in the set of Step 8. - Step 10 Update the delay of each operation node
in DFG - Step 11 Go to Step 6
Time Complexity O(MV3)
18Graph Example
C
A
B
C
D
-
-
-
-
-
-
B
E
F
I
A
E
D
G
H
I
F
-
-
G
H
fa_1
sa_1
sa_2
fs_1
ss_1
ss_2
ss_3
Module Instance Usage Graph (MIUG)
- To be illustrative, use the following
assumptions - For low-Vth implementations, the delay of fa_1
and fs_1 is 1 cycle, others are 2 cycles - Their high-Vth designs will increase the delays
by 1 cycle - Timing constraint is 8 cycles
19Heuristic Algorithm Example
2
1
1
2
sa_1
fa_1
sa_2
A
B
C
D
1
1
2
fs_1
ss_3
E
F
I
2
2
ss_1
ss_2
G
H
General Transitive Closure Graph (TG)
Module Instance Sensitive Graph (MISG)
20Heuristic Algorithm Example
2
1
1
3
sa_1
fa_1
sa_2
A
B
C
D
34
22
1/3/2
22
1/3/2
1/3/2
2/4/2
fs_1
ss_3
2
1
3
E
F
I
26
30
3/5/2
1/4/3
4/6/2
2
2
ss_1
ss_2
26
26
G
H
5/7/2
5/7/2
Induced Transitive Graph from MISG (with weight
labeled)
New Slack Graph
- MWIS heuristic approach
- the input to the MWIS solver is the weighted
transitive graph - fs_1, sa_2, ss_3 are replaced by the solver for
the first round
21Heuristic Algorithm Example
2
1
1
3
sa_1
fa_1
sa_2
A
B
C
D
0
0
1/2/1
22
1/2/1
1/2/1
2/3/1
fs_1
ss_3
2
1
3
E
F
I
0
0
3/4/1
1/3/2
4/5/1
3
3
ss_1
ss_2
26
26
G
H
5/6/1
5/6/1
Induced Transitive Graph from MISG (with weight
labeled)
New Slack Graph
- MWIS heuristic approach
- Second round ss_1, ss_2
22Heuristic Algorithm Example
3
1
1
3
sa_1
fa_1
sa_2
A
B
C
D
0
0
1/2/1
22
1/2/1
1/1/0
2/3/1
fs_1
ss_3
2
1
3
E
F
I
0
0
4/4/0
1/3/2
4/5/1
3
3
ss_1
ss_2
0
0
G
H
6/6/0
5/6/1
Induced Transitive Graph from MISG (with weight
labeled)
New Slack Graph
- MWIS heuristic approach
- Third round sa_1
23Greedy Algorithm Example
sa_1
fa_1
sa_2
3
2
2
3
A
B
C
D
34
22
1/1/0
22
1/2/1
1/2/1
3/3/0
fs_1
ss_3
2
2
3
E
F
I
26
30
4/5/1
1/4/3
5/5/0
ss_1
ss_2
2
2
26
26
G
H
7/7/0
7/7/0
Induced Transitive Graph from MISG (with weight
labeled)
New Slack Graph
- Greedy approach by replacing one instance for
each iteration fa_1, fs_1, ss_3, sa_1, and sa_2
24Experiments
- Benchmarks
- Diffeq
- A differential equation solver
- Ellipf
- Elliptical wave filter
- FIR filter
- Band pass filter
- Laplace edge detection
- Matrix multiplication
- O(n3)
- Sobel edge detection
- Basic Jacobi style algorithm (nearest neighbor)
- 0.18µm 1.8V Technology Library
25MWIS Heuristic Algorithm Optimization Results
Initial Results
Greedy
LPILP
MWIS_Heuristic
26MWIS Heuristic Algorithm Optimization Results
Initial Results
Greedy
LPILP
MWIS_Heuristic
27MWIS Heuristic Algorithm Optimization Results
Initial Results
Greedy
LPILP
MWIS_Heuristic
28Conclusions
- Problem Formulation
- Leakage power minimization through dual Vth
re-binding - Iteratively apply Max Weight Independent Set
algorithm to find the resources whose
simultaneous replacement gives max power savings - Heuristic because of power model, resource
sharing, and incomplete library - Experimental Results
- Average 70.9 leakage power reduction
- Close to ILP approach but much faster
29Future Work
- Current approach starting with the minimal
latency, iteratively reduce leakage - Investigate an approach of starting with the
minimal leakage and iteratively reducing latency - Current approach assumes a fixed sharing and
usage sequence - Investigate how to do a dual-Vth aware sharing or
a combined sharing and dual-Vth binding
30Thank you