Title: Lecture 2: Low Power RTL Synthesis
1Lecture 2Low Power RTL Synthesis
- M. Balakrishnan
- CSE Department, IIT Delhi
2Low Power RTL Synthesis Techniques
- Module selection
- Retiming
- Pipelining
- Parallelism
- Bus data encoding
- FSM encoding
- Transformations for Switching activity reduction
3Module Selection
- Modules are used for implementing functional
units, small memory modules etc. - Significant difference in power consumption of
different implementations - Word-length as well as number coding techniques
employed can play a significant role
4Ripple Carry Adder
Carry signal switching propagates through all
the stages and consumes Power
ACTEL MAPLD2004
5 Carry Look Ahead Adder
Carry signal switching propagates through much
less number of stages and thus not only reduces
delay but can also consume less power
ACTEL MAPLD2004
6 Carry Select Adder
Carry signal switching propagates through much
less number of stages and thus reduces delay but
considerable circuit duplication
ACTEL MAPLD2004
7Brent Kung Adder
- Brent Kung in their paper in 1982 6 had
proposed an area efficient adder - It is basically a restructured carry-look-ahead
adder - For details refer to their paper
8Adder Architectures Area-Delay Trade-offs
Forward Carry Look Ahead (CLF) Fastest but also
largest Brent and Kung (BK) Almost same speed as
CLF but drastically smaller Carry Look Ahead
(CLA) Relatively small and slow Ripple (RPL)
Smallest but slowest
ACTEL MAPLD2004
Brent and Kung Best area/speed tradeoff
9Adder Architectures Power Comparison
Brent Kung is the Lowest Power Dissipation as
well
ACTEL MAPLD2004
10Multiplier Architectures
- Considerable variation with carry save adders
- Wallace tree structures have a good area-time
performance - Pipelining for throughput becomes important
11Other Operations and Operators
- ALUs
- Traditional method Perform all operations and
use select for the output very inefficient in
terms of switching activity - Permit switching activity only in the operator
required in this cycle - Complex operators like MAC
- Cordic functions
- Look up table vs computation
12Alternative ALU Structures
Inputs
Inputs
Demux
F1
F2
Fn
F1
F2
Fn
Function Select
Mux
Function Select
Mux
13 Operator Power Estimation
Application model
Parameter extraction from high-level simulation
Extracted parameters
Parameter extraction from high-level simulation
Operator Models
Power estimates
14Retiming
- Leiserson1 first proposed retiming for
optimizing synchronous circuits and Monteiro2
modified for low power design - Basic observation is that positioning a flip-flop
can stop propagation of glitches and thus
unnecessary transitions - This implies they can be positioned not only to
minimize delays (classical retiming) but also to
reduce transitions
15Positioning a Flip-flop and Power Consumption
Eg
ER
Eg
Logic
FF
Logic
CL
CL
CR
P2 k (Eg CR ER CL)
P1 k Eg CL
P2 can be less than P1
16Retiming and Power Consumption
E0
E1
E2
Logic
FF
Logic
Logic
CR
C2
C1
P1 k (E0 CR E1 CL E2 C2 )
E0
E2
E3
Logic
Logic
FF
Logic
C1
CR
C2
P2 k (E0 C1 E2 CR E3 C2 )
17Retiming Methodology and Results
- Evaluates each possible stage for potential power
saving i.e. transitions generated to those
needing propagation - This is done by finding the difference between
transition count in 0-delay and time delay
simulation - Based on the above computation flip-flops are
placed either for minimizing power or for
minimizing power and timing (some factor) - Results show a reduction of 10 to 25 in
transition count in a number of benchmark circuits
18Pipelining
- Pipelining effects power in two different ways
- One factor is similar to retiming where
flip-flops can cut down on glitches - As pipelining can reduce the critical path to
give higher frequency and performance
(throughput), this can be used to reduce the
voltage for the given throughput to reduce power
19Effect of Pipelining
freq f0 voltage v0
Case1 No Pipelining
Logic
f1 gt f0 v1 v0
Case2 Pipelining for performance
Logic
FF
Logic
Logic
FF
f2 f0 v2 lt v0
Case 3 Pipelining for low power
Logic
FF
Logic
Logic
FF
20Increasing Parallelism/ Concurrency
- Chandrakasan4 first showed that concurrency can
be used to reduce power instead of increasing
performance - Primary idea is to reduce the frequency of
operation and/or voltage to meet a certain
throughput - Power consumed by additional logic required to
distribute computation and multiplex results
needs to be accounted for
21Effect of Parallelism
freq f0 voltage v0 throughput T0
Case1 Single FU
FU
FU
reg
Case2 Two FUs for enhanced performance
f1 f0 v1 v0 T1 gt T0
M U x
FU
reg
FU
reg
Case 3 Two FUs for reducing power
f2 lt f0 v2 lt v0 T2 T0
M U x
FU
reg
22Bus Data Encoding
- Bus is known to consume upto 30 of power in many
systems - The bus transitions can be reduced by encoding
the data being sent on the bus - Encoding such that value pairs corresponding to
frequent transitions have smaller hamming
distance - Power consumed by encoders and decoders to be
accounted for - Part of the bits can also be encoded
- Applicable for both data and address busses. For
address busses, patterns can be encoded
23Bus Data Encoding
Logic
Logic
Logic
Logic
Encoder/ decoder
Encoder/ decoder
24FSM State Encoding
- FSM state encoding evolved for logic synthesis
- Primarily encoding techniques are based on
reducing the hamming distance of codes assigned
to neighboring states - The assumption is that lower hamming distance
would imply less logic for implementing the
transitions. - This has to be combined with input as well as
output states as well - The approach to low power state assignment is
similar except that neighbour is to be defined
by frequency of transition rather than just by
connectivity in the state machine
25FSM State Encoding
Encoding for Logic Minimization
Encoding for Power Minimization
s0
000
s0
000
10
80
5
30
s1
s2
s3
s4
s1
s2
s3
s4
001
010
100
101
100
101
010
001
26Other Transformations
Case 1
Case 2
a
b
c
a
c
b
mux
mux
mux
mux
y
y
Power Consumption of case 2 is significantly less
than case 1 If activity (b) gt gt activity (a) and
activity (b) gt gt activity (c)
27References
- Leiserson et.al, Optimizing Synchronous Circuits
by Retiming, Proc. Of 3rd CalTech Conference on
VLSI, March 1983, pp. 23-36 - J. Monteiro et. al., Retiming Sequential
Circuits for Low Power, ICCAD, Nov. 1993, pp.
398-402 - Devadas Malik, A Survey of Optimization
Techniques targeting Low Power VLSI Circuits,
DAC 32, 1995, pp. 242-247 - A.P. Chandrakasan, Optimizing power using
transformations IEEE TCAD, vol 14, Â No.1 Â Jan
1995, pp. 12-31 - Koegst et.al. State Assignment for FSM Low Power
Design, EDAC 1995, pp. 28-33 - Brent Kung, A Regular Layout for Parallel
Adders IEEE Tr on Comp., vol C-31, No. 3, pp.
260-264