Lecture 2: Low Power RTL Synthesis - PowerPoint PPT Presentation

1 / 27
About This Presentation
Title:

Lecture 2: Low Power RTL Synthesis

Description:

Modules are used for implementing functional units, small memory modules etc. ... Power consumed by encoders and decoders to be accounted for ... – PowerPoint PPT presentation

Number of Views:47
Avg rating:3.0/5.0
Slides: 28
Provided by: cseIit
Category:

less

Transcript and Presenter's Notes

Title: Lecture 2: Low Power RTL Synthesis


1
Lecture 2Low Power RTL Synthesis
  • M. Balakrishnan
  • CSE Department, IIT Delhi

2
Low Power RTL Synthesis Techniques
  • Module selection
  • Retiming
  • Pipelining
  • Parallelism
  • Bus data encoding
  • FSM encoding
  • Transformations for Switching activity reduction

3
Module Selection
  • Modules are used for implementing functional
    units, small memory modules etc.
  • Significant difference in power consumption of
    different implementations
  • Word-length as well as number coding techniques
    employed can play a significant role

4
Ripple Carry Adder
Carry signal switching propagates through all
the stages and consumes Power
ACTEL MAPLD2004
5
Carry Look Ahead Adder
Carry signal switching propagates through much
less number of stages and thus not only reduces
delay but can also consume less power
ACTEL MAPLD2004
6
Carry Select Adder
Carry signal switching propagates through much
less number of stages and thus reduces delay but
considerable circuit duplication
ACTEL MAPLD2004
7
Brent Kung Adder
  • Brent Kung in their paper in 1982 6 had
    proposed an area efficient adder
  • It is basically a restructured carry-look-ahead
    adder
  • For details refer to their paper

8
Adder Architectures Area-Delay Trade-offs
Forward Carry Look Ahead (CLF) Fastest but also
largest Brent and Kung (BK) Almost same speed as
CLF but drastically smaller Carry Look Ahead
(CLA) Relatively small and slow Ripple (RPL)
Smallest but slowest
ACTEL MAPLD2004
Brent and Kung Best area/speed tradeoff
9
Adder Architectures Power Comparison
Brent Kung is the Lowest Power Dissipation as
well
ACTEL MAPLD2004
10
Multiplier Architectures
  • Considerable variation with carry save adders
  • Wallace tree structures have a good area-time
    performance
  • Pipelining for throughput becomes important

11
Other Operations and Operators
  • ALUs
  • Traditional method Perform all operations and
    use select for the output very inefficient in
    terms of switching activity
  • Permit switching activity only in the operator
    required in this cycle
  • Complex operators like MAC
  • Cordic functions
  • Look up table vs computation

12
Alternative ALU Structures
Inputs
Inputs
Demux
F1
F2
Fn
F1
F2
Fn
Function Select
Mux
Function Select
Mux
13
Operator Power Estimation
Application model
Parameter extraction from high-level simulation
Extracted parameters
Parameter extraction from high-level simulation
Operator Models
Power estimates
14
Retiming
  • Leiserson1 first proposed retiming for
    optimizing synchronous circuits and Monteiro2
    modified for low power design
  • Basic observation is that positioning a flip-flop
    can stop propagation of glitches and thus
    unnecessary transitions
  • This implies they can be positioned not only to
    minimize delays (classical retiming) but also to
    reduce transitions

15
Positioning a Flip-flop and Power Consumption
Eg
ER
Eg
Logic
FF
Logic
CL
CL
CR
P2 k (Eg CR ER CL)
P1 k Eg CL
P2 can be less than P1
16
Retiming and Power Consumption
E0
E1
E2
Logic
FF
Logic
Logic
CR
C2
C1
P1 k (E0 CR E1 CL E2 C2 )
E0
E2
E3
Logic
Logic
FF
Logic
C1
CR
C2
P2 k (E0 C1 E2 CR E3 C2 )
17
Retiming Methodology and Results
  • Evaluates each possible stage for potential power
    saving i.e. transitions generated to those
    needing propagation
  • This is done by finding the difference between
    transition count in 0-delay and time delay
    simulation
  • Based on the above computation flip-flops are
    placed either for minimizing power or for
    minimizing power and timing (some factor)
  • Results show a reduction of 10 to 25 in
    transition count in a number of benchmark circuits

18
Pipelining
  • Pipelining effects power in two different ways
  • One factor is similar to retiming where
    flip-flops can cut down on glitches
  • As pipelining can reduce the critical path to
    give higher frequency and performance
    (throughput), this can be used to reduce the
    voltage for the given throughput to reduce power

19
Effect of Pipelining
freq f0 voltage v0
Case1 No Pipelining
Logic
f1 gt f0 v1 v0
Case2 Pipelining for performance
Logic
FF
Logic
Logic
FF
f2 f0 v2 lt v0
Case 3 Pipelining for low power
Logic
FF
Logic
Logic
FF
20
Increasing Parallelism/ Concurrency
  • Chandrakasan4 first showed that concurrency can
    be used to reduce power instead of increasing
    performance
  • Primary idea is to reduce the frequency of
    operation and/or voltage to meet a certain
    throughput
  • Power consumed by additional logic required to
    distribute computation and multiplex results
    needs to be accounted for

21
Effect of Parallelism
freq f0 voltage v0 throughput T0
Case1 Single FU
FU
FU
reg
Case2 Two FUs for enhanced performance
f1 f0 v1 v0 T1 gt T0
M U x
FU
reg
FU
reg
Case 3 Two FUs for reducing power
f2 lt f0 v2 lt v0 T2 T0
M U x
FU
reg
22
Bus Data Encoding
  • Bus is known to consume upto 30 of power in many
    systems
  • The bus transitions can be reduced by encoding
    the data being sent on the bus
  • Encoding such that value pairs corresponding to
    frequent transitions have smaller hamming
    distance
  • Power consumed by encoders and decoders to be
    accounted for
  • Part of the bits can also be encoded
  • Applicable for both data and address busses. For
    address busses, patterns can be encoded

23
Bus Data Encoding
Logic
Logic
Logic
Logic
Encoder/ decoder
Encoder/ decoder
24
FSM State Encoding
  • FSM state encoding evolved for logic synthesis
  • Primarily encoding techniques are based on
    reducing the hamming distance of codes assigned
    to neighboring states
  • The assumption is that lower hamming distance
    would imply less logic for implementing the
    transitions.
  • This has to be combined with input as well as
    output states as well
  • The approach to low power state assignment is
    similar except that neighbour is to be defined
    by frequency of transition rather than just by
    connectivity in the state machine

25
FSM State Encoding
Encoding for Logic Minimization
Encoding for Power Minimization
s0
000
s0
000
10
80
5
30
s1
s2
s3
s4
s1
s2
s3
s4
001
010
100
101
100
101
010
001
26
Other Transformations
Case 1
Case 2
a
b
c
a
c
b
mux
mux
mux
mux
y
y
Power Consumption of case 2 is significantly less
than case 1 If activity (b) gt gt activity (a) and
activity (b) gt gt activity (c)
27
References
  • Leiserson et.al, Optimizing Synchronous Circuits
    by Retiming, Proc. Of 3rd CalTech Conference on
    VLSI, March 1983, pp. 23-36
  • J. Monteiro et. al., Retiming Sequential
    Circuits for Low Power, ICCAD, Nov. 1993, pp.
    398-402
  • Devadas Malik, A Survey of Optimization
    Techniques targeting Low Power VLSI Circuits,
    DAC 32, 1995, pp. 242-247
  • A.P. Chandrakasan, Optimizing power using
    transformations IEEE TCAD, vol 14,  No.1  Jan
    1995, pp. 12-31
  • Koegst et.al. State Assignment for FSM Low Power
    Design, EDAC 1995, pp. 28-33
  • Brent Kung, A Regular Layout for Parallel
    Adders IEEE Tr on Comp., vol C-31, No. 3, pp.
    260-264
Write a Comment
User Comments (0)
About PowerShow.com