Title: POWER OPTIMISATION IN VLSI CIRCUITS Charles Augustine
1POWER OPTIMISATION IN VLSI CIRCUITSCharles
Augustine
2ORGANIZATON OF THE TALK
- Basics, definitions. Role of low power in present
day electronics. - Survey of estimation techniques.
- Survey of power minimization techniques.
- Conclusions
3Basics
- .
- In the context of VLSI, power refers to the rate
at which electrical energy is converted into heat
(dissipation), or the rate at which energy is
drained from a source (consumption). - Power an important dimension in present day VLSI
digital systems. - Low power techniques were confined to analog
designs in the past. - Some digital circuits like pacemakers and watches
also needed low power techniques in the past. - The systems were operated at very low
frequencies. - In the past, only area and performance were key
careabouts of digital VLSI designs.
4Basics
- Power an important consideration in present day
digital designs ? - Higher integration? Higher performance ? Higher
power dissipated / unit area of the die. - Higher integration ? Market demand for portable
consumer electronic devices powered by batteries.
Higher energy densities ? can become explosive.
5Basics
- In high NiCd batteries energy density 20
watt-hour/ pound - NiMH batteries energy density 35
watt-hour/pound - In high NiCd batteries energy density 20
watt-hour/ pound - performance systems, packaging and cooling costs
increase with increased power dissipation.
Problems with supplying large currents through
device pins.
6Basics
- Why is power an important consideration in
digital designs ? - Reliability issues ? Every 10ºC increase in power
roughly doubles failure rate. - VLSI low power problems
- Analysis Estimation finding sources of power
dissipation.
7Basics
- Power estimation at every stage in the design
process, to ensure that power specifications of
the design are not violated. - Accuracy and Efficiency of estimation.
- Estimation first step toward implementing
optimization for low power. - Optimization
- Process of generating the best design that meets
specifications. - Depends on the result of the analysis. Needed to
evaluate merits of design choices. Involves
tradeoff amongst conflicting objectives. - Pavg CV2f
8Technology trends - SIA roadmap
Moores Law, 2x every 18 months
9Technology trends - Power
A 10mm2 chip clocked at 500Mhz would
consume 315W !!
10Basics
- Sources of power dissipation - depends on the
logic family. - CMOS is the logic family preferred in many
designs - Easy to get a functional circuit.
- Almost zero static power dissipation. Low power
by default. - In general, capacitive power gtgt short circuit
power gtgt leakage power.
11- In CMOS, power dissipation is because of the
following causes - Static power dissipation, when no signals are
changing. - Leakage - Reverse bias saturation current,
sub-threshold leakage. - Deviations from a strict CMOS design.
- Dynamic power dissipation, when signals are
changing - Short circuit power dissipation, when both
devices are on. - Capacitive power dissipation, due to output
loading.
12Low power metrics
- Absolute power number
- Quoted in Watts. Useful in applications such as
package selection, power supply distribution n/w
design. - uW/MHz
- Is the average energy consumed by the system.
13Low power metrics
- uW/MIPS or uW/SPEC.
- Is the average energy per instruction. Useful for
comparing processors of the same family. - uW/MIPS²
- Normalizes the average energy per instruction,
with the performance of that particular
architecture. Useful for comparing processors of
different families. - uW/(area MHz)
- Energy dissipated by unit area of the die. Useful
as a reliability measure. Also can be used to
compare two chips with the same function, but
different implementations.
14Types of estimation
- Depending on the application.
- Instantaneous power estimation.
- Peak power estimation.
- Important for estimating IR drops.
- Average power estimation.
- Important for conserving battery life, packaging
and cooling considerations. - System engineer What battery to use ?
- Chip architect Will my architecture be feasible
? - Layout engineer How wide should the power
supply lines be ? - Product engineer Which package to use ?
15- Depending on methodology
- Simulation based
- Vectors are given, circuit is known, simulation
is performed. The instantaneous currents are
averaged and power dissipation is then
calculated. - Probabilistic analysis based
- Averaging of the inputs is performed first,
probabilistic measures are extracted. Power
dissipation is then calculated.
16Types of estimation
- Monte Carlo methods
- These methods focus on performing the right
amount of simulation. When a stopping criterion
is met, simulation is stopped. - Too few measurements - Inaccurate estimates.
- Too many measurements - Waste of computing
resources. - The mean of n samples obtained during
characterization is compared against the
statistically calculated mean. When they are
sufficiently close to one another, simulation is
stopped.
17Types of estimation
- Why so many methods ?
- Accuracy
- When implementation details are not available,
relative accuracy should be stressed. - Efficiency
- What is the size of the circuit that can be
handled by any method? - Available information
- For high level estimation, it is difficult to
estimate the activity or the capacitance in the
absence of any implementation data - high errors
must be tolerated. - In some cases, only some characteristics of the
input are known.
18Circuit level estimation
- Model is developed as shown below.
- Model must be re-evaluated every time the value
of the current changes as the value of the
partial derivative depends on it.
19Gate level estimation - Simulation
- Basic blocks Logic gates
- AND, OR, NOT, NOR, XOR, NAND etc.
- Signals have a only a finite number of values
- 0, 1, H, L, Z, U, X
20Gate level estimation - Simulation
- An event driven simulator is used for functional
and timing simulation. - The simulator counts the number of specified
events required for power estimation. - The average power is estimated from the equations
shown in the next slide.
21Gate level estimation - Simulation
22Gate level estimation - Probabilistic
- Probabilistic analysis
- Estimates power based on certain probabilistic
measures of the input data. - Only dynamic power dissipation can be estimated
accurately. - Input signals are assumed to be random, taking on
values 0 or 1. - Definitions of Probabilistic measures used are
given below
23Gate level estimation - Probabilistic
- Transition probability Average fraction of the
clock cycles in which the steady state value of
the signal is different from its initial value. - Transition density Average number of transitions
in unit time. - Static probability Average fraction of clock
cycles in which the steady state value of the
signal is logic high.
24Gate level estimation - Probabilistic
25Gate level estimation - Probabilistic
- Sources of error
- All measures are unaffected by circuit delays.
Glitches are ignored. - Spatial correlation Different inputs to a
circuit may be related to one another. This can
happen because of circuit topology or input
characteristics. - Temporal correlation Input values to a circuit
may be related to previous values. This can
happen because of memory or input
characteristics. - In the presence of correlation, all probabilistic
measures will become conditional probabilities.
This is much harder to analyze. - Correlation causes estimates to be wrong. These
methods work well when the inputs are random.
26Gate level estimation - Probabilistic
27Gate level estimation - Probabilistic
28Gate level estimation - Probabilistic
29Architecture level estimation
- Challenges
- Very little information is available about the
implementation at this stage. - Difficult to estimate input patterns.
- Difficult to characterize
- A n-input block needs O(4n) input patterns for
exhaustive characterization. Not practical even
for small values of n.
30Architecture level estimation, an application
31Architecture level estimation, an application
32Architecture level estimation
- There is no general method in use today.
- How to estimate capacitance and how to estimate
activity for a circuit that has not yet been
implemented ? - Some methods use scalable capacitance models.
- For example, the total capacitance of a ripple
carry adder is proportional to the number of full
adder cells in it. - UWN characterization
- Components are characterized with random data.
- This works in situations where there is no
particular relationship b/w data.
33Architecture level estimation
- Fails miserably for highly correlated data, such
as DSP systems - Useful for comparing two different
implementations. - Entropy based modeling.
- Dual bit type method
- Works well for DSP systems.
- Based on the fact that there is a relationship
between bit level and word level characteristics. - The data in such systems is obtained from
sampling analog signals at a rate higher than the
highest signal frequency.
34Architecture level estimation
- Difference between successive samples is small
- LSBs change faster the MSBs in most applications.
- UWN characterization gives very poor results for
such inputs. - In some applications, the MSBs change faster than
the LSBs these changes correspond to sign bit
changes.
35Architecture level estimation
- Encountered in Delta modulation schemes.
- Inputs are divided into three regions based on
the nature of activity Sign bit region, UWN
region and transition region. Sign Bit regions
need many more capacitances. - Based on linear regression techniques.
- The power is expressed as a linear equation of
the input and output signal transitions. The
constant coefficients are then determined during
characterization. - Based on high level operation.
- Some high level operations are easily
identifiable in every circuit. - Example Memory - Read/Write ALU-ADD/MUL/SUB
Might be state dependent. For example some
writes can cost more than other.
36Architecture level estimation
- There is no general method in use today.
- How to estimate capacitance and how to estimate
activity for a circuit that has not yet been
implemented ? - Some methods use scalable capacitance models.
- For example, the total capacitance of a ripple
carry adder is proportional to the number of full
adder cells in it. - UWN characterization
- Components are characterized with random data.
37Architecture level estimation
- Fails miserably for highly correlated data, such
as DSP systems - Entropy based modeling.
- This works in situations where there is no
particular relationship b/w data. - Dual bit type method
- Works well for DSP systems.
- Based on the fact that there is a relationship
between bit level and word level characteristics. - Useful for comparing two different
implementations. - The data in such systems is obtained from
sampling analog signals at a rate higher than the
highest signal frequency.
38ARCHITECTURAL LEVEL ESTIMATION
- Entropy based modeling.
- Dual bit type method
- Works well for DSP systems.
- Based on the fact that there is a relationship
between bit level and word level characteristics. - The data in such systems is obtained from
sampling analog signals at a rate higher than the
highest signal frequency.
39Dual bit type method
40Optimization techniques
- Methods adopted
- Reduce voltage - example reduced voltage swing
bus. - Reduce activity - example clock gating, balance
delays. - Reduce wastage - example use drivers of the
proper strength. - Manage well - example dynamic frequency and
voltage reduction. - Possible to optimize at several levels, with the
high level optimizations yielding the best
results. - Process level - Vt reduction to enable voltage
reduction and sub-threshold leakage voltage
reduction. - Circuit level Voltage reduction, pin reordering,
network reorganization, transistor sizing, self
gating flip flops, dual edge triggering. - Gate level Pin reordering, Clock gating,
restructuring. - RTL level Clock gating, retiming
41Optimization techniques
- Architectural level
- Asynchronous vs. Synchronous architectures.
- Bus encoding.
- Resources vs. Power tradeoffs.
- Data Formats.
- Algorithm level
- Precision vs. Power tradeoff. Reduced precision
affects the behaviour of the algorithm. - Cache blocking.
- System level
- Partitioning, power down, dynamic power
management.
42Optimizations - circuit level
- Optimization problems can be formulated more
precisely at this level. - Incremental improvements are possible, but can
make major improvements in cell based designs
such as memories. - An important technique is voltage reduction.
Impacts fabrication process and is sometimes
driven by system considerations - Important techniques
- Transistor sizing for dynamic power reduction
- Transistor sizing for leakage power reduction
- Network restructuring and reorganization
- Transistor network repartitioning and
reorganization - Special latches and flip flops
- Differential input latch, Dual edge triggered
flip flops, Self gating flip flops, combinational
flip flops. - Low power libraries. P equivalence, N
equivalence, NP equivalence - Reduced voltage swing, Adiabatic computation,
Pass transistor logic.
43Optimization - circuit level, buffer sizing
- The objective is to use a buffer chain to drive a
large load capacitance from a source which has
low drive strength, without incurring large
delays, and spending large amounts of power. -
44Optimization - circuit level, buffer sizing
- If the load is driven directly, the delay would
be large. - A large buffer can drive the load, but has large
input capacitance itself. - The trick is therefore to use a chain of
successfully larger buffers, as shown above. Each
buffer drives appropriate load. - Too many buffers - Large propagation delay of the
chain. - Too few buffers - Output slope will be not be
steep, results in large delay.
45Optimization - circuit level, buffer sizing
- But switching current of every stage is k times
higher than the preceding stage, because of the
higher drive strength. - Total power of successive stages is k times
higher.
46Optimization - circuit level, buffer sizing
47Optimizations -gate level
- Some techniques
- Logic structuring Function preserving transforms
on the logic structure that give different area,
power and delay characteristics. - Signal gating Reducing unwanted transitions.
- Logic encoding Binary vs. Gray coding.
- Bus invert coding.
- State machine encoding.
- Output dont care encoding.
- Pre-computation logic.
- Floating node elimination.
48Optimizations - Architectural level
- Have the highest impact on power.
- Main themes
- Design decisions include selection and
organization of functional modules. Some of these
decisions can impact the power to a great extent. - Power management class of techniques that
carefully manage the performance and throughput
of a system based on the its computation needs to
achieve low power goals.
49Optimizations - Architectural level
- Some popular techniques
- Microprocessor sleep modes.
- Performance management.
- Adaptive filtering based on SNR requirements.
- Switching activity reduction through guarded
evaluation. - Resource sharing.
- Pipelining and parallelism.
- Flow graph transformations.
50Optimizations - Architectural level
- Adaptive filtering based on SNR requirements.
- Switching activity reduction through guarded
evaluation. - Resource sharing.
- Pipelining and parallelism.
- Flow graph transformations.
51Optimization - Performance management
- The asynchronous data processor reads the input
FIFO and loads the output FIFO immediately after
the computation is over. - If the throughput is low, more data will queue up
at the input FIFO.
52Optimization - Performance management
- Number of entries in the FIFO are used to
determine the load on the system. If the system
is loaded heavily, the voltage is increased to
increase the performance of the system. - Another idea Parts of the system that are not
performance critical are operated at lower
voltages. However, having multiple power supplies
on the same chip can complicate things.
53Optimization - Adaptive filtering
- Filtering operation - Mathematical operations on
sample values - mainly multiplication and
addition and memory. - Two varieties IIR and FIR
54Optimization - Adaptive filtering
- Filter order decides the sharpness of the filter
cutoff. Higher order filters generally have
better quality. But need more mathematical
operations to implement. - Filter order is fixed based on the input SNR and
the desired output SNR
55Optimization - Adaptive filtering
- When the input SNR is good, the filter order may
be reduced, to reduce power without sacrificing
quality. - The output noise level is monitored, when it is
above a given threshold, the order of the filter
is increased. - The estimate of output noise is the difference
between the input energy and the output energy
which equals the input noise energy ,
multiplied by the stop band energy response of
the filter stored in a lookup table .
- The power benefits of this method depend on the
noise characteristics of the input. If the input
is not noisy most of the time, significant power
savings can be possible.
56Optimization - Switching activity reduction
- Guarded evaluation.
- When the outputs of a functional units are
dont-care, the inputs to that block are not
allowed to toggle, this can save significant
amounts of power. - Glitch elimination using pipelining
- Glitches occur when signals incurring different
delays appear at the input of a gate. - Glitch probability is higher for circuits with
large logic depths. The delays at the gate inputs
can spread more widely in such cases. - Pipelining limits the logic depth, and reduces
the probability of glitching. - Attractive for datapath circuits because they
have large logic depths and several inputs of a
datapath element can switch simultaneously. - Flow graph transforms
- Strength reduction
- Critical path reduction - Voltage scaling can be
applied to reduce power - Redundancy elimination
- Loop unrolling loop pipelining
57Optimization - Switching activity reduction
- Pipelining limits the logic depth, and reduces
the probability of glitching. - Attractive for datapath circuits because they
have large logic depths and several inputs of a
datapath element can switch simultaneously. - Flow graph transforms
- Strength reduction
- Critical path reduction - Voltage scaling can be
applied to reduce power - Redundancy elimination
- Loop unrolling loop pipelining
58Optimization - Switching activity reduction
59Optimization - Parallelism and pipelining
- Parallelism if there is a inherent parallelism
in the application, the throughput of the system
can be doubled ( roughly, but beware of Amdahls
law ) by doubling the number of functional units.
Alternately, the same throughput can be
maintained, with twice the number of functional
units, each operating at half the frequency.
60Optimization - Parallelism and pipelining
- Now each of these functional units can be
operated at a lower voltage. This usually works,
because power has square law relationship with
voltage, and throughput has roughly linear
relationship with frequency and multiplicity. - Pipelining Reduces the propagation time of a
block by half, now voltage can be reduced,
without changing operating frequency. The
throughput does not change. - Problems Amdahls law. Point of diminishing
returns. Needs high level estimation capability.
61Architecture of the DLX
62Architecture of the DLX
- A load-store architecture. Most operations work
off the register file (32x32bit). Two memory
addressing modes. Memory is byte addressable. - Has instructions similar to most contemporary
RISCs. - 32 bit datapath. Only integer instructions
implemented.
63Related Work
- Called the instruction level power model.
- A power cost is associated with every
instruction. - The characterization is top down. Every
instruction is executed in a loop. The loop
overhead is kept minimal. The average current is
measured with an ammeter. - Method experimented with three processors
representing 3 broad architectural styles. - Intel 80486 - CISC.
- Fujitsu SPARClite - RISC for embedded
applications. - Motorola DSP
64Flow
65Conclusions
- Power is becoming an important aspect of VLSI
design, due to the high integration levels and
the high performance of present day systems. - Power estimation is needed at every step in the
design process to ensure that the design does not
violate the power specification. - Power minimization is still done in an ad-hoc
manner. Except for clock gating and gate
reorganization, very few tools are available for
real designs. - Unless aggressive power minimization is done at
all levels in the design process, power could
limit the level of integration or performance in
the future.
66