POWER OPTIMISATION IN VLSI CIRCUITS Charles Augustine

About This Presentation

Title:

POWER OPTIMISATION IN VLSI CIRCUITS Charles Augustine

Description:

Charles Augustine. 4. Basics. Power an important consideration in ... Charles Augustine. 11. In CMOS, power dissipation is because of the following causes ... – PowerPoint PPT presentation

Number of Views:800

Avg rating:3.0/5.0

Slides: 67

Provided by: srikan9

Category:

more less

Transcript and Presenter's Notes

Title: POWER OPTIMISATION IN VLSI CIRCUITS Charles Augustine

1
POWER OPTIMISATION IN VLSI CIRCUITSCharles
Augustine
2
ORGANIZATON OF THE TALK

Basics, definitions. Role of low power in present
day electronics.
Survey of estimation techniques.
Survey of power minimization techniques.
Conclusions

3
Basics

.
In the context of VLSI, power refers to the rate
at which electrical energy is converted into heat
(dissipation), or the rate at which energy is
drained from a source (consumption).
Power an important dimension in present day VLSI
digital systems.
Low power techniques were confined to analog
designs in the past.
Some digital circuits like pacemakers and watches
also needed low power techniques in the past.
The systems were operated at very low
frequencies.
In the past, only area and performance were key
careabouts of digital VLSI designs.

4
Basics

Power an important consideration in present day
digital designs ?
Higher integration? Higher performance ? Higher
power dissipated / unit area of the die.
Higher integration ? Market demand for portable
consumer electronic devices powered by batteries.
Higher energy densities ? can become explosive.

5
Basics

In high NiCd batteries energy density 20
watt-hour/ pound
NiMH batteries energy density 35
watt-hour/pound
In high NiCd batteries energy density 20
watt-hour/ pound
performance systems, packaging and cooling costs
increase with increased power dissipation.
Problems with supplying large currents through
device pins.

6
Basics

Why is power an important consideration in
digital designs ?
Reliability issues ? Every 10ºC increase in power
roughly doubles failure rate.
VLSI low power problems
Analysis Estimation finding sources of power
dissipation.

7
Basics

Power estimation at every stage in the design
process, to ensure that power specifications of
the design are not violated.
Accuracy and Efficiency of estimation.
Estimation first step toward implementing
optimization for low power.
Optimization
Process of generating the best design that meets
specifications.
Depends on the result of the analysis. Needed to
evaluate merits of design choices. Involves
tradeoff amongst conflicting objectives.
Pavg CV2f

8
Technology trends - SIA roadmap
Moores Law, 2x every 18 months
9
Technology trends - Power
A 10mm2 chip clocked at 500Mhz would
consume 315W !!
10
Basics

Sources of power dissipation - depends on the
logic family.
CMOS is the logic family preferred in many
designs
Easy to get a functional circuit.
Almost zero static power dissipation. Low power
by default.
In general, capacitive power gtgt short circuit
power gtgt leakage power.

In CMOS, power dissipation is because of the
following causes
Static power dissipation, when no signals are
changing.
Leakage - Reverse bias saturation current,
sub-threshold leakage.
Deviations from a strict CMOS design.
Dynamic power dissipation, when signals are
changing
Short circuit power dissipation, when both
devices are on.
Capacitive power dissipation, due to output
loading.

12
Low power metrics

Absolute power number
Quoted in Watts. Useful in applications such as
package selection, power supply distribution n/w
design.
uW/MHz
Is the average energy consumed by the system.

13
Low power metrics

uW/MIPS or uW/SPEC.
Is the average energy per instruction. Useful for
comparing processors of the same family.
uW/MIPS²
Normalizes the average energy per instruction,
with the performance of that particular
architecture. Useful for comparing processors of
different families.
uW/(area MHz)
Energy dissipated by unit area of the die. Useful
as a reliability measure. Also can be used to
compare two chips with the same function, but
different implementations.

14
Types of estimation

Depending on the application.
Instantaneous power estimation.
Peak power estimation.
Important for estimating IR drops.
Average power estimation.
Important for conserving battery life, packaging
and cooling considerations.
System engineer What battery to use ?
Chip architect Will my architecture be feasible
?
Layout engineer How wide should the power
supply lines be ?
Product engineer Which package to use ?

Depending on methodology
Simulation based
Vectors are given, circuit is known, simulation
is performed. The instantaneous currents are
averaged and power dissipation is then
calculated.
Probabilistic analysis based
Averaging of the inputs is performed first,
probabilistic measures are extracted. Power
dissipation is then calculated.

16
Types of estimation

Monte Carlo methods
These methods focus on performing the right
amount of simulation. When a stopping criterion
is met, simulation is stopped.
Too few measurements - Inaccurate estimates.
Too many measurements - Waste of computing
resources.
The mean of n samples obtained during
characterization is compared against the
statistically calculated mean. When they are
sufficiently close to one another, simulation is
stopped.

17
Types of estimation

Why so many methods ?
Accuracy
When implementation details are not available,
relative accuracy should be stressed.
Efficiency
What is the size of the circuit that can be
handled by any method?
Available information
For high level estimation, it is difficult to
estimate the activity or the capacitance in the
absence of any implementation data - high errors
must be tolerated.
In some cases, only some characteristics of the
input are known.

18
Circuit level estimation

Model is developed as shown below.

Model must be re-evaluated every time the value
of the current changes as the value of the
partial derivative depends on it.

19
Gate level estimation - Simulation

Basic blocks Logic gates
AND, OR, NOT, NOR, XOR, NAND etc.
Signals have a only a finite number of values
0, 1, H, L, Z, U, X

20
Gate level estimation - Simulation

An event driven simulator is used for functional
and timing simulation.
The simulator counts the number of specified
events required for power estimation.
The average power is estimated from the equations
shown in the next slide.

21
Gate level estimation - Simulation
22
Gate level estimation - Probabilistic

Probabilistic analysis
Estimates power based on certain probabilistic
measures of the input data.
Only dynamic power dissipation can be estimated
accurately.
Input signals are assumed to be random, taking on
values 0 or 1.
Definitions of Probabilistic measures used are
given below

23
Gate level estimation - Probabilistic

Transition probability Average fraction of the
clock cycles in which the steady state value of
the signal is different from its initial value.
Transition density Average number of transitions
in unit time.
Static probability Average fraction of clock
cycles in which the steady state value of the
signal is logic high.

24
Gate level estimation - Probabilistic
25
Gate level estimation - Probabilistic

Sources of error
All measures are unaffected by circuit delays.
Glitches are ignored.
Spatial correlation Different inputs to a
circuit may be related to one another. This can
happen because of circuit topology or input
characteristics.
Temporal correlation Input values to a circuit
may be related to previous values. This can
happen because of memory or input
characteristics.
In the presence of correlation, all probabilistic
measures will become conditional probabilities.
This is much harder to analyze.
Correlation causes estimates to be wrong. These
methods work well when the inputs are random.

26
Gate level estimation - Probabilistic
27
Gate level estimation - Probabilistic
28
Gate level estimation - Probabilistic
29
Architecture level estimation

Challenges
Very little information is available about the
implementation at this stage.
Difficult to estimate input patterns.
Difficult to characterize
A n-input block needs O(4n) input patterns for
exhaustive characterization. Not practical even
for small values of n.

30
Architecture level estimation, an application
31
Architecture level estimation, an application
32
Architecture level estimation

There is no general method in use today.
How to estimate capacitance and how to estimate
activity for a circuit that has not yet been
implemented ?
Some methods use scalable capacitance models.
For example, the total capacitance of a ripple
carry adder is proportional to the number of full
adder cells in it.
UWN characterization
Components are characterized with random data.
This works in situations where there is no
particular relationship b/w data.

33
Architecture level estimation

Fails miserably for highly correlated data, such
as DSP systems
Useful for comparing two different
implementations.
Entropy based modeling.
Dual bit type method
Works well for DSP systems.
Based on the fact that there is a relationship
between bit level and word level characteristics.
The data in such systems is obtained from
sampling analog signals at a rate higher than the
highest signal frequency.

34
Architecture level estimation

Difference between successive samples is small
LSBs change faster the MSBs in most applications.
UWN characterization gives very poor results for
such inputs.
In some applications, the MSBs change faster than
the LSBs these changes correspond to sign bit
changes.

35
Architecture level estimation

Encountered in Delta modulation schemes.
Inputs are divided into three regions based on
the nature of activity Sign bit region, UWN
region and transition region. Sign Bit regions
need many more capacitances.
Based on linear regression techniques.
The power is expressed as a linear equation of
the input and output signal transitions. The
constant coefficients are then determined during
characterization.
Based on high level operation.
Some high level operations are easily
identifiable in every circuit.
Example Memory - Read/Write ALU-ADD/MUL/SUB
Might be state dependent. For example some
writes can cost more than other.

36
Architecture level estimation

There is no general method in use today.
How to estimate capacitance and how to estimate
activity for a circuit that has not yet been
implemented ?
Some methods use scalable capacitance models.
For example, the total capacitance of a ripple
carry adder is proportional to the number of full
adder cells in it.
UWN characterization
Components are characterized with random data.

37
Architecture level estimation

Fails miserably for highly correlated data, such
as DSP systems
Entropy based modeling.
This works in situations where there is no
particular relationship b/w data.
Dual bit type method
Works well for DSP systems.
Based on the fact that there is a relationship
between bit level and word level characteristics.
Useful for comparing two different
implementations.
The data in such systems is obtained from
sampling analog signals at a rate higher than the
highest signal frequency.

38
ARCHITECTURAL LEVEL ESTIMATION

Entropy based modeling.
Dual bit type method
Works well for DSP systems.
Based on the fact that there is a relationship
between bit level and word level characteristics.
The data in such systems is obtained from
sampling analog signals at a rate higher than the
highest signal frequency.

39
Dual bit type method
40
Optimization techniques

Methods adopted
Reduce voltage - example reduced voltage swing
bus.
Reduce activity - example clock gating, balance
delays.
Reduce wastage - example use drivers of the
proper strength.
Manage well - example dynamic frequency and
voltage reduction.
Possible to optimize at several levels, with the
high level optimizations yielding the best
results.
Process level - Vt reduction to enable voltage
reduction and sub-threshold leakage voltage
reduction.
Circuit level Voltage reduction, pin reordering,
network reorganization, transistor sizing, self
gating flip flops, dual edge triggering.
Gate level Pin reordering, Clock gating,
restructuring.
RTL level Clock gating, retiming

41
Optimization techniques

Architectural level
Asynchronous vs. Synchronous architectures.
Bus encoding.
Resources vs. Power tradeoffs.
Data Formats.
Algorithm level
Precision vs. Power tradeoff. Reduced precision
affects the behaviour of the algorithm.
Cache blocking.
System level
Partitioning, power down, dynamic power
management.

42
Optimizations - circuit level

Optimization problems can be formulated more
precisely at this level.
Incremental improvements are possible, but can
make major improvements in cell based designs
such as memories.
An important technique is voltage reduction.
Impacts fabrication process and is sometimes
driven by system considerations
Important techniques
Transistor sizing for dynamic power reduction
Transistor sizing for leakage power reduction
Network restructuring and reorganization
Transistor network repartitioning and
reorganization
Special latches and flip flops
Differential input latch, Dual edge triggered
flip flops, Self gating flip flops, combinational
flip flops.
Low power libraries. P equivalence, N
equivalence, NP equivalence
Reduced voltage swing, Adiabatic computation,
Pass transistor logic.

43
Optimization - circuit level, buffer sizing

The objective is to use a buffer chain to drive a
large load capacitance from a source which has
low drive strength, without incurring large
delays, and spending large amounts of power.

44
Optimization - circuit level, buffer sizing

If the load is driven directly, the delay would
be large.
A large buffer can drive the load, but has large
input capacitance itself.
The trick is therefore to use a chain of
successfully larger buffers, as shown above. Each
buffer drives appropriate load.
Too many buffers - Large propagation delay of the
chain.
Too few buffers - Output slope will be not be
steep, results in large delay.

45
Optimization - circuit level, buffer sizing

But switching current of every stage is k times
higher than the preceding stage, because of the
higher drive strength.
Total power of successive stages is k times
higher.

46
Optimization - circuit level, buffer sizing
47
Optimizations -gate level

Some techniques
Logic structuring Function preserving transforms
on the logic structure that give different area,
power and delay characteristics.
Signal gating Reducing unwanted transitions.
Logic encoding Binary vs. Gray coding.
Bus invert coding.
State machine encoding.
Output dont care encoding.
Pre-computation logic.
Floating node elimination.

48
Optimizations - Architectural level

Have the highest impact on power.
Main themes
Design decisions include selection and
organization of functional modules. Some of these
decisions can impact the power to a great extent.
Power management class of techniques that
carefully manage the performance and throughput
of a system based on the its computation needs to
achieve low power goals.

49
Optimizations - Architectural level

Some popular techniques
Microprocessor sleep modes.
Performance management.
Adaptive filtering based on SNR requirements.
Switching activity reduction through guarded
evaluation.
Resource sharing.
Pipelining and parallelism.
Flow graph transformations.

50
Optimizations - Architectural level

Adaptive filtering based on SNR requirements.
Switching activity reduction through guarded
evaluation.
Resource sharing.
Pipelining and parallelism.
Flow graph transformations.

51
Optimization - Performance management

The asynchronous data processor reads the input
FIFO and loads the output FIFO immediately after
the computation is over.
If the throughput is low, more data will queue up
at the input FIFO.

52
Optimization - Performance management

Number of entries in the FIFO are used to
determine the load on the system. If the system
is loaded heavily, the voltage is increased to
increase the performance of the system.
Another idea Parts of the system that are not
performance critical are operated at lower
voltages. However, having multiple power supplies
on the same chip can complicate things.

53
Optimization - Adaptive filtering

Filtering operation - Mathematical operations on
sample values - mainly multiplication and
addition and memory.
Two varieties IIR and FIR

54
Optimization - Adaptive filtering

Filter order decides the sharpness of the filter
cutoff. Higher order filters generally have
better quality. But need more mathematical
operations to implement.
Filter order is fixed based on the input SNR and
the desired output SNR

55
Optimization - Adaptive filtering

When the input SNR is good, the filter order may
be reduced, to reduce power without sacrificing
quality.
The output noise level is monitored, when it is
above a given threshold, the order of the filter
is increased.
The estimate of output noise is the difference
between the input energy and the output energy
which equals the input noise energy ,
multiplied by the stop band energy response of
the filter stored in a lookup table .

The power benefits of this method depend on the
noise characteristics of the input. If the input
is not noisy most of the time, significant power
savings can be possible.

56
Optimization - Switching activity reduction

Guarded evaluation.
When the outputs of a functional units are
dont-care, the inputs to that block are not
allowed to toggle, this can save significant
amounts of power.
Glitch elimination using pipelining
Glitches occur when signals incurring different
delays appear at the input of a gate.
Glitch probability is higher for circuits with
large logic depths. The delays at the gate inputs
can spread more widely in such cases.
Pipelining limits the logic depth, and reduces
the probability of glitching.
Attractive for datapath circuits because they
have large logic depths and several inputs of a
datapath element can switch simultaneously.
Flow graph transforms
Strength reduction
Critical path reduction - Voltage scaling can be
applied to reduce power
Redundancy elimination
Loop unrolling loop pipelining

57
Optimization - Switching activity reduction

Pipelining limits the logic depth, and reduces
the probability of glitching.
Attractive for datapath circuits because they
have large logic depths and several inputs of a
datapath element can switch simultaneously.
Flow graph transforms
Strength reduction
Critical path reduction - Voltage scaling can be
applied to reduce power
Redundancy elimination
Loop unrolling loop pipelining

58
Optimization - Switching activity reduction
59
Optimization - Parallelism and pipelining

Parallelism if there is a inherent parallelism
in the application, the throughput of the system
can be doubled ( roughly, but beware of Amdahls
law ) by doubling the number of functional units.
Alternately, the same throughput can be
maintained, with twice the number of functional
units, each operating at half the frequency.

60
Optimization - Parallelism and pipelining

Now each of these functional units can be
operated at a lower voltage. This usually works,
because power has square law relationship with
voltage, and throughput has roughly linear
relationship with frequency and multiplicity.
Pipelining Reduces the propagation time of a
block by half, now voltage can be reduced,
without changing operating frequency. The
throughput does not change.
Problems Amdahls law. Point of diminishing
returns. Needs high level estimation capability.

61
Architecture of the DLX
62
Architecture of the DLX

A load-store architecture. Most operations work
off the register file (32x32bit). Two memory
addressing modes. Memory is byte addressable.
Has instructions similar to most contemporary
RISCs.
32 bit datapath. Only integer instructions
implemented.

63
Related Work

Called the instruction level power model.
A power cost is associated with every
instruction.
The characterization is top down. Every
instruction is executed in a loop. The loop
overhead is kept minimal. The average current is
measured with an ammeter.
Method experimented with three processors
representing 3 broad architectural styles.
Intel 80486 - CISC.
Fujitsu SPARClite - RISC for embedded
applications.
Motorola DSP

64
Flow
65
Conclusions

Power is becoming an important aspect of VLSI
design, due to the high integration levels and
the high performance of present day systems.
Power estimation is needed at every step in the
design process to ensure that the design does not
violate the power specification.
Power minimization is still done in an ad-hoc
manner. Except for clock gating and gate
reorganization, very few tools are available for
real designs.
Unless aggressive power minimization is done at
all levels in the design process, power could
limit the level of integration or performance in
the future.