POWER OPTIMISATION IN VLSI CIRCUITS Charles Augustine - PowerPoint PPT Presentation

1 / 66
About This Presentation
Title:

POWER OPTIMISATION IN VLSI CIRCUITS Charles Augustine

Description:

Charles Augustine. 4. Basics. Power an important consideration in ... Charles Augustine. 11. In CMOS, power dissipation is because of the following causes ... – PowerPoint PPT presentation

Number of Views:799
Avg rating:3.0/5.0
Slides: 67
Provided by: srikan9
Category:

less

Transcript and Presenter's Notes

Title: POWER OPTIMISATION IN VLSI CIRCUITS Charles Augustine


1
POWER OPTIMISATION IN VLSI CIRCUITSCharles
Augustine
2
ORGANIZATON OF THE TALK
  • Basics, definitions. Role of low power in present
    day electronics.
  • Survey of estimation techniques.
  • Survey of power minimization techniques.
  • Conclusions

3
Basics
  • .
  • In the context of VLSI, power refers to the rate
    at which electrical energy is converted into heat
    (dissipation), or the rate at which energy is
    drained from a source (consumption).
  • Power an important dimension in present day VLSI
    digital systems.
  • Low power techniques were confined to analog
    designs in the past.
  • Some digital circuits like pacemakers and watches
    also needed low power techniques in the past.
  • The systems were operated at very low
    frequencies.
  • In the past, only area and performance were key
    careabouts of digital VLSI designs.

4
Basics
  • Power an important consideration in present day
    digital designs ?
  • Higher integration? Higher performance ? Higher
    power dissipated / unit area of the die.
  • Higher integration ? Market demand for portable
    consumer electronic devices powered by batteries.
    Higher energy densities ? can become explosive.

5
Basics
  • In high NiCd batteries energy density 20
    watt-hour/ pound
  • NiMH batteries energy density 35
    watt-hour/pound
  • In high NiCd batteries energy density 20
    watt-hour/ pound
  • performance systems, packaging and cooling costs
    increase with increased power dissipation.
    Problems with supplying large currents through
    device pins.

6
Basics
  • Why is power an important consideration in
    digital designs ?
  • Reliability issues ? Every 10ºC increase in power
    roughly doubles failure rate.
  • VLSI low power problems
  • Analysis Estimation finding sources of power
    dissipation.

7
Basics
  • Power estimation at every stage in the design
    process, to ensure that power specifications of
    the design are not violated.
  • Accuracy and Efficiency of estimation.
  • Estimation first step toward implementing
    optimization for low power.
  • Optimization
  • Process of generating the best design that meets
    specifications.
  • Depends on the result of the analysis. Needed to
    evaluate merits of design choices. Involves
    tradeoff amongst conflicting objectives.
  • Pavg CV2f

8
Technology trends - SIA roadmap
Moores Law, 2x every 18 months
9
Technology trends - Power
A 10mm2 chip clocked at 500Mhz would
consume 315W !!
10
Basics
  • Sources of power dissipation - depends on the
    logic family.
  • CMOS is the logic family preferred in many
    designs
  • Easy to get a functional circuit.
  • Almost zero static power dissipation. Low power
    by default.
  • In general, capacitive power gtgt short circuit
    power gtgt leakage power.

11
  • In CMOS, power dissipation is because of the
    following causes
  • Static power dissipation, when no signals are
    changing.
  • Leakage - Reverse bias saturation current,
    sub-threshold leakage.
  • Deviations from a strict CMOS design.
  • Dynamic power dissipation, when signals are
    changing
  • Short circuit power dissipation, when both
    devices are on.
  • Capacitive power dissipation, due to output
    loading.

12
Low power metrics
  • Absolute power number
  • Quoted in Watts. Useful in applications such as
    package selection, power supply distribution n/w
    design.
  • uW/MHz
  • Is the average energy consumed by the system.

13
Low power metrics
  • uW/MIPS or uW/SPEC.
  • Is the average energy per instruction. Useful for
    comparing processors of the same family.
  • uW/MIPS²
  • Normalizes the average energy per instruction,
    with the performance of that particular
    architecture. Useful for comparing processors of
    different families.
  • uW/(area MHz)
  • Energy dissipated by unit area of the die. Useful
    as a reliability measure. Also can be used to
    compare two chips with the same function, but
    different implementations.

14
Types of estimation
  • Depending on the application.
  • Instantaneous power estimation.
  • Peak power estimation.
  • Important for estimating IR drops.
  • Average power estimation.
  • Important for conserving battery life, packaging
    and cooling considerations.
  • System engineer What battery to use ?
  • Chip architect Will my architecture be feasible
    ?
  • Layout engineer How wide should the power
    supply lines be ?
  • Product engineer Which package to use ?

15
  • Depending on methodology
  • Simulation based
  • Vectors are given, circuit is known, simulation
    is performed. The instantaneous currents are
    averaged and power dissipation is then
    calculated.
  • Probabilistic analysis based
  • Averaging of the inputs is performed first,
    probabilistic measures are extracted. Power
    dissipation is then calculated.

16
Types of estimation
  • Monte Carlo methods
  • These methods focus on performing the right
    amount of simulation. When a stopping criterion
    is met, simulation is stopped.
  • Too few measurements - Inaccurate estimates.
  • Too many measurements - Waste of computing
    resources.
  • The mean of n samples obtained during
    characterization is compared against the
    statistically calculated mean. When they are
    sufficiently close to one another, simulation is
    stopped.

17
Types of estimation
  • Why so many methods ?
  • Accuracy
  • When implementation details are not available,
    relative accuracy should be stressed.
  • Efficiency
  • What is the size of the circuit that can be
    handled by any method?
  • Available information
  • For high level estimation, it is difficult to
    estimate the activity or the capacitance in the
    absence of any implementation data - high errors
    must be tolerated.
  • In some cases, only some characteristics of the
    input are known.

18
Circuit level estimation
  • Model is developed as shown below.
  • Model must be re-evaluated every time the value
    of the current changes as the value of the
    partial derivative depends on it.

19
Gate level estimation - Simulation
  • Basic blocks Logic gates
  • AND, OR, NOT, NOR, XOR, NAND etc.
  • Signals have a only a finite number of values
  • 0, 1, H, L, Z, U, X

20
Gate level estimation - Simulation
  • An event driven simulator is used for functional
    and timing simulation.
  • The simulator counts the number of specified
    events required for power estimation.
  • The average power is estimated from the equations
    shown in the next slide.

21
Gate level estimation - Simulation
22
Gate level estimation - Probabilistic
  • Probabilistic analysis
  • Estimates power based on certain probabilistic
    measures of the input data.
  • Only dynamic power dissipation can be estimated
    accurately.
  • Input signals are assumed to be random, taking on
    values 0 or 1.
  • Definitions of Probabilistic measures used are
    given below

23
Gate level estimation - Probabilistic
  • Transition probability Average fraction of the
    clock cycles in which the steady state value of
    the signal is different from its initial value.
  • Transition density Average number of transitions
    in unit time.
  • Static probability Average fraction of clock
    cycles in which the steady state value of the
    signal is logic high.

24
Gate level estimation - Probabilistic
25
Gate level estimation - Probabilistic
  • Sources of error
  • All measures are unaffected by circuit delays.
    Glitches are ignored.
  • Spatial correlation Different inputs to a
    circuit may be related to one another. This can
    happen because of circuit topology or input
    characteristics.
  • Temporal correlation Input values to a circuit
    may be related to previous values. This can
    happen because of memory or input
    characteristics.
  • In the presence of correlation, all probabilistic
    measures will become conditional probabilities.
    This is much harder to analyze.
  • Correlation causes estimates to be wrong. These
    methods work well when the inputs are random.

26
Gate level estimation - Probabilistic
27
Gate level estimation - Probabilistic
28
Gate level estimation - Probabilistic
29
Architecture level estimation
  • Challenges
  • Very little information is available about the
    implementation at this stage.
  • Difficult to estimate input patterns.
  • Difficult to characterize
  • A n-input block needs O(4n) input patterns for
    exhaustive characterization. Not practical even
    for small values of n.

30
Architecture level estimation, an application
31
Architecture level estimation, an application
32
Architecture level estimation
  • There is no general method in use today.
  • How to estimate capacitance and how to estimate
    activity for a circuit that has not yet been
    implemented ?
  • Some methods use scalable capacitance models.
  • For example, the total capacitance of a ripple
    carry adder is proportional to the number of full
    adder cells in it.
  • UWN characterization
  • Components are characterized with random data.
  • This works in situations where there is no
    particular relationship b/w data.

33
Architecture level estimation
  • Fails miserably for highly correlated data, such
    as DSP systems
  • Useful for comparing two different
    implementations.
  • Entropy based modeling.
  • Dual bit type method
  • Works well for DSP systems.
  • Based on the fact that there is a relationship
    between bit level and word level characteristics.
  • The data in such systems is obtained from
    sampling analog signals at a rate higher than the
    highest signal frequency.

34
Architecture level estimation
  • Difference between successive samples is small
  • LSBs change faster the MSBs in most applications.
  • UWN characterization gives very poor results for
    such inputs.
  • In some applications, the MSBs change faster than
    the LSBs these changes correspond to sign bit
    changes.

35
Architecture level estimation
  • Encountered in Delta modulation schemes.
  • Inputs are divided into three regions based on
    the nature of activity Sign bit region, UWN
    region and transition region. Sign Bit regions
    need many more capacitances.
  • Based on linear regression techniques.
  • The power is expressed as a linear equation of
    the input and output signal transitions. The
    constant coefficients are then determined during
    characterization.
  • Based on high level operation.
  • Some high level operations are easily
    identifiable in every circuit.
  • Example Memory - Read/Write ALU-ADD/MUL/SUB
    Might be state dependent. For example some
    writes can cost more than other.

36
Architecture level estimation
  • There is no general method in use today.
  • How to estimate capacitance and how to estimate
    activity for a circuit that has not yet been
    implemented ?
  • Some methods use scalable capacitance models.
  • For example, the total capacitance of a ripple
    carry adder is proportional to the number of full
    adder cells in it.
  • UWN characterization
  • Components are characterized with random data.

37
Architecture level estimation
  • Fails miserably for highly correlated data, such
    as DSP systems
  • Entropy based modeling.
  • This works in situations where there is no
    particular relationship b/w data.
  • Dual bit type method
  • Works well for DSP systems.
  • Based on the fact that there is a relationship
    between bit level and word level characteristics.
  • Useful for comparing two different
    implementations.
  • The data in such systems is obtained from
    sampling analog signals at a rate higher than the
    highest signal frequency.

38
ARCHITECTURAL LEVEL ESTIMATION
  • Entropy based modeling.
  • Dual bit type method
  • Works well for DSP systems.
  • Based on the fact that there is a relationship
    between bit level and word level characteristics.
  • The data in such systems is obtained from
    sampling analog signals at a rate higher than the
    highest signal frequency.

39
Dual bit type method
40
Optimization techniques
  • Methods adopted
  • Reduce voltage - example reduced voltage swing
    bus.
  • Reduce activity - example clock gating, balance
    delays.
  • Reduce wastage - example use drivers of the
    proper strength.
  • Manage well - example dynamic frequency and
    voltage reduction.
  • Possible to optimize at several levels, with the
    high level optimizations yielding the best
    results.
  • Process level - Vt reduction to enable voltage
    reduction and sub-threshold leakage voltage
    reduction.
  • Circuit level Voltage reduction, pin reordering,
    network reorganization, transistor sizing, self
    gating flip flops, dual edge triggering.
  • Gate level Pin reordering, Clock gating,
    restructuring.
  • RTL level Clock gating, retiming

41
Optimization techniques
  • Architectural level
  • Asynchronous vs. Synchronous architectures.
  • Bus encoding.
  • Resources vs. Power tradeoffs.
  • Data Formats.
  • Algorithm level
  • Precision vs. Power tradeoff. Reduced precision
    affects the behaviour of the algorithm.
  • Cache blocking.
  • System level
  • Partitioning, power down, dynamic power
    management.

42
Optimizations - circuit level
  • Optimization problems can be formulated more
    precisely at this level.
  • Incremental improvements are possible, but can
    make major improvements in cell based designs
    such as memories.
  • An important technique is voltage reduction.
    Impacts fabrication process and is sometimes
    driven by system considerations
  • Important techniques
  • Transistor sizing for dynamic power reduction
  • Transistor sizing for leakage power reduction
  • Network restructuring and reorganization
  • Transistor network repartitioning and
    reorganization
  • Special latches and flip flops
  • Differential input latch, Dual edge triggered
    flip flops, Self gating flip flops, combinational
    flip flops.
  • Low power libraries. P equivalence, N
    equivalence, NP equivalence
  • Reduced voltage swing, Adiabatic computation,
    Pass transistor logic.

43
Optimization - circuit level, buffer sizing
  • The objective is to use a buffer chain to drive a
    large load capacitance from a source which has
    low drive strength, without incurring large
    delays, and spending large amounts of power.

44
Optimization - circuit level, buffer sizing
  • If the load is driven directly, the delay would
    be large.
  • A large buffer can drive the load, but has large
    input capacitance itself.
  • The trick is therefore to use a chain of
    successfully larger buffers, as shown above. Each
    buffer drives appropriate load.
  • Too many buffers - Large propagation delay of the
    chain.
  • Too few buffers - Output slope will be not be
    steep, results in large delay.

45
Optimization - circuit level, buffer sizing
  • But switching current of every stage is k times
    higher than the preceding stage, because of the
    higher drive strength.
  • Total power of successive stages is k times
    higher.

46
Optimization - circuit level, buffer sizing
47
Optimizations -gate level
  • Some techniques
  • Logic structuring Function preserving transforms
    on the logic structure that give different area,
    power and delay characteristics.
  • Signal gating Reducing unwanted transitions.
  • Logic encoding Binary vs. Gray coding.
  • Bus invert coding.
  • State machine encoding.
  • Output dont care encoding.
  • Pre-computation logic.
  • Floating node elimination.

48
Optimizations - Architectural level
  • Have the highest impact on power.
  • Main themes
  • Design decisions include selection and
    organization of functional modules. Some of these
    decisions can impact the power to a great extent.
  • Power management class of techniques that
    carefully manage the performance and throughput
    of a system based on the its computation needs to
    achieve low power goals.

49
Optimizations - Architectural level
  • Some popular techniques
  • Microprocessor sleep modes.
  • Performance management.
  • Adaptive filtering based on SNR requirements.
  • Switching activity reduction through guarded
    evaluation.
  • Resource sharing.
  • Pipelining and parallelism.
  • Flow graph transformations.

50
Optimizations - Architectural level
  • Adaptive filtering based on SNR requirements.
  • Switching activity reduction through guarded
    evaluation.
  • Resource sharing.
  • Pipelining and parallelism.
  • Flow graph transformations.

51
Optimization - Performance management
  • The asynchronous data processor reads the input
    FIFO and loads the output FIFO immediately after
    the computation is over.
  • If the throughput is low, more data will queue up
    at the input FIFO.

52
Optimization - Performance management
  • Number of entries in the FIFO are used to
    determine the load on the system. If the system
    is loaded heavily, the voltage is increased to
    increase the performance of the system.
  • Another idea Parts of the system that are not
    performance critical are operated at lower
    voltages. However, having multiple power supplies
    on the same chip can complicate things.

53
Optimization - Adaptive filtering
  • Filtering operation - Mathematical operations on
    sample values - mainly multiplication and
    addition and memory.
  • Two varieties IIR and FIR

54
Optimization - Adaptive filtering
  • Filter order decides the sharpness of the filter
    cutoff. Higher order filters generally have
    better quality. But need more mathematical
    operations to implement.
  • Filter order is fixed based on the input SNR and
    the desired output SNR

55
Optimization - Adaptive filtering
  • When the input SNR is good, the filter order may
    be reduced, to reduce power without sacrificing
    quality.
  • The output noise level is monitored, when it is
    above a given threshold, the order of the filter
    is increased.
  • The estimate of output noise is the difference
    between the input energy and the output energy
    which equals the input noise energy ,
    multiplied by the stop band energy response of
    the filter stored in a lookup table .
  • The power benefits of this method depend on the
    noise characteristics of the input. If the input
    is not noisy most of the time, significant power
    savings can be possible.

56
Optimization - Switching activity reduction
  • Guarded evaluation.
  • When the outputs of a functional units are
    dont-care, the inputs to that block are not
    allowed to toggle, this can save significant
    amounts of power.
  • Glitch elimination using pipelining
  • Glitches occur when signals incurring different
    delays appear at the input of a gate.
  • Glitch probability is higher for circuits with
    large logic depths. The delays at the gate inputs
    can spread more widely in such cases.
  • Pipelining limits the logic depth, and reduces
    the probability of glitching.
  • Attractive for datapath circuits because they
    have large logic depths and several inputs of a
    datapath element can switch simultaneously.
  • Flow graph transforms
  • Strength reduction
  • Critical path reduction - Voltage scaling can be
    applied to reduce power
  • Redundancy elimination
  • Loop unrolling loop pipelining

57
Optimization - Switching activity reduction
  • Pipelining limits the logic depth, and reduces
    the probability of glitching.
  • Attractive for datapath circuits because they
    have large logic depths and several inputs of a
    datapath element can switch simultaneously.
  • Flow graph transforms
  • Strength reduction
  • Critical path reduction - Voltage scaling can be
    applied to reduce power
  • Redundancy elimination
  • Loop unrolling loop pipelining

58
Optimization - Switching activity reduction
59
Optimization - Parallelism and pipelining
  • Parallelism if there is a inherent parallelism
    in the application, the throughput of the system
    can be doubled ( roughly, but beware of Amdahls
    law ) by doubling the number of functional units.
    Alternately, the same throughput can be
    maintained, with twice the number of functional
    units, each operating at half the frequency.

60
Optimization - Parallelism and pipelining
  • Now each of these functional units can be
    operated at a lower voltage. This usually works,
    because power has square law relationship with
    voltage, and throughput has roughly linear
    relationship with frequency and multiplicity.
  • Pipelining Reduces the propagation time of a
    block by half, now voltage can be reduced,
    without changing operating frequency. The
    throughput does not change.
  • Problems Amdahls law. Point of diminishing
    returns. Needs high level estimation capability.

61
Architecture of the DLX
62
Architecture of the DLX
  • A load-store architecture. Most operations work
    off the register file (32x32bit). Two memory
    addressing modes. Memory is byte addressable.
  • Has instructions similar to most contemporary
    RISCs.
  • 32 bit datapath. Only integer instructions
    implemented.

63
Related Work
  • Called the instruction level power model.
  • A power cost is associated with every
    instruction.
  • The characterization is top down. Every
    instruction is executed in a loop. The loop
    overhead is kept minimal. The average current is
    measured with an ammeter.
  • Method experimented with three processors
    representing 3 broad architectural styles.
  • Intel 80486 - CISC.
  • Fujitsu SPARClite - RISC for embedded
    applications.
  • Motorola DSP

64
Flow
65
Conclusions
  • Power is becoming an important aspect of VLSI
    design, due to the high integration levels and
    the high performance of present day systems.
  • Power estimation is needed at every step in the
    design process to ensure that the design does not
    violate the power specification.
  • Power minimization is still done in an ad-hoc
    manner. Except for clock gating and gate
    reorganization, very few tools are available for
    real designs.
  • Unless aggressive power minimization is done at
    all levels in the design process, power could
    limit the level of integration or performance in
    the future.

66
  • THANKYOU
Write a Comment
User Comments (0)
About PowerShow.com