Title: Instruction Level Power Analysis
1Instruction Level Power Analysis
- Manoj Gupta, 2001119
- Mayank Gupta, 2001120
2Layout
- Introduction
- Components of Power Consumption
- Power Characterization
- Instruction Level Power Analysis for RISC
processors - Extensions for VLIW/EPIC processors
- Register Files
- Caches
3Introduction
- Why power of nano-electronics became so
important? - Because of Moores law still holds true through
complex applications - Mobile systems battery bottleneck
- High performance computation heat extraction
- Operating cost and reliability
- Data warehouse of ISP with 8000 servers needs 2 MW
4Introduction
- Power or Energy? Arent they go hand-in-hand?
- Power varies significantly with time!
- A given battery has fixed amount of energy
- Average power consumption Energy/Execution-time
- Decides average chip and junction temperature
- Decides battery life (if peak current lt rated
current) - Peak power and current
- Voltage drops, hot spots, rate of battery
discharge - Power-efficient, Energy-efficient,
Battery-efficient design paradigms do exist!
5Components of Power Consumption
- System hardware platform software (sys.
app.) - Software impacts hardware power consumption
- Static power
- Sub-threshold leakage reverse biased junction
leakage - Quiescent biasing power (in case of non-CMOS
circuits) - Dynamic power
- Charging and discharging of capacitance
(switching activity) - Short circuit power during transition (rate of
change, delay) - Alternative grouping (used at component/cell
level) - Switching power at the boundaries of cells
- Internal cell power
- Short circuit power
- Switching power at internal nodes
6System Abstractions - Power
Functional Specifications and Constraints System
Level Netlist Register Transfer Level (RTL)
Netlist Component/Cell Level Netlist Layout or
Configuration-bits Chip
Time complexity
Accuracy of power characterization
Opportunities for optimization
7Power Characterization
- Measurement (Chip/Board Level)
- Most accurate
- Perhaps the fastest, if setup and tools exist
- Too late to change hardware details
- Software/Load control is still possible
- Typically used for software optimizations
8Power Characterization (cont)
- Transistor Level (estimation)
- Spice simulation of transistor level netlist
- Most accurate in the simulation world
- Requires complete implementation details
- Unmanageable time complexity even for simpler
designs - Typically used for cell/component
characterization - Synopsys PowerMill (said to provide spice-like
accuracy)
9Power Characterization (cont)
- Cell Level (estimation)
- After logic synthesis
- Requires RTL implementation
- Simulation to capture switching activity
- Requires delay simulation if glitches need to be
accounted - Characterized cells empirical formulas or table
look-up - Interconnect power
- Either unaccounted or
- Using estimated wire load models (typically based
on experience) or - Extracted layout (if done after physical
synthesis) - Still unmanageable time complexity especially to
use in design space exploration - Synopsys PrimePower
- Netlist, interconnect capacitance, VCD traces,
cell power library
10Power Characterization (cont)
- Register Transfer Level (estimation)
- Requires conceptual RTL description (detailed
micro-architecture) - Data-path is modeled as netlist of macro cells,
which are characterized offline - Control path and glue logic
- Either unaccounted or estimated based on I/O
- Simulation to capture switching activity
- Typically glitches are not considered but methods
do exist - Interconnect power
- Typically unaccounted but possible to estimate
through floor-planning - Typically used in DSE mostly using in-house tools
11System Level Power Estimation
- For Design Space Exploration
- Least accurate but uncertainty of exploration
results can be reduced if models have good
fidelity - Purpose, target architecture and available system
details govern the system-level estimation models - Selecting algorithm or designing hardware for
given algorithm? - ASIC based or processor based?
- Is ISA fixed or extensible?
- Typically system-level power estimation models
are macro-architecture template specific - Major constituents of power consumption
- Computation, communication, storage units
peripherals
12Power Estimation Models
- Activity Based Models
- Instruction Level Energy Models
13Activity Based Models
- Fixed Activity Model
- N-Transition Model
- Dual Bit Model
14Fixed Activity Model
- P ? i kiGifi
- Where
- ki PFA proportionality constant extracted
empirically from past designs - Gi Measure of hardware complexity
- fi Activation frequency
- Disadvantage Do not model the influence of data
activity on power consumption
15N-Transition Model
- P Pconst n.Pchange
- Disadvantage
- It does not differentiate between transitions
on different inputs.
16Dual Bit Type Model
- Drawback in previous approaches
- Less Accurate
- Characterizes the module on basis of Uniform
White Noise (UWN) input - Leads to high error if the input dynamic range
does not fully occupy the word length
17Dual Bit Type ModelThe Approach
- Combines reduced complexity of the architecture
level with the accuracy of gate and circuit level - Black box model of capacitance switched in each
module for various types of inputs - Easy to parameterize capacitance models to take
into account size , etc. -
18Dual Bit Type ModelModeling Complexity
- Power consumed by a module is a function of its
complexity as large modules contain more
circuitry - Examples
- Capacitance of N-bit ripple carry subtracter
- CT Ceff N
- Not restricted to linear models, but can be used
to specify even more complex models -
19Dual Bit Type ModelCapacitive Data Coefficients
- Describe the average amount of capacitance
switched within a module during an input
transition - LSB regions suffer random transitions and hence
can be characterized by a single capacitive
coefficient CUU - MSB region experiences sign transitions and so is
characterized by capacitive sign coefficients
C-,C, etc.
20Instruction Level Power Estimation
- First introduced to characterize processor power
consumption to drive software optimizations - Each instruction is associated with some current
- Inter instruction effects for better accuracy
21Instruction Level Power Estimation
- E S(Bi x Ni) S(O(i,j) x N(I,j)) SEk
- Bi Base Energy Cost
- Oi.j Inter-instruction effect Energy Cost
- Ek additional energy penalties due to resource
constraints - Require cost associated with every pair of
instructions O(N2), where N number of
instructions in ISA
22JouleTrack
- Experiments on StrongARM by Amit Sinha
A.P.Chandran - Current/instruction 0.2A (averaged over all
instructions) - Min-max variation of 38 of average current
- Address mode and data dependent variation is
smaller - But, max current variation across benchmarks is lt
8 ! - Concluded that first order energy model of a
given processor is, E V I(V, f) T - Second order effects can be significant for
data-path dominated processors such as DSP, VLIW
23Instruction Level Power Estimation
- Impractical for CISC processors with very large
instruction set - Higher Average Instruction Energy
- Low Energy Per Instruction Variance
- Do not consider inter instruction effects
- Cluster Similar Instructions as a single class
- Exponential Storage Problem for VLIW
architectures - No. of Long Instructions N operations into a
K-wide VLIW N(2k)
24Modified Energy Model for VLIW
- Assume Independent Energy dissipation for
different Execution slots - Consider nop as the base energy
- E(W) SU(wnwn-1) mxpxS lxqxM
- U(wnwn-1) U(00) Sv(wnk,wn-1k)
- Wnk operation issued on lane k by instruction
wn - Example
- Wn ALU NOP NOP NOP, Wn-1 LS NOP ALU
NOP - U(wnwn-1) U(00) v(ALULS) v(NOPALU)
- Memory Requirement
- O(KN2)
25Modified Energy Model for VLIW
- Cluster Similar Instructions based on cost
- T e1, e2, , et
- et energy consumption of instruction t
- Partition T into K clusters (C1, C2, , Ck) s.t.
- SS (xi,j cj)2 minimum
- Large number of clusters
- Good Accuracy
- Huge no. of experiments
- Small number of clusters
- Small number of experiments
- High Variance between clusters
- Reduced Accuracy
- Memory Requirement
- O(CN2)
26Limitations of ILPA
- Does not provide any insight on the causes of
power consumption within the processor core - Does not account for the power consumed in the
memory system, which is often dominant - To address the second limitation, power
estimation frameworks which integrate processor
and memory models are built around instruction
set simulators
27MicroArchitecture ILPA
- Pipeline Aware Instruction Level Energy Model
- Divide the design into smaller architectural
blocks - Usually Processors Pipeline Stages
- Fetch, Decode, RF, Execute, WB
- E(wnwn-1) S As(wnwn-1) I(wnwn-1)
- As Energy Consumed Per stage s when executing
wn after wn-1 - I(wnwn-1) Interstage connections energy
(PipeLine Registers Buses) - Provides better insight for power bottlenecks
- Smoother Energy Behaviour than Blackbox model
- Require a Pipeline Structure Aware ISS
28Energy Models for Register File
- Assume Linear Power Behaviour for access across
different ports - PRF Pi 1/T S (Er,n Ew,n)
- Er,n S H(RRi,n, RRi,n-1) Erb
- Ew,n S H(RWi,n, oldi,n) Ewb
29Energy Model for Caches
- Power consumption depends on mode of operation
(read, write, idle) - Energy consumed in a given clock cycle is
function of node transition between previous and
current cycle. - Characterize energy as function of state
transitions(read-read, read-write, etc). - For a given transition, dependence upon
transition on address lines.
30Thank You