Title: Energy Awareness and Uncertainty in Design at Microarchitectural Level
1Energy Awareness and Uncertainty in Design at
Microarchitectural Level
Title Goes Here
- Diana Marculescu
- Dept. of Electrical and Computer Engineering
- Carnegie Mellon University
- dianam_at_ece.cmu.edu
- http//www.ece.cmu.edu/enyac
2Upcoming technology challenges
- Main design constraints energy and power density
Source Shekhar Borkar, DAC/MICRO 2004
3Upcoming technology challenges
System performance and leakage power severely
affected by variability
Increased soft error rates make reliable
computing a challenge
Source Shekhar Borkar, DAC/MICRO 2004
4Performance-power-variability interaction
- Deeper pipelining worsens random variation impact
- Total variation impact insensitive to pipeline
depth - Variations worsen with increasing number of
critical paths - Performance enhancing techniques increase number
of critical paths the simplest ( deep
pipelining) increases overall random variations. - Energy awareness increases variability selective
voltage scaling adaptive resource scaling, etc.
e.g., they exploit existing slack for hiding
voltage scaling latencies.
Source Shekhar Borkar, DAC 2004
5Multiple Clock/Voltage Architectures
- Need for synergistic approaches that tie
- Lower level technology capabilities such as
supply and threshold voltage or speed scaling - with knobs available at higher levels of
abstraction (microarchitecture, architecture,
system level) - Need for coping with design complexity issues
through localized, fine-grain power management - Possible solution the use of voltage/frequency
islands that may run at lower speed/power for a
prescribed workload - Need for dealing with variability, be it from
process or system parameters, or
application/workload
Proposed solution Multiple clock/voltage (MCV)
domain microarchitectures
6Using MCVs for energy awareness ISCA2002,
ISCA2005
Synchronous
MCV
- Potential for fine-grain adaptability
- Different speeds among synchronous blocks
- Different voltages, and hence potential for lower
power consumption - Can be used with on-the-fly application-driven
adaptation - Fits nicely with the multi- paradigm
- Multi-core, multi-threading
7How many frequency islands?
8Performance increase coefficient
- Significant speed-up can be achieved increasing
clock speed in the Fetch or Memory, followed by
Integer and FP partitions
9Delay - voltage dependency
- Fine-grain voltage reduction can be beneficial in
GALS systems - Each clock domain can be run at a different speed
- Vdd is the supply voltage
- Vt is the threshold voltage
- a is a technology-defined constant between 1 and
2 - a is 1.2-1.6 for present generation Chen et al.,
1998
10A possible solution Dynamic control strategy
- Threshold-based algorithm Iyer et al., 2002
- Assumes two operating modes
- Selects the appropriate mode based on the Issue
Queue occupancy - Attack-decay algorithm Semeraro et al., 2002
- Assumes several (tens) operating modes
- Tries to preserve the same Issue Queue occupancy,
while more aggressively pursuing the best
power/performance trade-off - Additional improvements Talpes et al., 2003,
2005 - Fetch clock speed scaled to match commit rates
- Use cross-domain dependency information to
eliminate false positives in low occupancy rates
11Impact on energy reduction ISLPED2003, IEEE
TVLSI2005
- By using DVS, an average energy reduction of up
to 25 can be achieved at the expense of a 10
penalty in performance - Note that these are pessimistic results
variability is not taken into account
12Energy-Variability interaction for MCV uarchs
- Potential for less process or system
parameterinduced variability - Local clock domains are characterized by static
or dynamic variations characterized by lower
means - Enable faster local speeds AND better overall
energy efficiency - Intuitively
- The number of critical paths per clock domain is
smaller - Hence, less variability
- Beneficial impact on other parameters as well
- Smaller variations in temperature per clock domain
Synchronous
MCV
13Variability modeling Bowman et al., 2002
- Assume normal distributions for critical path
delay (Tcp,nom nominal critical path delay) - Maximum critical path delay distribution (f
probability density, F cumulative probability
function, Ncp number of critical paths)
14Energy-performance-variability DAC 2005, IEEE
MICRO 2005
- A possible probabilistic design metric that needs
to be maximized (FMAX clock speed distribution)
Borkar, 2004 - However, in the case of high-end processors
- Clock speed does not necessarily translate into
performance - Moreover, IPC increasing artifacts affect
variability - Proposed joint metric that must be minimized
- The goal is to include variability in the maximum
critical path or minimum clock speed, with and
without temperature modeling (q temperature,
ncp logic depth) Basu et al., 2004
15Critical path delay distribution without Temp
- Locally clocked domains have a mean value for the
maximum critical path delay that is 2-12 smaller
than for the fully synchronous baseline
16Critical path delay distribution with Temp
- Locally clocked domains have a mean value for the
maximum critical path delay that is 8-18 smaller
than for the fully synchronous baseline
17Q Metric Distribution w/ and w/o Temp
- Using local speed/voltage scaling per clock
domain decreases Q by 26 when compared to the
synchronous baseline
18Instead of summary
- Microarchitectural modeling of process
variability effects is possible - In conjunction with fine-grain DVS,
minimally-clocked machines provide a better joint
energy/performance/variability metric than their
fully synchronous counterparts - Considered only WID-induced gate length effects
and temperature-induced effects - Ahead
- both WID and D2D variability
- leakage variations
- true microarchitecture design exploration with
variability in mind
19Thank you!
- More information
- http//www.ece.cmu.edu/enyac