Title: PowerAnalyzer for Pocket Computers
1PowerAnalyzer for Pocket Computers
- Dr. Robert Graybills PAC/C Program
- Fourth Review September 13, 2002
- Todd Austin and Trevor Mudge, U. Michigan
- Dirk Grunwald, U. Colorado
- http//www.eecs.umich.edu/jringenb/power/
2Status PowerAnalyzer Related Projects
- Budget summary
- Remaining schedule
- September release of PowerAnalyzer
- Simplescalar ARM-based platform
- MiBench
- Drowsy Cache
- Vertigo
3Budget Summary
- As of September 30, 2002 our total expenses are
anticipated to be approximately 694,965 and
our total award to date is 698,578 so that means
we will be carrying over only about 3,600 into
FY2003. - Our total award was 797,414
- Remaining no-cost extension allocation 98,836
4Schedule thru June 02
- September 1, 02 PowerAnalyzer version 1 Release
- October 15, 02 Test release of iPAQ platform
simulator - December 1, 02 Complete test integration of
PowerAnalyzer into platform simulator - January 1, 03 Release complete power models for
all the functional units supported by
SimpleScalar.There are remaining power model for
some functional blocks. - RUU unit using fully associative memory.
- Load/Store Queue etc.
- Floating point units power model.
- Refine area / clocked node capacitance models to
support GALS research. - February 1, 03 Rev 2.0 release of Simulators
PA - March 1, 03 Calibration for ARM7 TDMI Co-work
with Seoul National University (Prof. Naehyuck
Changs group). - April 1, 03 Provide technology scalable power
model.
5September Release
- Cache memory
- Datapath
- Random Logic
- Clock tree
- Chip I/O pads
- Buses
- Data sensitivity
- Leakage
6Leakage power model
- Using empirical equation for the leakage current
- Estimate effective channel width for a simple
inverter
Winvp1
ILeakp
P1
Worst case estimation!
ILeakn
N1
Winvn1
Weff max(µp x Winvp1, µn x Winvn1)
7Leakage power model
- Estimate effective channel width for 2-input nand
gate
Wnandp1
Wnandp2
P1
P2
N1
Wnandn1
N2
Wnandn2
Weff max(µp x Weff_nandp, µn x Weff_nandn)
Weff_nandp Wnandp1 Wnandp2
8Leakage power model
- Applying the leakage power model to
- memory decoder (nand/nor gates)
- memory cell (inverters)
- repeater/buffers (inverters)
- Calibrated against HSPICE and 0.07um technology
- Giving the same order of the magnitude for
various circuits. - Extending leakage power model to random logic
circuits - Random logic is usually modeled by a group of
nand-equivalent circuits. - Estimate total Weff from the given number of the
gates in the functional unit.
9Newly added power models
- Translation Lookahead / Branch Traget Buffer (TLB
/BTB)
- SimpleScalar Interface for extracting x-switching
/ y-switching info.
tag array
data array
address (x-switching component)
internal / leakage component
internal / leakage component
TLB/BTB (cache like structure)
physical page number / branch address (y-switching
component)
10Newly added power models
- Branch predictor components
- Bimodal/Return Address Stack
- Branch predictor components
- 2-level branch predictor
- Consists of two memory arrays.
- The output of the first array become the input of
the second array.
address (x-switching component)
Lev 1
address (x-switching component)
internal / leakage component
(y-switching component of Lev 1)
Lev 2
(x-switching component of Lev 2)
2-bit counter value / return address (y-switching
component)
2-bit counter value
(y-switching component)
11Newly added power models
- Newly added power models are fully linked to
SimpleScalar micro-architectural parameters. - The change of the u-arch parameters are fed to
the power model generator automatically. - Providing optimized memory model for small size
array. - BTB/TLB/RAS/branch predictors are small compared
to large cache memory. - Extracting x-switching / y-switching activities
from the simulator on the fly. - Providing all the branch predictor types
supported by SimpleScalar - Bimodal
- 2-level
- Combination
- RAS
12Improved power analysis out fmt.
x switching power dissipation caused by input
switching or transition
y switching power dissipation depending on
output switching or transition
internal power dissipation independent of
in/output switching
leakage leakage power dissipation
pdissipation x switching y switching
internal leakage
peak peak pdissipation
aio.xswitching 720.2696 aio total x switching
power aio.avgxswitching 0.0511 aio average x
switching power aio.yswitching 4417.1984 aio
total y switching power aio.avgyswitching
0.3110 aio average y switching
power aio.internal 0.0000 aio total internal
power aio.avginternal 0.0000 aio average
internal power aio.leakage 0.1242 aio total
leakage power aio.avgleakage 0.0001 aio
average leakage power aio.pdissipation 5137.5922
aio total pdissipation aio.avgpdissipation
0.3621 aio average pdissipation aio.peak
2.2747 aio peak power
13Interface with MILAN
- PowerAnalyzer configuration parameters
- fully compatible with SimpleScalar configuration
parameters. - integrated with SimpleScalar architectural
parameters. - command line options
- configuration file
sim-panalyzer cacheil1 il1512321l
-panalyzeril1 il1a2331.84110.25
sim-panalyzer config sa-110.cfg
./microbench/dhrystone
l1 cache configuration cacheil1 il1512321l
sa-110.cfg
l1 cache power analyzer configuration panalyzer
il1 il12331.84110.25
14SimpleScalar/ARM Platform Simulation
- System-level simulation support for
SimpleScalar/ARM - Adds device modeling to ARM modeling
infrastructure - Permits execution of applications plus kernel and
device drivers - Increases fidelity of embedded workload modeling
- Key design goals
- Implementation of a complete embedded device set
- Flexible device configuration
- Extensible device interface
- Reproducible real-time experiments
SPEC, MiBench, etc On Linux, WinCE, etc
ARM7 ISA ARM FPA
Device Emulation
Power/Performance Model
Fetch
Pipeline
SA-1100/ XScale Core
Predictor
Caches
Simulation Kernel
Host Platform Interface
15Device Modeling Infrastructure
- Space manager
- Orchestrates memory and I/O accesses
- Devices register for address ranges
- Simple to integrate new devices
- Configuration manager
- Allows multiple platform emulation without code
changes - Via platform configuration files
- Specifies device address and other functionality
parameterization - iPAQ -gt LW through configuration file
- I/O manager
- Permits reproducible real-time experiments
- Records external I/O to trace file with
timestamps - Replay I/O to re-create experiment
- Initial device set
- SA-1110 pipeline and integrated devices
- PCMCIA
- Enables Bochs drivers
- NE-2000 network interface
I-cache
IMMU
SA-1110 Integer Pipeline
D-cache
DMMU
FPA
RAM
PIC
Space Manager
RTC
Flash
DMA
PCMCIA
SER0
GPIO
I/O Mgr
Platform Config
implemented
in development
next generation
16I/O Tracing Technology
Fast Functional Simulator
Today
Instrumented Device Models
External I/O Trace
Detailed Simulation w/ Deep Analysis
Live Hardware Execution
Initial State
I/O input events (w/time-stamps)
Instrumented Device Drivers
Future Applications
Test Synthesis Framework
- Fully addresses the difficulties of creating
reproducible real-time experiments with support
for deep analysis - 1000x compression over industry-standard branch
tracing - Re-creates of all program data values (unlike
branch tracing) - External I/O events traced
- DMA, memory-mapped I/O, external interrupts
- All events time-stamped with processor cycle count
17System Model Validation
- System level validation approach
- Driver scripts drive reference execution
- Boot Linux, execute test programs
- Run on IPaq H/W and Sim-IPaq
- Differences in output drive debug
- Challenging debug task
- All bugs today required debugging execution
problems within Linux kernel and device drivers,
on Sim-IPaq model - Forced refinement of simulator debug
infrastructure - Event tracking, tracing added
- Support for memory write snooping
- Dataflow trace mechanism implemented
- Current status
- 42M instructions into Linux boot
- Bootloader fully functional
- Kernel modules mmap, vmap, console, serial, intr,
timer, rts, deflate, fpu, flash, and environ are
functional - Currently attacking bugs in filesystem
- Identified two Linux kernel timing bugs
Linux Shell Scripts
Sim-IPaq System Simulation Model
Program Output Kernel messages
!
Debug kernel on Sim-IPaq (using simulator debug
infrastructure)
18EASE Embedded Architecture Simulation Engine
FU
FU
EX/ MEM
IF
ID
WB
FU
- Fetch Insts
- Place Insts in IFQ
- Decode Insts
- Schedule FUs
- Send Ready Insts to EX in program order
- Execute Insts
- Schedule register writeback
- Write result to RF
- Free FUs
- Validate result
- Complete generality of SimpleScalar
microarchitecture models is not needed for
embedded processor modeling - Register renaming, dynamic scheduling, value
speculation not used - Learning curve is too steep for embedded modeling
- EASE implements a fast flexible embedded
architecture - In-order pipeline, superscalar execution,
co-processor interfaces, multi-level memory
hierarchy, conventional DRAM architectures - Small model, less than 2500 lines of C code
- Emulator computes latch data at all pipeline
stages (micro-functional) - Writeback checker validates the results of all
instructions simulated - Accurately models short SA-1110 pipeline, and
longer XScale pipe
19MiBench 2.0
- Changes
- Bug fixes / portability fixes
- Modify input to be more realistic
- Standardized random number generation
- Remove dependencies on Linux
- Additions
- Move input files into code (reduce file I/O)
- Move command line input into code (reduce
dependency on shell environment) - Create global build/support environment
- Generate precompiled binaries for several target
systems - Add new benchmarks
- Handwriting recognition
- Network packet routing
- New FFT and GSM Code
20Recent Publications
- Nam Sung Kim, Krisztián Flautner, David Blaauw,
Trevor Mudge. Drowsy Instruction Caches Leakage
Power Reduction using Dynamic Voltage Scaling and
Cache Sub-bank Prediction. the 35th Int. Symp.
Microarchitecture (MICRO 35), Istanbul , Nov.
2002. - K. Flautner and T. Mudge. Vertigo Automatic
Performance-Setting for Linux. Fifth Symposium on
Operating Systems Design and Implementation
(OSDI), Boston, Dec. 2002 - S. Martin, K. Flautner, D. Blaauw, and T. Mudge.
Dynamic voltage scaling and adaptive body biasing
for optimal power consumption in microprocessors
under dynamic workloads. ICCAD. - K. Flautner, S. Reinhardt, and T. Mudge.
Automatic performance setting for dynamic voltage
scaling. ACM Jour. Wireless Networks. - Flautner, N. Kim, S. Martin, D. Blaauw, T. Mudge.
Drowsy Caches Simple techniques for reducing
leakage power. Proc. of the 29th Ann. Int. Symp.
on Computer Architecture, Anchorage Alaska, May
2002, pp. 148-157.
21Adding a DSP to the Platform
- Texas Instruments TMS320C62xx and C64xx
- Shared memory model
- Heterogeneous multiprocessor
- C62xx functional and tested
- Early studies using Smart Camera workload and
ETSI GSM voice codec - Comparison of C6xx with 4 issue ARM out-of-order
processor
22TMS320C6200
23DSP vs. GPP
24Vertigo Interactive application
Performance level
Vertigo
Time (s)
- Performance-setting decisions during run of
Acrobat Reader - Data collected on x86 based machine under Linux
- Invention disclosure
25Drowsy Caches
Extended to Instruction Caches Invention
disclosure
26Implementation of the Drowsy Cache Line
27ARM Partners Meeting
- Summer 2002
- Vertigo demo
- Drowsy presentation
- Negotiating IP sale
28Fin