PowerAnalyzer for Pocket Computers - PowerPoint PPT Presentation

1 / 28
About This Presentation
Title:

PowerAnalyzer for Pocket Computers

Description:

Interface with MILAN. PowerAnalyzer configuration parameters ... MILAN can use the same configuration routines for SimpleScalar to configure PowerAnalyzer ... – PowerPoint PPT presentation

Number of Views:115
Avg rating:3.0/5.0
Slides: 29
Provided by: todda7
Category:

less

Transcript and Presenter's Notes

Title: PowerAnalyzer for Pocket Computers


1
PowerAnalyzer for Pocket Computers
  • Dr. Robert Graybills PAC/C Program
  • Fourth Review September 13, 2002
  • Todd Austin and Trevor Mudge, U. Michigan
  • Dirk Grunwald, U. Colorado
  • http//www.eecs.umich.edu/jringenb/power/

2
Status PowerAnalyzer Related Projects
  • Budget summary
  • Remaining schedule
  • September release of PowerAnalyzer
  • Simplescalar ARM-based platform
  • MiBench
  • Drowsy Cache
  • Vertigo

3
Budget Summary
  • As of September 30, 2002 our total expenses are
    anticipated to be approximately 694,965 and
    our total award to date is 698,578 so that means
    we will be carrying over only about 3,600 into
    FY2003.
  • Our total award was 797,414
  • Remaining no-cost extension allocation 98,836

4
Schedule thru June 02
  • September 1, 02 PowerAnalyzer version 1 Release
  • October 15, 02 Test release of iPAQ platform
    simulator
  • December 1, 02 Complete test integration of
    PowerAnalyzer into platform simulator
  • January 1, 03 Release complete power models for
    all the functional units supported by
    SimpleScalar.There are remaining power model for
    some functional blocks.
  • RUU unit using fully associative memory.
  • Load/Store Queue etc.
  • Floating point units power model.
  • Refine area / clocked node capacitance models to
    support GALS research.
  • February 1, 03 Rev 2.0 release of Simulators
    PA
  • March 1, 03 Calibration for ARM7 TDMI Co-work
    with Seoul National University (Prof. Naehyuck
    Changs group).
  • April 1, 03 Provide technology scalable power
    model.

5
September Release
  • Cache memory
  • Datapath
  • Random Logic
  • Clock tree
  • Chip I/O pads
  • Buses
  • Data sensitivity
  • Leakage

6
Leakage power model
  • Using empirical equation for the leakage current
  • Estimate effective channel width for a simple
    inverter

Winvp1
ILeakp
P1
Worst case estimation!
ILeakn
N1
Winvn1
Weff max(µp x Winvp1, µn x Winvn1)
7
Leakage power model
  • Estimate effective channel width for 2-input nand
    gate

Wnandp1
Wnandp2
P1
P2
N1
Wnandn1
N2
Wnandn2
Weff max(µp x Weff_nandp, µn x Weff_nandn)
Weff_nandp Wnandp1 Wnandp2
8
Leakage power model
  • Applying the leakage power model to
  • memory decoder (nand/nor gates)
  • memory cell (inverters)
  • repeater/buffers (inverters)
  • Calibrated against HSPICE and 0.07um technology
  • Giving the same order of the magnitude for
    various circuits.
  • Extending leakage power model to random logic
    circuits
  • Random logic is usually modeled by a group of
    nand-equivalent circuits.
  • Estimate total Weff from the given number of the
    gates in the functional unit.

9
Newly added power models
  • Translation Lookahead / Branch Traget Buffer (TLB
    /BTB)
  • SimpleScalar Interface for extracting x-switching
    / y-switching info.

tag array
data array
address (x-switching component)
internal / leakage component
internal / leakage component
TLB/BTB (cache like structure)
physical page number / branch address (y-switching
component)
10
Newly added power models
  • Branch predictor components
  • Bimodal/Return Address Stack
  • Branch predictor components
  • 2-level branch predictor
  • Consists of two memory arrays.
  • The output of the first array become the input of
    the second array.

address (x-switching component)
Lev 1
address (x-switching component)
internal / leakage component
(y-switching component of Lev 1)
Lev 2
(x-switching component of Lev 2)
2-bit counter value / return address (y-switching
component)
2-bit counter value
(y-switching component)
11
Newly added power models
  • Newly added power models are fully linked to
    SimpleScalar micro-architectural parameters.
  • The change of the u-arch parameters are fed to
    the power model generator automatically.
  • Providing optimized memory model for small size
    array.
  • BTB/TLB/RAS/branch predictors are small compared
    to large cache memory.
  • Extracting x-switching / y-switching activities
    from the simulator on the fly.
  • Providing all the branch predictor types
    supported by SimpleScalar
  • Bimodal
  • 2-level
  • Combination
  • RAS

12
Improved power analysis out fmt.
x switching power dissipation caused by input
switching or transition
y switching power dissipation depending on
output switching or transition
internal power dissipation independent of
in/output switching
leakage leakage power dissipation
pdissipation x switching y switching
internal leakage
peak peak pdissipation
aio.xswitching 720.2696 aio total x switching
power aio.avgxswitching 0.0511 aio average x
switching power aio.yswitching 4417.1984 aio
total y switching power aio.avgyswitching
0.3110 aio average y switching
power aio.internal 0.0000 aio total internal
power aio.avginternal 0.0000 aio average
internal power aio.leakage 0.1242 aio total
leakage power aio.avgleakage 0.0001 aio
average leakage power aio.pdissipation 5137.5922
aio total pdissipation aio.avgpdissipation
0.3621 aio average pdissipation aio.peak
2.2747 aio peak power
13
Interface with MILAN
  • PowerAnalyzer configuration parameters
  • fully compatible with SimpleScalar configuration
    parameters.
  • integrated with SimpleScalar architectural
    parameters.
  • command line options
  • configuration file

sim-panalyzer cacheil1 il1512321l
-panalyzeril1 il1a2331.84110.25
sim-panalyzer config sa-110.cfg
./microbench/dhrystone



l1 cache configuration cacheil1 il1512321l
sa-110.cfg
l1 cache power analyzer configuration panalyzer
il1 il12331.84110.25
14
SimpleScalar/ARM Platform Simulation
  • System-level simulation support for
    SimpleScalar/ARM
  • Adds device modeling to ARM modeling
    infrastructure
  • Permits execution of applications plus kernel and
    device drivers
  • Increases fidelity of embedded workload modeling
  • Key design goals
  • Implementation of a complete embedded device set
  • Flexible device configuration
  • Extensible device interface
  • Reproducible real-time experiments

SPEC, MiBench, etc On Linux, WinCE, etc
ARM7 ISA ARM FPA
Device Emulation
Power/Performance Model
Fetch
Pipeline
SA-1100/ XScale Core
Predictor
Caches
Simulation Kernel
Host Platform Interface
15
Device Modeling Infrastructure
  • Space manager
  • Orchestrates memory and I/O accesses
  • Devices register for address ranges
  • Simple to integrate new devices
  • Configuration manager
  • Allows multiple platform emulation without code
    changes
  • Via platform configuration files
  • Specifies device address and other functionality
    parameterization
  • iPAQ -gt LW through configuration file
  • I/O manager
  • Permits reproducible real-time experiments
  • Records external I/O to trace file with
    timestamps
  • Replay I/O to re-create experiment
  • Initial device set
  • SA-1110 pipeline and integrated devices
  • PCMCIA
  • Enables Bochs drivers
  • NE-2000 network interface

I-cache
IMMU
SA-1110 Integer Pipeline
D-cache
DMMU
FPA
RAM
PIC
Space Manager
RTC
Flash
DMA
PCMCIA
SER0
GPIO
I/O Mgr
Platform Config
implemented
in development
next generation
16
I/O Tracing Technology
Fast Functional Simulator
Today
Instrumented Device Models
External I/O Trace
Detailed Simulation w/ Deep Analysis
Live Hardware Execution
Initial State
I/O input events (w/time-stamps)
Instrumented Device Drivers
Future Applications
Test Synthesis Framework
  • Fully addresses the difficulties of creating
    reproducible real-time experiments with support
    for deep analysis
  • 1000x compression over industry-standard branch
    tracing
  • Re-creates of all program data values (unlike
    branch tracing)
  • External I/O events traced
  • DMA, memory-mapped I/O, external interrupts
  • All events time-stamped with processor cycle count

17
System Model Validation
  • System level validation approach
  • Driver scripts drive reference execution
  • Boot Linux, execute test programs
  • Run on IPaq H/W and Sim-IPaq
  • Differences in output drive debug
  • Challenging debug task
  • All bugs today required debugging execution
    problems within Linux kernel and device drivers,
    on Sim-IPaq model
  • Forced refinement of simulator debug
    infrastructure
  • Event tracking, tracing added
  • Support for memory write snooping
  • Dataflow trace mechanism implemented
  • Current status
  • 42M instructions into Linux boot
  • Bootloader fully functional
  • Kernel modules mmap, vmap, console, serial, intr,
    timer, rts, deflate, fpu, flash, and environ are
    functional
  • Currently attacking bugs in filesystem
  • Identified two Linux kernel timing bugs

Linux Shell Scripts
Sim-IPaq System Simulation Model
Program Output Kernel messages
!
Debug kernel on Sim-IPaq (using simulator debug
infrastructure)
18
EASE Embedded Architecture Simulation Engine
FU
FU
EX/ MEM
IF
ID
WB
FU
  • Fetch Insts
  • Place Insts in IFQ
  • Decode Insts
  • Schedule FUs
  • Send Ready Insts to EX in program order
  • Execute Insts
  • Schedule register writeback
  • Write result to RF
  • Free FUs
  • Validate result
  • Complete generality of SimpleScalar
    microarchitecture models is not needed for
    embedded processor modeling
  • Register renaming, dynamic scheduling, value
    speculation not used
  • Learning curve is too steep for embedded modeling
  • EASE implements a fast flexible embedded
    architecture
  • In-order pipeline, superscalar execution,
    co-processor interfaces, multi-level memory
    hierarchy, conventional DRAM architectures
  • Small model, less than 2500 lines of C code
  • Emulator computes latch data at all pipeline
    stages (micro-functional)
  • Writeback checker validates the results of all
    instructions simulated
  • Accurately models short SA-1110 pipeline, and
    longer XScale pipe

19
MiBench 2.0
  • Changes
  • Bug fixes / portability fixes
  • Modify input to be more realistic
  • Standardized random number generation
  • Remove dependencies on Linux
  • Additions
  • Move input files into code (reduce file I/O)
  • Move command line input into code (reduce
    dependency on shell environment)
  • Create global build/support environment
  • Generate precompiled binaries for several target
    systems
  • Add new benchmarks
  • Handwriting recognition
  • Network packet routing
  • New FFT and GSM Code

20
Recent Publications
  • Nam Sung Kim, Krisztián Flautner, David Blaauw,
    Trevor Mudge. Drowsy Instruction Caches Leakage
    Power Reduction using Dynamic Voltage Scaling and
    Cache Sub-bank Prediction. the 35th Int. Symp.
    Microarchitecture (MICRO 35), Istanbul , Nov.
    2002.
  • K. Flautner and T. Mudge. Vertigo Automatic
    Performance-Setting for Linux. Fifth Symposium on
    Operating Systems Design and Implementation
    (OSDI), Boston, Dec. 2002
  • S. Martin, K. Flautner, D. Blaauw, and T. Mudge.
    Dynamic voltage scaling and adaptive body biasing
    for optimal power consumption in microprocessors
    under dynamic workloads. ICCAD.
  • K. Flautner, S. Reinhardt, and T. Mudge.
    Automatic performance setting for dynamic voltage
    scaling. ACM Jour. Wireless Networks.
  • Flautner, N. Kim, S. Martin, D. Blaauw, T. Mudge.
    Drowsy Caches Simple techniques for reducing
    leakage power. Proc. of the 29th Ann. Int. Symp.
    on Computer Architecture, Anchorage Alaska, May
    2002, pp. 148-157.

21
Adding a DSP to the Platform
  • Texas Instruments TMS320C62xx and C64xx
  • Shared memory model
  • Heterogeneous multiprocessor
  • C62xx functional and tested
  • Early studies using Smart Camera workload and
    ETSI GSM voice codec
  • Comparison of C6xx with 4 issue ARM out-of-order
    processor

22
TMS320C6200
23
DSP vs. GPP
24
Vertigo Interactive application
Performance level
Vertigo
Time (s)
  • Performance-setting decisions during run of
    Acrobat Reader
  • Data collected on x86 based machine under Linux
  • Invention disclosure

25
Drowsy Caches
Extended to Instruction Caches Invention
disclosure
26
Implementation of the Drowsy Cache Line
27
ARM Partners Meeting
  • Summer 2002
  • Vertigo demo
  • Drowsy presentation
  • Negotiating IP sale

28
Fin
Write a Comment
User Comments (0)
About PowerShow.com