ECE 510 Brendan Crowley - PowerPoint PPT Presentation

About This Presentation
Title:

ECE 510 Brendan Crowley

Description:

ECE 510 Brendan Crowley ... phase-locked-loop circuitry and pins ... This paper only reports results for one benchmark application Multiple cores/threads ... – PowerPoint PPT presentation

Number of Views:92
Avg rating:3.0/5.0
Slides: 24
Provided by: Joc131
Category:

less

Transcript and Presenter's Notes

Title: ECE 510 Brendan Crowley


1
ECE 510Brendan Crowley
  • Paper Review
  • October 31, 2006

2
Processor Power Reduction Via Single-ISA
Heterogeneous Multi-Core Architectures
  • Rakesh Kumar, Keith Farkas, Norman P. Jouppi,
    Partha Ranganathan, Dean M. Tullsen

3
Presentation Overview
  • Introduction
  • The Architecture
  • Modeling the Architecture
  • Results
  • Critical Analysis / Conclusion

4
Introduction
  • Background
  • Processors continue to have increased speed and
    transistor count as transistor sizes decrease
  • This leads to increased power consumption which
    causes problems
  • Heat dissipation
  • Chip failure
  • Battery life
  • Designers are always searching for new ways to
    decrease power consumption

5
Introduction (2)
  • Most work on reducing power consumption falls
    under one of two categories
  • Voltage and frequency scaling
  • Gating the ability to turn on/off portions of
    the core
  • Some designs have included the use of multiple
    identical (homogeneous) cores
  • Others have included processors with
    co-processors that run a different instruction
    set

6
Introduction (3)
  • The Main Idea
  • Different software applications have different
    resource requirements
  • This fact leads the authors to believe that core
    diversity is of greater value than uniformity
  • Therefore, proposed design is a single-ISA
    heterogeneous multi-core architecture
  • Each core runs the same instruction set, but has
    different abilities and performance
    characteristics

7
The Architecture
  • One method is to take a family of previously
    designed cores, modify their interfaces, and
    combine them on one die
  • Each core executes same instruction set, but
    contains different resources, and therefore
    achieves different performance and energy
    efficiency on the same application

8
The Architecture (2)
  • The operating system determines the applications
    requirements and decides which core is best to
    use (which core will be the most energy
    efficient)
  • To accommodate a wide variety of applications,
    the cores should have a wide range of
    performances

9
The Architecture (3)
  • Authors chose a 5-core design, using existing
    cores with a few changes
  • Hypothetical single-threaded version of the EV8
    (Alpha 21464), which they call the EV8-
  • MIPS R4700
  • EV4 (Alpha 21064)
  • EV5 (Alpha 21164)
  • EV6 (Alpha 21264)

10
The Architecture (4)
  • Assumptions
  • Each core has a private L1 data and instruction
    cache
  • All cores share an L2 cache, phase-locked-loop
    circuitry and pins
  • Implemented in 0.10 micron technology
  • One application running at a time (one thread
    running)

11
The Architecture (5)
  • Relative core sizes

12
The Architecture (6)
  • Different parts of a program may require
    different resources
  • To take full advantage of the core diversity it
    is necessary to switch between cores in the
    middle of program execution
  • This is done at operating system timeslice
    intervals, with user-state already saved to
    memory
  • If the OS decides to switch cores, the data is
    saved to the shared L2 cache, where the next core
    can retrieve it

13
The Architecture (7)
  • The authors assume the unused cores are powered
    down to avoid static leakage and dynamic
    switching power
  • This means time must be spent powering up the
    cores
  • Experimental results show that this doesnt
    affect performance when core-switching is done at
    OS timer intervals, even with pessimistic
    assumptions about power-up time and software
    overhead

14
Modeling the Architecture
  • Data on the EV8 was based on some predictions and
    reported data
  • Data on the other cores was from published
    literature
  • Assume all of the alpha cores run at 2.1GHz
    (since they assume 0.10 micron process), and the
    R4700 runs at 1GHz

15
Modeling the Architecture (2)
  • All architectures were modeled as accurately as
    possible on a highly detailed instruction-level
    simulator, using the configurations in the table
    below

16
Modeling the Architecture (3)
  • The table below shows the area and peak power
    statistics of the cores
  • Areas were found from die photos
  • Total Die area is approximately 400mm2

17
Modeling the Architecture (4)
  • Benchmark execution simulated using SMTSIM
  • Simulator was modified to simulate a multi-core
    processor with a shared L2 cache
  • Assume a single thread running on one core at a
    time
  • Switching cores requires the active cores
    pipeline to be flushed and writing back the L1
    cache lines to the L2 cache

18
Results
  • The following figure shows results for the SPEC
    application applu
  • The Y-axis, IPS2/W, is basically the inverse of
    power-delay product
  • Constraint
  • Never choose a core that sacrifices more than 50
    performance relative to EV8- over an interval

19
Results (2)
20
Results (3)
  • Compared to a single-core architecture, this
    design could ideally reduce the PDP by 74
  • Combination of 25 performance loss and 81
    energy savings
  • Could change the constraint to achieve greater
    PDP savings (sacrificing performance, of course)
  • Another design point gives 36 energy savings
    with 4 performance loss

21
Results (4)
  • Could optimize other metrics besides PDP,
    depending on the design goals
  • Different power and performance tradeoffs can be
    made simply by changing the core switching
    algorithm (no need to change the hardware)

22
Critical Analysis / Conclusion
  • There are a lot of assumptions made about things
    like frequency scaling, power consumption of
    cores, etc.
  • This paper only reports results for one benchmark
    application
  • Multiple cores/threads running at the same time
    would likely be used in practice
  • How would this affect the core switching
    complexity and latency

23
Critical Analysis / Conclusion (2)
  • This technique seems like a very good one
  • Homogeneous multi-core chips are already on the
    market
  • Potential for significant energy savings
Write a Comment
User Comments (0)
About PowerShow.com