CMP Design Choices - PowerPoint PPT Presentation

About This Presentation
Title:

CMP Design Choices

Description:

16P sims used cache warmup files. 2P sims ran for more transactions ... Doubling the size of the cache reduces the miss rate by a factor of 1/2 ... – PowerPoint PPT presentation

Number of Views:45
Avg rating:3.0/5.0
Slides: 22
Provided by: mccl8
Category:
Tags: cmp | choices | design

less

Transcript and Presenter's Notes

Title: CMP Design Choices


1
CMP Design Choices
  • Finding Parameters that
  • Impact CMP Performance
  • Sam Koblenski and Peter McClone

2
Outline
  • Introduction
  • Assumptions
  • Plackett Burman Analysis
  • Simulation methods
  • Statistical Design
  • Plackett Burman Results
  • Mean Value Analysis
  • MVA Implementation
  • MVA Results
  • AMVA Implementation
  • AMVA Results
  • Complementary Results
  • Conclusions

3
Introduction
  • 2 part study
  • Design space is huge, how can we reduce it?
  • Method 1
  • Plackett Burman (PB) Analysis finds critical
    parameters
  • Design uses extreme values of parameters
  • Detailed architecture design can focus on a few
    parameters

4
Introduction (cont.)
  • Method 2
  • Mean Value Analysis Model of a CMP
  • Simply designed to compute throughput
  • Design choices can be narrowed down quickly
  • Intuition is gained and patterns/parameter
    relationships identified

5
Assumptions - PB Design
  • In-Order approximated as OoO with small window
  • Die Size 300 mm2 (16 MB Cache _at_ 65nm)
  • L2 Cache Size expanded to fill the die
  • Discrete sizes 4, 8, 12 MB
  • Associativity can be non-power-of-2
  • Core size measured in Cache Byte Equivalents

6
Simulation Methodology
  • Simics with Ruby Opal
  • 16P sims used cache warmup files
  • 2P sims ran for more transactions
  • Attempted OLTP and JBB benchmarks

7
Plackett Burman Design
  • Motivation
  • Narrow a huge design space
  • Minimize simulation runs (experiments)
  • Preliminaries
  • Performance Measure
  • Extreme Parameter Values
  • Number of Parameters (N lt 4Xn-1)

8
PB Design Example
9
PB Design Parameter Values
10
PB Results
  • Extreme Values stressed the simulator
  • Have not completed an entire set of runs, yet
  • Possibly necessary to build a custom L2 network
    for each run

11
PB Results for JBB
12
Assumptions - MVA
  • Distribution of time between memory requests is
    exponential
  • Processor cores exhibit the same average behavior
    with respect to their service times and miss
    rates.
  • Doubling the size of the cache reduces the miss
    rate by a factor of 1/v2
  • An inorder core takes approximately the same area
    as 50 KB of cache

13
MVA Design
  • Simple Closed Model

14
MVA Design
  • Two phases of this Model design
  • First Use the exact MVA equations
  • Use average time between memory access as an
    application parameter
  • Solve for throughput
  • Second Use Approximate MVA (AMVA)
  • Use an iterative method to converge on this
    service time
  • Solve for throughput 

15
Exact MVA
  • To solve for the MVA equations, we determine the
    mean residence time at all service centers
  • Rp processor/L1 residence time
  • RL2 L2 residence time
  • RM memory residence time.
  • The case with one core is trivial. Use this case
    to solve for additional cores
  • Rn,p Dp (1 Qn-1,p)

16
Exact MVA results
  • Using data from simulation runs throughput was
    calculated
  • Miss rates, number of memory requests
  • Results are erratic
  • Not consistent with simulation results
  • Source of the problem is most likely processor
    service time!

17
Approximate MVA Design
  • An iterative method can be used to converge on a
    service time
  • Uses total R as an input parameter
  • Iterative method works well with approximate MVA
  • Goal is to match total average residence time of
    a memory request

18
Approximate MVA Results
  • Convergence using the AMVA equations does not
    always occur
  • Total measured residence time cannot be reached
    with this model and parameter set.
  • Variation of input values without convergence
    implies flaws in the model structure
  • There is a complex relationship between the
    memory system and the rate at which a core issues
    requests that must be modeled 

19
Complementary Results
  • Initial goal to produce PB Results to find
    parameters to focus on for MVA Model
  • Results from both approaches could cross-verify
    correctness

20
Conclusions
  • Simics has a STEEP learning curve
  • lt5 weeks is not enough time for valid/any results
  • Refinement of a PB Design leads to long lead
    times on valid results
  • CMPs complicate the relationship between cores
    and memory subsystem
  • Design methodologies that focus simulation runs
    are necessary
  • More results and conclusions to follow

21
Questions
  • Questions?
Write a Comment
User Comments (0)
About PowerShow.com