To Include or Not to Include? - PowerPoint PPT Presentation

About This Presentation
Title:

To Include or Not to Include?

Description:

CMP technology affects coherence protocols differently than ... Need for scalability in design. Industry Examples. IBM Power 4 Inclusion. Piranha Exclusion ... – PowerPoint PPT presentation

Number of Views:261
Avg rating:3.0/5.0
Slides: 15
Provided by: enri1
Category:
Tags: include | piranha

less

Transcript and Presenter's Notes

Title: To Include or Not to Include?


1
To Include or Not to Include?
  • Natalie Enright
  • Dana Vantrease

2
Motivation
  • CMP technology affects coherence protocols
    differently than previously studied MP systems
  • New shared on-chip resources (e.g. L2)
  • Low latency between on-chip caches
  • Need for scalability in design
  • Industry Examples
  • IBM Power 4 Inclusion
  • Piranha Exclusion
  • Our goal Determine at which point, each
    inclusion protocol (strict inclusion,
    non-inclusion and exclusion) is the best choice
    for CMP performance.

3
SMP vs CMP Opportunities
L1
L1
L1
L1
VS
L2
L2
L2
L1
L1
L1
L1
VS
L2
L2
L2
4
Multilevel Inclusion
  • Protocol given to us with the simulator
  • L1 has Modified, Shared and Invalid States
  • L2 has Modified, Owned, Shared, and Invalid
    States
  • When an L2 line is replaced, any copies present
    on the chip must be invalidated (the sharers are
    given in the directory entry)
  • In a single processor chip, there are only 2
    caches (Instruction and Data) connected to a
    single L2 cache
  • Chip multiprocessors introduce an additional 2
    level 1 caches per processor which could make
    this forced inclusion harmful.

5
Non-Inclusion
  • Protocol courtesy of Mike
  • L1 now has owned and exclusion states
  • Complexity of the on chip directory has increased
    significantly
  • States added to indicate local level 1 sharers or
    a local level 1 owner.
  • L1 directory state also needs to be visible for
    external requests from other chips
  • Increase effective on-chip cache storage

6
Directory Exclusion
  • No replication of Data between a single L1 and
    the L2
  • L2 Acts as Large Victim Cache
  • Utilizes cache space, lowering required off-chip
    bandwidth
  • L2 is centralized coherency point (tag lookup)
  • L1 States M, E, I, SC, SM
  • L2 States M, E, I
  • No ownership simply request 1st Sharer in Tag
    Lookup for Data Request

7
Directory Exclusion
L1
L1
L1
L1
L1
L1
L1
L1
L2
L1 Tags
L2
L1 Tags
L1
L1
L1
L1
L2
L1 Tags
8
Tag Lookup Cache
  • Aids in off-chip coherency and directing on-chip
    requests
  • Associativity L1 associativity L1s
  • Sets Sets in a single L1
  • Data Entries L1s
  • Data Entry The L1 corresponding to the Data
    Entry has the data or not (1/0).
  • Scalability?

9
Methodology
  • Vary the L1 cache size to find the design point
    at which an inclusive protocol hurts performance.
  • As the number of cores increases, so does the
    aggregate L1 cache size

10
Simulation Configuration
  • Configuration
  • 4 processors per chip and 1 chip
  • 2 MB of L2 cache
  • Small but wanted to see the effect of changing
    the ratio of L1 size to L2 size.
  • 16 processors per chip as future work
  • Only simulated one chip to isolate the effects of
    intra-chip coherence from inter-chip coherence
  • Future work see how extending the life of a
    block on chip through non-inclusion or exclusion
    affects other chips.

11
Results
  • Inclusion vs. Non-Inclusion

12
Results (cont.)
  • Inclusion vs. Pseudo-Exclusion

13
Conclusion/Future Work
  • An inclusive protocol is less complex
  • Esp. considering inter-chip communication
  • Non-Inclusion performs consistently better than
    inclusion
  • Additional complexity only warranted after the
    total L1 cache size is greater than 25 of the L2
    cache size.
  • Longer runs and more benchmarks would provide
    more conclusive evidence

14
Future Work
  • Ongoing Get working exclusion protocol in Ruby
    tester and Simics.
  • Current Status Currently runs 500 memory
    transactions in the Ruby tester.
  • Run comparable tests to those run for
    Non-inclusion
  • Analyze benefits of exclusion over inclusion.
  • Expand to 16 cores and study scalability issues.
Write a Comment
User Comments (0)
About PowerShow.com