Cache coherence for CMPs - PowerPoint PPT Presentation

About This Presentation
Title:

Cache coherence for CMPs

Description:

Cache coherence for CMPs Miodrag Bolic Private cache Each cache bank is private to a particular core Cache coherence is maintained at the L2 cache level Intel ... – PowerPoint PPT presentation

Number of Views:78
Avg rating:3.0/5.0
Slides: 13
Provided by: mbolic
Category:

less

Transcript and Presenter's Notes

Title: Cache coherence for CMPs


1
Cache coherence for CMPs
  • Miodrag Bolic

2
Private cache
  • Each cache bank is private to a particular core
  • Cache coherence is maintained at the L2 cache
    level
  • Intel Montecito 81, AMD Opteron 56, or IBM
    POWER6 63

3
Private cache
  • Advantages
  • Disadvantages
  • Short L2 cache access latency
  • Small amount of network traffic generated Since
    the local L2 cache bank can filter most of the
    memory requests, the number of coherence messages
    injected into the interconnection network is
    limited.
  • Data blocks can get duplicated
  • if the working set accessed by the different
    cores is not well-balanced, some caches can be
    over-utilized whilst others can be under-utilized

4
Shared cache
  • Cache coherence is maintained at the L1 level
  • Bits usually chosen for the mapping to a
    particular bank are the less significant ones
  • Piranha 16, Hydra 47, Sun UltraSPARC T2 105
    and Intel Merom 104

5
Shared caches
  • Advantage
  • Disadvantages
  • Single copy of blocks
  • Workload balancing Since the utilization of each
    cache bank does not depend on the working set
    accessed by each core, but they are uniformly
    distributed among cache banks in a round-robin
    fashion, the aggregate cache capacity is
    augmented.
  • Many requests will be will be serviced by remote
    banks (L2 NUCA architecture)

6
Hammer protocol
  • AMD - Opteron systems
  • It relies on broadcasting requests to all tiles
    to solve cache misses
  • It targets systems that use unordered
    point-to-point interconnection networks
  • On every cache miss, Hammer sends a request to
    the home tile. If the memory block is present
    on-chip, the request is forwarded to the rest of
    tiles to obtain the requested block
  • All tiles answer to the forwarded request by
    sending either an acknowledgement or the data
    message to the requesting core.
  • The requesting core needs
  • to wait until it receives the response from each
    other tile. When the requester receives all the
    responses, it sends an unblock message to the
    home tile.

7
Hammer protocol
  • Disadvantages
  • Requires three hops in the critical path before
    the requested data block is obtained.
  • Broadcasting invalidation messages increases
    considerably the traffic injected into the
    interconnection network and, therefore, its power
    consumption.

8
Directory protocol
  • In order to accelerate cache misses, this
    directory information is not stored in main
    memory. Instead, it is usually stored on-chip at
    the home tile of each block.
  • In tiled CMPs, the directory structure is split
    into banks which are distributed across the
    tiles.
  • Each directory bank tracks a particular range of
    memory blocks.

9
Directory protocol
  • The indirection problem
  • every cache miss must reach the home tile before
    any coherence action can be performed.
  • adds unnecessary hops into the critical path of
    the cache misses
  • The directory memory overhead to keep the track
    of sharers for each memory block could be
    intolerable for large-scale configurations.
  • Example block size 16 bytes, 64 tiles

10
Comparison of protocols
11
Interleaving
12
Mapping between cache entries and directory
entries
  • One way to keep constant the size of the
    directory entries is storing duplicate tags.
Write a Comment
User Comments (0)
About PowerShow.com