Shared Memory Multiprocessors Cache Coherence - PowerPoint PPT Presentation

1 / 27
About This Presentation
Title:

Shared Memory Multiprocessors Cache Coherence

Description:

Processors caches snoop to see if they have a copy and respond accordingly. ... Write-back: snoop in caches to find most recent copy ... – PowerPoint PPT presentation

Number of Views:79
Avg rating:3.0/5.0
Slides: 28
Provided by: Surf6
Category:

less

Transcript and Presenter's Notes

Title: Shared Memory Multiprocessors Cache Coherence


1
Shared Memory MultiprocessorsCache Coherence
2
SMP hardware organization
3
  • SMP systems support shared memory abstraction
    all processors see the whole memory and can
    perform memory operations on all memory
    locations.
  • Two key issues in such an architecture
  • Cache coherence
  • Memory consistency model formal specification of
    memory semantics
  • Why is this non-trivial?
  • The model affects many hardware and software
    optimization techniques.
  • Cache coherence is a part that defines the
    consistency model.

4
Cache coherence problem
  • Due to the cache copies of the memory, different
    processors may see the different values of the
    same memory location.
  • Processors see different values for u after
    event 3.
  • With a write-back cache, memory may store the
    stale date.
  • This happens frequently and is unacceptable to
    applications.

5
Bus Snoopy Cache Coherence protocols
  • Memory centralized with uniform access time and
    bus interconnect.
  • Example All Intel MP machines like diablo

6
Bus Snooping idea
  • When necessary, send requests to all processors
    (and caches)
  • Processorscaches snoop to see if they have a
    copy and respond accordingly.
  • Cache listens to both CPU and BUS.
  • The state of a cache line may change by (1) CPU
    memory operation, and (2) bus transaction (remote
    CPUs memory operation).
  • Requires broadcast since caching information may
    be at all processors.
  • Bus is a natural broadcast medium.
  • Bus (centralized medium) also serializes
    requests.
  • Bus snoopy cache coherence protocols dominate
    small scale machines.

7
Types of snoopy bus protocols
  • Write invalidate protocols
  • Write to shared data an invalidate is sent to
    all caches which snoop and invalidate copies.
  • Read miss
  • Write-through memory is always up-to-date
  • Write-back snoop in caches to find most recent
    copy
  • Write broadcast protocols (typically write
    through)
  • Write to shared data broadcast on bus,
    processors snoop and update any copies.
  • Read miss memory is always up to date.

8
An Example Snoopy Protocol (MSI)
  • Invalidation protocol, write-back cache
  • Each block of memory is in one state
  • Clean in all caches and up-to-date in memory
    (shared)
  • Dirty in exactly one cache (exclusive)
  • Not in any cache
  • Each cache block is in one state
  • Shared block can be read
  • Exclusive cache has only copy, its writable and
    dirty
  • Invalid block contains no data.
  • Read misses cause all caches to snoop bus
  • Write to a shared block is treated as misses
    (needs bus transaction).

9
MSI protocol state machine for CPU requests
10
MSI protocol state machine for Bus requests
11
MSI protocol state machine (combined)
12
(No Transcript)
13
(No Transcript)
14
(No Transcript)
15
(No Transcript)
16
(No Transcript)
17
(No Transcript)
18
Some snooping cache variations
  • Basic Protocol
  • Three states MSI.
  • Can optimize by refining the states so as to
    reduce the transactions in some cases.
  • Berkeley protocol
  • Five states, M ? owned, exclusive, owned shared.
  • Illinois protocols (five states)
  • MESI protocol (four states)
  • M ? modified and Exclusive.
  • Used by Intel MP systems.

19
Multiple levels of caches
  • Most processors today have on-chip L1 and L2
    caches.
  • Transactions on L1 cache are not visible to bus
    (needs separate snooper for coherence, which
    would be expensive).
  • Typical solution
  • Maintain inclusion property on L1 and L2 cache so
    that all bus transactions that are relevant to L1
    are also relevant to L2 sufficient to only use
    the L2 controller to snoop the bus.
  • Propagating transactions for coherence in the
    hierarchy.

20
Large share memory multiprocessors
  • The interconnection network is usually not a
    bus.
  • No broadcast medium ? cannot snoop.
  • Needs a different kind of cache coherence
    protocol.

21
Cache coherence for large SMPs
  • Use a directory for each cache line to track the
    state of every block in the cache.
  • Can also track the state for all memory blocks ?
    directory size O(memory size).
  • Need to used distributed directory
  • Centralized directory becomes the bottleneck.
  • Typically called cc-NUMA mulriprocessors

22
ccNUMA multiprocessors
The directory in the home node stores the
cache information (who has the line) in the whole
system.
23
Directory based cache coherence protocols
  • States of cache lines similar to snoopy
    protocol, three states
  • Shared gt 1 processors have the data, memory
    up-to-date
  • Uncached not valid in any cache
  • Exclusive 1 processor has data, memory
    out-of-date
  • Directory must track
  • Cache state
  • Which processors have data when it is in shared
    state
  • Bit vector, 1 if a particular processor has a
    copy
  • Id and bit vector combination
  • Keep it simple
  • Writes to non-exclusive data ? write miss
  • Processor blocks until access completes
  • Assume messages received and acted upon in the
    order of send

24
Directory based cache coherence protocols
  • No bus and do not want to broadcast
  • Typically 3 processors involved
  • Local node where a request originates
  • Home node where the memory location of an address
    resides
  • Remote node has a copy a cache block (exclusive
    or shared)

25
Directory protocol messages example
26
An example
  • Let variable u be located in p2.
  • The caches of p3, p4, p5 have shared cache copies
    of u.
  • Which directory stores what cache information?
  • What should happen when p1 write a new value to
    u?

27
An example
  • What happens when p1 write a new value to u?
  • p1 finds out u is located in p2 and sends
    WriteMiss(P2, u) to p2.
  • p2 checks its directory and sees that p3, p4, and
    p5 have shared copies
  • P2 sends invalidate(p3, u), invalidate(p4, u),
    and invalidate(p5, u) to p3, p4, and p5
  • p2 changes the directory entry for u to be p1
    exclusive.
  • p2 returns data to p1 datareply(p1, u)
  • p1 updates the caches and returns from the write
    operation.
Write a Comment
User Comments (0)
About PowerShow.com