Shared Memory Multiprocessors Cache Coherence - PowerPoint PPT Presentation

1 / 27

About This Presentation

Title:

Shared Memory Multiprocessors Cache Coherence

Description:

Processors caches snoop to see if they have a copy and respond accordingly. ... Write-back: snoop in caches to find most recent copy ... – PowerPoint PPT presentation

Number of Views:79

Avg rating:3.0/5.0

Slides: 28

Provided by: Surf6

Category:

more less

Transcript and Presenter's Notes

Title: Shared Memory Multiprocessors Cache Coherence

1
Shared Memory MultiprocessorsCache Coherence
2
SMP hardware organization
3

SMP systems support shared memory abstraction
all processors see the whole memory and can
perform memory operations on all memory
locations.
Two key issues in such an architecture
Cache coherence
Memory consistency model formal specification of
memory semantics
Why is this non-trivial?
The model affects many hardware and software
optimization techniques.
Cache coherence is a part that defines the
consistency model.

4
Cache coherence problem

Due to the cache copies of the memory, different
processors may see the different values of the
same memory location.
Processors see different values for u after
event 3.
With a write-back cache, memory may store the
stale date.
This happens frequently and is unacceptable to
applications.

5
Bus Snoopy Cache Coherence protocols

Memory centralized with uniform access time and
bus interconnect.
Example All Intel MP machines like diablo

6
Bus Snooping idea

When necessary, send requests to all processors
(and caches)
Processorscaches snoop to see if they have a
copy and respond accordingly.
Cache listens to both CPU and BUS.
The state of a cache line may change by (1) CPU
memory operation, and (2) bus transaction (remote
CPUs memory operation).
Requires broadcast since caching information may
be at all processors.
Bus is a natural broadcast medium.
Bus (centralized medium) also serializes
requests.
Bus snoopy cache coherence protocols dominate
small scale machines.

7
Types of snoopy bus protocols

Write invalidate protocols
Write to shared data an invalidate is sent to
all caches which snoop and invalidate copies.
Read miss
Write-through memory is always up-to-date
Write-back snoop in caches to find most recent
copy
Write broadcast protocols (typically write
through)
Write to shared data broadcast on bus,
processors snoop and update any copies.
Read miss memory is always up to date.

8
An Example Snoopy Protocol (MSI)

Invalidation protocol, write-back cache
Each block of memory is in one state
Clean in all caches and up-to-date in memory
(shared)
Dirty in exactly one cache (exclusive)
Not in any cache
Each cache block is in one state
Shared block can be read
Exclusive cache has only copy, its writable and
dirty
Invalid block contains no data.
Read misses cause all caches to snoop bus
Write to a shared block is treated as misses
(needs bus transaction).

9
MSI protocol state machine for CPU requests
10
MSI protocol state machine for Bus requests
11
MSI protocol state machine (combined)
12
(No Transcript)
13
(No Transcript)
14
(No Transcript)
15
(No Transcript)
16
(No Transcript)
17
(No Transcript)
18
Some snooping cache variations

Basic Protocol
Three states MSI.
Can optimize by refining the states so as to
reduce the transactions in some cases.
Berkeley protocol
Five states, M ? owned, exclusive, owned shared.
Illinois protocols (five states)
MESI protocol (four states)
M ? modified and Exclusive.
Used by Intel MP systems.

19
Multiple levels of caches

Most processors today have on-chip L1 and L2
caches.
Transactions on L1 cache are not visible to bus
(needs separate snooper for coherence, which
would be expensive).
Typical solution
Maintain inclusion property on L1 and L2 cache so
that all bus transactions that are relevant to L1
are also relevant to L2 sufficient to only use
the L2 controller to snoop the bus.
Propagating transactions for coherence in the
hierarchy.

20
Large share memory multiprocessors

The interconnection network is usually not a
bus.
No broadcast medium ? cannot snoop.
Needs a different kind of cache coherence
protocol.

21
Cache coherence for large SMPs

Use a directory for each cache line to track the
state of every block in the cache.
Can also track the state for all memory blocks ?
directory size O(memory size).
Need to used distributed directory
Centralized directory becomes the bottleneck.
Typically called cc-NUMA mulriprocessors

22
ccNUMA multiprocessors
The directory in the home node stores the
cache information (who has the line) in the whole
system.
23
Directory based cache coherence protocols

States of cache lines similar to snoopy
protocol, three states
Shared gt 1 processors have the data, memory
up-to-date
Uncached not valid in any cache
Exclusive 1 processor has data, memory
out-of-date
Directory must track
Cache state
Which processors have data when it is in shared
state
Bit vector, 1 if a particular processor has a
copy
Id and bit vector combination
Keep it simple
Writes to non-exclusive data ? write miss
Processor blocks until access completes
Assume messages received and acted upon in the
order of send

24
Directory based cache coherence protocols

No bus and do not want to broadcast
Typically 3 processors involved
Local node where a request originates
Home node where the memory location of an address
resides
Remote node has a copy a cache block (exclusive
or shared)

25
Directory protocol messages example
26
An example

Let variable u be located in p2.
The caches of p3, p4, p5 have shared cache copies
of u.
Which directory stores what cache information?
What should happen when p1 write a new value to
u?

27
An example

What happens when p1 write a new value to u?
p1 finds out u is located in p2 and sends
WriteMiss(P2, u) to p2.
p2 checks its directory and sees that p3, p4, and
p5 have shared copies
P2 sends invalidate(p3, u), invalidate(p4, u),
and invalidate(p5, u) to p3, p4, and p5
p2 changes the directory entry for u to be p1
exclusive.
p2 returns data to p1 datareply(p1, u)
p1 updates the caches and returns from the write
operation.