Lecture 19: Coherence Protocols - PowerPoint PPT Presentation

About This Presentation

Title:

Lecture 19: Coherence Protocols

Description:

SEND(mydiff, 1, 0, DIFF); RECEIVE(done, 1, 0, DONE); else. for i 1 to nprocs-1 do ... C: Wr X I I E. B: Rd X I S S. A: Rd X S S S. A: Rd Y S (Y) S (X) S (X) B: ... – PowerPoint PPT presentation

Number of Views:66

Avg rating:3.0/5.0

Slides: 23

Provided by: RajeevBala4

Learn more at: https://my.eng.utah.edu

Category:

more less

Transcript and Presenter's Notes

Title: Lecture 19: Coherence Protocols

1
Lecture 19 Coherence Protocols

Topics coherence protocols for symmetric and
distributed
shared-memory multiprocessors (Sections
6.3-6.5)

2
SMPs or Centralized Shared-Memory
Processor
Processor
Processor
Processor
Caches
Caches
Caches
Caches
Main Memory
I/O System
3
Distributed Memory Multiprocessors
Processor Caches
Processor Caches
Processor Caches
Processor Caches
Memory
I/O
Memory
I/O
Memory
I/O
Memory
I/O
Interconnection network
4
Shared-Memory Vs. Message-Passing

Shared-memory
Well-understood programming model
Communication is implicit and hardware handles
protection
Hardware-controlled caching
Message-passing
No cache coherence ? simpler hardware
Explicit communication ? easier for the
programmer to
restructure code
Sender can initiate data transfer

5
Ocean Kernel
Procedure Solve(A) begin diff done 0
while (!done) do diff 0 for i ? 1
to n do for j ? 1 to n do
temp Ai,j Ai,j ? 0.2 (Ai,j
neighbors) diff abs(Ai,j
temp) end for end for if
(diff lt TOL) then done 1 end while end
procedure
6
Shared Address Space Model
procedure Solve(A) int i, j, pid, done0
float temp, mydiff0 int mymin 1 (pid
n/procs) int mymax mymin n/nprocs -1
while (!done) do mydiff diff 0
BARRIER(bar1,nprocs) for i ? mymin to
mymax for j ? 1 to n do
endfor endfor
LOCK(diff_lock) diff mydiff
UNLOCK(diff_lock) BARRIER (bar1,
nprocs) if (diff lt TOL) then done 1
BARRIER (bar1, nprocs) endwhile
int n, nprocs float A, diff LOCKDEC(diff_loc
k) BARDEC(bar1) main() begin read(n)
read(nprocs) A ? G_MALLOC() initialize
(A) CREATE (nprocs,Solve,A) WAIT_FOR_END
(nprocs) end main
7
Message Passing Model
main() read(n) read(nprocs) CREATE
(nprocs-1, Solve) Solve() WAIT_FOR_END
(nprocs-1) procedure Solve() int i, j, pid,
nn n/nprocs, done0 float temp, tempdiff,
mydiff 0 myA ? malloc()
initialize(myA) while (!done) do
mydiff 0 if (pid ! 0)
SEND(myA1,0, n, pid-1, ROW) if (pid !
nprocs-1) SEND(myAnn,0, n, pid1,
ROW) if (pid ! 0)
RECEIVE(myA0,0, n, pid-1, ROW) if (pid
! nprocs-1) RECEIVE(myAnn1,0, n,
pid1, ROW)
for i ? 1 to nn do for j ? 1 to
n do endfor
endfor if (pid ! 0) SEND(mydiff,
1, 0, DIFF) RECEIVE(done, 1, 0, DONE)
else for i ? 1 to nprocs-1 do
RECEIVE(tempdiff, 1, , DIFF)
mydiff tempdiff endfor if
(mydiff lt TOL) done 1 for i ? 1 to
nprocs-1 do SEND(done, 1, I, DONE)
endfor endif endwhile
8
Coherence Protocols

Two conditions for cache coherence
write propagation
write serialization
Cache coherence protocols
snooping
directory-based
write-update
write-invalidate

9
SMP Example
A Rd X B Rd X C Rd X A Wr X A Wr X C
Wr X B Rd X A Rd X A Rd Y B Wr X B Rd
Y B Wr X B Wr Y
Processor A
Processor B
Processor C
Processor D
Caches
Caches
Caches
Caches
Main Memory
I/O System
10
SMP Example
A B C
A Rd X B Rd X C Rd
X A Wr X A Wr X
C Wr X B Rd X
A Rd X A Rd Y B Wr X B Rd Y B Wr
X B Wr Y
11
SMP Example
A B C
A Rd X S B Rd X S
S C Rd X S
S S A Wr X
E I I A
Wr X E I
I C Wr X I
I E B Rd X
I S S A
Rd X S S
S A Rd Y S (Y)
S (X) S (X) B Wr X S (Y)
E (X) I B Rd Y
S (Y) S (Y) I B Wr
X S (Y) E (X)
I B Wr Y I E
(Y) I
12
Example Protocol
13
Performance Improvements

What determines performance on a multiprocessor
What fraction of the program is parallelizable?
How does memory hierarchy performance change?
New form of cache miss coherence miss such a
miss
would not have happened if another processor
did not
write to the same cache line
False coherence miss the second processor
writes to a
different word in the same cache line this
miss would
not have happened if the line size equaled one
word

14
How do Cache Misses Scale?
15
Simplifying Assumptions

All transactions on a read or write are atomic
on a write
miss, the miss is sent on the bus, a block is
fetched from
memory/remote cache, and the block is marked
exclusive
Potential problem if the actions are non-atomic
P1 sends
a write miss on the bus, P2 sends a write miss
on the bus
since the block is still invalid in P1, P2 does
not realize that
it should write after receiving the block from
P1 instead, it
receives the block from memory
Most problems are fixable by keeping track of
more state
for example, dont acquire the bus unless all
outstanding
transactions for the block have completed

16
Coherence in Distributed Memory Multiprocs

Distributed memory systems are typically larger
?
bus-based snooping may not work well
Option 1 software-based mechanisms
message-passing
systems or software-controlled cache coherence
Option 2 hardware-based mechanisms
directory-based
cache coherence

17
Directory-Based Cache Coherence

The physical memory is distributed among all
processors
The directory is also distributed along with the
corresponding memory
The physical address is enough to determine the
location
of memory
The (many) processing nodes are connected with a
scalable interconnect (not a bus) hence,
messages
are no longer broadcast, but routed from sender
to
receiver since the processing nodes can no
longer
snoop, the directory keeps track of sharing
state

18
Distributed Memory Multiprocessors
Processor Caches
Processor Caches
Processor Caches
Processor Caches
Memory
I/O
Memory
I/O
Memory
I/O
Memory
I/O
Directory
Directory
Directory
Directory
Interconnection network
19
Cache Block States

What are the different states a block of memory
can have
within the directory?
Note that we need information for each cache so
that
invalidate messages can be sent
The block state is also stored in the cache for
efficiency
The directory now serves as the arbitrator if
multiple
write attempts happen simultaneously, the
directory
determines the ordering

20
Directory-Based Example
A Rd X B Rd X C Rd X A Wr X A Wr X C
Wr X B Rd X A Rd X A Rd Y B Wr X B Rd
Y B Wr X B Wr Y
Processor Caches
Processor Caches
Processor Caches
Memory
I/O
Memory
I/O
Memory
I/O
Directory
Directory X
Directory Y
Interconnection network
21
Directory Actions