Lecture 19: Coherence Protocols - PowerPoint PPT Presentation

About This Presentation
Title:

Lecture 19: Coherence Protocols

Description:

SEND(mydiff, 1, 0, DIFF); RECEIVE(done, 1, 0, DONE); else. for i 1 to nprocs-1 do ... C: Wr X I I E. B: Rd X I S S. A: Rd X S S S. A: Rd Y S (Y) S (X) S (X) B: ... – PowerPoint PPT presentation

Number of Views:66
Avg rating:3.0/5.0
Slides: 23
Provided by: RajeevBala4
Learn more at: https://my.eng.utah.edu
Category:

less

Transcript and Presenter's Notes

Title: Lecture 19: Coherence Protocols


1
Lecture 19 Coherence Protocols
  • Topics coherence protocols for symmetric and
    distributed
  • shared-memory multiprocessors (Sections
    6.3-6.5)

2
SMPs or Centralized Shared-Memory
Processor
Processor
Processor
Processor
Caches
Caches
Caches
Caches
Main Memory
I/O System
3
Distributed Memory Multiprocessors
Processor Caches
Processor Caches
Processor Caches
Processor Caches
Memory
I/O
Memory
I/O
Memory
I/O
Memory
I/O
Interconnection network
4
Shared-Memory Vs. Message-Passing
  • Shared-memory
  • Well-understood programming model
  • Communication is implicit and hardware handles
    protection
  • Hardware-controlled caching
  • Message-passing
  • No cache coherence ? simpler hardware
  • Explicit communication ? easier for the
    programmer to
  • restructure code
  • Sender can initiate data transfer

5
Ocean Kernel
Procedure Solve(A) begin diff done 0
while (!done) do diff 0 for i ? 1
to n do for j ? 1 to n do
temp Ai,j Ai,j ? 0.2 (Ai,j
neighbors) diff abs(Ai,j
temp) end for end for if
(diff lt TOL) then done 1 end while end
procedure
6
Shared Address Space Model
procedure Solve(A) int i, j, pid, done0
float temp, mydiff0 int mymin 1 (pid
n/procs) int mymax mymin n/nprocs -1
while (!done) do mydiff diff 0
BARRIER(bar1,nprocs) for i ? mymin to
mymax for j ? 1 to n do
endfor endfor
LOCK(diff_lock) diff mydiff
UNLOCK(diff_lock) BARRIER (bar1,
nprocs) if (diff lt TOL) then done 1
BARRIER (bar1, nprocs) endwhile
int n, nprocs float A, diff LOCKDEC(diff_loc
k) BARDEC(bar1) main() begin read(n)
read(nprocs) A ? G_MALLOC() initialize
(A) CREATE (nprocs,Solve,A) WAIT_FOR_END
(nprocs) end main
7
Message Passing Model
main() read(n) read(nprocs) CREATE
(nprocs-1, Solve) Solve() WAIT_FOR_END
(nprocs-1) procedure Solve() int i, j, pid,
nn n/nprocs, done0 float temp, tempdiff,
mydiff 0 myA ? malloc()
initialize(myA) while (!done) do
mydiff 0 if (pid ! 0)
SEND(myA1,0, n, pid-1, ROW) if (pid !
nprocs-1) SEND(myAnn,0, n, pid1,
ROW) if (pid ! 0)
RECEIVE(myA0,0, n, pid-1, ROW) if (pid
! nprocs-1) RECEIVE(myAnn1,0, n,
pid1, ROW)
for i ? 1 to nn do for j ? 1 to
n do endfor
endfor if (pid ! 0) SEND(mydiff,
1, 0, DIFF) RECEIVE(done, 1, 0, DONE)
else for i ? 1 to nprocs-1 do
RECEIVE(tempdiff, 1, , DIFF)
mydiff tempdiff endfor if
(mydiff lt TOL) done 1 for i ? 1 to
nprocs-1 do SEND(done, 1, I, DONE)
endfor endif endwhile
8
Coherence Protocols
  • Two conditions for cache coherence
  • write propagation
  • write serialization
  • Cache coherence protocols
  • snooping
  • directory-based
  • write-update
  • write-invalidate

9
SMP Example
A Rd X B Rd X C Rd X A Wr X A Wr X C
Wr X B Rd X A Rd X A Rd Y B Wr X B Rd
Y B Wr X B Wr Y
Processor A
Processor B
Processor C
Processor D
Caches
Caches
Caches
Caches
Main Memory
I/O System
10
SMP Example
A B C
A Rd X B Rd X C Rd
X A Wr X A Wr X
C Wr X B Rd X
A Rd X A Rd Y B Wr X B Rd Y B Wr
X B Wr Y
11
SMP Example
A B C
A Rd X S B Rd X S
S C Rd X S
S S A Wr X
E I I A
Wr X E I
I C Wr X I
I E B Rd X
I S S A
Rd X S S
S A Rd Y S (Y)
S (X) S (X) B Wr X S (Y)
E (X) I B Rd Y
S (Y) S (Y) I B Wr
X S (Y) E (X)
I B Wr Y I E
(Y) I
12
Example Protocol
13
Performance Improvements
  • What determines performance on a multiprocessor
  • What fraction of the program is parallelizable?
  • How does memory hierarchy performance change?
  • New form of cache miss coherence miss such a
    miss
  • would not have happened if another processor
    did not
  • write to the same cache line
  • False coherence miss the second processor
    writes to a
  • different word in the same cache line this
    miss would
  • not have happened if the line size equaled one
    word

14
How do Cache Misses Scale?
15
Simplifying Assumptions
  • All transactions on a read or write are atomic
    on a write
  • miss, the miss is sent on the bus, a block is
    fetched from
  • memory/remote cache, and the block is marked
    exclusive
  • Potential problem if the actions are non-atomic
    P1 sends
  • a write miss on the bus, P2 sends a write miss
    on the bus
  • since the block is still invalid in P1, P2 does
    not realize that
  • it should write after receiving the block from
    P1 instead, it
  • receives the block from memory
  • Most problems are fixable by keeping track of
    more state
  • for example, dont acquire the bus unless all
    outstanding
  • transactions for the block have completed

16
Coherence in Distributed Memory Multiprocs
  • Distributed memory systems are typically larger
    ?
  • bus-based snooping may not work well
  • Option 1 software-based mechanisms
    message-passing
  • systems or software-controlled cache coherence
  • Option 2 hardware-based mechanisms
    directory-based
  • cache coherence

17
Directory-Based Cache Coherence
  • The physical memory is distributed among all
    processors
  • The directory is also distributed along with the
  • corresponding memory
  • The physical address is enough to determine the
    location
  • of memory
  • The (many) processing nodes are connected with a
  • scalable interconnect (not a bus) hence,
    messages
  • are no longer broadcast, but routed from sender
    to
  • receiver since the processing nodes can no
    longer
  • snoop, the directory keeps track of sharing
    state

18
Distributed Memory Multiprocessors
Processor Caches
Processor Caches
Processor Caches
Processor Caches
Memory
I/O
Memory
I/O
Memory
I/O
Memory
I/O
Directory
Directory
Directory
Directory
Interconnection network
19
Cache Block States
  • What are the different states a block of memory
    can have
  • within the directory?
  • Note that we need information for each cache so
    that
  • invalidate messages can be sent
  • The block state is also stored in the cache for
    efficiency
  • The directory now serves as the arbitrator if
    multiple
  • write attempts happen simultaneously, the
    directory
  • determines the ordering

20
Directory-Based Example
A Rd X B Rd X C Rd X A Wr X A Wr X C
Wr X B Rd X A Rd X A Rd Y B Wr X B Rd
Y B Wr X B Wr Y
Processor Caches
Processor Caches
Processor Caches
Memory
I/O
Memory
I/O
Memory
I/O
Directory
Directory X
Directory Y
Interconnection network
21
Directory Actions
  • If block is in uncached state
  • Read miss send data, make block shared
  • Write miss send data, make block exclusive
  • If block is in shared state
  • Read miss send data, add node to sharers list
  • Write miss send data, invalidate sharers, make
    excl
  • If block is in exclusive state
  • Read miss ask owner for data, write to memory,
    send
  • data, make shared, add node to sharers list
  • Data write back write to memory, make uncached
  • Write miss ask owner for data, write to memory,
    send
  • data, update identity of new owner, remain
    exclusive

22
Title
  • Bullet
Write a Comment
User Comments (0)
About PowerShow.com