Title: ECE 1747: Parallel Programming
1ECE 1747 Parallel Programming
- Distributed Shared Memory
- (DSM)
2Multiprocessor (SMP)
proc1
proc3
proc2
X0
X0
X0
X0
3Consistency Models
- Sequential Consistency
- All processors observe the same order
- Must correspond to some serial order
- Only ordering constraint is that reads/writes of
P1 appear in the same order, but no restrictions
on relative ordering between processors.
4Common consistency protocols
- Write update
- Multicast update to all replicas
- Write invalidate
- Invalidate cached copies in p2, p3
- Cache miss if p2/p3 access X
- Valid data from other cache
5Distributed Shared Memory (DSM)
shared memory
network
mem0
mem1
mem2
memN
...
proc0
proc1
proc2
procN
6DSM programming
- Standard pthread-like
- synchronizations
- Barriers
- Locks
- Semaphores
7Sequential SOR
- for some number of timesteps/iterations
- for (i0 iltn i )
- for( j1, jltn, j )
- tempij 0.25
- ( gridi-1j gridi1j
- gridij-1 gridij1 )
- for( i0 iltn i )
- for( j1 jltn j )
- gridij tempij
8Parallel SOR with Barriers (1 of 2)
- void sor (void arg)
-
- int slice (int)arg
- int from (slice (n-1))/p 1
- int to ((slice1) (n-1))/p 1
- for some number of iterations
-
9Parallel SOR with Barriers (2 of 2)
- for (ifrom iltto i)
- for (j1 jltn j)
- tempij 0.25 (gridi-1j
gridi1j gridij-1 gridij1) - barrier()
- for (ifrom iltto i)
- for (j1 jltn j)
- gridijtempij
- barrier()
10Sequential Consistency DSM
- As proposed by Li Hudak, TOCS 86.
- Use virtual memory to implement sharing.
- Shared memory divided up by virtual memory pages.
- Use an SMP-like coherence protocol.
- Keep pages in one of three states
- invalid, read-only, read-write
11 SC implementation
- Synchronous read/write
- Writes must be propagated before moving on to the
next operation
12Read-Write False Sharing
x
y
13Read-Write False Sharing (Cont.)
w(x)
w(x)
w(x)
r(x)
r(y)
r(y)
14Read-Write False Sharing (Cont.)
w(x)
w(x)
w(x)
r(x)
r(y)
r(y)
synch
15Weak Consistency (WEAKC)
- Data modifications are only propagated at the
time of synchronization. - Works fine if program is properly synchronized
through system primitives. - All programs should be
16Read-Write False Sharing (Before)
w(x)
w(x)
w(x)
r(x)
r(y)
r(y)
synch
17Read-Write False Sharing (WEAKC)
w(x)
w(x)
r(y)
r(y)
r(x)
synch
18Write-Write False Sharing
x
y
19Write-Write False Sharing
w(x)
w(x)
w(x)
r(x)
w(y)
w(y)
synch
20Write-Write False Sharing (WEAKC)
w(x)
w(x)
w(x)
w(y)
r(x)
w(y)
synch
21Multiple Writer (MW) Protocols
- Allows multiple writers per page.
- Modifications merged at synchronization
(according to weakc definition). - Modifications are recorded through a mechanism
called twinning and diffing.
22Write-Write False Sharing and MW
w(x)
w(x)
w(x)
w(y)
w(y)
r(x)
synch
23Creating a diff (delta)
Diff (delta)
twin
w(x)
...
w(x)
write- protected
write- protected
writable
24Write-Write False Sharing and MW
x
synch
twin
w(x)
w(x)
w(x)
x
w(y)
w(y)
r(x)
x
twin
y
y
25Release Consistency (RC)
- Distinguish acquires from releases
- Ordinary read/write wait until the previous
acquire is performed - Release waits until previous read/write are
performed - Acquire/release are sequentially consistent
w.r.t. one another
26Eager Lazy Release Consistency
- Eager release consistency transfer consistency
information at release of a lock. - Lazy release consistency transfer consistency
information at acquire of a lock.
27Eager Release Consistency
w(x) rel
p1
acq w(x) rel
p2
Acq w(x) rel
p3
acq r(x)
p4
28Lazy Release Consistency
w(x) rel
p1
acq w(x) rel
p2
Acq w(x) rel
p3
acq r(x)
p4
29Lazy Release Consistency
- Acquiring processor determines witch
modifications it needs to see.
w(x) rel
p1
acq w(y) rel
p2
acq r(x) r(y)
p3
synch
30Vector Timestamps
1 0 0
0 0 0
w(x) rel
p1
1 1 0
acq w(y) rel
0 0 0
p2
acq r(x) r(y)
p3
0 0 0
31DSM Summary
- Relaxed consistency
- applications definition of correctness
- gt70 performance of corresponding message passing
applications