Title: Chapter 6: Multiprocessors Part 2
1Chapter 6 Multiprocessors Part 2
- Parallel programming
- Synchronization (Section 6.7)
- Memory consistency models (Section 6.8)
2Parallel Programming Example
- Add two matrices C A B
- Sequential Program
- main(argc, argv)
- int argc char argv
-
- Read(A)
- Read(B)
- for (i 0 i ! N i)
- for (j 0 j ! N j)
- Ci,j Ai,j Bi,j
- Print(C)
3Parallel Program Example (Cont.)
4Parallel Program Example (Cont.)
- main(argc, argv)
- int argc char argv
-
- Read(A)
- Read(B)
- for (p 1 p numberofprocessors p)
- createprocess(p, startprocedure)
- startprocedure()
- waitforallprocessestobedone()
- Print(C)
-
- startprocedure()
-
- for (i myrowsbegin i ! myrowsend i)
- for (j 0, j ! N, j)
- Ci,j Ai,j Bi,j
- indicatedone()
5The Parallel Programming Process
6The Parallel Programming Process
- Break up computation into tasks
- Break up data into chunks
- Necessary for messagepassing machines
- Introduce synchronization for correctness
7Synchronization
- Communication Exchange data
- Synchronization Exchange data to order events
- Mutual exclusion or atomicity
- Event ordering or Producer/consumer
- Point to Point
- Flags
- Global
- Barriers
8Mutual Exclusion
- Example
- Each processor needs to occasionally update a
counter - Processor 1 Processor 2
- Load reg1, Counter Load reg2, Counter
- reg1 reg1 tmp1 reg2 reg2 tmp2
- Store Counter, reg1 Store Counter, reg2
9Mutual Exclusion Primitives
- Hardware instructions
- TestSet
- Atomically tests for 0 and sets to 1
- Unset is simply a store of 0
- while (TestSet(L) ! 0)
- Critical Section
- Unset(L)
- Problem?
10Mutual Exclusion Primitives
- Hardware instructions
- TestSet
- Atomically tests for 0 and sets to 1
- Unset is simply a store of 0
- while (TestSet(L) ! 0)
- Critical Section
- Unset(L)
- Problem - Traffic
11Mutual Exclusion Primitives Alternative?
12Mutual Exclusion Primitives Alternative?
- TestTestSet
- A while (L ! 0)
- if (TestSet(L) 0)
- critical Section
-
- else go to loop A
- Problem?
13Mutual Exclusion Primitives Alternative?
- TestTestSet
- A while (L ! 0)
- if (TestSet(L) 0)
- critical Section
-
- else go to loop A
- Problem
- Traffic on lock release
- What if processor swapped out while holding lock?
14Mutual Exclusion Primitives FetchAdd
- FetchAdd(var, data)
- / atomic action /
- temp var
- var temp data
-
- return temp
- E.g., let X 57
- P1 a FetchAdd(X,3)
- P1 b FetchAdd(X,5)
- If P1 before P2, ?
- If P2 before P1, ?
- If P1, P2 concurrent ?
15Point to Point Event Ordering
- Example
- Producer wants to indicate to consumer that data
is ready - Processor 1 Processor 2
- A1 A1
- A2 A2
- . .
- . .
- An An
16Point to Point Event Ordering Flags
- Example
- Producer wants to indicate to consumer that data
is ready - Processor 1 Processor 2
- while (Flag ! 1)
- A1 A1
- A2 A2
- . .
- . .
- An An
- Flag 1
17Global Event Ordering Barriers
- Example
- All processors produce some data
- Want to tell all processors that it is ready
- In next phase, all processors consume data
produced previously - Use barriers
18Implementing Barriers
- Simple barrier
- temp FetchInc(count)
- while (count ! N)
- Problem
19Implementing Barriers
- Simple barrier
- temp FetchInc(count)
- while (count ! N)
- Problem Cannot use it again
20Implementing Barriers
- local_flag !local_flag
- if FetchInc(count) N
- count 1
- flag local_flag
-
- while (flag ! local_flag)
21Memory Consistency Model - Motivation
- Example shared-memory program
- Initially all locations 0
- Processor 1 Processor 2
- Data 23 while (Flag ! 1)
- Flag 1 Data
- Execution (only shared-memory operations)
- Processor 1 Processor 2
- Write, Data, 23
- Write, Flag, 1
- Read, Flag, 1
- Read, Data, ___
22Memory Consistency Model Definition
- Memory consistency model
- Order in which memory operations will appear to
execute - What value can a read return?
- Affects ease-of-programming and performance
23The Uniprocessor Model
- Program text defines total order program order
- Uniprocessor model
- Memory operations appear to execute one-at-a-time
in program order - ? Read returns value of last write
- BUT uniprocessor hardware
- Overlap, reorder operations
- Model maintained as long as
- maintain control and data dependences
- ? Easy to use high performance
24Implicit Memory Model
- Sequential consistency (SC) Lamport
- Result of an execution appears as if
- All operations executed in some sequential order
(i.e., atomically) - Memory operations of each process in program
order
25Understanding Program Order Example 1
- Initially Flag1 Flag2 0
- P1 P2
- Flag1 1 Flag2 1
- if (Flag2 0) if (Flag1 0)
- critical section critical section
- Execution
- P1 P2
- (Operation, Location, Value)
(Operation, Location, Value) - Write, Flag1, 1 Write, Flag2, 1
- Read, Flag2, 0 Read, Flag1, ___
26Understanding Program Order Example 1
- P1 P2
- Write, Flag1, 1 Write, Flag2, 1
- Read, Flag2, 0 Read, Flag1, 0
- Can happen if
- Write buffers with read bypassing
- Overlap, reorder write followed by read in h/w or
compiler - Allocate Flag1 or Flag2 in registers
27Understanding Program Order - Example 2
- Initially A Flag 0
- P1 P2
- A 23 while (Flag ! 1)
- Flag 1 ... A
- P1 P2
- Write, A, 23 Read, Flag, 0
- Write, Flag, 1
- Read, Flag, 1
- Read, A, ____
28Understanding Program Order - Example 2
- Initially A Flag 0
- P1 P2
- A 23 while (Flag ! 1)
- Flag 1 ... A
- P1 P2
- Write, A, 23 Read, Flag, 0
- Write, Flag, 1
- Read, Flag, 1
- Read, A, 0
- Can happen if
- Overlap or reorder writes or reads in hardware or
compiler
29Understanding Program Order Summary
- SC limits program order relaxation
- Write ? Read
- Write ? Write
- Read ? Read, Write
30Understanding Atomicity
P1
P2
Pn
CACHE
A
OLD
A
OLD
BUS
MEMORY
MEMORY
A
OLD
- A mechanism needed to propagate a write to other
copies - ? Cache coherence protocol
31Cache Coherence Protocols
- How to propagate write?
- Invalidate -- Remove old copies from other caches
- Update -- Update old copies in other caches to
new values
32Understanding Atomicity - Example 1
- Initially A B C 0
- P1 P2 P3
P4 - A 1 A 2 while (B ! 1)
while (B ! 1) - B 1 C 1 while (C ! 1)
while (C ! 1) - tmp1 A
tmp2 A -
33Understanding Atomicity - Example 1
- Initially A B C 0
- P1 P2 P3
P4 - A 1 A 2 while (B ! 1)
while (B ! 1) - B 1 C 1 while (C ! 1)
while (C ! 1) - tmp1 A
1 tmp2 A 2 - Can happen if updates of A reach P3 and P4 in
different order - Coherence protocol must serialize writes to same
location - (Writes to same location should be seen in same
order by all) -
-
34Understanding Atomicity - Example 2
- Initially A B 0
- P1 P2 P3
- A 1 while (A ! 1) while (B ! 1)
- B 1 tmp A
- P1 P2 P3
- Write, A, 1
- Read, A, 1
- Write, B, 1
- Read, B, 1
- Read, A, 0
- Can happen if read returns new value before all
copies see it
35SC Summary
- SC limits
- Program order relaxation
- Write ? Read
- Write ? Write
- Read ? Read, Write
- When a processor can read the value of a write
- Unserialized writes to the same location
- Alternative
- Aggressive hardware techniques proposed to get SC
w/o penalty - using speculation and prefetching
- But compilers still limited by SC
- (2) Give up sequential consistency
- Use relaxed models
36Relaxed Memory Models
- Motivation
- Ordering important only at synchronization
- Can reorder data between synchronization
- Distinguish synchronization from data
- Initially all locations 0
- Processor 1 Processor 2
- Data1 23 while (Flag ! 1)
- Data2 45 Data1
- Data2
- Flag 1
- ? Weak ordering, release consistency