Title: Outline for Today
1Outline for Today
- Objectives
- To introduce the critical section problem.
- To learn how to reason about the correctness of
concurrent programs. - To present Linux kernel synchronization
- Administrative details
2Reasons for Explicitly Programming with
Threads(User-level Perspective Birrell)
- To capture naturally concurrent activities
- Waiting for slow devices
- Providing human users faster response.
- Shared network servers multiplexing among client
requests (each client served by its own server
thread) - To gain speedup by exploiting parallelism in
hardware - Maintenance tasks performed in the background
- Multiprocessors
- Overlap the asynchronous and independent
functioning of devices and users - Within a single user thread signal handlers
cause asynchronous control flow.
3Concurrency from theKernel Perspective
- Kernel preemption scheduler can preempt task
executing in kernel. - Interrupts occurring asynchronously invoking
handler that disrupts the execution flow. - Sleeping to wait for events.
- Support for SMP multiprocessors true
concurrency of code executing on shared memory
locations.
4The Trouble with Concurrency in Threads...
What is the value of x when both threadsleave
this while loop?
5Range of Answers
- Process 0
- LD x // x currently 0
- Add 1
- ST x // x now 1, stored over 9
- Do 9 more full loops // leaving x at 10
- Process1
- LD x // x currently 0
- Add 1
- ST x // x now 1
- Do 8 more full loops // x 9
- LD x // x now 1
- Add 1
- ST x // x 2 stored over 10
6Reasoning about Concurrency
- What unit of work can be performed without
interruption? Indivisible or atomic operations. - Interleavings - possible execution sequences of
operations drawn from all threads. - Race condition - final results depend on ordering
and may not be correct.
7The Trouble with Concurrency
- Two threads (T1,T2) in one address space or two
processes in the kernel - One counter (shared)
Assumedatomic
T1 T2 count ld (count) add switch ld
(count) add st (count1) count1 switch st
(count1) count1
ld r2, count add r1, r2, r3 st count, r1
private
ld r2, count add r1, r2, r3 st count, r1
Time
Shared Data
count
8Desired Atomic Sequence of Instructions
wait
- Atomic Sequence
- Appears to execute to completion without any
intervening operations
9Unprotected Shared Data
- void threadcode( )
- int ilong keyfor (i0 ilt20 i) key
rand() SortedInsert (key) - for (i0 ilt20 i) key SortedRemove() pri
nt (key) -
private
head
10
20
30
null
What can happen here?
10Unprotected Shared Data
- 2 concurrent SortedInserts with keys 5 and 7.
7
head
20
30
10
null
5
What can happen here?
11Unprotected Shared Data
- 2 concurrent SortedInserts with keys 5 and 7.
- 2 concurrent SortedRemoves
head
20
30
10
null
What can happen here?
12Critical Sections
- If a sequence of non-atomic operations must be
executed as if it were atomic in order to be
correct, then we need to provide a way to
constrain the possible interleavings - Critical sections are defined as code sequences
that contribute to bad race conditions. - Synchronization is needed around such critical
sections. - Mutual Exclusion - goal is to ensure that
critical sections execute atomically w.r.t.
related critical sections in other threads or
processes.
13The Critical Section Problem
- Each process follows this template
- while (1)
- ...other stuff... //processes in here
shouldnt stop others - enter_region( )
- critical section
- exit_region( )
-
- The problem is to implement enter_region and
exit_region to ensure mutual exclusion with some
degree of fairness.
Problem with this definitionIt focuses on code
not shared data that needs protecting!
14Temptation to Protect Critical Sections (Badly)
- void threadcode( )
- int ilong keyfor (i0 ilt20 i) key
rand() SortedInsert (key) - for (i0 ilt20 i) key SortedRemove() p
rint (key) -
head
Acquire(insertmutex) Release(insertmutex)
10
20
30
null
Acquire(removemutex) Release(removemutex)
Focus on the data!
15Temptation to Protect Critical Sections (Badly)
- void threadcode( )
- int ilong keyfor (i0 ilt20 i) key
rand() SortedInsert (key) - for (i0 ilt20 i) key SortedRemove() p
rint (key) -
head
Acquire(listmutex) Release(listmutex)
10
20
30
null
Acquire(listmutex) Release(listmutex)
Focus on the data!
16Yet Another Example
- Problem Given arrays C0x,0y, A 0x,0y,
and B 0x,0y. Use n threads to update each
element of C to the sum of A and B and then the
last thread returns the average value of all C
elements.
17Design Alternatives
- Static partitioning of arrays
- for (i lowi i lt highi i)
- for (j lowj j lt highj j)
- Ci,j Ai,j Bi,j
- sum sum Ci,j
lowi 0 highi n/2-1 lowj 0 highj n/2-1
lowi n/2 highi n-1lowj 0 highj n/2-1
lowi n/2 highi n-1lowj n/2 highj n-1
lowi 0 highi n/2-1 lowj n/2 highj n-1
- Static partitioning of arrays
- for (i lowi i lt highi i)
- for (j lowj j lt highj j)
- Ci,j Ai,j Bi,j
- privatesum privatesum Ci,j
- sum sum privatesum
C
sum
18Design Alternatives
- Dynamic partitioning of arrays
- while (elements_remain(i,j))
- Ci,j Ai,j Bi,j
- sum sum Ci,j
C
sum
19Implementation Options for Mutual Exclusion
- Disable Interrupts
- Use atomic operations (read-mod-write instr.)
- Busywaiting solutions - spinlocks
- execute a tight loop if critical section is busy
- benefits from specialized atomic instructions
- Blocking synchronization
- sleep (enqueued on wait queue) while C.S. is busy
- Synchronization primitives (abstractions, such as
locks) which are provided by a system may be
implemented with some combination of these
techniques.
20The Critical Section Problem
while (1) ...other stuff... critical
section anything that touches a particular set
of shared data
enter_region( )
exit_region( )
21Critical Data
- Goal in solving the critical section problem is
to build synchronization so that the sequence of
instructions that can cause a race condition are
executed AS IF they were indivisible - Other stuff code that does not touch the
critical data associated with a critical section
can be interleaved with the critical section
code. - Code from a critical section involving data x can
be interleaved with code from a critical section
associated with data y.
22The Critical Section Problem
while (1) ...other stuff... critical
section anything that touches a particular set
of shared data
local_irq_save(flags)
local_irq_restore(flags)
Overkill on UPInsufficient for SMP
23Disabling Preemption
while (1) ...other stuff... critical
section per-processor data
preempt_disable()
preempt_enable()
Milder impact on UP
24Atomic Operations (Integer)
- Special data type atomic_t
- Prevent misuse and compiler optimizations
- Only 24 bit values (its SPARCs fault)
- atomic_t u ATOMIC_INIT (0)
- Selected operations (see p. 119)
- atomic_read
- atomic_set
- atomic_add
- atomic_inc
- atomic_sub_and_test
- atomic_add_negative
- atomic_inc_and_test
25Atomic Operations (Bitwise)
- No special data type take pointer and bit
number as arguments. Bit 0 is least sign. bit. - Selected operations
- test_and_set_bit
- test_and_clear_bit
- test_and_change_bit
- set_bit
- clear_bit
- change_bit
- test_bit
26Uses of Atomic Operations
- static int x 0
- threadcode()
-
- int j 0
- while(jlt10)
- //10 times per thread
- xx1
- j
-
- atomic_t x ATOMIC_INIT (0)
- threadcode()
-
- int j0
- while(jlt10)
- //10 times per thread
- atomic_inc(x)
- j
27Uses of Atomic Operations
- static int x 0
- static int j 11
- threadcode()
-
- while((--j)!0)
- // 10 times in all
- xx1
-
- atomic_t x ATOMIC_INIT (0)
- atomic_t j ATOMIC_INIT (11)
- threadcode()
-
- while(!atomic_dec_and_test(j))
- //10 times in all
- atomic_inc(x)
28Uses of Atomic Operations
while (1) ...other stuff... critical
section anything that touches a particular set
of shared data
//homegrown spinlockwhile(test_and_set_bit(0,
busy)
clear_bit(0, busy )
29Linux Kernel Spinlocks
spinlock_t busy SPIN_LOCK_UNLOCKED
while (1) ...other stuff... critical
section anything that touches a particular set
of shared data
//canned spinlockspin_lock(busy)
spin_unlock(busy )
30Pros and Cons of Busywaiting
- Key characteristic - the waiting process is
actively executing instructions in the CPU and
using memory cycles. - Appropriate when
- High likelihood of finding the critical section
unoccupied (dont take context switch just to
find that out) or estimated wait time is very
short - You have a processor all to yourself
- In interrupt context
- Disadvantages
- Wastes resources (CPU, memory, bus bandwidth)
31Spinlock Subtleties
- Using spinlock in interrupt handlers disable
local interrupts before obtaining lock - Saves (and restores) IRQ-enable state.Disables
while holding lock - spin_lock_irqsave (lockvar, flags)
- spin_unlock_irqrestore (lockvar, flags)
- spin_lock_irq (lockvar)
- spin_unlock_irq(lockvar)
- Disabling bottom halves
- spin_lock_bh() and spin_unlock_bh()
32Pros and Cons of Blocking
- Sleeping processes/threads dont consume CPU
cycles - Appropriate when the cost of a system call is
justified by expected waiting time - High likelihood of contention for lock
- Long critical sections
- Disadvantage context switch d?overhead
33Semaphores
- Well-known synchronization abstraction
- Defined as a non-negative integer with two atomic
operations - P(s) - wait until s gt 0 s-- or down(s)
- V(s) - s or up(s)
34(No Transcript)
35(No Transcript)
36Semaphore Usage
- Binary semaphores can provide mutual exclusion
mutex (solution to critical section problem) - Counting semaphores can represent a resource with
multiple instances (e.g. solving
producer/consumer problem) - Signaling events (persistent events that stay
relevant even if nobody listening right now)
37The Critical Section Problem
static DECLARE_SEMAPHORE_GENERIC(mutex,1)
or static DECLARE_MUTEX(mutex)
while (1) ...other stuff... critical
section
down_interruptable(mutex)
Fill in the boxes
up(mutex)
38Lock Granularity how much should one lock
protect?
tail
head
4
6
2
8
10
B
3
A
39Lock Granularity how much should one lock
protect?
tail
head
4
6
2
8
10
B
3
A
Concurrency vs. overheadComplexity threatens
correctness
40Optimistic Locking Seqlocks
- Sequence counter incremented on write
- Compare counter before and after a read
- Even counter value means data is stable
- Odd counter value means write in progress
Writes write_seqlock(lock) // do write, lock is
odd write_sequnlock(lock) // write complete,
// lock is even
Reads do oldread_seqbegin(lock)
//reading data while (read_seqretry(lock,
old))
41Petersons Algorithm for 2 Process Mutual
Exclusion
- enter_region
- needin me true
- turn you
- while (needin you turn you) no_op
- exit_region
- needin me false
Based on the assumption of atomic ld/st operations
42Interleaving of Execution of 2 Threads (blue and
green)
- enter_region
- needin me true
- turn you
- while (needin you turn you) no_op
- Critical Section
- exit_region
- needin me false
- enter_region
- needin me true
- turn you
- while (needin you turn you) no_op
- Critical Section
- exit_region
- needin me false
43needin blue true
needin green true
turn green
turn blue
while (needin green turn green)
Critical Section
while (needin blue turn blue)no_op
while (needin blue turn blue)no_op
needin blue false
while (needin blue turn blue)
Critical Section
needin green false
44Petersons Algorithm for 2 Process Mutual
Exclusion
- enter_region
- needin me true
- turn you
- while (needin you turn you) no_op
- exit_region
- needin me false
mb()
45Barriers
- rmb prevents loads being reordered across
barrier - wmb prevents reordering stores
- mb both loads and stores
- read_barrier_depends data-dependent loads
- SMP versions of above compiles to barrier on UP
- barrier prevents compiler optimizations from
causing the reordering
46Classic Synchronization Problems
- There are a number of classic problems that
represent a class of synchronization situations - Critical Section problem
- Producer/Consumer problem
- Reader/Writer problem
- 5 Dining Philosophers
- Why? Once you know the generic solutions, you
can recognize other special cases in which to
apply them (e.g., this is just a version of the
reader/writer problem)
47Producer / Consumer
- Producer
- while(whatever)
- locally generate item
-
-
- fill empty buffer with item
-
- Consumer
- while(whatever)
-
- get item from full buffer
- use item
48Producer / Consumer(with Counting Semaphores)
- Producer
- while(whatever)
- locally generate item
-
-
- fill empty buffer with item
-
- Consumer
- while(whatever)
-
- get item from full buffer
- use item
P(fullbuf)
P(emptybuf)
V(emptybuf)
V(fullbuf)
Semaphores emptybuf initially N fullbuf
initially 0
not Linux syntax
49What does it mean that Semaphores have
persistence?Tweedledum and Tweedledee Problem
- Separate threads executing their respective
procedures. The code below is intended to cause
them to forever take turns exchanging insults
through the shared variable X in strict
alternation. - The Sleep() and Wakeup() routines operate as
follows - Sleep blocks the calling thread,
- Wakeup unblocks a specific thread if that thread
is blocked, otherwise its behavior is
unpredictable - Linux wait_for_completion() and complete()
50The code shown above exhibits a well-known
synchronization flaw. Outline a scenario in which
this code would fail, and the outcome of that
scenario
- void Tweedledum()
-
- while(1)
- Sleep()
- x Quarrel(x)
- Wakeup(Tweedledee)
-
-
- void Tweedledee()
-
- while(1)
- x Quarrel(x)
Wakeup(Tweedledum) - Sleep()
-
-
Lost WakeupIf dee goes first to sleep, the
wakeup is lost (since dum isnt sleeping yet).
Both sleep forever.
51Show how to fix the problem by replacing the
Sleep and Wakeup calls with semaphore P (down)
and V (up) operations.
- void Tweedledum()
-
- while(1)
- Sleep()
- x Quarrel(x)
- Wakeup(Tweedledee)
-
-
void Tweedledee() while(1)
x Quarrel(x)
Wakeup(Tweedledum) Sleep()
semaphore dee 0 semaphore dum 0
525 Dining Philosophers
53Template for Philosopher
- while (food available)
- /pick up forks/
- eat
- /put down forks/
- think awhile
54Naive Solution
- while (food available)
- /pick up forks/
- eat
- /put down forks/
- think awhile
P(forkleft(me))P(forkright(me))
V(forkleft(me))V(forkright(me))
Does this work?
55Simplest Example of Deadlock
- Thread 0
- P(R1)
- P(R2)
- V(R1)
- V(R2)
- Thread 1
- P(R2)
- P(R1)
- V(R2)
- V(R1)
Interleaving P(R1) P(R2) P(R1) waits P(R2) waits
R1 and R2 initially 1 (binary semaphore)
56Conditions for Deadlock
- Mutually exclusive use of resources
- Binary semaphores R1 and R2
- Circular waiting
- Thread 0 waits for Thread 1 to V(R2) and Thread
1 waits for Thread 0 to V(R1) - Hold and wait
- Holding either R1 or R2 while waiting on other
- No pre-emption
- Neither R1 nor R2 are removed from their
respective holding Threads.
57Philosophy 101(or why 5DP is interesting)
- How to eat with your Fellows without causing
Deadlock. - Circular arguments (the circular wait condition)
- Not giving up on firmly held things (no
preemption) - Infinite patience with Half-baked schemes (hold
some wait for more) - Why Starvation exists and what we can do about it.
58Dealing with Deadlock
- It can be prevented by breaking one of the
prerequisite conditions - Mutually exclusive use of resources
- Example Allowing shared access to read-only
files (readers/writers problem) - circular waiting
- Example Define an ordering on resources and
acquire them in order - hold and wait
- no pre-emption
59Circular Wait Condition
- while (food available)
- if (me ???0) P(forkleft(me))
P(forkright(me)) - else (P(forkright(me)) P(forkleft(me))
- eat
- V(forkleft(me)) V(forkright(me))
- think awhile
60Hold and Wait Condition
while (food available) P(mutex) while (forks
me ! 2) blockingme true V(mutex)
P(sleepyme) P(mutex) forks
leftneighbor(me) -- forks rightneighbor(me)-
- V(mutex) eat P(mutex) forks
leftneighbor(me) forks rightneighbor(me)
if (blockingleftneighbor(me)) blocking
leftneighbor(me) false V(sleepyleftneighbor(
me)) if (blockingrightneighbor(me))
blockingrightneighbor(me) false
V(sleepyrightneighbor(me)) V(mutex)
think awhile
61Starvation
- The difference between deadlock and starvation is
subtle - Once a set of processes are deadlocked, there is
no future execution sequence that can get them
out of it. - In starvation, there does exist some execution
sequence that is favorable to the starving
process although there is no guarantee it will
ever occur. - Rollback and Retry solutions are prone to
starvation. - Continuous arrival of higher priority processes
is another common starvation situation.
62Readers/Writers Problem
- Synchronizing access to a file or data record in
a database such that any number of threads
requesting read-only access are allowed but only
one thread requesting write access is allowed,
excluding all readers.
63Template for Readers/Writers
- Reader()
- while (true)
-
- read
-
- Writer()
- while (true)
-
- write
-
-
/request r access/
/request w access/
/release r access/
/release w access/
64Reader/Writer Spinlocks
- Class of reader/writer problems
- Multiple readers OK
- Mutual exclusion for writers
- No upgrade from reader lock to writer lock
- Favors readers starvation of writers possible
- rwlock_t
- read_lock,read_unlock
- read_lock_irq // also unlock
- read_lock_irqsave
- read_unlock_irqrestore
- write_lock,write_unlock
- //_irq,_irqsave,_irqrestore
- write_trylock
- rw_is_locked
65Reader/Writer Semaphores
- All reader / writer semaphores are mutexes (usage
count 1) - Multiple readers, solo writer
- Uninterruptible sleep
- Possible to downgrade writer to reader
- down_read
- down_write
- up_read
- up_write
- downgrade_writer
- down_read_trylock
- down_write_trylock
66Semaphore Solution with Writer Priority
-
- int readCount 0, writeCount 0
- semaphore mutex1 1, mutex2 1
- semaphore readBlock 1
- semaphore writePending 1
- semaphore writeBlock 1
67- Reader()
- while (TRUE)
- other stuff
- P(writePending)
- P(readBlock)
- P(mutex1)
- readCount readCount 1
- if(readCount 1)
- P(writeBlock)
- V(mutex1) V(readBlock)
- V(writePending)
- access resource
- P(mutex1)
- readCount readCount -1
- if(readCount 0)
- V(writeBlock)
- V(mutex1)
- Writer()
- while(TRUE)
- other stuff
- P(mutex2)
- writeCount writeCount 1
- if (writeCount 1)
- P(readBlock)
- V(mutex2)
- P(writeBlock)
- access resource
- V(writeBlock)
- P(mutex2)
- writeCount - writeCount - 1
- if (writeCount 0)
- V(readBlock)
- V(mutex2)
-
68- Reader()
- while (TRUE)
- other stuff
- P(writePending)
- P(readBlock)
- P(mutex1)
- readCount readCount 1
- if(readCount 1)
- P(writeBlock)
- V(mutex1) V(readBlock)
- V(writePending)
- access resource
- P(mutex1)
- readCount readCount -1
- if(readCount 0)
- V(writeBlock)
- V(mutex1)
- Writer()
- while(TRUE)
- other stuff
- P(mutex2)
- writeCount writeCount 1
- if (writeCount 1)
- P(readBlock)
- V(mutex2)
- P(writeBlock)
- access resource
- V(writeBlock)
- P(mutex2)
- writeCount - writeCount - 1
- if (writeCount 0)
- V(readBlock)
- V(mutex2)
-
Assume the writePending semaphore was omitted.
What would happen?
69- Reader()
- while (TRUE)
- other stuff
- P(writePending)
- P(readBlock)
- P(mutex1)
- readCount readCount 1
- if(readCount 1)
- P(writeBlock)
- V(mutex1) V(readBlock)
- V(writePending)
- access resource
- P(mutex1)
- readCount readCount -1
- if(readCount 0)
- V(writeBlock)
- V(mutex1)
- Writer()
- while(TRUE)
- other stuff
- P(mutex2)
- writeCount writeCount 1
- if (writeCount 1)
- P(readBlock)
- V(mutex2)
- P(writeBlock)
- access resource
- V(writeBlock)
- P(mutex2)
- writeCount - writeCount - 1
- if (writeCount 0)
- V(readBlock)
- V(mutex2)
-
Assume the writePending semaphore was omitted.
What would happen?
70- Reader()
- while (TRUE)
- other stuff
- P(writePending)
- P(readBlock)
- P(mutex1)
- readCount readCount 1
- if(readCount 1)
- P(writeBlock)
- V(mutex1) V(readBlock)
- V(writePending)
- access resource
- P(mutex1)
- readCount readCount -1
- if(readCount 0)
- V(writeBlock)
- V(mutex1)
- Writer()
- while(TRUE)
- other stuff
- P(mutex2)
- writeCount writeCount 1
- if (writeCount 1)
- P(readBlock)
- V(mutex2)
- P(writeBlock)
- access resource
- V(writeBlock)
- P(mutex2)
- writeCount - writeCount - 1
- if (writeCount 0)
- V(readBlock)
- V(mutex2)
-
Assume the writePending semaphore was omitted.
What would happen?
71- Assume the writePending semaphore was omitted in
the solution just given. What would happen?
This is supposed to give writers priority.
However, consider the following sequence Reader
1 arrives, executes thro P(readBlock) Reader 1
executes P(mutex1) Writer 1 arrives, waits at
P(readBlock) Reader 2 arrives, waits at
P(readBlock) Reader 1 executes V(mutex1) then
V(readBlock) Reader 2 may now proceedwrong
72Birrell paper SRC Thread Primitives
- SRC thread primitives
- Thread Fork (procedure, args)
- result Join (thread)
- LOCK mutex DO critical section END
- Wait (mutex, condition)
- Signal (condition)
- Broadcast (condition)
- Acquire (mutex), Release (mutex) //more dangerous
73Monitor Abstraction
- Encapsulates shared data and operations with
mutual exclusive use of the object (an associated
lock). - Associated Condition Variables with operations of
Wait and Signal.
notEmpty
entry queue
monitor_lock
enQ
deQ
init
shared data
conditions
74Condition Variables
- We build the monitor abstraction out of a lock
(for the mutual exclusion) and a set of
associated condition variables. - Wait on condition releases lock held by caller,
caller goes to sleep on conditions queue.
When awakened, it must reacquire lock. - Signal condition wakes up one waiting thread.
- Broadcast wakes up all threads waiting on this
condition.
75Monitor Abstraction
- EnQacquire (lock)
- if (head null)
- head item
- signal (lock, notEmpty)
- else tail-gtnext item
- tail item
- release(lock)
- deQacquire (lock)
- if (head null)
- wait (lock, notEmpty)
- item head
- if (tail head) tail null
- headitem-gtnext
- release(lock)
-
notEmpty
entry queue
monitor_lock
enQ
deQ
init
shared data
conditions
76Monitor Abstraction
- EnQacquire (lock)
- if (head null)
- head item
- signal (lock, notEmpty)
- else tail-gtnext item
- tail item
- release(lock)
- deQacquire (lock)
- if (head null)
- wait (lock, notEmpty)
- item head
- if (tail head) tail null
- headitem-gtnext
- release(lock)
-
notEmpty
entry queue
monitor_lock
enQ
deQ
init
shared data
conditions
77Monitor Abstraction
- EnQacquire (lock)
- if (head null)
- head item
- signal (lock, notEmpty)
- else tail-gtnext item
- tail item
- release(lock)
- deQacquire (lock)
- if (head null)
- wait (lock, notEmpty)
- item head
- if (tail head) tail null
- headitem-gtnext
- release(lock)
-
notEmpty
entry queue
monitor_lock
enQ
deQ
init
shared data
conditions
78Monitor Abstraction
- EnQacquire (lock)
- if (head null)
- head item
- signal (lock, notEmpty)
- else tail-gtnext item
- tail item
- release(lock)
- deQacquire (lock)
- if (head null)
- wait (lock, notEmpty)
- item head
- if (tail head) tail null
- headitem-gtnext
- release(lock)
-
notEmpty
entry queue
monitor_lock
enQ
deQ
init
shared data
conditions
79Monitor Abstraction
- EnQacquire (lock)
- if (head null)
- head item
- signal (lock, notEmpty)
- else tail-gtnext item
- tail item
- release(lock)
- deQacquire (lock)
- if (head null)
- wait (lock, notEmpty)
- item head
- if (tail head) tail null
- headitem-gtnext
- release(lock)
-
notEmpty
entry queue
monitor_lock
enQ
deQ
init
shared data
conditions
80Monitor Abstraction
- EnQacquire (lock)
- if (head null)
- head item
- signal (lock, notEmpty)
- else tail-gtnext item
- tail item
- release(lock)
- deQacquire (lock)
- while (head null)
- wait (lock, notEmpty)
- item head
- if (tail head) tail null
- headitem-gtnext
- release(lock)
-
notEmpty
entry queue
monitor_lock
enQ
deQ
init
shared data
conditions
81The Critical Section Problem
while (1) ...other stuff... acquire
(mutex) critical section release
(mutex)
// conceptually inside monitor
82PV using Locks CV (Monitor)
- P acquire (lock)
- while (Sval 0)
- wait (lock, nonZero)
- Sval Sval 1
- release(lock)
- V acquire (lock)
- Sval Sval 1
- signal (lock, nonZero)
- release(lock)
-
nonZero
entry queue
lock
P
V
Sval
init
shared data
conditions
83Design Decisions / Issues
- Locking overhead (granularity)
- Broadcast vs. Signal
- Nested lock/condition variable problem
- LOCK a DO LOCK b DO while (not_ready) wait
(b, c) //releases b not a ENDEND - My advice correctness first!
Unseenin call
84Using Condition Variables
- while (! required_conditions) wait (m, c)
- Why we use while not if invariant not
guaranteed - Why use broadcast vs. signal can arise if we
are using one condition queue for many reasons.
Waking threads have to sort it out (spurious
wakeups). Possibly better to separate into
multiple conditions (but more complexity to
code).
855DP - Monitor Style
- Boolean eating 5
- Lock forkMutex
- Condition forksAvail
- void PickupForks (int i)
- forkMutex.Acquire( )
- while ( eating(i-1)5 ???eating(i1)5 )
- forksAvail.Wait(forkMutex)
- eatingi true
- forkMutex.Release( )
- void PutdownForks (int i)
- forkMutex.Acquire( )
- eatingi false
- forksAvail.Broadcast(forkMutex)
- forkMutex.Release( )
86What about this?
while (food available) forkMutex.Acquire( )
while (forks me ! 2) blockingmetrue for
kMutex.Release( ) sleep( ) forkMutex.Acquire(
) forks leftneighbor(me)-- forks
rightneighbor(me)-- forkMutex.Release(
) eat forkMutex.Acquire( ) forksleftneighb
or(me) forks rightneighbor(me) if
(blockingleftneighbor(me) blockingrightneigh
bor(me)) wakeup ( )
forkMutex.Release( ) think awhile
87Template for Readers/Writers
- Reader()
- while (true)
-
- read
-
- Writer()
- while (true)
-
- write
-
-
fd open(foo, 0)
fd open(foo, 1)
close(fd)
close(fd)
88Template for Readers/Writers
- Reader()
- while (true)
-
- read
-
- Writer()
- while (true)
-
- write
-
-
startRead()
startWrite()
endRead()
endWrite()
89R/W - Monitor Style
- Boolean busy false
- int numReaders 0
- Lock filesMutex
- Condition OKtoWrite, OKtoRead
- void startRead ()
- filesMutex.Acquire( )
- while ( busy )
- OKtoRead.Wait(filesMutex)
- numReaders
- filesMutex.Release( )
- void endRead ()
- filesMutex.Acquire( )
- numReaders--
- if (numReaders 0)
- OKtoWrite.Signal(filesMutex)
- filesMutex.Release( )
- void startWrite()
- filesMutex.Acquire( )
- while (busy numReaders ! 0) OKtoWrite.Wait(
filesMutex) - busy true
- filesMutex.Release( )
- void endWrite()
- filesMutex.Acquire( )
- busy false
- OKtoRead.Broadcast(filesMutex)
OKtoWrite.Signal(filesMutex) - filesMutex.Release( )
90Issues
- Locking overhead (granularity)
- Broadcast vs. Signal and other causes of spurious
wakeups - Nested lock/condition variable problem
- LOCK a DO LOCK b DO while (not_ready) wait
(b, c) //releases b not a ENDEND - Priority inversions
Unseenin call
91Spurious Wakeups
- while (! required_conditions) wait (m, c)
- Why we use while not if invariant not
guaranteed - Why use broadcast using one condition queue for
many reasons. Waking threads have to sort it
out. Possibly better to separate into multiple
conditions (more complexity to code)
92Tricks (mixed syntax)
- if (some_condition) // as a hint
-
- LOCK m DO
- if (some_condition) //the truth
- stuff
- END
Cheap to get info but must check for
correctness always a slow way
93More Tricks
- General patternwhile (! required_conditions)
wait (m, c) - Broadcast works because waking up too many is OK
(correctness-wise) although a performance impact.
LOCK m DO deferred_signal trueENDif
(deferred_signal) signal (c)
Spurious lock conflictscaused by signals
insidecritical section and threads waking up to
test mutex before it getsreleased.
94Alerts
- Thread state contains flag, alert-pending
- Exception alerted
- Alert (thread)
- alert-pending to true, wakeup a waiting thread
- AlertWait (mutex, condition)
- if alert-pending set to false and raise exception
- else wait as usual
- Boolean b TestAlert()
- tests and clear alert-pending
- TRYwhile (empty) AlertWait (m, nonempty)
return (nextchar()) - EXCEPTThread.Alerted
- return (eof)
95Using Alerts
- sibling Fork (proc, arg)
- while (!done)
- done longComp()
- if (done) Alert (sibling)
- else done TestAlert()
-
-
96Wisdom
- Do s
- Reserve using alerts for when you dont know what
is going on - Only use if you forked the thread
- Impose an ordering on lock acquisition
- Write down invariants that should be true when
locks arent being held
- Dont s
- Call into a different abstraction level while
holding a lock - Move the last signal beyond scope of Lock
- Acquire lock, fork, and let child release lock
- Expect priority inheritance since few
implementations - Pack data and expect fine grain locking to work
97(No Transcript)
98Proposed Algorithm for 2 Process Mutual Exclusion
- Boolean flag2
- proc (int i)
- while (TRUE)
- compute
- flagi TRUE
- while(flag(i1) mod 2)
- critical section
- flagi FALSE
-
- flag0 flag1 FALSE
- fork (proc, 1, 0)
- fork (proc, 1,1)
- Is it correct?
Assume they go lockstep. Both set their
own flag to TRUE. Both busywait forever on the
others flag -gt deadlock.
99Greedy Version (turn me)
needin blue true
needin green true
turn blue
while (needin green turn green)
Critical Section
turn green
while (needin blue turn blue)
Critical Section
Oooops!
100Petersons Algorithm for 2 Process Mutual
Exclusion
- enter_region
- needin me true
- turn you
- while (needin you turn you) no_op
- exit_region
- needin me false
- What about more than 2 processes?
101Can we extend 2-process algorithm to work with n
processes?
102Can we extend 2-process algorithm to work with n
processes?
needin me true turn you
needin me true turn you
needin me true turn you
needin me true turn you
needin me true turn you
CS
IdeaTournamentDetailsBookkeeping (left to the
reader)
103Lamports Bakery Algorithm
- enter_regionchoosingme truenumberme
max(number0n-1) 1choosingme falsefor
(j0 n-1 j) while (choosingj ! 0)
skip while((numberj ! 0 ) and
((numberj lt numberme) or ((numberj
numberme) and (j lt me)))) skip - exit_regionnumberme 0
104Interleaving / Execution Sequence with Bakery
Algorithm
Thread 0
Thread 1
Choosing
Choosing
False
False
0
Number 0
Number 1
0
Thread 2
Thread 3
False
Choosing
Choosing
False
Number 3
0
Number 2
0
105Thread 0
Thread 1
Choosing
Choosing
True
True
0
Number 0
Number 1
0
1
Thread 2
Thread 3
True
Choosing
Choosing
False
Number 3
0
1
Number 2
0
106for (j0 n-1 j) while (choosingj !
0) skip while((numberj ! 0 ) and
((numberj lt numberme) or ((numberj
numberme) and (j lt me)))) skip
Thread 0
Thread 1
Choosing
Choosing
True
False
0
Number 0
Number 1
1
Thread 2
Thread 3
False
Choosing
Choosing
False
Number 3
1
Number 2
0
107for (j0 n-1 j) while (choosingj !
0) skip while((numberj ! 0 ) and
((numberj lt numberme) or ((numberj
numberme) and (j lt me)))) skip
Thread 0
Thread 1
Choosing
Choosing
False
False
2
Number 0
Number 1
1
Thread 2
Thread 3
False
Choosing
Choosing
False
Number 3
1
Number 2
0
108for (j0 n-1 j) while (choosingj !
0) skip while((numberj ! 0 ) and
((numberj lt numberme) or ((numberj
numberme) and (j lt me)))) skip
Thread 0
Thread 1
Choosing
Choosing
False
False
2
Number 0
Number 1
1
Thread 2
Thread 3
False
Choosing
Choosing
True
Number 3
1
Number 2
3
109for (j0 n-1 j) while (choosingj !
0) skip while((numberj ! 0 ) and
((numberj lt numberme) or ((numberj
numberme) and (j lt me)))) skip
Thread 3 Stuck
Thread 0
Thread 1
Choosing
Choosing
False
False
2
Number 0
Number 1
1
Thread 2
Thread 3
False
Choosing
Choosing
True
Number 3
1
Number 2
3
110for (j0 n-1 j) while (choosingj !
0) skip while((numberj ! 0 ) and
((numberj lt numberme) or ((numberj
numberme) and (j lt me)))) skip
Thread 0
Thread 1
Choosing
Choosing
False
False
2
Number 0
Number 1
0
Thread 2
Thread 3
False
Choosing
Choosing
True
Number 3
1
Number 2
3
111for (j0 n-1 j) while (choosingj !
0) skip while((numberj ! 0 ) and
((numberj lt numberme) or ((numberj
numberme) and (j lt me)))) skip
Thread 0
Thread 1
Choosing
Choosing
False
False
2
Number 0
Number 1
0
Thread 2
Thread 3
False
Choosing
Choosing
False
Number 3
1
Number 2
3
112Hardware Assistance
- Most modern architectures provide some support
for building synchronization atomic
read-modify-write instructions. - Example test-and-set (loc, reg)
- sets bit to 1 in the new value of loc
- returns old value of loc in reg
- Other examples
- compare-and-swap, fetch-and-op
notation means atomic
113Busywaiting with Test-and-Set
- Declare a shared memory location to represent a
busyflag on the critical section we are trying to
protect. - enter_region (or acquiring the lock)
- waitloop tsl busyflag, R0 // R0 ??busyflag
busyflag ? 1 - bnz R0, waitloop // was it already set?
- exit region (or releasing the lock)
- busyflag ? 0
114Better Implementations from Multiprocessor Domain
- Dealing with contention of TestSet spinlocks
- Dont execute testset so much
- Spin without generating bus traffic
- TestSet with Backoff
- Insert delay between testset operations (not too
long) - Exponential seems good (kci)
- Not fair
- Test-and-TestSet
- Spin (test) on local cached copy until it gets
invalidated, then issue testset - Intuition No point in trying to set the location
until we know that its not set, which we can
detect when it get invalidated... - Still contention after invalidate
- Still not fair
- Analogies for Energy?
115Blocking Synchronization
- OS implementation involving changing the state of
the waiting process from running to blocked. - Need some synchronization abstraction known to OS
- provided by system calls. - mutex locks with operations acquire and release
- semaphores with operations P and V (down, up)
- condition variables with wait and signal
116Template for Implementing Blocking Synchronization
- Associated with the lock is a memory location
(busy) and a queue for waiting threads/processes. - Acquire syscall
- while (busy) enqueue caller on locks queue
- /upon waking to nonbusy lock/ busy true
- Release syscall
- busy false
- / wakeup / move any waiting threads to Ready
queue
117- The Alpha and MIPS 4000 processor architectures
have no atomic read-modify-write instructions,
i.e., no test-and-set-lock instruction (TS).
Atomic update is supported by pairs of
load_locked (LDL) and store-conditional (STC)
instructions. - The semantics of the Alpha architectures LDL and
STC instructions are as follows. Executing an LDL
Rx, y instruction loads the memory at the
specified address (y) into the specified general
register (Rx), and holds y in a special
per-processor lock register. STC Rx, y stores the
contents of the specified general register (Rx)
to memory at the specified address (y), but only
if y matches the address in the CPUs lock
register. If STC succeeds, it places a one in Rx
if it fails, it places a zero in Rx. Several
kinds of events can cause the machine to clear
the CPU lock register, including traps and
interrupts. Moreover, if any CPU in a
multiprocessor system successfully completes a
STC to address y, then every other processors
lock register is atomically cleared if it
contains the value y. - Show how to use LDL and STC to implement safe
busy-waiting
118- Acquire LDL R1 flag /R1 lt- flag/
- BNZ R1 Acquire / if (R1 ! 0) already locked/
- LDI R2 1
- STC R2 flag / try to set flag/
- BEZ R2 Acquire / if STC failed, retry/
- Release LDI R2 0
- STC R2 flag / reset lock, breaking lock on
flag/ - BEZ R2 Error / should never happen/
119Synchronization Primitives (Abstractions)
- implementable by busywaiting or blocking
120Knowing a Critical Section when you see one.
- Weve talked about solutions for protecting
critical sections of code. - Before considering other problems or usage
patterns for synchronization primitives, lets
get some practice with recognizing critical
sections that need such protection.