Title: Synchronization
1Synchronization
2Disclaimer
- If youve taken ICS432 in Fall 2008, a large
subset of this set of lecture notes should be
very familiar - But there are a few new things
- Priority Inversion
- Reader-Write Locks
- What OSes provide
- Given how difficult and important synchronization
is, I cant imagine that going through it again
is a bad idea
3Cooperating Processes/Threads
- Cooperation is great and useful
- processes and message-passing
- processes and shared memory segments
- threads in a single address space
- all of the above
- However, it must be done very carefully
- all above models have their dangers
- There are two main problems
- Race Conditions a bug that leads the program to
gives unpredictably incorrect results - Typical with processes/threads sharing memory
- Deadlocks the program blocks forever
- Typical in all the above
- We talk about Race Conditions in these lecture
notes - Deadlocks next set of notes
4Race Condition
- Let look at the producer-consumer example, using
a circular buffer - Which you should have read in Section 3.4.1
- Producer
- while (true)
- while (counter BUFFER_SIZE) // wait,
buffers full - bufferin produceNewItem()
- in (in 1) BUFFER_SIZE
- counter
-
- Consumer
- while (true)
- while (counter 0) // wait, buffers
empty - nextConsumed bufferout
- out (out 1) BUFFER_SIZE
- counter--
-
Initially counter 0 in 0 out
0
5Race Condition Example
- Lets look at the code in race_condition_example.c
- Lets run it and see if we can observe the race
condition - Terminology there is a race condition
- The program is buggy
- The question is whether the bug will manifest
itself - The bug we see is called a lost update
6Why Race Condition?
- Race conditions occur because of concurrency of
threads/processes - Two kinds of concurrency
core 1
core 2
false concurrency within a core illusion of
concurrency provided by the OS) (e.g. green and
blue task) true concurrency across cores
(e.g., green and yellow task)
7True/False Concurrency
- The programmer shouldnt have to care/know
whether concurrency will be true or false - Typically, the programmer doesnt know on which
computer the program will run! - A concurrent program with 10 tasks should work on
a single-core processor, a quad-core processor, a
32-core processor, etc. - However, better performance with true concurrency
- false concurrency is still very useful, e.g., for
interactivity - Weve talked about true concurrency across cores,
but there could be true concurrency between any
two hardware resources - e.g., between the network card and the core
- e.g., between the disk and the network card
8Why Race Conditions?
- Race conditions can happen with false or true
concurrency - Everything else being equal, one could argue that
theyre most statistically likely to manifest
themselves with true concurrency - Lets explain how they can occur with false
concurrency first - Consider a single-core running a 2-threaded
process, with a thread doing count and the
other count-- - These statements are in a high-level language
- But we know that the compiler translates them
into machine code, which we like to look at
written as assembly code - On a Load/Store architecture (RISC), the code
would then be
Thread 1 load R1, _at_ inc R1 store _at_, R1
Thread 2 load R1, _at_ dec R1 store _at_, R1
9Why Race Conditions?
- Illusion of concurrency the OS context-switches
threads rapidly - Interrupt, save state (stack, register values,
...), restart - Three possible execution paths
load R1, _at_ inc R1 load R1, _at_ dec R1 store _at_,
R1 store _at_, R1
load R1, _at_ inc R1 load R1, _at_ dec R1 store _at_,
R1 store _at_, R1
load R1, _at_ load R1, _at_ dec R1 inc R1 store _at_,
R1 store _at_, R1
Important R1 is not the same as R1 They are both
register values into logical register sets (i.e.,
inside a data structure in the OS)
10Why Race Conditions?
Lets assume that initially _at_ 5
load R1, _at_ // R1 5 load R1, _at_ // R1
5 dec R1 // R1 4 inc R1
// R1 6 store _at_, R1 // _at_ 4 store _at_,
R1 // _at_ 6
load R1, _at_ // R1 5 inc R1 // R1
6 load R1, _at_ // R1 5 dec R1 //
R1 4 store _at_, R1 // _at_ 4 store _at_, R1
// _at_ 6
load R1, _at_ // R1 5 inc R1 // R1
6 load R1, _at_ // R1 5 dec R1
// R1 4 store _at_, R1 // _at_ 6 store _at_,
R1 // _at_ 4
We would expect _at_ to be 5 at the end But we can
get 4, or 6
11Why Race Conditions?
- What happens in the case of true concurrency?
- Basically the same thing
- Each thread could be running on its core
- But it still has its own register set
- In this case a different physical register set as
opposed to a different logical register set - Its just that some instructions can actually be
done really concurrently - Note that each core may have its own cache
- But in this case there is cache coherency
hardware - No Register coherency hardware possible because
registers are used in completely arbitrary ways
and there is no notion of R1 having the same
value across cores - Lets see an example
12Why Race Conditions?
- Perfectly synchronized?
- When two processors issue a memory request
(load/store), then one of them gets to it first
(could be random, deterministic)
load R1, _at_ inc R1 store _at_, R1
load R1, _at_ dec R1 store _at_, R1
load R1, _at_ inc R1 store _at_, R1
load R1, _at_ dec R1 store _at_, R1
(_at_ 4 if we started with _at_ 5)
13Why do we Hate Race Conditions?
- A code may be working fine a million times, and
then fail one day - And then it takes you one more million times to
reproduce the bug - If you modify the code (e.g., adding a few printf
statements), or if you run in debugging mode,
this could completely impact the race condition
behavior - hide it, or highlight it
- If you write code, run it, and it works, you
dont really know whether youve written a
bug-free program - Typically true, but exacerbated with race
conditions - We hate them because we hate nondeterministic
bugs - and we hate bugs to begin with
- So what do we do?
14Critical Section
- We want a critical section a section of the code
in which only one thread can be at a time - It doesnt have to be a contiguous section of
code - In the example here, we have a 3-zone critical
section - If thread A is already in one of the red zones,
then all other threads are blocked before being
allowed to enter any red zone - And only one will be allowed to enter once thread
A leaves the red zone it was in
15Critical Section
- We can have multiple critical sections
- One 3-zone red critical section
- One 2-zone green critical section
16Critical Section
- More formally, we want three properties of
critical sections - Mutual exclusion if thread P is in the critical
section, then no other thread can be in it - Progress if thread P wants to enter in a
critical section it will enter it eventually - Bounded waiting once thread P has declared
intent to enter the critical section, there is a
bound on the number of threads that can enter the
critical section before P - Note that there is no assumption regarding the
relative speed of the involved threads - But no thread has speed zero
17Critical Section Code
- Producer
- while (true)
- while (counter BUFFER_SIZE) // wait,
buffers full - bufferin produceNewItem()
- in (in 1) BUFFER_SIZE
- enter_critical_section()
- counter
- exit_critical_section()
-
- Consumer
- while (true)
- while (counter 0) // wait, buffers
empty - nextConsumed bufferout
- out (out 1) BUFFER_SIZE
- enter_critical_section()
- counter--
- leave_critical_section()
-
18Critical Section
- A Critical Section corresponds to sections of
code (i.e., the text segment) - It doesnt correspond to data (i.e., variables)
- Even though the section of code is typically one
that modifies a particular variables - When we say we need to protect variable x
against race conditions it means we need to
look at the entire code, see where x is modified,
and put all those places in the SAME critical
section - If software engineering is well-done,
modification of a single variable doesnt happen
all over the code - Its a misconception that critical sections are
attached to variables. They are attached to code.
19Critical Sections and the Kernel
- On modern OSes, multiple threads can be in the
kernel - User threads that are doing a system call and are
in kernel mode - Threads started by the kernel itself to do useful
kernel things - Therefore, the kernel is subject to race
conditions - Weve seen that kernel debugging is hard, that
race condition debugging is hard, so we dont
want race conditions in the kernel - Example the kernel maintains many data
structures - e.g., the list of open files
- The list must be updated each time a files
opened or closed - This is very much like the counter / counter--
example - e.g., the list of memory allocations
- e.g., the list of processes
- e.g., the list of interrupt handlers
- The Kernel developer must ensure that no race
conditions exist
20Preemptive vs. Non-Preemptive
- A preemptive kernel allows a thread executing
kernel code (in kernel mode) to be preempted - A non-preemptive kernel doesnt
- The thread runs until is willingly exists kernel
mode (or yields control of the CPU) - Non-preemptive kernels are simple
- There is no race condition
- Preemptive kernels are more complex
- There are race conditions
- Preemptive kernels are more powerful
- better for real-time programming as a real-time
thread can preempt a thread running in kernel
mode - should be more responsive for the same reason
- Most modern kernels are preemptive
21Synchronization Implementation
- What we need is a way to implement
enter_critical_section() and leave_critical_sectio
n() - There are some software solutions
- They can be very complicated
- Theyre not guaranteed to work on modern
architecture - See Section 6.3 in the book if interested
- What we need is help from the hardware
- One option disabling interrupts?
- Problems
- If you allow whatever user process to disable
interrupts, what tells you it will enable them
afterwards? - What if interrupts are needed for other purposes,
such as a bunch of timers? - Disabling interrupts across multiple processor
cores take time and entering a critical section
would be very costly - Conclusion although inside the kernel one could
disable interrupts for specific purposes, one
cannot use this mechanism in general
22Atomic Instructions
- Modern processors offer atomic instruction
- the instruction is uninterruptible from the
moment it is issued to the moment it completes - Test Set instruction, which would correspond to
the following code - boolean TestAndSet (boolean target)
- boolean rv target
- target TRUE
- return rv
-
- Lets see how we can implement critical sections
with TestAndSet - Using it in our pseudo-code as if it were an
uninterruptible function, when its really an
uninterruptible instruction - Note that the book also talks about a Swap
instruction, which is equivalent
23Locks with TestAndSets
- We declare a boolean variable lock
- Shared by all threads, initialized to false
- Pseudo-code
- while (TestAndSet(lock))
- // Do nothing
- // Critical Section here
- lock FALSE
boolean TestAndSet (boolean target)
boolean rv target target
TRUE return rv
24Synchronization Abstractions
- A number of abstractions have been defined to
allow programs to use synchronization without
using things like TestAndSet directly - well see that these abstractions must be
provided by the Kernel because they require some
kernel things to happen - Ill use a slightly different order to present
things here when compared to the book, but the
material is the same
25The Lock Abstraction
- Based on TestAndSet, its easy to implement a
lock abstraction - typedef char lock_t
- void lock(lock_t lock)
- while(TestAndSet(lock))
- return
-
- void unlock(lock_t lock)
- lock FALSE
-
26The Lock Abstraction
- The abstraction is easily used
- lock_t mutex // A lock used for mutual
- // exclusion is often
- // called a mutex
- . . .
- . . .
- lock(mutex)
- insert(linked_list, element) // CS
- unlock(mutex)
27Locks for Communication?
- So far weve seen the use of locks for mutual
exclusion - But its tempting to do more advance
synchronization - Thread A waits for an event by doing lock(x)
- Thread B signals the event by doing unlock(x)
- Example
- We need to display a short movie while loading a
file - Thread A displays the movie
- Thread B loads the file
- They both start at the same time
- When the movie ends, if the file isnt already
there, thread A needs to wait for the file to be
loaded lock(x) - Then thread Bs done loading the file, it tells
thread A the file is loaded now unlock(x) - Any problem with this???
28Spin Locks
- The lock abstraction weve developed has the
thread do busy waiting - void lock(lock_t lock)
- while(TestAndSet(lock)) // busy
- return
-
- lock() burns CPU cycles
- slows down other threads/processes
- generates heat
- In general, busy waiting is frowned upon
- It is very tempting, unfortunately
- Exponential back-off solutions are not clean
and striking the good compromise isnt easy at
all - Our lock abstraction is called a spin lock,
because it has the thread spin until it can get
through
29Are Spin Locks Evil?
- Spin Locks can be very useful for (short)
critical sections - Burn only a few cycles, but provide super fast
response time - They do not involve the kernel
- In fact, spin locks are used inside the kernel
and several OSes provide the abstraction to users - Typically another type of lock is also provided,
and users can choose - How can we have a lock that doesnt spin?
30How Not To Spin?
- The alternative to spinning is to have the OS
block the thread - The OS can always enforce that any thread be in
any sate, and in this case BLOCKED - The OS simply needs to keep track of threads
blocked due to some synchronization operation - When necessary, the OS can simply remove a
blocked thread from that list and put it in the
READY state - This is much more heavier than spin locks in
terms of OS-involvement, but much lighter in
terms of CPU cycle consumption - e.g., spinning may still be a good idea if one
knows that the spinning time will be very short - Not spinning for a x critical section is
probably a bad idea - Nowadays, due the heavy multi-threading due to
many-core architecture, the overhead of lock()
and unlock() is a big concern
31No-Spin Locks for Communication?
- One can provide the lock abstraction with locks
that do not spin, where lock() and unlock() are
system calls to do kernel things - typedef char lock_t
- void lock(lock_t lock)
- ltget blocked and get put in the waiting
list for lockgt - return
-
- void unlock(lock_t lock)
- ltunblock the first thread in the
- waiting list for lockgt
-
32Problem w/ Locks and Communication
- Subtle issues with using (non-spin) locks for
communication - Lets look at the producer/consumer problem
- The code we showed on Slide 3 had the threads do
busy waiting - The producer waits for the buffer to be non-full
- The consumer waits for the buffer to be non-empty
- We want to avoid busy waiting, and get notified
by a lock - Since we now have non-spin locks to our avail
- We want to implement strict producer/consumer
- A consumer never attempts to consume from an
empty buffer - A produced never attempts to produce into a full
buffer - This strictness is desirable in many contexts
- Lets look at that code again, but using a
buffer abstract data type - buffer.consume() consumes an item
- buffer.produce() produces an item
- buffer.size() returns the number of items
- The buffer size is bounded above by SIZE
33Prod/Cons Revisited
- Producer
- while (true)
- while (buffer.size() SIZE) // busy wait
- buffer.produce(generateItem())
-
- Consumer
- while (true)
- while (buffer.size() 0) // busy wait
- Item item buffer.consume()
-
- First, we need to add a mutex lock to protect
the buffer - Internally, buffer.produce() and buffer.consume()
have race conditions, e.g., on the pointer to the
last element in the buffer - Equivalent to some counter/counter-- race
condition
34Prod/Cons Revisited
- Producer
- while (true)
- while (buffer.size() SIZE) // busy wait
- lock(mutex) // could be a spin-lock
- buffer.produce(generateItem())
- unlock(mutex)
-
- Consumer
- while (true)
- while (buffer.size() 0) // busy wait
- lock(mutex) // could be a spin-lock
- Item item buffer.consume()
- unlock(mutex)
-
Now lets add a locks for communication
instead of the busy wait
35Prod/Cons Revisited
- Producer
- while (true)
- if (buffer.size() SIZE)
- lock(not_full)
- lock(mutex)
- buffer.produce(generateItem())
- unlock(mutex)
- unlock(not_empty)
-
- Consumer
- while (true)
- if (buffer.size() 0)
- lock(not_empty)
- lock(mutex)
- Item item buffer.consume()
- unlock(mutex)
- unlock(not_full)
-
- This assumes we can do an unlock() on a lock
thats not locked - e.g., the producer starts way before the consumer
and puts three items in - It calls unlock(not_empty) three times, but the
consumer has yet to acquire the not_empty lock
once - This is really easy to implement in practice
- This assumes that initially
- not_full is not locked
- not_empty is locked
- Note that we must do the lock(not_full) or
lock(not_empty) only if necessary - e.g., if there is already an item in the buffer,
then the consumer should not attempt
lock(not_empty), but instead directly go take the
item
36Prod/Cons Revisited
- Producer
- while (true)
- if (buffer.size() SIZE)
- lock(not_full)
- lock(mutex)
- buffer.produce(generateItem())
- unlock(mutex)
- unlock(not_empty)
-
- Consumer
- while (true)
- if (buffer.size() 0)
- lock(not_empty)
- lock(mutex)
- Item item buffer.consume()
- unlock(mutex)
- unlock(not_full)
-
- This code works fine for one producer and one
consumer - Mutual exclusion via the mutex (spin) lock
- Communication via the not_full and not_empty
locks - However, there is a problem if we have more than
two threads - Lets assume we have two consumers
- What bad behavior could happen here?
37Prod/Cons Revisited
- One bad sequence of events
- Producer produces an element in the buffer and
then goes to sleep for 10 hours (because of some
action not shown in the pseudo-code) - Consumer 1 tests if buffer.size() 0, and sees
that it isnt - Consumer 1 calls lock(mutex) and enters the
Critical Section - Consumer 1 is interrupted by the OS and Consumer
2 starts - Consumer 2 tests if buffer.size() 0, and sees
that it isnt! - Consumer 2 calls lock(mutex), but Consumer 1
has the lock, so Consumer 2 is blocked - Consumer 1 proceeds the consume the item,
releases mutex, releases the not_full lock and is
interrupted by the OS - Consumer 2 acquires the mutex lock, and consumes
an item from an empty buffer!
- Producer
- while (true)
- if (buffer.size() SIZE)
- lock(not_full)
- lock(mutex)
- buffer.produce(generateItem())
- unlock(mutex)
- unlock(not_empty)
-
- Consumer
- while (true)
- if (buffer.size() 0)
- lock(not_empty)
- lock(mutex)
- Item item buffer.consume()
- unlock(mutex)
- unlock(not_full)
-
38Prod/Cons Revisited
- The problem race condition on the buffer.size()
test - What do we do against care conditions?
- We use mutex locks to create critical sections!
- So lets do that
- Producer
- while (true)
- if (buffer.size() SIZE)
- lock(not_full)
- lock(mutex)
- buffer.produce(generateItem())
- unlock(mutex)
- unlock(not_empty)
-
- Consumer
- while (true)
- if (buffer.size() 0)
- lock(not_empty)
- lock(mutex)
- Item item buffer.consume()
- unlock(mutex)
- unlock(not_full)
-
39Prod/Cons Revisited
- Producer
- while (true)
- lock(mutex2)
- if (buffer.size() SIZE)
- lock(not_full)
- unlock(mutex2)
- lock(mutex)
- buffer.produce(generateItem())
- unlock(mutex)
- unlock(not_empty)
-
- Consumer
- while (true)
- lock(mutex2)
- if (buffer.size() 0)
- lock(not_empty)
- unlock(mutex2)
- lock(mutex)
- Item item buffer.consume()
- Great, but now we have a new problem!
- Anybody sees what it is?
40Prod/Cons Revisited
- Producer
- while (true)
- lock(mutex2)
- if (buffer.size() SIZE)
- lock(not_full)
- unlock(mutex2)
- lock(mutex)
- buffer.produce(generateItem())
- unlock(mutex)
- unlock(not_empty)
-
- Consumer
- while (true)
- lock(mutex2)
- if (buffer.size() 0)
- lock(not_empty)
- unlock(mutex2)
- lock(mutex)
- Item item buffer.consume()
- The Deadlock
- The Consumer starts before the Producer
- The Consumer acquires the mutex2 lock
- The buffers empty, to the Comsumer blocks while
attempting to acquire the not_empty lock - which is locked initially
- The Producer starts and attempts to acquire
mutex2 - But mutex2 is locked by the Consumer!
- Result both Producer and Consumer are blocked
and the program simple sits there - Classic problem
- Thread 1 acquires lock A, and then blocks while
attempting to acquire lock B - Thread 2 acquires lock B, and then blocks while
attempting to acquire lock A - Well talk more about deadlocks
41Were Stuck
- If we dont protect the buffer.size() test, then
we have a race condition - If we do protect it, then we have a deadlock
- We can live with neither solution!
- This means that communication with locks is
perhaps not a good idea - Even if we can do non-spin locks
- Using the same abstraction for critical sections
and for communication may be asking too much? - How about a separate abstraction for
communication? - This abstraction is called condition variables
42The Cond. Variable Abstraction
- What we need is a way for a thread to block
waiting for an event WITHOUT holding a lock - General rule dont go to sleep while youre
holding a resource that could let a bunch of
people do useful work - e.g., dont go to sleep locked up in your dorm
room while holding the only key to the laundry
room - The solution is to create a new abstraction that
knows how to deal with a thread that holds a
lock - This abstraction is called a condition variable
- It provides two mechanisms
- wait() Blocks, waiting for an event
- signal() Unblock a thread waiting for the event
- i.e., tell the OS that that thread is runnable
again - Does not mean that the thread calling signal()
relinquishes control (at least not right away) - It is combined with a (mutex) lock
- wait(cond, mutex)
- signal(cond)
43Cond. Variable and Mutex
- Wait is saying Ok, Ill block and release the
lock so that somebody can use it. But as soon as
I wake up Im re-acquiring the lock to keep doing
the critical section stuff I wanted to do in the
first place - Safe because while a thread sleeps, its not
doing anything at all - Although before going to sleep, the thread should
make sure that it doesnt leave the program in an
inconsistent state - e.g., if program state is safely updated only
using a sequence of two calls, wait() between the
two calls is not a good idea! - Pseudo-code
- void wait(cond_t cond, lock_t mutex)
- unlock(mutex)
- ltask the OS to block me and to unblock me
when the event cond is signaledgt - lock(mutex)
-
44Prod/Cons With Cond. Variable
- Producer
- while (true)
- lock(mutex)
- if (buffer.size() SIZE)
- wait(not_full, mutex)
- buffer.produce(generateItem())
- unlock(mutex)
- signal(not_empty)
- Consumer
- while (true)
- lock(mutex)
- if (buffer.size() 0)
- wait(not_empty, mutex)
- Item item buffer.consume()
- unlock(mutex)
- signal(not_full)
-
- Note that we now use only one mutex
- Basically the whole code is a critical section,
but threads go to sleep while releasing the lock - We could have used two, but its equivalent and
much more verbose - Current view of the world
- (Spin) Locks for mutual exclusion
- Cond. Variables for communication
45Prod/Cons With Cond. Variable
- Producer
- while (true)
- lock(mutex)
- if (buffer.size() SIZE)
- wait(not_full, mutex)
- buffer.produce(generateItem())
- unlock(mutex)
- signal(not_empty)
- Consumer
- while (true)
- lock(mutex)
- if (buffer.size() 0)
- wait(not_empty, mutex)
- Item item buffer.consume()
- unlock(mutex)
- signal(not_full)
-
- There is STILL a subtle problem with the code
above - Anybody sees what it is?
- We could still have a read on an empty buffer!
46Prod/Cons With Cond. Variable
- Producer
- while (true)
- lock(mutex)
- if (buffer.size() SIZE)
- wait(not_full, mutex)
- buffer.produce(generateItem())
- unlock(mutex)
- signal(not_empty)
- Bad sequence of operations
- Consumer 1 starts, sees the buffer as empty, and
waits on not_empty - The Producer puts an item in the buffer (which is
not full), but gets interrupted right before it
calls signal(not_empty) - Consumer 2 starts, sees that the buffer isnt
empty, happily consumes the item - The OS resumes the Producer. The producer moves
on to execute signal(not_empty) - Consumer 1 then wakes up and moves on to
consume from an empty buffer! - This is called a spurious wake-up
- A Thread is re-awakened but the condition
corresponding to the awaited even is no longer
true! - There is a simple solution...
- Consumer
- while (true)
- lock(mutex)
- if (buffer.size() 0)
- wait(not_empty, mutex)
- Item item buffer.consume()
- unlock(mutex)
- signal(not_full)
-
47Prod/Cons With Cond. Variable
- Producer
- while (true)
- lock(mutex)
- while (buffer.size() SIZE)
- wait(not_full, mutex)
- buffer.produce(generateItem())
- unlock(mutex)
- signal(not_empty)
- A while instead of an if solves the spurious
wake-up problem - Basically, a thread waiting on a condition
shouldnt trust that once its awakened the
condition will be true - With a while, the thread just keeps rechecking
that the condition is really true after it wakes
up - Using a while loop around tests for conditions
combined with wait() is something you should
always do - Unless youre doing something super clever
perhaps...
- Consumer
- while (true)
- lock(mutex)
- while (buffer.size() 0)
- wait(not_empty, mutex)
- Item item buffer.consume()
- unlock(mutex)
- signal(not_full)
-
48The Semaphore Abstraction
- A Semaphore is an abstraction that provides a
unified signaling mechanism both for mutual
exclusion and communication - It can lead to very clean solutions that remove
the need for counters and other boolean variable
tests throughout the code - History
- Proposed in 1968 by Dijkstra
- Inspired by railroad semaphores
- Up/Down, Red/Green
49Semaphores
- A semaphore S is an integer variable thats
- Initialized to some value
- May have a max value specified
- Can be accessed by a wait() operation
- Originally called P() Proberen in Dutch (to
test) - Can be accessed by a signal() operation
- Originally called V() Verhogen in Dutch (to
increment) - The book uses the wait() and signal(), but Ill
use P() and V() - wait() and signal() are confusing with condition
variables, although of course theyre connected - Cool systems people I know always use P() and
V() when writing pseudo-code in speech or writing - Its shorter to type
- P() Wait (block) until the semaphore is
non-zero, and then decrement it - V() increments the semaphore
- Both P() and V() are atomic
- Can be implemented directly with TestSet-like
instructions
50Semaphores, Locks, Cond. Var
- Semaphores and (Locks Cond Var) are equivalent
in power - Everything you can do with one, you can do with
the other - Its a matter of preference
- Some solutions look great with semaphores
- Some solutions look great with locks cond var
- Many developers are biased towards one or the
other, based on what theyre more comfortable
with - Classical problems
- Implement Semaphores with Locks Cond Variables
- Implement Locks Cond. Variables with Semaphores
- Lets do the first one, which is easy
- The second one is actually tricky
51Semaphores with Locks Cond
void sem_init(sem_t sem, int value, int max)
sem.value value sem.max
max mutex_init(sem.mutex) cond_init(sem.cond)
- typedef struct
- int value
- int max
- lock_t mutex
- cond_t cond
- sem_t
void V( sem_t sem) lock(sem.mutex) if
(sem.value lt sem.max) sem.value unlock(sem.m
utex) signal(sem.cond)
void P( sem_t sem) lock(sem.mutex) while
(!sem.value) wait(sem.cond,
sem.mutex) sem.value-- unlock(sem.mutex)
52Prod/Cons with Semaphores
- Producer
- while (true)
- P(not_full)
- P(mutex)
- buffer.produce(generateItem())
- V(mutex)
- V(not_empty)
- Initial Values
- not_full SIZE
- Max-value SIZE
- not_empty 0
- Max-value SIZE
- mutex 1
- Max-value 1
- Note that the semaphores do the counting of the
elements - No longer any need for boolean tests, etc.
- This is an example that shows the expressive
power of semaphores - Naming the semaphores appropriately makes the
code readable
- Consumer
- while (true)
- P(not_empty)
- P(mutex)
- Item item buffer.consume()
- V(mutex)
- V(not_full)
-
53Semaphores and Spinning
- The textbook starts with semaphores
- I started with locks and condition variables
- In the description of semaphores, the textbook
talks about spinning - And yes, P() can spin
- Typically though, the common assumption is that
spinning is done only with spin locks and that
Semaphores do blocking - For maximum efficiency
- Spin locks used to protect, e.g., counter
- Semaphores used to protect, e.g.,
processHTTPRequest() - In code presented in these slides, I dont mix
the two, but good code should use the right
abstraction for the right things - e.g., when I write P(mutex), in would probably be
more efficient with spin_lock(mutex)
54Simple Deadlocks
- Deadlocks with Locks
- Deadlocks with Semaphores
lock(B) lock(A) . . . unlock(B) unlock(A)
lock(A) lock(B) . . . unlock(A) unlock(B)
P(A) P(B) . . . V(A) V(B)
P(B) P(A) . . . V(B) V(A)
55Classical Synchronization Pbs
- Synchronization is such a difficult topic that a
number of standard problems have been defined - Some are very relevant to real programs
- Producer/Consumer
- Reader/Writer
- . . .
- Some are more out there but use ideas relevant in
real programs - Dining Philosophers
- Barber shop
- . . .
- I am going to describe Dining Philosophers
without going into details - It should be part of your culture
- Read the textbooks section
- Take ICS432 is you want to know it inside out
- Well also describe the Reader/Writer
56Dining Philosophers
- 5 philosophers sit at a table with 5 plates and 5
forks - Each philosopher does two things
- void philosopher()
- ltthinkgt
- pickupForks()
- lteatgt
- putdownForks()
-
- To eat, a philosopher needs two forks
- Questions
- how to implement pickupForks()
- how to implement putdownForks()
- Goals
- No race conditions (of course)
- No deadlocks
- No starvation
- Fair eating
- There are many solutions, and good ones are very
complicated
57Readers/Writers
- We have a database (DB) accessed by two kinds of
threads - Readers read records from the DB
- Writers write records into the DB
- Either one or more readers access the DB at the
same time, or one single writer does - Lets look at a few solutions
- Using Semaphores
58A Naïve Solution
sem_t rw 1
void writer() while(true) P(rw)
ltwrite to the DBgt V(rw)
void reader() while(true) P(rw)
ltread from the DBgt V(rw)
- But no concurrent accesses to the DB
- Super safe, complete mutual exclusion
- Would be very inefficient with many readers and
few writers
59Reader-Preferred Solution
void reader() while(true) P(mutex)
if (nr 0) P(rw) // I am first
nr V(mutex) ltread from the DBgt
P(mutex) nr-- if (nr 0) V(rw) //
I am last V(mutex)
semaphore_t mutex 1 semaphore_t rw 1 int nr
0
void writer() while(true) P(rw)
ltwrite to the DBgt V(rw)
60Reader-Preferred Solution
- The problem of the reader-preferred solution is
that it is too reader-preferred - An endless stream of readers will preclude all
writes to the DB - Turns out its very difficult to modify the code
to make it fair between readers and writers - There is a classic solution that uses
synchronization and the passing the baton
technique - Based on a invariant condition and subtle
signaling - Many intricate solutions presented on-line
- Lets look at a simple but pretty good solution
61Maximum number of readers
- Defining a maximum number of allowed concurrent
readers simplifies the problem! - And most likely makes sense for most applications
- Lets say we allow at most N concurrent active
readers - Then we can create a resource semaphore with
initial value N - Each reader needs to acquire one resource to be
able to read - Therefore, N concurrent readers are allowed
- Each writer needs to acquire N resources to be
able to write - Therefore, only one writer can be executing at a
time and no readers can be executing concurrently - Analogy N tokens on a table
- A reader needs one
- A writer needs all N
- Keeps accumulating until it has them
- They are all in line to grab tokens, and when
they grab one, they go back to the end of the
line - This line is actually implemented in the OS for
semaphores, cond vars, etc. - Lets look at the code
62Max Readers Solution Attempt
semaphore_t sem N
void writer() while(true) for (i0
iltN i) P(sem) ltwrite to the DBgt
for (i0 iltN i) V(sem)
void reader() while(true) P(sem)
ltread from the DBgt V(sem)
There is still a problem...
63Reader/Writer
void writer() while(true) for (i0
iltN i) P(sem) ltwrite to the DBgt
for (i0 iltN i) V(sem)
- Deadlock!
- One could have two writers each start acquiring
resources concurrently - For instance
- Writer 1 holds 1 resource
- Writer 2 holds N-1 resources
- Theyre both blocked forever
- Solution Not allow two writers to execute the
for loop of P() calls concurrently - This can easily be done with mutual exclusion
- That is, we need another semaphore
64Decent Max Readers Solution
semaphore_t sem N semaphore_t wmutex 1
void writer() while(true)
P(wmutex) for (i0 iltN i)
P(sem) V(wmutex) ltwrite to the
DBgt for (i0 iltN i) V(sem)
void reader() while(true) P(sem)
ltread from the DBgt V(sem)
65Reader-Writer Locks
- The reader-writer is so popular and common that
some OSes provide reader-writer locks - The lock can be accessed in reader mode by
readers - The lock can be accessed in writer mode by
writers - This basically solves the reader-writer problem
- Multiple readers are in multiple exclusion with
single writers - The programmer just has to make sure that readers
and writers acquire the lock with the correct
mode - The implementation of the reader/writer problem
is only as good as the implementation of the
reader-writer lock - Reader-Writer locks are less efficient than
regular locks, because they do more work for you - Never use them if you can get back with regular
locks - Typically, a lot of effort is spent making sure
that locks are fast
66Priority Inversion
- Going back toward the OS, we have seen that
processes/threads can have different priorities - Well talk about CPU scheduling in detail in a
future lecture, but lets just say for now that a
higher priority process, if ready, always runs
before a lower priority process - Important Processes, even if their code doesnt
lead to synchronization problems, use data
structures in the kernel that are themselves
protected by, e.g., locks - Whether you see it or not, your programs do use
locks, cond vars, semaphores, etc. when they run
in kernel mode - Lets say we have three processes H gt M gt L
- Resource R is currently in use by process L
- Process L holds binary semaphore S
- Process H requires resource R
- Process H is blocked on a P(S)
- But process M is running, preventing process L
from running for a long time - So process L can never get to do a V(S)
- Priority Inversion Process M runs, and runs,
while process H is stuck
67Priority Inversion Solution
- Most OSes implement a priority inheritance
mechanism - A process that accesses a resource needed by a
higher priority process inherits that process
priority temporarily - Complexifies the Kernel code quite a bit
- This solves the example seen in the previous
slides - Read the Priority Inversion and the Mars
Pathfinder blurb on p. 239 of the textbook - Priority inheritance hadnt been enabled
- The program was real-time, so higher-priority
processes had better run when they need to! - If priority inheritance hadnt been implemented
in the kernel of the OS, the pathfinder would
have failed
68Monitors
- Writing concurrent programs with semaphores,
locks, condition variables is very error prone - Typically, either youre implementing a version
of one of the well-known problems, or youre
introducing concurrency bugs - At least as a beginner concurrent programmer
- In the late 70s, Brinch-Hansen proposed the
concept of a Monitor. - Popularized by Hoare (1974)
- A monitor is really an abstract data type
representing a shared resource - e.g., a class/object
- It is a construct of a programming language
- Java implements monitors
69Monitors
- There is nothing magical here, we still need the
two basic functionalities of mutual exclusion and
waiting/signaling - Monitors have the same power as other
synchronization abstractions such as locks and
condition variables - But monitors constrain several aspects
- Condition variables are not visible outside the
monitor - They are hidden/encapsulated
- One interacts with them via special monitor
operations - Mutual exclusion is implicit
- Monitor operations execute by definition in
mutual exclusion - These apparently innocuous properties make
writing concurrent code less error-prone - The programmer shouldnt have to deal with where
P() and V() should be placed - Ill let you read the textbooks section for more
information - We wont use monitors and Kernel code doesnt use
monitors
70Synchronization in Solaris
- Solaris provides
- adaptive mutexes
- condition variables
- semaphores
- reader-writer locks
- turnstiles
- Adaptive mutexes
- looks at the state of the system and decides
whether to spin or to block - e.g., if the lock is currently being held by a
thread thats blocked, forget spinning - No matter what, long critical sections should be
protected by semaphores or cond. variables so
that one is certain that there will be no spinning
71Synchronization in Solaris
- Turnstiles
- Queues containing threads waiting for locks
- One turnstile per synchronization object
- Turnstiles provide the abstraction through which
priority inheritance is implemented - Almost all these mechanisms are available inside
and outside the Kernel - The exception
- Priority-inheritance happens only in the Kernel
- User-level programs, if dealing with priorities,
have to deal with them creatively - e.g., implementing their own turnstiles
72Synchronization in Win XP
- The Kernel uses spin locks for protection within
the Kernel - Or interrupt-disabling on single-processor
systems - It ensures that a (kernel) thread holding a spin
lock is never preempted - For user-programs, XP provides dispatcher objects
- mutex locks
- semaphores
- event (a.k.a. condition variables)
- timers (sends a signal() after a lapse of time)
73Synchronization in Linux
- Locking in the Kernel spin locks and semaphores
- Spin locks protect only short code sections
- On single-proc machines, disables kernel
preemption - Which is allowed only if the current thread does
not hold any locks (the kernel counts locks held
per thread) - (Non-spin) Semaphores used for longer sections of
code - Pthreads
- (non spin) mutex locks
- spin locks
- condition variables
- read-write locks
- semaphores
74Mutual Exclusion and Pthreads
- Pthreads provide a simple mutual exclusion lock
- Lock creation
- int pthread_mutex_init(
- pthread_mutex_t mutex, const
pthread_mutexattr_t attr) - returns 0 on success, an error code otherwise
- mutex output parameter, lock
- attr input, lock attributes
- NULL default
- There are functions to set the attribute (look at
the man pages if youre interested)
75Pthread Locking
- Locking a lock
- If the lock is already locked, then the calling
thread is blocked - If the lock is not locked, then the calling
thread acquires it - int pthread_mutex_lock(
- pthread_mutex_t mutex)
- returns 0 on success, an error code otherwise
- mutex input parameter, lock
76Pthread Locking
- Just checking
- Returns instead of locking
-
- int pthread_mutex_trylock(
- pthread_mutex_t mutex)
- returns 0 on success, EBUSY if the lock is
locked, an error code otherwise - mutex input parameter, lock
77Synchronizing pthreads
- Releasing a lock
- int pthread_mutex_unlock(
- pthread_mutex_t mutex)
- returns 0 on success, an error code otherwise
- mutex input parameter, lock
- Pthreads implement exactly the concept of locks
as it was described in the previous lecture notes
78Cleaning up memory
- Releasing memory for a mutex attribute
- int pthread_mutex_destroy(
- pthread_mutex_t mutex)
- Releasing memory for a mutex
- int pthread_mutexattr_destroy(
- pthread_mutexattr_t mutex)
79Pthread Spin Locks
- There is a pthread_spin_t type, which implements
spin locks - Used just like pthread_mutex_t
80Cond. Variables and Semaphores
- Condition variables are of the type
pthread_cond_t - They are used in conjunction with mutex locks
- Semaphores are provided as a separate POSIX
standard - sem_init
- sem_wait
- sem_post
- sem_getvalue
- ...
81pthread_cond_init()
- Creating a condition variable
- int pthread_cond_init(
- pthread_cond_t cond,
- const pthread_condattr_t attr)
- returns 0 on success, an error code otherwise
- cond output parameter, condition
- attr input parameter, attributes (default
NULL)
82pthread_cond_wait()
- Waiting on a condition variable
- int pthread_cond_wait(
- pthread_cond_t cond,
- pthread_mutex_t mutex)
- returns 0 on success, an error code otherwise
- cond input parameter, condition
- mutex input parameter, associated mutex
83pthread_cond_signal()
- Signaling a condition variable
- int pthread_cond_signal(
- pthread_cond_t cond
- returns 0 on success, an error code otherwise
- cond input parameter, condition
- Wakes up one thread out of the possibly many
threads waiting for the condition - The thread is chosen non-deterministically
84pthread_cond_broadcast()
- Signaling a condition variable
- int pthread_cond_broadcast(
- pthread_cond_t cond
- returns 0 on success, an error code otherwise
- cond input parameter, condition
- Wakes up ALL threads waiting for the condition
- May be useful in some applications
85Condition Variable example
- Say I want to have multiple threads wait until a
counter reaches a maximum value and be awakened
when it happens - pthread_mutex_lock(lock)
- while (count lt MAX_COUNT)
- pthread_cond_wait(cond,lock)
-
- pthread_mutex_unlock(lock)
- Locking the lock so that we can read the value of
count without the possibility of a race condition - Calling pthread_cond_wait() in a while loop to
avoid spurious wakes ups - When going to sleep the pthread_cond_wait()
function implicitly releases the lock - When waking up the pthread_cond_wait() function
implicitly acquires the lock - The lock is unlocked after exiting from the loop
86pthread_cond_timed_wait()
- Waiting on a condition variable with a timeout
- int pthread_cond_timedwait(
- pthread_cond_t cond,
- pthread_mutex_t mutex,
- const struct timespec delay)
- returns 0 on success, an error code otherwise
- cond input parameter, condition
- mutex input parameter, associated mutex
- delay input parameter, timeout (same fields as
the one used for gettimeofday)
87Putting a Pthread to Sleep
- To make a Pthread thread sleep you should use the
usleep() function - include ltunistd.hgt
- int usleep(usecond_t microseconds)
- Do not use the sleep() function as it may not be
safe to use it in a multi-threaded program
88Conclusion
- Thread Synchronization is an important topic
- Theory is difficult
- Practice is difficult
- What weve presented here is the low-level view
of synchronization, the do it yourself version - But this is often whats used in practice
- There are higher-level abstractions
- e.g., Java monitors
- Provides some relief, but still fraught with
peril - e.g., Java ThreadPool abstractions
- Provides convenience
- As of today, to be a good concurrent programmer,
one needs to understand low-level concurrency
details - The future may change this unfortunate situation
- New concurrent languages
- New ways to think about concurrent programming
- Help from the hardware transaction memories