Synchronization

About This Presentation

Title:

Synchronization

Description:

And only one will be allowed to enter once thread A leaves the red zone it was in ... Thread A waits for an 'event' by doing lock(x) ... – PowerPoint PPT presentation

Number of Views:50

Avg rating:3.0/5.0

Slides: 89

Provided by: henrica

Category:

Tags: synchronization

more less

Transcript and Presenter's Notes

Title: Synchronization

1
Synchronization
2
Disclaimer

If youve taken ICS432 in Fall 2008, a large
subset of this set of lecture notes should be
very familiar
But there are a few new things
Priority Inversion
Reader-Write Locks
What OSes provide
Given how difficult and important synchronization
is, I cant imagine that going through it again
is a bad idea

3
Cooperating Processes/Threads

Cooperation is great and useful
processes and message-passing
processes and shared memory segments
threads in a single address space
all of the above
However, it must be done very carefully
all above models have their dangers
There are two main problems
Race Conditions a bug that leads the program to
gives unpredictably incorrect results
Typical with processes/threads sharing memory
Deadlocks the program blocks forever
Typical in all the above
We talk about Race Conditions in these lecture
notes
Deadlocks next set of notes

4
Race Condition

Let look at the producer-consumer example, using
a circular buffer
Which you should have read in Section 3.4.1

Producer
while (true)
while (counter BUFFER_SIZE) // wait,
buffers full
bufferin produceNewItem()
in (in 1) BUFFER_SIZE
counter
Consumer
while (true)
while (counter 0) // wait, buffers
empty
nextConsumed bufferout
out (out 1) BUFFER_SIZE
counter--

Initially counter 0 in 0 out
0
5
Race Condition Example

Lets look at the code in race_condition_example.c
Lets run it and see if we can observe the race
condition
Terminology there is a race condition
The program is buggy
The question is whether the bug will manifest
itself
The bug we see is called a lost update

6
Why Race Condition?

Race conditions occur because of concurrency of
threads/processes
Two kinds of concurrency

core 1
core 2
false concurrency within a core illusion of
concurrency provided by the OS) (e.g. green and
blue task) true concurrency across cores
(e.g., green and yellow task)
7
True/False Concurrency

The programmer shouldnt have to care/know
whether concurrency will be true or false
Typically, the programmer doesnt know on which
computer the program will run!
A concurrent program with 10 tasks should work on
a single-core processor, a quad-core processor, a
32-core processor, etc.
However, better performance with true concurrency
false concurrency is still very useful, e.g., for
interactivity
Weve talked about true concurrency across cores,
but there could be true concurrency between any
two hardware resources
e.g., between the network card and the core
e.g., between the disk and the network card

8
Why Race Conditions?

Race conditions can happen with false or true
concurrency
Everything else being equal, one could argue that
theyre most statistically likely to manifest
themselves with true concurrency
Lets explain how they can occur with false
concurrency first
Consider a single-core running a 2-threaded
process, with a thread doing count and the
other count--
These statements are in a high-level language
But we know that the compiler translates them
into machine code, which we like to look at
written as assembly code
On a Load/Store architecture (RISC), the code
would then be

Thread 1 load R1, _at_ inc R1 store _at_, R1
Thread 2 load R1, _at_ dec R1 store _at_, R1
9
Why Race Conditions?

Illusion of concurrency the OS context-switches
threads rapidly
Interrupt, save state (stack, register values,
...), restart
Three possible execution paths

load R1, _at_ inc R1 load R1, _at_ dec R1 store _at_,
R1 store _at_, R1
load R1, _at_ inc R1 load R1, _at_ dec R1 store _at_,
R1 store _at_, R1
load R1, _at_ load R1, _at_ dec R1 inc R1 store _at_,
R1 store _at_, R1
Important R1 is not the same as R1 They are both
register values into logical register sets (i.e.,
inside a data structure in the OS)
10
Why Race Conditions?
Lets assume that initially _at_ 5
load R1, _at_ // R1 5 load R1, _at_ // R1
5 dec R1 // R1 4 inc R1
// R1 6 store _at_, R1 // _at_ 4 store _at_,
R1 // _at_ 6
load R1, _at_ // R1 5 inc R1 // R1
6 load R1, _at_ // R1 5 dec R1 //
R1 4 store _at_, R1 // _at_ 4 store _at_, R1
// _at_ 6
load R1, _at_ // R1 5 inc R1 // R1
6 load R1, _at_ // R1 5 dec R1
// R1 4 store _at_, R1 // _at_ 6 store _at_,
R1 // _at_ 4
We would expect _at_ to be 5 at the end But we can
get 4, or 6
11
Why Race Conditions?

What happens in the case of true concurrency?
Basically the same thing
Each thread could be running on its core
But it still has its own register set
In this case a different physical register set as
opposed to a different logical register set
Its just that some instructions can actually be
done really concurrently
Note that each core may have its own cache
But in this case there is cache coherency
hardware
No Register coherency hardware possible because
registers are used in completely arbitrary ways
and there is no notion of R1 having the same
value across cores
Lets see an example

12
Why Race Conditions?

Perfectly synchronized?
When two processors issue a memory request
(load/store), then one of them gets to it first
(could be random, deterministic)

load R1, _at_ inc R1 store _at_, R1
load R1, _at_ dec R1 store _at_, R1
load R1, _at_ inc R1 store _at_, R1
load R1, _at_ dec R1 store _at_, R1
(_at_ 4 if we started with _at_ 5)
13
Why do we Hate Race Conditions?

A code may be working fine a million times, and
then fail one day
And then it takes you one more million times to
reproduce the bug
If you modify the code (e.g., adding a few printf
statements), or if you run in debugging mode,
this could completely impact the race condition
behavior
hide it, or highlight it
If you write code, run it, and it works, you
dont really know whether youve written a
bug-free program
Typically true, but exacerbated with race
conditions
We hate them because we hate nondeterministic
bugs
and we hate bugs to begin with
So what do we do?

14
Critical Section

We want a critical section a section of the code
in which only one thread can be at a time
It doesnt have to be a contiguous section of
code
In the example here, we have a 3-zone critical
section
If thread A is already in one of the red zones,
then all other threads are blocked before being
allowed to enter any red zone
And only one will be allowed to enter once thread
A leaves the red zone it was in

15
Critical Section

We can have multiple critical sections
One 3-zone red critical section
One 2-zone green critical section

16
Critical Section

More formally, we want three properties of
critical sections
Mutual exclusion if thread P is in the critical
section, then no other thread can be in it
Progress if thread P wants to enter in a
critical section it will enter it eventually
Bounded waiting once thread P has declared
intent to enter the critical section, there is a
bound on the number of threads that can enter the
critical section before P
Note that there is no assumption regarding the
relative speed of the involved threads
But no thread has speed zero

17
Critical Section Code

Producer
while (true)
while (counter BUFFER_SIZE) // wait,
buffers full
bufferin produceNewItem()
in (in 1) BUFFER_SIZE
enter_critical_section()
counter
exit_critical_section()
Consumer
while (true)
while (counter 0) // wait, buffers
empty
nextConsumed bufferout
out (out 1) BUFFER_SIZE
enter_critical_section()
counter--
leave_critical_section()

18
Critical Section

A Critical Section corresponds to sections of
code (i.e., the text segment)
It doesnt correspond to data (i.e., variables)
Even though the section of code is typically one
that modifies a particular variables
When we say we need to protect variable x
against race conditions it means we need to
look at the entire code, see where x is modified,
and put all those places in the SAME critical
section
If software engineering is well-done,
modification of a single variable doesnt happen
all over the code
Its a misconception that critical sections are
attached to variables. They are attached to code.

19
Critical Sections and the Kernel

On modern OSes, multiple threads can be in the
kernel
User threads that are doing a system call and are
in kernel mode
Threads started by the kernel itself to do useful
kernel things
Therefore, the kernel is subject to race
conditions
Weve seen that kernel debugging is hard, that
race condition debugging is hard, so we dont
want race conditions in the kernel
Example the kernel maintains many data
structures
e.g., the list of open files
The list must be updated each time a files
opened or closed
This is very much like the counter / counter--
example
e.g., the list of memory allocations
e.g., the list of processes
e.g., the list of interrupt handlers
The Kernel developer must ensure that no race
conditions exist

20
Preemptive vs. Non-Preemptive

A preemptive kernel allows a thread executing
kernel code (in kernel mode) to be preempted
A non-preemptive kernel doesnt
The thread runs until is willingly exists kernel
mode (or yields control of the CPU)
Non-preemptive kernels are simple
There is no race condition
Preemptive kernels are more complex
There are race conditions
Preemptive kernels are more powerful
better for real-time programming as a real-time
thread can preempt a thread running in kernel
mode
should be more responsive for the same reason
Most modern kernels are preemptive

21
Synchronization Implementation

What we need is a way to implement
enter_critical_section() and leave_critical_sectio
n()
There are some software solutions
They can be very complicated
Theyre not guaranteed to work on modern
architecture
See Section 6.3 in the book if interested
What we need is help from the hardware
One option disabling interrupts?
Problems
If you allow whatever user process to disable
interrupts, what tells you it will enable them
afterwards?
What if interrupts are needed for other purposes,
such as a bunch of timers?
Disabling interrupts across multiple processor
cores take time and entering a critical section
would be very costly
Conclusion although inside the kernel one could
disable interrupts for specific purposes, one
cannot use this mechanism in general

22
Atomic Instructions

Modern processors offer atomic instruction
the instruction is uninterruptible from the
moment it is issued to the moment it completes
Test Set instruction, which would correspond to
the following code
boolean TestAndSet (boolean target)
boolean rv target
target TRUE
return rv
Lets see how we can implement critical sections
with TestAndSet
Using it in our pseudo-code as if it were an
uninterruptible function, when its really an
uninterruptible instruction
Note that the book also talks about a Swap
instruction, which is equivalent

23
Locks with TestAndSets

We declare a boolean variable lock
Shared by all threads, initialized to false
Pseudo-code
while (TestAndSet(lock))
// Do nothing
// Critical Section here
lock FALSE

boolean TestAndSet (boolean target)
boolean rv target target
TRUE return rv
24
Synchronization Abstractions

A number of abstractions have been defined to
allow programs to use synchronization without
using things like TestAndSet directly
well see that these abstractions must be
provided by the Kernel because they require some
kernel things to happen
Ill use a slightly different order to present
things here when compared to the book, but the
material is the same

25
The Lock Abstraction

Based on TestAndSet, its easy to implement a
lock abstraction
typedef char lock_t
void lock(lock_t lock)
while(TestAndSet(lock))
return
void unlock(lock_t lock)
lock FALSE

26
The Lock Abstraction

The abstraction is easily used
lock_t mutex // A lock used for mutual
// exclusion is often
// called a mutex
. . .
. . .
lock(mutex)
insert(linked_list, element) // CS
unlock(mutex)

27
Locks for Communication?

So far weve seen the use of locks for mutual
exclusion
But its tempting to do more advance
synchronization
Thread A waits for an event by doing lock(x)
Thread B signals the event by doing unlock(x)
Example
We need to display a short movie while loading a
file
Thread A displays the movie
Thread B loads the file
They both start at the same time
When the movie ends, if the file isnt already
there, thread A needs to wait for the file to be
loaded lock(x)
Then thread Bs done loading the file, it tells
thread A the file is loaded now unlock(x)
Any problem with this???

28
Spin Locks

The lock abstraction weve developed has the
thread do busy waiting
void lock(lock_t lock)
while(TestAndSet(lock)) // busy
return
lock() burns CPU cycles
slows down other threads/processes
generates heat
In general, busy waiting is frowned upon
It is very tempting, unfortunately
Exponential back-off solutions are not clean
and striking the good compromise isnt easy at
all
Our lock abstraction is called a spin lock,
because it has the thread spin until it can get
through

29
Are Spin Locks Evil?

Spin Locks can be very useful for (short)
critical sections
Burn only a few cycles, but provide super fast
response time
They do not involve the kernel
In fact, spin locks are used inside the kernel
and several OSes provide the abstraction to users
Typically another type of lock is also provided,
and users can choose
How can we have a lock that doesnt spin?

30
How Not To Spin?

The alternative to spinning is to have the OS
block the thread
The OS can always enforce that any thread be in
any sate, and in this case BLOCKED
The OS simply needs to keep track of threads
blocked due to some synchronization operation
When necessary, the OS can simply remove a
blocked thread from that list and put it in the
READY state
This is much more heavier than spin locks in
terms of OS-involvement, but much lighter in
terms of CPU cycle consumption
e.g., spinning may still be a good idea if one
knows that the spinning time will be very short
Not spinning for a x critical section is
probably a bad idea
Nowadays, due the heavy multi-threading due to
many-core architecture, the overhead of lock()
and unlock() is a big concern

31
No-Spin Locks for Communication?

One can provide the lock abstraction with locks
that do not spin, where lock() and unlock() are
system calls to do kernel things
typedef char lock_t
void lock(lock_t lock)
ltget blocked and get put in the waiting
list for lockgt
return
void unlock(lock_t lock)
ltunblock the first thread in the
waiting list for lockgt

32
Problem w/ Locks and Communication

Subtle issues with using (non-spin) locks for
communication
Lets look at the producer/consumer problem
The code we showed on Slide 3 had the threads do
busy waiting
The producer waits for the buffer to be non-full
The consumer waits for the buffer to be non-empty
We want to avoid busy waiting, and get notified
by a lock
Since we now have non-spin locks to our avail
We want to implement strict producer/consumer
A consumer never attempts to consume from an
empty buffer
A produced never attempts to produce into a full
buffer
This strictness is desirable in many contexts
Lets look at that code again, but using a
buffer abstract data type
buffer.consume() consumes an item
buffer.produce() produces an item
buffer.size() returns the number of items
The buffer size is bounded above by SIZE

33
Prod/Cons Revisited

Producer
while (true)
while (buffer.size() SIZE) // busy wait
buffer.produce(generateItem())
Consumer
while (true)
while (buffer.size() 0) // busy wait
Item item buffer.consume()
First, we need to add a mutex lock to protect
the buffer
Internally, buffer.produce() and buffer.consume()
have race conditions, e.g., on the pointer to the
last element in the buffer
Equivalent to some counter/counter-- race
condition

34
Prod/Cons Revisited

Producer
while (true)
while (buffer.size() SIZE) // busy wait
lock(mutex) // could be a spin-lock
buffer.produce(generateItem())
unlock(mutex)
Consumer
while (true)
while (buffer.size() 0) // busy wait
lock(mutex) // could be a spin-lock
Item item buffer.consume()
unlock(mutex)

Now lets add a locks for communication
instead of the busy wait
35
Prod/Cons Revisited

Producer
while (true)
if (buffer.size() SIZE)
lock(not_full)
lock(mutex)
buffer.produce(generateItem())
unlock(mutex)
unlock(not_empty)
Consumer
while (true)
if (buffer.size() 0)
lock(not_empty)
lock(mutex)
Item item buffer.consume()
unlock(mutex)
unlock(not_full)

This assumes we can do an unlock() on a lock
thats not locked
e.g., the producer starts way before the consumer
and puts three items in
It calls unlock(not_empty) three times, but the
consumer has yet to acquire the not_empty lock
once
This is really easy to implement in practice
This assumes that initially
not_full is not locked
not_empty is locked
Note that we must do the lock(not_full) or
lock(not_empty) only if necessary
e.g., if there is already an item in the buffer,
then the consumer should not attempt
lock(not_empty), but instead directly go take the
item

36
Prod/Cons Revisited

Producer
while (true)
if (buffer.size() SIZE)
lock(not_full)
lock(mutex)
buffer.produce(generateItem())
unlock(mutex)
unlock(not_empty)
Consumer
while (true)
if (buffer.size() 0)
lock(not_empty)
lock(mutex)
Item item buffer.consume()
unlock(mutex)
unlock(not_full)

This code works fine for one producer and one
consumer
Mutual exclusion via the mutex (spin) lock
Communication via the not_full and not_empty
locks
However, there is a problem if we have more than
two threads
Lets assume we have two consumers
What bad behavior could happen here?

37
Prod/Cons Revisited

One bad sequence of events
Producer produces an element in the buffer and
then goes to sleep for 10 hours (because of some
action not shown in the pseudo-code)
Consumer 1 tests if buffer.size() 0, and sees
that it isnt
Consumer 1 calls lock(mutex) and enters the
Critical Section
Consumer 1 is interrupted by the OS and Consumer
2 starts
Consumer 2 tests if buffer.size() 0, and sees
that it isnt!
Consumer 2 calls lock(mutex), but Consumer 1
has the lock, so Consumer 2 is blocked
Consumer 1 proceeds the consume the item,
releases mutex, releases the not_full lock and is
interrupted by the OS
Consumer 2 acquires the mutex lock, and consumes
an item from an empty buffer!

Producer
while (true)
if (buffer.size() SIZE)
lock(not_full)
lock(mutex)
buffer.produce(generateItem())
unlock(mutex)
unlock(not_empty)
Consumer
while (true)
if (buffer.size() 0)
lock(not_empty)
lock(mutex)
Item item buffer.consume()
unlock(mutex)
unlock(not_full)

38
Prod/Cons Revisited

The problem race condition on the buffer.size()
test
What do we do against care conditions?
We use mutex locks to create critical sections!
So lets do that

Producer
while (true)
if (buffer.size() SIZE)
lock(not_full)
lock(mutex)
buffer.produce(generateItem())
unlock(mutex)
unlock(not_empty)
Consumer
while (true)
if (buffer.size() 0)
lock(not_empty)
lock(mutex)
Item item buffer.consume()
unlock(mutex)
unlock(not_full)

39
Prod/Cons Revisited

Producer
while (true)
lock(mutex2)
if (buffer.size() SIZE)
lock(not_full)
unlock(mutex2)
lock(mutex)
buffer.produce(generateItem())
unlock(mutex)
unlock(not_empty)
Consumer
while (true)
lock(mutex2)
if (buffer.size() 0)
lock(not_empty)
unlock(mutex2)
lock(mutex)
Item item buffer.consume()

Great, but now we have a new problem!
Anybody sees what it is?

40
Prod/Cons Revisited

Producer
while (true)
lock(mutex2)
if (buffer.size() SIZE)
lock(not_full)
unlock(mutex2)
lock(mutex)
buffer.produce(generateItem())
unlock(mutex)
unlock(not_empty)
Consumer
while (true)
lock(mutex2)
if (buffer.size() 0)
lock(not_empty)
unlock(mutex2)
lock(mutex)
Item item buffer.consume()

The Deadlock
The Consumer starts before the Producer
The Consumer acquires the mutex2 lock
The buffers empty, to the Comsumer blocks while
attempting to acquire the not_empty lock
which is locked initially
The Producer starts and attempts to acquire
mutex2
But mutex2 is locked by the Consumer!
Result both Producer and Consumer are blocked
and the program simple sits there
Classic problem
Thread 1 acquires lock A, and then blocks while
attempting to acquire lock B
Thread 2 acquires lock B, and then blocks while
attempting to acquire lock A
Well talk more about deadlocks

41
Were Stuck

If we dont protect the buffer.size() test, then
we have a race condition
If we do protect it, then we have a deadlock
We can live with neither solution!
This means that communication with locks is
perhaps not a good idea
Even if we can do non-spin locks
Using the same abstraction for critical sections
and for communication may be asking too much?
How about a separate abstraction for
communication?
This abstraction is called condition variables

42
The Cond. Variable Abstraction

What we need is a way for a thread to block
waiting for an event WITHOUT holding a lock
General rule dont go to sleep while youre
holding a resource that could let a bunch of
people do useful work
e.g., dont go to sleep locked up in your dorm
room while holding the only key to the laundry
room
The solution is to create a new abstraction that
knows how to deal with a thread that holds a
lock
This abstraction is called a condition variable
It provides two mechanisms
wait() Blocks, waiting for an event
signal() Unblock a thread waiting for the event
i.e., tell the OS that that thread is runnable
again
Does not mean that the thread calling signal()
relinquishes control (at least not right away)
It is combined with a (mutex) lock
wait(cond, mutex)
signal(cond)

43
Cond. Variable and Mutex

Wait is saying Ok, Ill block and release the
lock so that somebody can use it. But as soon as
I wake up Im re-acquiring the lock to keep doing
the critical section stuff I wanted to do in the
first place
Safe because while a thread sleeps, its not
doing anything at all
Although before going to sleep, the thread should
make sure that it doesnt leave the program in an
inconsistent state
e.g., if program state is safely updated only
using a sequence of two calls, wait() between the
two calls is not a good idea!
Pseudo-code
void wait(cond_t cond, lock_t mutex)
unlock(mutex)
ltask the OS to block me and to unblock me
when the event cond is signaledgt
lock(mutex)

44
Prod/Cons With Cond. Variable

Producer
while (true)
lock(mutex)
if (buffer.size() SIZE)
wait(not_full, mutex)
buffer.produce(generateItem())
unlock(mutex)
signal(not_empty)

Consumer
while (true)
lock(mutex)
if (buffer.size() 0)
wait(not_empty, mutex)
Item item buffer.consume()
unlock(mutex)
signal(not_full)

Note that we now use only one mutex
Basically the whole code is a critical section,
but threads go to sleep while releasing the lock
We could have used two, but its equivalent and
much more verbose
Current view of the world
(Spin) Locks for mutual exclusion
Cond. Variables for communication

45
Prod/Cons With Cond. Variable

Producer
while (true)
lock(mutex)
if (buffer.size() SIZE)
wait(not_full, mutex)
buffer.produce(generateItem())
unlock(mutex)
signal(not_empty)

Consumer
while (true)
lock(mutex)
if (buffer.size() 0)
wait(not_empty, mutex)
Item item buffer.consume()
unlock(mutex)
signal(not_full)

There is STILL a subtle problem with the code
above
Anybody sees what it is?
We could still have a read on an empty buffer!

46
Prod/Cons With Cond. Variable

Producer
while (true)
lock(mutex)
if (buffer.size() SIZE)
wait(not_full, mutex)
buffer.produce(generateItem())
unlock(mutex)
signal(not_empty)

Bad sequence of operations
Consumer 1 starts, sees the buffer as empty, and
waits on not_empty
The Producer puts an item in the buffer (which is
not full), but gets interrupted right before it
calls signal(not_empty)
Consumer 2 starts, sees that the buffer isnt
empty, happily consumes the item
The OS resumes the Producer. The producer moves
on to execute signal(not_empty)
Consumer 1 then wakes up and moves on to
consume from an empty buffer!
This is called a spurious wake-up
A Thread is re-awakened but the condition
corresponding to the awaited even is no longer
true!
There is a simple solution...

Consumer
while (true)
lock(mutex)
if (buffer.size() 0)
wait(not_empty, mutex)
Item item buffer.consume()
unlock(mutex)
signal(not_full)

47
Prod/Cons With Cond. Variable

Producer
while (true)
lock(mutex)
while (buffer.size() SIZE)
wait(not_full, mutex)
buffer.produce(generateItem())
unlock(mutex)
signal(not_empty)

A while instead of an if solves the spurious
wake-up problem
Basically, a thread waiting on a condition
shouldnt trust that once its awakened the
condition will be true
With a while, the thread just keeps rechecking
that the condition is really true after it wakes
up
Using a while loop around tests for conditions
combined with wait() is something you should
always do
Unless youre doing something super clever
perhaps...

Consumer
while (true)
lock(mutex)
while (buffer.size() 0)
wait(not_empty, mutex)
Item item buffer.consume()
unlock(mutex)
signal(not_full)

48
The Semaphore Abstraction

A Semaphore is an abstraction that provides a
unified signaling mechanism both for mutual
exclusion and communication
It can lead to very clean solutions that remove
the need for counters and other boolean variable
tests throughout the code
History
Proposed in 1968 by Dijkstra
Inspired by railroad semaphores
Up/Down, Red/Green

49
Semaphores

A semaphore S is an integer variable thats
Initialized to some value
May have a max value specified
Can be accessed by a wait() operation
Originally called P() Proberen in Dutch (to
test)
Can be accessed by a signal() operation
Originally called V() Verhogen in Dutch (to
increment)
The book uses the wait() and signal(), but Ill
use P() and V()
wait() and signal() are confusing with condition
variables, although of course theyre connected
Cool systems people I know always use P() and
V() when writing pseudo-code in speech or writing
Its shorter to type
P() Wait (block) until the semaphore is
non-zero, and then decrement it
V() increments the semaphore
Both P() and V() are atomic
Can be implemented directly with TestSet-like
instructions

50
Semaphores, Locks, Cond. Var

Semaphores and (Locks Cond Var) are equivalent
in power
Everything you can do with one, you can do with
the other
Its a matter of preference
Some solutions look great with semaphores
Some solutions look great with locks cond var
Many developers are biased towards one or the
other, based on what theyre more comfortable
with
Classical problems
Implement Semaphores with Locks Cond Variables
Implement Locks Cond. Variables with Semaphores
Lets do the first one, which is easy
The second one is actually tricky

51
Semaphores with Locks Cond
void sem_init(sem_t sem, int value, int max)
sem.value value sem.max
max mutex_init(sem.mutex) cond_init(sem.cond)

typedef struct
int value
int max
lock_t mutex
cond_t cond
sem_t

void V( sem_t sem) lock(sem.mutex) if
(sem.value lt sem.max) sem.value unlock(sem.m
utex) signal(sem.cond)
void P( sem_t sem) lock(sem.mutex) while
(!sem.value) wait(sem.cond,
sem.mutex) sem.value-- unlock(sem.mutex)
52
Prod/Cons with Semaphores

Producer
while (true)
P(not_full)
P(mutex)
buffer.produce(generateItem())
V(mutex)
V(not_empty)

Initial Values
not_full SIZE
Max-value SIZE
not_empty 0
Max-value SIZE
mutex 1
Max-value 1

Note that the semaphores do the counting of the
elements
No longer any need for boolean tests, etc.
This is an example that shows the expressive
power of semaphores
Naming the semaphores appropriately makes the
code readable

Consumer
while (true)
P(not_empty)
P(mutex)
Item item buffer.consume()
V(mutex)
V(not_full)

53
Semaphores and Spinning

The textbook starts with semaphores
I started with locks and condition variables
In the description of semaphores, the textbook
talks about spinning
And yes, P() can spin
Typically though, the common assumption is that
spinning is done only with spin locks and that
Semaphores do blocking
For maximum efficiency
Spin locks used to protect, e.g., counter
Semaphores used to protect, e.g.,
processHTTPRequest()
In code presented in these slides, I dont mix
the two, but good code should use the right
abstraction for the right things
e.g., when I write P(mutex), in would probably be
more efficient with spin_lock(mutex)

54
Simple Deadlocks

Deadlocks with Locks
Deadlocks with Semaphores

lock(B) lock(A) . . . unlock(B) unlock(A)
lock(A) lock(B) . . . unlock(A) unlock(B)
P(A) P(B) . . . V(A) V(B)
P(B) P(A) . . . V(B) V(A)
55
Classical Synchronization Pbs

Synchronization is such a difficult topic that a
number of standard problems have been defined
Some are very relevant to real programs
Producer/Consumer
Reader/Writer
. . .
Some are more out there but use ideas relevant in
real programs
Dining Philosophers
Barber shop
. . .
I am going to describe Dining Philosophers
without going into details
It should be part of your culture
Read the textbooks section
Take ICS432 is you want to know it inside out
Well also describe the Reader/Writer

56
Dining Philosophers

5 philosophers sit at a table with 5 plates and 5
forks
Each philosopher does two things
void philosopher()
ltthinkgt
pickupForks()
lteatgt
putdownForks()
To eat, a philosopher needs two forks
Questions
how to implement pickupForks()
how to implement putdownForks()
Goals
No race conditions (of course)
No deadlocks
No starvation
Fair eating
There are many solutions, and good ones are very
complicated

57
Readers/Writers

We have a database (DB) accessed by two kinds of
threads
Readers read records from the DB
Writers write records into the DB
Either one or more readers access the DB at the
same time, or one single writer does
Lets look at a few solutions
Using Semaphores

58
A Naïve Solution
sem_t rw 1
void writer() while(true) P(rw)
ltwrite to the DBgt V(rw)
void reader() while(true) P(rw)
ltread from the DBgt V(rw)

But no concurrent accesses to the DB
Super safe, complete mutual exclusion
Would be very inefficient with many readers and
few writers

59
Reader-Preferred Solution
void reader() while(true) P(mutex)
if (nr 0) P(rw) // I am first
nr V(mutex) ltread from the DBgt
P(mutex) nr-- if (nr 0) V(rw) //
I am last V(mutex)
semaphore_t mutex 1 semaphore_t rw 1 int nr
0
void writer() while(true) P(rw)
ltwrite to the DBgt V(rw)
60
Reader-Preferred Solution

The problem of the reader-preferred solution is
that it is too reader-preferred
An endless stream of readers will preclude all
writes to the DB
Turns out its very difficult to modify the code
to make it fair between readers and writers
There is a classic solution that uses
synchronization and the passing the baton
technique
Based on a invariant condition and subtle
signaling
Many intricate solutions presented on-line
Lets look at a simple but pretty good solution

61
Maximum number of readers

Defining a maximum number of allowed concurrent
readers simplifies the problem!
And most likely makes sense for most applications
Lets say we allow at most N concurrent active
readers
Then we can create a resource semaphore with
initial value N
Each reader needs to acquire one resource to be
able to read
Therefore, N concurrent readers are allowed
Each writer needs to acquire N resources to be
able to write
Therefore, only one writer can be executing at a
time and no readers can be executing concurrently
Analogy N tokens on a table
A reader needs one
A writer needs all N
Keeps accumulating until it has them
They are all in line to grab tokens, and when
they grab one, they go back to the end of the
line
This line is actually implemented in the OS for
semaphores, cond vars, etc.
Lets look at the code

62
Max Readers Solution Attempt
semaphore_t sem N
void writer() while(true) for (i0
iltN i) P(sem) ltwrite to the DBgt
for (i0 iltN i) V(sem)
void reader() while(true) P(sem)
ltread from the DBgt V(sem)
There is still a problem...
63
Reader/Writer
void writer() while(true) for (i0
iltN i) P(sem) ltwrite to the DBgt
for (i0 iltN i) V(sem)

Deadlock!
One could have two writers each start acquiring
resources concurrently
For instance
Writer 1 holds 1 resource
Writer 2 holds N-1 resources
Theyre both blocked forever
Solution Not allow two writers to execute the
for loop of P() calls concurrently
This can easily be done with mutual exclusion
That is, we need another semaphore

64
Decent Max Readers Solution
semaphore_t sem N semaphore_t wmutex 1
void writer() while(true)
P(wmutex) for (i0 iltN i)
P(sem) V(wmutex) ltwrite to the
DBgt for (i0 iltN i) V(sem)

void reader() while(true) P(sem)
ltread from the DBgt V(sem)
65
Reader-Writer Locks

The reader-writer is so popular and common that
some OSes provide reader-writer locks
The lock can be accessed in reader mode by
readers
The lock can be accessed in writer mode by
writers
This basically solves the reader-writer problem
Multiple readers are in multiple exclusion with
single writers
The programmer just has to make sure that readers
and writers acquire the lock with the correct
mode
The implementation of the reader/writer problem
is only as good as the implementation of the
reader-writer lock
Reader-Writer locks are less efficient than
regular locks, because they do more work for you
Never use them if you can get back with regular
locks
Typically, a lot of effort is spent making sure
that locks are fast

66
Priority Inversion

Going back toward the OS, we have seen that
processes/threads can have different priorities
Well talk about CPU scheduling in detail in a
future lecture, but lets just say for now that a
higher priority process, if ready, always runs
before a lower priority process
Important Processes, even if their code doesnt
lead to synchronization problems, use data
structures in the kernel that are themselves
protected by, e.g., locks
Whether you see it or not, your programs do use
locks, cond vars, semaphores, etc. when they run
in kernel mode
Lets say we have three processes H gt M gt L
Resource R is currently in use by process L
Process L holds binary semaphore S
Process H requires resource R
Process H is blocked on a P(S)
But process M is running, preventing process L
from running for a long time
So process L can never get to do a V(S)
Priority Inversion Process M runs, and runs,
while process H is stuck

67
Priority Inversion Solution

Most OSes implement a priority inheritance
mechanism
A process that accesses a resource needed by a
higher priority process inherits that process
priority temporarily
Complexifies the Kernel code quite a bit
This solves the example seen in the previous
slides
Read the Priority Inversion and the Mars
Pathfinder blurb on p. 239 of the textbook
Priority inheritance hadnt been enabled
The program was real-time, so higher-priority
processes had better run when they need to!
If priority inheritance hadnt been implemented
in the kernel of the OS, the pathfinder would
have failed

68
Monitors

Writing concurrent programs with semaphores,
locks, condition variables is very error prone
Typically, either youre implementing a version
of one of the well-known problems, or youre
introducing concurrency bugs
At least as a beginner concurrent programmer
In the late 70s, Brinch-Hansen proposed the
concept of a Monitor.
Popularized by Hoare (1974)
A monitor is really an abstract data type
representing a shared resource
e.g., a class/object
It is a construct of a programming language
Java implements monitors

69
Monitors

There is nothing magical here, we still need the
two basic functionalities of mutual exclusion and
waiting/signaling
Monitors have the same power as other
synchronization abstractions such as locks and
condition variables
But monitors constrain several aspects
Condition variables are not visible outside the
monitor
They are hidden/encapsulated
One interacts with them via special monitor
operations
Mutual exclusion is implicit
Monitor operations execute by definition in
mutual exclusion
These apparently innocuous properties make
writing concurrent code less error-prone
The programmer shouldnt have to deal with where
P() and V() should be placed
Ill let you read the textbooks section for more
information
We wont use monitors and Kernel code doesnt use
monitors

70
Synchronization in Solaris

Solaris provides
adaptive mutexes
condition variables
semaphores
reader-writer locks
turnstiles
Adaptive mutexes
looks at the state of the system and decides
whether to spin or to block
e.g., if the lock is currently being held by a
thread thats blocked, forget spinning
No matter what, long critical sections should be
protected by semaphores or cond. variables so
that one is certain that there will be no spinning

71
Synchronization in Solaris

Turnstiles
Queues containing threads waiting for locks
One turnstile per synchronization object
Turnstiles provide the abstraction through which
priority inheritance is implemented
Almost all these mechanisms are available inside
and outside the Kernel
The exception
Priority-inheritance happens only in the Kernel
User-level programs, if dealing with priorities,
have to deal with them creatively
e.g., implementing their own turnstiles

72
Synchronization in Win XP

The Kernel uses spin locks for protection within
the Kernel
Or interrupt-disabling on single-processor
systems
It ensures that a (kernel) thread holding a spin
lock is never preempted
For user-programs, XP provides dispatcher objects
mutex locks
semaphores
event (a.k.a. condition variables)
timers (sends a signal() after a lapse of time)

73
Synchronization in Linux

Locking in the Kernel spin locks and semaphores
Spin locks protect only short code sections
On single-proc machines, disables kernel
preemption
Which is allowed only if the current thread does
not hold any locks (the kernel counts locks held
per thread)
(Non-spin) Semaphores used for longer sections of
code
Pthreads
(non spin) mutex locks
spin locks
condition variables
read-write locks
semaphores

74
Mutual Exclusion and Pthreads

Pthreads provide a simple mutual exclusion lock
Lock creation
int pthread_mutex_init(
pthread_mutex_t mutex, const
pthread_mutexattr_t attr)
returns 0 on success, an error code otherwise
mutex output parameter, lock
attr input, lock attributes
NULL default
There are functions to set the attribute (look at
the man pages if youre interested)

75
Pthread Locking

Locking a lock
If the lock is already locked, then the calling
thread is blocked
If the lock is not locked, then the calling
thread acquires it
int pthread_mutex_lock(
pthread_mutex_t mutex)
returns 0 on success, an error code otherwise
mutex input parameter, lock

76
Pthread Locking

Just checking
Returns instead of locking
int pthread_mutex_trylock(
pthread_mutex_t mutex)
returns 0 on success, EBUSY if the lock is
locked, an error code otherwise
mutex input parameter, lock

77
Synchronizing pthreads

Releasing a lock
int pthread_mutex_unlock(
pthread_mutex_t mutex)
returns 0 on success, an error code otherwise
mutex input parameter, lock
Pthreads implement exactly the concept of locks
as it was described in the previous lecture notes

78
Cleaning up memory

Releasing memory for a mutex attribute
int pthread_mutex_destroy(
pthread_mutex_t mutex)
Releasing memory for a mutex
int pthread_mutexattr_destroy(
pthread_mutexattr_t mutex)

79
Pthread Spin Locks

There is a pthread_spin_t type, which implements
spin locks
Used just like pthread_mutex_t

80
Cond. Variables and Semaphores

Condition variables are of the type
pthread_cond_t
They are used in conjunction with mutex locks
Semaphores are provided as a separate POSIX
standard
sem_init
sem_wait
sem_post
sem_getvalue
...

81
pthread_cond_init()

Creating a condition variable
int pthread_cond_init(
pthread_cond_t cond,
const pthread_condattr_t attr)
returns 0 on success, an error code otherwise
cond output parameter, condition
attr input parameter, attributes (default
NULL)

82
pthread_cond_wait()

Waiting on a condition variable
int pthread_cond_wait(
pthread_cond_t cond,
pthread_mutex_t mutex)
returns 0 on success, an error code otherwise
cond input parameter, condition
mutex input parameter, associated mutex

83
pthread_cond_signal()

Signaling a condition variable
int pthread_cond_signal(
pthread_cond_t cond
returns 0 on success, an error code otherwise
cond input parameter, condition
Wakes up one thread out of the possibly many
threads waiting for the condition
The thread is chosen non-deterministically

84
pthread_cond_broadcast()

Signaling a condition variable
int pthread_cond_broadcast(
pthread_cond_t cond
returns 0 on success, an error code otherwise
cond input parameter, condition
Wakes up ALL threads waiting for the condition
May be useful in some applications

85
Condition Variable example

Say I want to have multiple threads wait until a
counter reaches a maximum value and be awakened
when it happens
pthread_mutex_lock(lock)
while (count lt MAX_COUNT)
pthread_cond_wait(cond,lock)
pthread_mutex_unlock(lock)
Locking the lock so that we can read the value of
count without the possibility of a race condition
Calling pthread_cond_wait() in a while loop to
avoid spurious wakes ups
When going to sleep the pthread_cond_wait()
function implicitly releases the lock
When waking up the pthread_cond_wait() function
implicitly acquires the lock
The lock is unlocked after exiting from the loop

86
pthread_cond_timed_wait()

Waiting on a condition variable with a timeout
int pthread_cond_timedwait(
pthread_cond_t cond,
pthread_mutex_t mutex,
const struct timespec delay)
returns 0 on success, an error code otherwise
cond input parameter, condition
mutex input parameter, associated mutex
delay input parameter, timeout (same fields as
the one used for gettimeofday)

87
Putting a Pthread to Sleep

To make a Pthread thread sleep you should use the
usleep() function
include ltunistd.hgt
int usleep(usecond_t microseconds)
Do not use the sleep() function as it may not be
safe to use it in a multi-threaded program

88
Conclusion

Thread Synchronization is an important topic
Theory is difficult
Practice is difficult
What weve presented here is the low-level view
of synchronization, the do it yourself version
But this is often whats used in practice
There are higher-level abstractions
e.g., Java monitors
Provides some relief, but still fraught with
peril
e.g., Java ThreadPool abstractions
Provides convenience
As of today, to be a good concurrent programmer,
one needs to understand low-level concurrency
details
The future may change this unfortunate situation
New concurrent languages
New ways to think about concurrent programming
Help from the hardware transaction memories