Synchronization - PowerPoint PPT Presentation

1 / 88
About This Presentation
Title:

Synchronization

Description:

And only one will be allowed to enter once thread A leaves the red zone it was in ... Thread A waits for an 'event' by doing lock(x) ... – PowerPoint PPT presentation

Number of Views:50
Avg rating:3.0/5.0
Slides: 89
Provided by: henrica
Category:

less

Transcript and Presenter's Notes

Title: Synchronization


1
Synchronization
2
Disclaimer
  • If youve taken ICS432 in Fall 2008, a large
    subset of this set of lecture notes should be
    very familiar
  • But there are a few new things
  • Priority Inversion
  • Reader-Write Locks
  • What OSes provide
  • Given how difficult and important synchronization
    is, I cant imagine that going through it again
    is a bad idea

3
Cooperating Processes/Threads
  • Cooperation is great and useful
  • processes and message-passing
  • processes and shared memory segments
  • threads in a single address space
  • all of the above
  • However, it must be done very carefully
  • all above models have their dangers
  • There are two main problems
  • Race Conditions a bug that leads the program to
    gives unpredictably incorrect results
  • Typical with processes/threads sharing memory
  • Deadlocks the program blocks forever
  • Typical in all the above
  • We talk about Race Conditions in these lecture
    notes
  • Deadlocks next set of notes

4
Race Condition
  • Let look at the producer-consumer example, using
    a circular buffer
  • Which you should have read in Section 3.4.1
  • Producer
  • while (true)
  • while (counter BUFFER_SIZE) // wait,
    buffers full
  • bufferin produceNewItem()
  • in (in 1) BUFFER_SIZE
  • counter
  • Consumer
  • while (true)
  • while (counter 0) // wait, buffers
    empty
  • nextConsumed bufferout
  • out (out 1) BUFFER_SIZE
  • counter--

Initially counter 0 in 0 out
0
5
Race Condition Example
  • Lets look at the code in race_condition_example.c
  • Lets run it and see if we can observe the race
    condition
  • Terminology there is a race condition
  • The program is buggy
  • The question is whether the bug will manifest
    itself
  • The bug we see is called a lost update

6
Why Race Condition?
  • Race conditions occur because of concurrency of
    threads/processes
  • Two kinds of concurrency

core 1
core 2
false concurrency within a core illusion of
concurrency provided by the OS) (e.g. green and
blue task) true concurrency across cores
(e.g., green and yellow task)
7
True/False Concurrency
  • The programmer shouldnt have to care/know
    whether concurrency will be true or false
  • Typically, the programmer doesnt know on which
    computer the program will run!
  • A concurrent program with 10 tasks should work on
    a single-core processor, a quad-core processor, a
    32-core processor, etc.
  • However, better performance with true concurrency
  • false concurrency is still very useful, e.g., for
    interactivity
  • Weve talked about true concurrency across cores,
    but there could be true concurrency between any
    two hardware resources
  • e.g., between the network card and the core
  • e.g., between the disk and the network card

8
Why Race Conditions?
  • Race conditions can happen with false or true
    concurrency
  • Everything else being equal, one could argue that
    theyre most statistically likely to manifest
    themselves with true concurrency
  • Lets explain how they can occur with false
    concurrency first
  • Consider a single-core running a 2-threaded
    process, with a thread doing count and the
    other count--
  • These statements are in a high-level language
  • But we know that the compiler translates them
    into machine code, which we like to look at
    written as assembly code
  • On a Load/Store architecture (RISC), the code
    would then be

Thread 1 load R1, _at_ inc R1 store _at_, R1
Thread 2 load R1, _at_ dec R1 store _at_, R1
9
Why Race Conditions?
  • Illusion of concurrency the OS context-switches
    threads rapidly
  • Interrupt, save state (stack, register values,
    ...), restart
  • Three possible execution paths

load R1, _at_ inc R1 load R1, _at_ dec R1 store _at_,
R1 store _at_, R1
load R1, _at_ inc R1 load R1, _at_ dec R1 store _at_,
R1 store _at_, R1
load R1, _at_ load R1, _at_ dec R1 inc R1 store _at_,
R1 store _at_, R1
Important R1 is not the same as R1 They are both
register values into logical register sets (i.e.,
inside a data structure in the OS)
10
Why Race Conditions?
Lets assume that initially _at_ 5
load R1, _at_ // R1 5 load R1, _at_ // R1
5 dec R1 // R1 4 inc R1
// R1 6 store _at_, R1 // _at_ 4 store _at_,
R1 // _at_ 6
load R1, _at_ // R1 5 inc R1 // R1
6 load R1, _at_ // R1 5 dec R1 //
R1 4 store _at_, R1 // _at_ 4 store _at_, R1
// _at_ 6
load R1, _at_ // R1 5 inc R1 // R1
6 load R1, _at_ // R1 5 dec R1
// R1 4 store _at_, R1 // _at_ 6 store _at_,
R1 // _at_ 4
We would expect _at_ to be 5 at the end But we can
get 4, or 6
11
Why Race Conditions?
  • What happens in the case of true concurrency?
  • Basically the same thing
  • Each thread could be running on its core
  • But it still has its own register set
  • In this case a different physical register set as
    opposed to a different logical register set
  • Its just that some instructions can actually be
    done really concurrently
  • Note that each core may have its own cache
  • But in this case there is cache coherency
    hardware
  • No Register coherency hardware possible because
    registers are used in completely arbitrary ways
    and there is no notion of R1 having the same
    value across cores
  • Lets see an example

12
Why Race Conditions?
  • Perfectly synchronized?
  • When two processors issue a memory request
    (load/store), then one of them gets to it first
    (could be random, deterministic)

load R1, _at_ inc R1 store _at_, R1
load R1, _at_ dec R1 store _at_, R1
load R1, _at_ inc R1 store _at_, R1
load R1, _at_ dec R1 store _at_, R1
(_at_ 4 if we started with _at_ 5)
13
Why do we Hate Race Conditions?
  • A code may be working fine a million times, and
    then fail one day
  • And then it takes you one more million times to
    reproduce the bug
  • If you modify the code (e.g., adding a few printf
    statements), or if you run in debugging mode,
    this could completely impact the race condition
    behavior
  • hide it, or highlight it
  • If you write code, run it, and it works, you
    dont really know whether youve written a
    bug-free program
  • Typically true, but exacerbated with race
    conditions
  • We hate them because we hate nondeterministic
    bugs
  • and we hate bugs to begin with
  • So what do we do?

14
Critical Section
  • We want a critical section a section of the code
    in which only one thread can be at a time
  • It doesnt have to be a contiguous section of
    code
  • In the example here, we have a 3-zone critical
    section
  • If thread A is already in one of the red zones,
    then all other threads are blocked before being
    allowed to enter any red zone
  • And only one will be allowed to enter once thread
    A leaves the red zone it was in

15
Critical Section
  • We can have multiple critical sections
  • One 3-zone red critical section
  • One 2-zone green critical section

16
Critical Section
  • More formally, we want three properties of
    critical sections
  • Mutual exclusion if thread P is in the critical
    section, then no other thread can be in it
  • Progress if thread P wants to enter in a
    critical section it will enter it eventually
  • Bounded waiting once thread P has declared
    intent to enter the critical section, there is a
    bound on the number of threads that can enter the
    critical section before P
  • Note that there is no assumption regarding the
    relative speed of the involved threads
  • But no thread has speed zero

17
Critical Section Code
  • Producer
  • while (true)
  • while (counter BUFFER_SIZE) // wait,
    buffers full
  • bufferin produceNewItem()
  • in (in 1) BUFFER_SIZE
  • enter_critical_section()
  • counter
  • exit_critical_section()
  • Consumer
  • while (true)
  • while (counter 0) // wait, buffers
    empty
  • nextConsumed bufferout
  • out (out 1) BUFFER_SIZE
  • enter_critical_section()
  • counter--
  • leave_critical_section()

18
Critical Section
  • A Critical Section corresponds to sections of
    code (i.e., the text segment)
  • It doesnt correspond to data (i.e., variables)
  • Even though the section of code is typically one
    that modifies a particular variables
  • When we say we need to protect variable x
    against race conditions it means we need to
    look at the entire code, see where x is modified,
    and put all those places in the SAME critical
    section
  • If software engineering is well-done,
    modification of a single variable doesnt happen
    all over the code
  • Its a misconception that critical sections are
    attached to variables. They are attached to code.

19
Critical Sections and the Kernel
  • On modern OSes, multiple threads can be in the
    kernel
  • User threads that are doing a system call and are
    in kernel mode
  • Threads started by the kernel itself to do useful
    kernel things
  • Therefore, the kernel is subject to race
    conditions
  • Weve seen that kernel debugging is hard, that
    race condition debugging is hard, so we dont
    want race conditions in the kernel
  • Example the kernel maintains many data
    structures
  • e.g., the list of open files
  • The list must be updated each time a files
    opened or closed
  • This is very much like the counter / counter--
    example
  • e.g., the list of memory allocations
  • e.g., the list of processes
  • e.g., the list of interrupt handlers
  • The Kernel developer must ensure that no race
    conditions exist

20
Preemptive vs. Non-Preemptive
  • A preemptive kernel allows a thread executing
    kernel code (in kernel mode) to be preempted
  • A non-preemptive kernel doesnt
  • The thread runs until is willingly exists kernel
    mode (or yields control of the CPU)
  • Non-preemptive kernels are simple
  • There is no race condition
  • Preemptive kernels are more complex
  • There are race conditions
  • Preemptive kernels are more powerful
  • better for real-time programming as a real-time
    thread can preempt a thread running in kernel
    mode
  • should be more responsive for the same reason
  • Most modern kernels are preemptive

21
Synchronization Implementation
  • What we need is a way to implement
    enter_critical_section() and leave_critical_sectio
    n()
  • There are some software solutions
  • They can be very complicated
  • Theyre not guaranteed to work on modern
    architecture
  • See Section 6.3 in the book if interested
  • What we need is help from the hardware
  • One option disabling interrupts?
  • Problems
  • If you allow whatever user process to disable
    interrupts, what tells you it will enable them
    afterwards?
  • What if interrupts are needed for other purposes,
    such as a bunch of timers?
  • Disabling interrupts across multiple processor
    cores take time and entering a critical section
    would be very costly
  • Conclusion although inside the kernel one could
    disable interrupts for specific purposes, one
    cannot use this mechanism in general

22
Atomic Instructions
  • Modern processors offer atomic instruction
  • the instruction is uninterruptible from the
    moment it is issued to the moment it completes
  • Test Set instruction, which would correspond to
    the following code
  • boolean TestAndSet (boolean target)
  • boolean rv target
  • target TRUE
  • return rv
  • Lets see how we can implement critical sections
    with TestAndSet
  • Using it in our pseudo-code as if it were an
    uninterruptible function, when its really an
    uninterruptible instruction
  • Note that the book also talks about a Swap
    instruction, which is equivalent

23
Locks with TestAndSets
  • We declare a boolean variable lock
  • Shared by all threads, initialized to false
  • Pseudo-code
  • while (TestAndSet(lock))
  • // Do nothing
  • // Critical Section here
  • lock FALSE

boolean TestAndSet (boolean target)
boolean rv target target
TRUE return rv
24
Synchronization Abstractions
  • A number of abstractions have been defined to
    allow programs to use synchronization without
    using things like TestAndSet directly
  • well see that these abstractions must be
    provided by the Kernel because they require some
    kernel things to happen
  • Ill use a slightly different order to present
    things here when compared to the book, but the
    material is the same

25
The Lock Abstraction
  • Based on TestAndSet, its easy to implement a
    lock abstraction
  • typedef char lock_t
  • void lock(lock_t lock)
  • while(TestAndSet(lock))
  • return
  • void unlock(lock_t lock)
  • lock FALSE

26
The Lock Abstraction
  • The abstraction is easily used
  • lock_t mutex // A lock used for mutual
  • // exclusion is often
  • // called a mutex
  • . . .
  • . . .
  • lock(mutex)
  • insert(linked_list, element) // CS
  • unlock(mutex)

27
Locks for Communication?
  • So far weve seen the use of locks for mutual
    exclusion
  • But its tempting to do more advance
    synchronization
  • Thread A waits for an event by doing lock(x)
  • Thread B signals the event by doing unlock(x)
  • Example
  • We need to display a short movie while loading a
    file
  • Thread A displays the movie
  • Thread B loads the file
  • They both start at the same time
  • When the movie ends, if the file isnt already
    there, thread A needs to wait for the file to be
    loaded lock(x)
  • Then thread Bs done loading the file, it tells
    thread A the file is loaded now unlock(x)
  • Any problem with this???

28
Spin Locks
  • The lock abstraction weve developed has the
    thread do busy waiting
  • void lock(lock_t lock)
  • while(TestAndSet(lock)) // busy
  • return
  • lock() burns CPU cycles
  • slows down other threads/processes
  • generates heat
  • In general, busy waiting is frowned upon
  • It is very tempting, unfortunately
  • Exponential back-off solutions are not clean
    and striking the good compromise isnt easy at
    all
  • Our lock abstraction is called a spin lock,
    because it has the thread spin until it can get
    through

29
Are Spin Locks Evil?
  • Spin Locks can be very useful for (short)
    critical sections
  • Burn only a few cycles, but provide super fast
    response time
  • They do not involve the kernel
  • In fact, spin locks are used inside the kernel
    and several OSes provide the abstraction to users
  • Typically another type of lock is also provided,
    and users can choose
  • How can we have a lock that doesnt spin?

30
How Not To Spin?
  • The alternative to spinning is to have the OS
    block the thread
  • The OS can always enforce that any thread be in
    any sate, and in this case BLOCKED
  • The OS simply needs to keep track of threads
    blocked due to some synchronization operation
  • When necessary, the OS can simply remove a
    blocked thread from that list and put it in the
    READY state
  • This is much more heavier than spin locks in
    terms of OS-involvement, but much lighter in
    terms of CPU cycle consumption
  • e.g., spinning may still be a good idea if one
    knows that the spinning time will be very short
  • Not spinning for a x critical section is
    probably a bad idea
  • Nowadays, due the heavy multi-threading due to
    many-core architecture, the overhead of lock()
    and unlock() is a big concern

31
No-Spin Locks for Communication?
  • One can provide the lock abstraction with locks
    that do not spin, where lock() and unlock() are
    system calls to do kernel things
  • typedef char lock_t
  • void lock(lock_t lock)
  • ltget blocked and get put in the waiting
    list for lockgt
  • return
  • void unlock(lock_t lock)
  • ltunblock the first thread in the
  • waiting list for lockgt

32
Problem w/ Locks and Communication
  • Subtle issues with using (non-spin) locks for
    communication
  • Lets look at the producer/consumer problem
  • The code we showed on Slide 3 had the threads do
    busy waiting
  • The producer waits for the buffer to be non-full
  • The consumer waits for the buffer to be non-empty
  • We want to avoid busy waiting, and get notified
    by a lock
  • Since we now have non-spin locks to our avail
  • We want to implement strict producer/consumer
  • A consumer never attempts to consume from an
    empty buffer
  • A produced never attempts to produce into a full
    buffer
  • This strictness is desirable in many contexts
  • Lets look at that code again, but using a
    buffer abstract data type
  • buffer.consume() consumes an item
  • buffer.produce() produces an item
  • buffer.size() returns the number of items
  • The buffer size is bounded above by SIZE

33
Prod/Cons Revisited
  • Producer
  • while (true)
  • while (buffer.size() SIZE) // busy wait
  • buffer.produce(generateItem())
  • Consumer
  • while (true)
  • while (buffer.size() 0) // busy wait
  • Item item buffer.consume()
  • First, we need to add a mutex lock to protect
    the buffer
  • Internally, buffer.produce() and buffer.consume()
    have race conditions, e.g., on the pointer to the
    last element in the buffer
  • Equivalent to some counter/counter-- race
    condition

34
Prod/Cons Revisited
  • Producer
  • while (true)
  • while (buffer.size() SIZE) // busy wait
  • lock(mutex) // could be a spin-lock
  • buffer.produce(generateItem())
  • unlock(mutex)
  • Consumer
  • while (true)
  • while (buffer.size() 0) // busy wait
  • lock(mutex) // could be a spin-lock
  • Item item buffer.consume()
  • unlock(mutex)

Now lets add a locks for communication
instead of the busy wait
35
Prod/Cons Revisited
  • Producer
  • while (true)
  • if (buffer.size() SIZE)
  • lock(not_full)
  • lock(mutex)
  • buffer.produce(generateItem())
  • unlock(mutex)
  • unlock(not_empty)
  • Consumer
  • while (true)
  • if (buffer.size() 0)
  • lock(not_empty)
  • lock(mutex)
  • Item item buffer.consume()
  • unlock(mutex)
  • unlock(not_full)
  • This assumes we can do an unlock() on a lock
    thats not locked
  • e.g., the producer starts way before the consumer
    and puts three items in
  • It calls unlock(not_empty) three times, but the
    consumer has yet to acquire the not_empty lock
    once
  • This is really easy to implement in practice
  • This assumes that initially
  • not_full is not locked
  • not_empty is locked
  • Note that we must do the lock(not_full) or
    lock(not_empty) only if necessary
  • e.g., if there is already an item in the buffer,
    then the consumer should not attempt
    lock(not_empty), but instead directly go take the
    item

36
Prod/Cons Revisited
  • Producer
  • while (true)
  • if (buffer.size() SIZE)
  • lock(not_full)
  • lock(mutex)
  • buffer.produce(generateItem())
  • unlock(mutex)
  • unlock(not_empty)
  • Consumer
  • while (true)
  • if (buffer.size() 0)
  • lock(not_empty)
  • lock(mutex)
  • Item item buffer.consume()
  • unlock(mutex)
  • unlock(not_full)
  • This code works fine for one producer and one
    consumer
  • Mutual exclusion via the mutex (spin) lock
  • Communication via the not_full and not_empty
    locks
  • However, there is a problem if we have more than
    two threads
  • Lets assume we have two consumers
  • What bad behavior could happen here?

37
Prod/Cons Revisited
  • One bad sequence of events
  • Producer produces an element in the buffer and
    then goes to sleep for 10 hours (because of some
    action not shown in the pseudo-code)
  • Consumer 1 tests if buffer.size() 0, and sees
    that it isnt
  • Consumer 1 calls lock(mutex) and enters the
    Critical Section
  • Consumer 1 is interrupted by the OS and Consumer
    2 starts
  • Consumer 2 tests if buffer.size() 0, and sees
    that it isnt!
  • Consumer 2 calls lock(mutex), but Consumer 1
    has the lock, so Consumer 2 is blocked
  • Consumer 1 proceeds the consume the item,
    releases mutex, releases the not_full lock and is
    interrupted by the OS
  • Consumer 2 acquires the mutex lock, and consumes
    an item from an empty buffer!
  • Producer
  • while (true)
  • if (buffer.size() SIZE)
  • lock(not_full)
  • lock(mutex)
  • buffer.produce(generateItem())
  • unlock(mutex)
  • unlock(not_empty)
  • Consumer
  • while (true)
  • if (buffer.size() 0)
  • lock(not_empty)
  • lock(mutex)
  • Item item buffer.consume()
  • unlock(mutex)
  • unlock(not_full)

38
Prod/Cons Revisited
  • The problem race condition on the buffer.size()
    test
  • What do we do against care conditions?
  • We use mutex locks to create critical sections!
  • So lets do that
  • Producer
  • while (true)
  • if (buffer.size() SIZE)
  • lock(not_full)
  • lock(mutex)
  • buffer.produce(generateItem())
  • unlock(mutex)
  • unlock(not_empty)
  • Consumer
  • while (true)
  • if (buffer.size() 0)
  • lock(not_empty)
  • lock(mutex)
  • Item item buffer.consume()
  • unlock(mutex)
  • unlock(not_full)

39
Prod/Cons Revisited
  • Producer
  • while (true)
  • lock(mutex2)
  • if (buffer.size() SIZE)
  • lock(not_full)
  • unlock(mutex2)
  • lock(mutex)
  • buffer.produce(generateItem())
  • unlock(mutex)
  • unlock(not_empty)
  • Consumer
  • while (true)
  • lock(mutex2)
  • if (buffer.size() 0)
  • lock(not_empty)
  • unlock(mutex2)
  • lock(mutex)
  • Item item buffer.consume()
  • Great, but now we have a new problem!
  • Anybody sees what it is?

40
Prod/Cons Revisited
  • Producer
  • while (true)
  • lock(mutex2)
  • if (buffer.size() SIZE)
  • lock(not_full)
  • unlock(mutex2)
  • lock(mutex)
  • buffer.produce(generateItem())
  • unlock(mutex)
  • unlock(not_empty)
  • Consumer
  • while (true)
  • lock(mutex2)
  • if (buffer.size() 0)
  • lock(not_empty)
  • unlock(mutex2)
  • lock(mutex)
  • Item item buffer.consume()
  • The Deadlock
  • The Consumer starts before the Producer
  • The Consumer acquires the mutex2 lock
  • The buffers empty, to the Comsumer blocks while
    attempting to acquire the not_empty lock
  • which is locked initially
  • The Producer starts and attempts to acquire
    mutex2
  • But mutex2 is locked by the Consumer!
  • Result both Producer and Consumer are blocked
    and the program simple sits there
  • Classic problem
  • Thread 1 acquires lock A, and then blocks while
    attempting to acquire lock B
  • Thread 2 acquires lock B, and then blocks while
    attempting to acquire lock A
  • Well talk more about deadlocks

41
Were Stuck
  • If we dont protect the buffer.size() test, then
    we have a race condition
  • If we do protect it, then we have a deadlock
  • We can live with neither solution!
  • This means that communication with locks is
    perhaps not a good idea
  • Even if we can do non-spin locks
  • Using the same abstraction for critical sections
    and for communication may be asking too much?
  • How about a separate abstraction for
    communication?
  • This abstraction is called condition variables

42
The Cond. Variable Abstraction
  • What we need is a way for a thread to block
    waiting for an event WITHOUT holding a lock
  • General rule dont go to sleep while youre
    holding a resource that could let a bunch of
    people do useful work
  • e.g., dont go to sleep locked up in your dorm
    room while holding the only key to the laundry
    room
  • The solution is to create a new abstraction that
    knows how to deal with a thread that holds a
    lock
  • This abstraction is called a condition variable
  • It provides two mechanisms
  • wait() Blocks, waiting for an event
  • signal() Unblock a thread waiting for the event
  • i.e., tell the OS that that thread is runnable
    again
  • Does not mean that the thread calling signal()
    relinquishes control (at least not right away)
  • It is combined with a (mutex) lock
  • wait(cond, mutex)
  • signal(cond)

43
Cond. Variable and Mutex
  • Wait is saying Ok, Ill block and release the
    lock so that somebody can use it. But as soon as
    I wake up Im re-acquiring the lock to keep doing
    the critical section stuff I wanted to do in the
    first place
  • Safe because while a thread sleeps, its not
    doing anything at all
  • Although before going to sleep, the thread should
    make sure that it doesnt leave the program in an
    inconsistent state
  • e.g., if program state is safely updated only
    using a sequence of two calls, wait() between the
    two calls is not a good idea!
  • Pseudo-code
  • void wait(cond_t cond, lock_t mutex)
  • unlock(mutex)
  • ltask the OS to block me and to unblock me
    when the event cond is signaledgt
  • lock(mutex)

44
Prod/Cons With Cond. Variable
  • Producer
  • while (true)
  • lock(mutex)
  • if (buffer.size() SIZE)
  • wait(not_full, mutex)
  • buffer.produce(generateItem())
  • unlock(mutex)
  • signal(not_empty)
  • Consumer
  • while (true)
  • lock(mutex)
  • if (buffer.size() 0)
  • wait(not_empty, mutex)
  • Item item buffer.consume()
  • unlock(mutex)
  • signal(not_full)
  • Note that we now use only one mutex
  • Basically the whole code is a critical section,
    but threads go to sleep while releasing the lock
  • We could have used two, but its equivalent and
    much more verbose
  • Current view of the world
  • (Spin) Locks for mutual exclusion
  • Cond. Variables for communication

45
Prod/Cons With Cond. Variable
  • Producer
  • while (true)
  • lock(mutex)
  • if (buffer.size() SIZE)
  • wait(not_full, mutex)
  • buffer.produce(generateItem())
  • unlock(mutex)
  • signal(not_empty)
  • Consumer
  • while (true)
  • lock(mutex)
  • if (buffer.size() 0)
  • wait(not_empty, mutex)
  • Item item buffer.consume()
  • unlock(mutex)
  • signal(not_full)
  • There is STILL a subtle problem with the code
    above
  • Anybody sees what it is?
  • We could still have a read on an empty buffer!

46
Prod/Cons With Cond. Variable
  • Producer
  • while (true)
  • lock(mutex)
  • if (buffer.size() SIZE)
  • wait(not_full, mutex)
  • buffer.produce(generateItem())
  • unlock(mutex)
  • signal(not_empty)
  • Bad sequence of operations
  • Consumer 1 starts, sees the buffer as empty, and
    waits on not_empty
  • The Producer puts an item in the buffer (which is
    not full), but gets interrupted right before it
    calls signal(not_empty)
  • Consumer 2 starts, sees that the buffer isnt
    empty, happily consumes the item
  • The OS resumes the Producer. The producer moves
    on to execute signal(not_empty)
  • Consumer 1 then wakes up and moves on to
    consume from an empty buffer!
  • This is called a spurious wake-up
  • A Thread is re-awakened but the condition
    corresponding to the awaited even is no longer
    true!
  • There is a simple solution...
  • Consumer
  • while (true)
  • lock(mutex)
  • if (buffer.size() 0)
  • wait(not_empty, mutex)
  • Item item buffer.consume()
  • unlock(mutex)
  • signal(not_full)

47
Prod/Cons With Cond. Variable
  • Producer
  • while (true)
  • lock(mutex)
  • while (buffer.size() SIZE)
  • wait(not_full, mutex)
  • buffer.produce(generateItem())
  • unlock(mutex)
  • signal(not_empty)
  • A while instead of an if solves the spurious
    wake-up problem
  • Basically, a thread waiting on a condition
    shouldnt trust that once its awakened the
    condition will be true
  • With a while, the thread just keeps rechecking
    that the condition is really true after it wakes
    up
  • Using a while loop around tests for conditions
    combined with wait() is something you should
    always do
  • Unless youre doing something super clever
    perhaps...
  • Consumer
  • while (true)
  • lock(mutex)
  • while (buffer.size() 0)
  • wait(not_empty, mutex)
  • Item item buffer.consume()
  • unlock(mutex)
  • signal(not_full)

48
The Semaphore Abstraction
  • A Semaphore is an abstraction that provides a
    unified signaling mechanism both for mutual
    exclusion and communication
  • It can lead to very clean solutions that remove
    the need for counters and other boolean variable
    tests throughout the code
  • History
  • Proposed in 1968 by Dijkstra
  • Inspired by railroad semaphores
  • Up/Down, Red/Green

49
Semaphores
  • A semaphore S is an integer variable thats
  • Initialized to some value
  • May have a max value specified
  • Can be accessed by a wait() operation
  • Originally called P() Proberen in Dutch (to
    test)
  • Can be accessed by a signal() operation
  • Originally called V() Verhogen in Dutch (to
    increment)
  • The book uses the wait() and signal(), but Ill
    use P() and V()
  • wait() and signal() are confusing with condition
    variables, although of course theyre connected
  • Cool systems people I know always use P() and
    V() when writing pseudo-code in speech or writing
  • Its shorter to type
  • P() Wait (block) until the semaphore is
    non-zero, and then decrement it
  • V() increments the semaphore
  • Both P() and V() are atomic
  • Can be implemented directly with TestSet-like
    instructions

50
Semaphores, Locks, Cond. Var
  • Semaphores and (Locks Cond Var) are equivalent
    in power
  • Everything you can do with one, you can do with
    the other
  • Its a matter of preference
  • Some solutions look great with semaphores
  • Some solutions look great with locks cond var
  • Many developers are biased towards one or the
    other, based on what theyre more comfortable
    with
  • Classical problems
  • Implement Semaphores with Locks Cond Variables
  • Implement Locks Cond. Variables with Semaphores
  • Lets do the first one, which is easy
  • The second one is actually tricky

51
Semaphores with Locks Cond
void sem_init(sem_t sem, int value, int max)
sem.value value sem.max
max mutex_init(sem.mutex) cond_init(sem.cond)
  • typedef struct
  • int value
  • int max
  • lock_t mutex
  • cond_t cond
  • sem_t

void V( sem_t sem) lock(sem.mutex) if
(sem.value lt sem.max) sem.value unlock(sem.m
utex) signal(sem.cond)
void P( sem_t sem) lock(sem.mutex) while
(!sem.value) wait(sem.cond,
sem.mutex) sem.value-- unlock(sem.mutex)
52
Prod/Cons with Semaphores
  • Producer
  • while (true)
  • P(not_full)
  • P(mutex)
  • buffer.produce(generateItem())
  • V(mutex)
  • V(not_empty)
  • Initial Values
  • not_full SIZE
  • Max-value SIZE
  • not_empty 0
  • Max-value SIZE
  • mutex 1
  • Max-value 1
  • Note that the semaphores do the counting of the
    elements
  • No longer any need for boolean tests, etc.
  • This is an example that shows the expressive
    power of semaphores
  • Naming the semaphores appropriately makes the
    code readable
  • Consumer
  • while (true)
  • P(not_empty)
  • P(mutex)
  • Item item buffer.consume()
  • V(mutex)
  • V(not_full)

53
Semaphores and Spinning
  • The textbook starts with semaphores
  • I started with locks and condition variables
  • In the description of semaphores, the textbook
    talks about spinning
  • And yes, P() can spin
  • Typically though, the common assumption is that
    spinning is done only with spin locks and that
    Semaphores do blocking
  • For maximum efficiency
  • Spin locks used to protect, e.g., counter
  • Semaphores used to protect, e.g.,
    processHTTPRequest()
  • In code presented in these slides, I dont mix
    the two, but good code should use the right
    abstraction for the right things
  • e.g., when I write P(mutex), in would probably be
    more efficient with spin_lock(mutex)

54
Simple Deadlocks
  • Deadlocks with Locks
  • Deadlocks with Semaphores

lock(B) lock(A) . . . unlock(B) unlock(A)
lock(A) lock(B) . . . unlock(A) unlock(B)
P(A) P(B) . . . V(A) V(B)
P(B) P(A) . . . V(B) V(A)
55
Classical Synchronization Pbs
  • Synchronization is such a difficult topic that a
    number of standard problems have been defined
  • Some are very relevant to real programs
  • Producer/Consumer
  • Reader/Writer
  • . . .
  • Some are more out there but use ideas relevant in
    real programs
  • Dining Philosophers
  • Barber shop
  • . . .
  • I am going to describe Dining Philosophers
    without going into details
  • It should be part of your culture
  • Read the textbooks section
  • Take ICS432 is you want to know it inside out
  • Well also describe the Reader/Writer

56
Dining Philosophers
  • 5 philosophers sit at a table with 5 plates and 5
    forks
  • Each philosopher does two things
  • void philosopher()
  • ltthinkgt
  • pickupForks()
  • lteatgt
  • putdownForks()
  • To eat, a philosopher needs two forks
  • Questions
  • how to implement pickupForks()
  • how to implement putdownForks()
  • Goals
  • No race conditions (of course)
  • No deadlocks
  • No starvation
  • Fair eating
  • There are many solutions, and good ones are very
    complicated

57
Readers/Writers
  • We have a database (DB) accessed by two kinds of
    threads
  • Readers read records from the DB
  • Writers write records into the DB
  • Either one or more readers access the DB at the
    same time, or one single writer does
  • Lets look at a few solutions
  • Using Semaphores

58
A Naïve Solution
sem_t rw 1
void writer() while(true) P(rw)
ltwrite to the DBgt V(rw)
void reader() while(true) P(rw)
ltread from the DBgt V(rw)
  • But no concurrent accesses to the DB
  • Super safe, complete mutual exclusion
  • Would be very inefficient with many readers and
    few writers

59
Reader-Preferred Solution
void reader() while(true) P(mutex)
if (nr 0) P(rw) // I am first
nr V(mutex) ltread from the DBgt
P(mutex) nr-- if (nr 0) V(rw) //
I am last V(mutex)
semaphore_t mutex 1 semaphore_t rw 1 int nr
0
void writer() while(true) P(rw)
ltwrite to the DBgt V(rw)
60
Reader-Preferred Solution
  • The problem of the reader-preferred solution is
    that it is too reader-preferred
  • An endless stream of readers will preclude all
    writes to the DB
  • Turns out its very difficult to modify the code
    to make it fair between readers and writers
  • There is a classic solution that uses
    synchronization and the passing the baton
    technique
  • Based on a invariant condition and subtle
    signaling
  • Many intricate solutions presented on-line
  • Lets look at a simple but pretty good solution

61
Maximum number of readers
  • Defining a maximum number of allowed concurrent
    readers simplifies the problem!
  • And most likely makes sense for most applications
  • Lets say we allow at most N concurrent active
    readers
  • Then we can create a resource semaphore with
    initial value N
  • Each reader needs to acquire one resource to be
    able to read
  • Therefore, N concurrent readers are allowed
  • Each writer needs to acquire N resources to be
    able to write
  • Therefore, only one writer can be executing at a
    time and no readers can be executing concurrently
  • Analogy N tokens on a table
  • A reader needs one
  • A writer needs all N
  • Keeps accumulating until it has them
  • They are all in line to grab tokens, and when
    they grab one, they go back to the end of the
    line
  • This line is actually implemented in the OS for
    semaphores, cond vars, etc.
  • Lets look at the code

62
Max Readers Solution Attempt
semaphore_t sem N
void writer() while(true) for (i0
iltN i) P(sem) ltwrite to the DBgt
for (i0 iltN i) V(sem)
void reader() while(true) P(sem)
ltread from the DBgt V(sem)
There is still a problem...
63
Reader/Writer
void writer() while(true) for (i0
iltN i) P(sem) ltwrite to the DBgt
for (i0 iltN i) V(sem)
  • Deadlock!
  • One could have two writers each start acquiring
    resources concurrently
  • For instance
  • Writer 1 holds 1 resource
  • Writer 2 holds N-1 resources
  • Theyre both blocked forever
  • Solution Not allow two writers to execute the
    for loop of P() calls concurrently
  • This can easily be done with mutual exclusion
  • That is, we need another semaphore

64
Decent Max Readers Solution
semaphore_t sem N semaphore_t wmutex 1
void writer() while(true)
P(wmutex) for (i0 iltN i)
P(sem) V(wmutex) ltwrite to the
DBgt for (i0 iltN i) V(sem)

void reader() while(true) P(sem)
ltread from the DBgt V(sem)
65
Reader-Writer Locks
  • The reader-writer is so popular and common that
    some OSes provide reader-writer locks
  • The lock can be accessed in reader mode by
    readers
  • The lock can be accessed in writer mode by
    writers
  • This basically solves the reader-writer problem
  • Multiple readers are in multiple exclusion with
    single writers
  • The programmer just has to make sure that readers
    and writers acquire the lock with the correct
    mode
  • The implementation of the reader/writer problem
    is only as good as the implementation of the
    reader-writer lock
  • Reader-Writer locks are less efficient than
    regular locks, because they do more work for you
  • Never use them if you can get back with regular
    locks
  • Typically, a lot of effort is spent making sure
    that locks are fast

66
Priority Inversion
  • Going back toward the OS, we have seen that
    processes/threads can have different priorities
  • Well talk about CPU scheduling in detail in a
    future lecture, but lets just say for now that a
    higher priority process, if ready, always runs
    before a lower priority process
  • Important Processes, even if their code doesnt
    lead to synchronization problems, use data
    structures in the kernel that are themselves
    protected by, e.g., locks
  • Whether you see it or not, your programs do use
    locks, cond vars, semaphores, etc. when they run
    in kernel mode
  • Lets say we have three processes H gt M gt L
  • Resource R is currently in use by process L
  • Process L holds binary semaphore S
  • Process H requires resource R
  • Process H is blocked on a P(S)
  • But process M is running, preventing process L
    from running for a long time
  • So process L can never get to do a V(S)
  • Priority Inversion Process M runs, and runs,
    while process H is stuck

67
Priority Inversion Solution
  • Most OSes implement a priority inheritance
    mechanism
  • A process that accesses a resource needed by a
    higher priority process inherits that process
    priority temporarily
  • Complexifies the Kernel code quite a bit
  • This solves the example seen in the previous
    slides
  • Read the Priority Inversion and the Mars
    Pathfinder blurb on p. 239 of the textbook
  • Priority inheritance hadnt been enabled
  • The program was real-time, so higher-priority
    processes had better run when they need to!
  • If priority inheritance hadnt been implemented
    in the kernel of the OS, the pathfinder would
    have failed

68
Monitors
  • Writing concurrent programs with semaphores,
    locks, condition variables is very error prone
  • Typically, either youre implementing a version
    of one of the well-known problems, or youre
    introducing concurrency bugs
  • At least as a beginner concurrent programmer
  • In the late 70s, Brinch-Hansen proposed the
    concept of a Monitor.
  • Popularized by Hoare (1974)
  • A monitor is really an abstract data type
    representing a shared resource
  • e.g., a class/object
  • It is a construct of a programming language
  • Java implements monitors

69
Monitors
  • There is nothing magical here, we still need the
    two basic functionalities of mutual exclusion and
    waiting/signaling
  • Monitors have the same power as other
    synchronization abstractions such as locks and
    condition variables
  • But monitors constrain several aspects
  • Condition variables are not visible outside the
    monitor
  • They are hidden/encapsulated
  • One interacts with them via special monitor
    operations
  • Mutual exclusion is implicit
  • Monitor operations execute by definition in
    mutual exclusion
  • These apparently innocuous properties make
    writing concurrent code less error-prone
  • The programmer shouldnt have to deal with where
    P() and V() should be placed
  • Ill let you read the textbooks section for more
    information
  • We wont use monitors and Kernel code doesnt use
    monitors

70
Synchronization in Solaris
  • Solaris provides
  • adaptive mutexes
  • condition variables
  • semaphores
  • reader-writer locks
  • turnstiles
  • Adaptive mutexes
  • looks at the state of the system and decides
    whether to spin or to block
  • e.g., if the lock is currently being held by a
    thread thats blocked, forget spinning
  • No matter what, long critical sections should be
    protected by semaphores or cond. variables so
    that one is certain that there will be no spinning

71
Synchronization in Solaris
  • Turnstiles
  • Queues containing threads waiting for locks
  • One turnstile per synchronization object
  • Turnstiles provide the abstraction through which
    priority inheritance is implemented
  • Almost all these mechanisms are available inside
    and outside the Kernel
  • The exception
  • Priority-inheritance happens only in the Kernel
  • User-level programs, if dealing with priorities,
    have to deal with them creatively
  • e.g., implementing their own turnstiles

72
Synchronization in Win XP
  • The Kernel uses spin locks for protection within
    the Kernel
  • Or interrupt-disabling on single-processor
    systems
  • It ensures that a (kernel) thread holding a spin
    lock is never preempted
  • For user-programs, XP provides dispatcher objects
  • mutex locks
  • semaphores
  • event (a.k.a. condition variables)
  • timers (sends a signal() after a lapse of time)

73
Synchronization in Linux
  • Locking in the Kernel spin locks and semaphores
  • Spin locks protect only short code sections
  • On single-proc machines, disables kernel
    preemption
  • Which is allowed only if the current thread does
    not hold any locks (the kernel counts locks held
    per thread)
  • (Non-spin) Semaphores used for longer sections of
    code
  • Pthreads
  • (non spin) mutex locks
  • spin locks
  • condition variables
  • read-write locks
  • semaphores

74
Mutual Exclusion and Pthreads
  • Pthreads provide a simple mutual exclusion lock
  • Lock creation
  • int pthread_mutex_init(
  • pthread_mutex_t mutex, const
    pthread_mutexattr_t attr)
  • returns 0 on success, an error code otherwise
  • mutex output parameter, lock
  • attr input, lock attributes
  • NULL default
  • There are functions to set the attribute (look at
    the man pages if youre interested)

75
Pthread Locking
  • Locking a lock
  • If the lock is already locked, then the calling
    thread is blocked
  • If the lock is not locked, then the calling
    thread acquires it
  • int pthread_mutex_lock(
  • pthread_mutex_t mutex)
  • returns 0 on success, an error code otherwise
  • mutex input parameter, lock

76
Pthread Locking
  • Just checking
  • Returns instead of locking
  • int pthread_mutex_trylock(
  • pthread_mutex_t mutex)
  • returns 0 on success, EBUSY if the lock is
    locked, an error code otherwise
  • mutex input parameter, lock

77
Synchronizing pthreads
  • Releasing a lock
  • int pthread_mutex_unlock(
  • pthread_mutex_t mutex)
  • returns 0 on success, an error code otherwise
  • mutex input parameter, lock
  • Pthreads implement exactly the concept of locks
    as it was described in the previous lecture notes

78
Cleaning up memory
  • Releasing memory for a mutex attribute
  • int pthread_mutex_destroy(
  • pthread_mutex_t mutex)
  • Releasing memory for a mutex
  • int pthread_mutexattr_destroy(
  • pthread_mutexattr_t mutex)

79
Pthread Spin Locks
  • There is a pthread_spin_t type, which implements
    spin locks
  • Used just like pthread_mutex_t

80
Cond. Variables and Semaphores
  • Condition variables are of the type
    pthread_cond_t
  • They are used in conjunction with mutex locks
  • Semaphores are provided as a separate POSIX
    standard
  • sem_init
  • sem_wait
  • sem_post
  • sem_getvalue
  • ...

81
pthread_cond_init()
  • Creating a condition variable
  • int pthread_cond_init(
  • pthread_cond_t cond,
  • const pthread_condattr_t attr)
  • returns 0 on success, an error code otherwise
  • cond output parameter, condition
  • attr input parameter, attributes (default
    NULL)

82
pthread_cond_wait()
  • Waiting on a condition variable
  • int pthread_cond_wait(
  • pthread_cond_t cond,
  • pthread_mutex_t mutex)
  • returns 0 on success, an error code otherwise
  • cond input parameter, condition
  • mutex input parameter, associated mutex

83
pthread_cond_signal()
  • Signaling a condition variable
  • int pthread_cond_signal(
  • pthread_cond_t cond
  • returns 0 on success, an error code otherwise
  • cond input parameter, condition
  • Wakes up one thread out of the possibly many
    threads waiting for the condition
  • The thread is chosen non-deterministically

84
pthread_cond_broadcast()
  • Signaling a condition variable
  • int pthread_cond_broadcast(
  • pthread_cond_t cond
  • returns 0 on success, an error code otherwise
  • cond input parameter, condition
  • Wakes up ALL threads waiting for the condition
  • May be useful in some applications

85
Condition Variable example
  • Say I want to have multiple threads wait until a
    counter reaches a maximum value and be awakened
    when it happens
  • pthread_mutex_lock(lock)
  • while (count lt MAX_COUNT)
  • pthread_cond_wait(cond,lock)
  • pthread_mutex_unlock(lock)
  • Locking the lock so that we can read the value of
    count without the possibility of a race condition
  • Calling pthread_cond_wait() in a while loop to
    avoid spurious wakes ups
  • When going to sleep the pthread_cond_wait()
    function implicitly releases the lock
  • When waking up the pthread_cond_wait() function
    implicitly acquires the lock
  • The lock is unlocked after exiting from the loop

86
pthread_cond_timed_wait()
  • Waiting on a condition variable with a timeout
  • int pthread_cond_timedwait(
  • pthread_cond_t cond,
  • pthread_mutex_t mutex,
  • const struct timespec delay)
  • returns 0 on success, an error code otherwise
  • cond input parameter, condition
  • mutex input parameter, associated mutex
  • delay input parameter, timeout (same fields as
    the one used for gettimeofday)

87
Putting a Pthread to Sleep
  • To make a Pthread thread sleep you should use the
    usleep() function
  • include ltunistd.hgt
  • int usleep(usecond_t microseconds)
  • Do not use the sleep() function as it may not be
    safe to use it in a multi-threaded program

88
Conclusion
  • Thread Synchronization is an important topic
  • Theory is difficult
  • Practice is difficult
  • What weve presented here is the low-level view
    of synchronization, the do it yourself version
  • But this is often whats used in practice
  • There are higher-level abstractions
  • e.g., Java monitors
  • Provides some relief, but still fraught with
    peril
  • e.g., Java ThreadPool abstractions
  • Provides convenience
  • As of today, to be a good concurrent programmer,
    one needs to understand low-level concurrency
    details
  • The future may change this unfortunate situation
  • New concurrent languages
  • New ways to think about concurrent programming
  • Help from the hardware transaction memories
Write a Comment
User Comments (0)
About PowerShow.com