W4118 Operating Systems


W4118 Operating Systems
  • Instructor Junfeng Yang

  • Homework 2 update
  • Assembly code to call a hook function written in
  • syscall_fail(long syscall_nr)
  • Clarifications on sys_fail (int ith, int ncall,
    struct syscall_failures calls)
  • Only fail system calls from the current process
  • Each fail() injects only one failure
  • if a system call matches one of the system call
    numbers specified in the calls argument, count it
    as one matching call toward ith
  • Mac users talk to TA to get access to VMware

Last lecture
  • Processes in Linux
  • Context switch on x86
  • Kernel stack captures all state ? switch stack
    switch process
  • Threads good at expressing concurrency
  • Multithreading Models different tradeoffs
  • Race conditions

Recall banking example
int balance 1000 int main()
pthread_t t1, t2
pthread_create(t1, NULL, deposit, (void)1)
pthread_create(t2, NULL, withdraw,
(void)2) pthread_join(t1, NULL)
pthread_join(t2, NULL) printf(all
done balance d\n, balance) return
void deposit(void arg) int
i for(i0 ilt1e7 i)
void withdraw(void arg) int
i for(i0 ilt1e7 i)
-- balance
Recall a closer look at the banking example
  • objdump d bank
  • 08048464 ltdepositgt
  • //
  • 8048473 a1 80 97 04 08 mov
  • 8048478 83 c0 01 add
  • 804847b a3 80 97 04 08 mov
  • 0804849b ltwithdrawgt
  • //
    -- balance
  • 80484aa a1 80 97 04 08 mov
  • 80484af 83 e8 01 sub
  • 80484b2 a3 80 97 04 08 mov

Avoiding Race Conditions
  • Race condition a timing dependent error
    involving shared state
  • Critical section a segment of code that accesses
    shared variable (or resource) and must not be
    concurrently executed by more than one thread

// balance mov 0x8049780,eax add
0x1,eax mov eax,0x8049780 // --
balance mov 0x8049780,eax sub
0x1,eax mov eax,0x8049780
How to implement critical sections?
  • Atomic operations no other instructions can be
    interleaved, executed as a unit all or none,
    guaranteed by hardware
  • A possible solution create a super instruction
    that does what we want atomically
  • add 0x1, 0x8049780
  • Problem
  • Cant anticipate every possible way we want
  • Increases hardware complexity, slows down other

// balance mov 0x8049780,eax add
0x1,eax mov eax,0x8049780 // --
balance mov 0x8049780,eax sub
0x1,eax mov eax,0x8049780
Layered approach to synchronization
  • Hardware provides simple low-level atomic
    operations, upon which we can build high-level,
    atomic operations, upon which we can implement
    critical sections and build correct
    multi-threaded/multi-process programs

Properly synchronized application
High-level synchronization primitives
Hardware-provided low-level atomic operations
Example low-level atomic operations and
high-level synchronization primitives
  • Low-level atomic operations
  • On uniprocessor, disable/enable interrupt
  • x86 load and store of words
  • Special instructions
  • Test-and-set
  • High-level synchronization primitives
  • Lock
  • Semaphore
  • Monitor

Will look at them all. Start with lock.
Locks (or Mutex Mutual exclusion)
  • Two common operations
  • lock() acquire lock exclusively wait if not
  • unlock() release exclusive access to lock
  • pthread example

void deposit(void arg) int
i for(i0 ilt1e7 i)
balance pthread_mutex_unlock(l)

void withdraw(void arg) int
i for(i0 ilt1e7 i)
pthread_mutex_lock(l) --
balance pthread_mutex_unlock(l)

Critical Section Goals
  • Requirements
  • Safety (aka mutual exclusion) no more than one
    thread in critical section at a time.
  • Liveness (aka progress)
  • If multiple threads simultaneously request to
    enter critical section, must allow one to proceed
  • Must not depend on threads outside critical
  • Bounded waiting (aka starvation-free)
  • Must eventually allow waiting thread to proceed
  • Makes no assumptions about the speed and number
    of CPU
  • However, assumes each thread makes progress
  • Desirable properties
  • Efficient dont consume too much resource while
  • Dont busy wait (spin wait). Better to
    relinquish CPU and let other thread run
  • Fair dont make some thread wait longer than
    others. Hard to do efficiently
  • Simple should be easy to use

Implementing Locks version 1
  • Can cheat on uniprocessor implement locks by
    disabling and enabling interrupts
  • Linux kernel heavily used this trick in single
    core days
  • cli() __asm__ __volatile__("cli" "memory")
  • sti() __asm__ __volatile__(sti" "memory")
  • Good simple!
  • Bad
  • Both operations are privileged, cant let user
    program use
  • Doesnt work on multiprocessors

lock() disable_interrupt()
unlock() enable_interrupt()
Implementing Locks version 2
  • Petersons algorithm software-based lock
  • Good doesnt require much from hardware
  • Only assumptions
  • Loads and stores are atomic
  • They execute in order
  • Does not require special hardware instructions

Software-based lock 1st attempt
// 0 lock is available, 1 lock is held by a
thread int flag 0
lock() while (flag 1) //
spin wait flag 1
unlock() flag 0
  • Idea use one flag, test then set, if
    unavailable, spin-wait (or busy-wait)
  • Why doesnt work?
  • Not safe both threads can be in critical section
  • Not efficient busy wait, particularly bad on
    uniprocessor (will solve this later)

Software-based locks 2nd attempt
// 1 a thread wants to enter critical section,
0 it doesnt int flag2 0, 0
lock() flagself 1 // I
need lock while (flag1- self 1)
// spin wait
unlock() // not any more
flagself 0
  • Idea use per thread flags, set then test, to
    achieve mutual exclusion
  • Why doesnt work?
  • Not live can deadlock

Software-based locks 3rd attempt
// whose turn is it? int turn 0
lock() // wait for my turn
while (turn 1 self) // spin wait
unlock() // Im done. your turn
turn 1 self
  • Idea strict alternation to achieve mutual
  • Why doesnt work?
  • Not live depends on threads outside critical

Software-based locks final attempt (Petersons
// whose turn is it? int turn 0 // 1 a thread
wants to enter critical section, 0 it
doesnt int flag2 0, 0
unlock() // not any more
flagself 0
lock() flagself 1 // I
need lock turn 1 self //
wait for my turn while (flag1-self
1 turn 1 self) // spin wait while
the // other thread has intent
// AND it is the other //
threads turn
  • Why works?
  • Safe?
  • Live?
  • Bounded wait?

Notes on Petersons algorithm
  • Great way to start thinking of multi-threaded
  • Scheduler is malicious
  • The algorithm is useful in other contexts as well
  • Problem
  • Doesnt work with Ngt2 threads
  • Obvious extension N flags, turn 0,1,,N-1
    doesnt work
  • Leslie Lamports Bakerys algorithm
  • More importantly, doesnt really work on modern
    out-of-order processors
  • Next implement locks with hardware support

Implementing locks version 3
// 0 lock is available, 1 lock is held by a
thread int flag 0
lock() while(test_and_set(flag)
unlock() flag 0
  • Problem with the test-then-set approach test and
    set are not atomic
  • Special atomic operation
  • test_and_set address, register
  • load address to register, and set address to 1
  • Why works?

test_and_set on x86
  • xchg reg, addr atomically swaps addr and reg
  • spin_lock in Linux can be implemented using this
    instruction (include/asm-i386/spin_lock.h)
  • Another way append a lock prefix before an
  • Examples in include/asm-i386/atomic.h

long test_and_set(volatile long lock)
int old asm("xchgl 0, 1"
"r"(old), "m"(lock) // output
"0"(1) // input
"memory // can clobber
anything in
// memory, so gcc wont
// reorder this
statement with others )
return old
Spin-wait or block
  • So far the lock implementations weve seen are
    busy-wait or spin-wait locks endlessly checking
    the lock flag without yielding CPU
  • Problem waste CPU cycles
  • Worst case prev thread holding a busy-wait lock
    gets preempted, other threads try to acquire the
    same lock
  • On uniprocessor should not use spin-lock
  • Yield CPU when lock not available (need OS
  • On multi-processor
  • Thread holding lock gets preempted ? ???
  • Correct action depends on how long before lock
  • Lock released quickly ? spin-wait
  • Lock released slowly ? block
  • Quick or slow is relative to context-switch
  • Good plan spin a bit, then block

Problem with simple yield
lock() while(test_and_set(flag)
) yield()
  • Problem
  • Still a lot of context switches
  • Starvation possible
  • Why? No control over who gets the lock next
  • Need explicit control over who gets the lock

Implementing locks version 4
typedef struct __mutex_t int flag
// 0 mutex is available, 1 mutex is not
available int guard // guard lock to
internal mutex data structure queue_t q //
queue of waiting threads mutex_t
void lock(mutex_t m) while
(test_and_set(m-gtguard)) //acquire
guard lock by spinning if (m-gtflag 0)
m-gtflag 1 // acquire mutex
m-gtguard 0 else enqueue(m-gtq,
self) m-gtguard 0 yield()

void unlock(mutex_t m) while
(test_and_set(m-gtguard)) if
(queue_empty(m-gtq)) // release mutex no
one wants mutex m-gtflag 0 else
// hold mutex (for next thread!)
wakeup(dequeue(m-gtq)) m-gtguard 0
  • Add thread to queue when lock unavailable

  • Synchronization tool that does not require busy
  • Semaphore S integer variable
  • Two standard operations modify S acquire() and
  • Originally called P() and V()
  • from Dutch Proberen and Verhogen (Dijkstra)
  • Also called down() and up()
  • And even wait() and signal()
  • Higher-level abstraction, less complicated
  • Can only be accessed via two indivisible (atomic)

P(S) while(S 0) S--
V(S) S
Semaphore Types
  • Counting semaphore integer value can range over
    an unrestricted domain
  • Used for synchronization
  • Binary semaphore integer value can range only
    between 0 and 1 can be simpler to implement
  • Used for mutual exclusion same as mutex

Process i P(S) Critical Section V(S)
Remainder Section
Semaphore Implementation
  • Must guarantee that no two processes can execute
    P () / acquire () and V () / release () on the
    same semaphore at the same time
  • Thus, implementation of these operations becomes
    the critical section problem again, where the
    acquire and release code are placed inside the
    critical section.
  • Could now have busy waiting in critical section
  • But if we know we cant acquire semaphore, should
    we busy wait and burn up the CPU?
  • Note that applications may spend lots of time in
    critical sections and therefore this is not a
    good solution.
  • Wed like a semaphore that sleeps (or at least
    lets someone else run)

Semaphore Implementation with no Busy waiting
  • With each semaphore there is an associated
    waiting queue. Each entry in a waiting queue has
    two data items
  • value (of type integer)
  • pointer to next record in the list
  • Two operations
  • block place the process invoking the operation
    on the appropriate waiting queue.
  • wakeup remove one of processes in the waiting
    queue and place it in the ready queue.
  • Potential queuing policies FIFO, LIFO, undef

Semaphore Implementation with no Busy waiting
  • Implementation of acquire()
  • Implementation of release()

Producer-Consumer Problem
  • Bounded buffer size N
  • Access entry 0 N-1, then wrap around to 0
  • Producer process writes data to buffer
  • Must not write more than N items more than
    consumer ate
  • Consumer process reads data from buffer
  • Should not try to consume if there is no data

Solving Producer-Consumer Problem
  • Solving with semaphores
  • Well use two kinds of semaphores
  • Well use counters to track how much data is in
    the buffer
  • One counter counts as we add data and stops the
    producer if there are N objects in the buffer
  • A second counter counts as we remove data and
    stops a consumer if there are 0 in the buffer
  • Idea since general semaphores can count for us,
    we dont need a separate counter variable
  • Why do we need a second kind of semaphore?
  • Well also need a mutex semaphore

Producer-Consumer Problem
Shared Semaphores mutex, empty, full Init
mutex 1 / for mutual exclusion/
empty N / number empty buf entries /
full 0 / number full buf entries /
Producer do . . . // produce an item
in nextp . . . P(empty) P(mutex)
. . . // add nextp to buffer . . .
V(mutex) V(full) while (true)
Consumer do P(full) P(mutex) . .
. // remove item to nextc . . .
V(mutex) V(empty) . . . //
consume item in nextc . . . while (true)
Common Errors with Semaphores
Process l P(S) If (something) return CS V(S)
Process i P(S) CS P(S)
Process k P(S) CS
Process j V(S) CS V(S)
A typo. Process J wont respect mutual exclusion
even if the other processes follow the rules
correctly. Worse still, once weve done two
extra V() operations this way, other processes
might get into the CS inappropriately!
Whoever next calls P() will freeze up. The bug
might be confusing because that other process
could be perfectly correct code, yet thats the
one youll see hung when you use the debugger to
look at its state!
A typo. Process I will get stuck (forever) the
second time it does the P() operation. Moreover,
every other process will freeze up too when
trying to enter the critical section!
Someone forgot to release the semaphore before
returning! The next caller will get stuck.
