Title: Advanced Operating Systems
1Advanced Operating Systems
Lecture 7 Concurrency
- University of Tehran
- Dept. of EE and Computer Engineering
- By
- Dr. Nasser Yazdani
2How to use shared resource
- Some general problem and solutions.
- References
- Fast Mutual Exclusion for Uniprocessors
- On Optimistic Methods for Concurrency Control
3Outline
- Introduction
- Motivation
- Implementing mutual exclusion
- Implementing restartable atomic sequence
- Kernel design considerations
- The performance of three software techniques
- Conclusions
4Why Coordinate?
- Critical section
- Must execute atomically, without interruption.
- Atomicity usually only w.r.t. other operations on
the same data structures. - What are sources of interruption?
- Hardware interrupts, UNIX signals.
- Thread pre-emption.
- Interleaving of multiple CPUs.
5Spooling Example Correct
Shared memory
Process 1
Process 2
int next_free
int next_free
out
next_free in
1
abc
4
Stores F1 into next_free
Prog.c
5
2
Prog.n
6
innext_free1
in
3
F1
7
next_free in
4
F2
Stores F2 into next_free
5
innext_free1
6
6Spooling Example Races
Shared memory
Process 1
Process 2
int next_free
int next_free
out
next_free in
1
abc
4
Prog.c
next_free in / value 7 /
5
2
Stores F1 into next_free
Prog.n
6
3
in
F1
7
F2
innext_free1
4
Stores F2 into next_free
5
innext_free1
6
7Critical Section Problem
- N threads all competing to use the same shared
data - It might eventuate to Race condition
- Each thread has a code segment, called a critical
section, in which share data is accessed - We need to ensure that when one thread is
executing in its critical section, no other
thread is allowed to execute in its critical
section
8Critical Region (Critical Section)
- Process
- while (true)
- ENTER CRITICAL SECTION
- Access shared variables // Critical Section
LEAVE CRITICAL SECTION - Do other work
-
-
9Critical Region Requirement
- Mutual Exclusion
- one process must execute within the critical.
- Progress
- If no Waiting process in its critical section,
any process entry to its critical section cannot
be postponed indefinitely. - No process running outside its critical region
may block other processes - Bounded Wait
- A process requesting entry to a critical section
should only have to wait for a bounded number of
other processes to enter and leave the critical
section. - No process should have to wait forever to enter
its critical region - Speed and Number of CPUs
- No assumption may be made about speeds or number
of CPUs.
10Critical Regions (2)
- Mutual exclusion using critical regions
11Synchronization approaches
- Disabling Interrupts
- Lock Variables
- Strict Alternation
- Petersons solution
- TSL
- Sleep and Wakeup
- Message sending
12Disabling Interrupts
- How does it work?
- Disable all interrupts just after entering a
critical section and re-enable them just before
leaving it. - Why does it work?
- With interrupts disabled, no clock interrupts and
switch can occur. - Problems
- What if the process forgets to enable the
interrupts? - Multiprocessor? (disabling interrupts only
affects one CPU) - Only used inside OS
13Lock Variables
- Int lock
- lock0
- While (lock)
- lock 1
- EnterCriticalSection
- access shared variable
- LeaveCriticalSection
- lock 0
- Does the above code work?
14Strict Alternation
- Thread Me / For two threads /
-
- while (true)
- while ( turn ! my_thread_id)
- Access shared variables // Critical
Section - turn other_thread_id
- Do other work
-
-
- Satisfies mutual exclusion but not progress.
- Why?
- Notes
- While turn ! my_thread_id / busy
waiting/ - A lock (turn variable) that uses busy waiting is
called a spin lock
15Using Flags
- int flag2 false, false
- Thread Me
-
- while (true)
- flagmy_thread_id true
- while (flagother_thread_id )
- Access shared variables // Critical
Section - flagmy_thread_id false
- Do other work
-
-
- Can block indefinitely
- Why? (You go ahead!)
16Test Set (TSL)
- Requires hardware support
- Does test and set atomically
- char Test_and_Set ( char target)
- \\ All done atomically
- char temp target
- target true
- return(temp)
17Problems with TSL
- Operates at motherboard speeds, not CPU.
- Much slower than cached load or store.
- Prevents other use of the memory system.
- Interferes with other CPUs and DMA.
- Silly to spin in TSL on a uniprocessor.
- Add a thread_yield() after every TSL.
18Other Similar Hardware Instruction
- Swap TSL
- void Swap (char x, y)
- \\ All done atomically
- char temp x
- x y
- y temp
19Petersons Solution
- int flag2false, false
- int turn
- Thread Me
-
- while (true)
- flagmy_thread_id true
- turn other_thread_id
- while (flagother_thread_id
- and turn other_thread_id )
- Access shared variables // Critical
Section - flagmy_thread_id false
- Do other work
-
-
- It works!!!
- Why?
20Sleep and Wakeup
- Problem with previous solutions
- Busy waiting
- Wasting CPU
- Priority Inversion
- a high priority waits for a low priority to leave
the critical section - the low priority can never execute since the high
priority is not blocked. - Solution sleep and wakeup
- When blocked, go to sleep
- Wakeup when it is OK to retry entering the
critical section - Semaphore operation that executes sleep and wakeup
21Semaphores
- A semaphore count represents count number of
abstract resources. - New variable having 2 operations
- The Down (P) operation is used to acquire a
resource and decrements count. - The Up (V) operation is used to release a
resource and increments count. - Any semaphore operation is indivisible (atomic)
- Semaphores solve the problem of the wakeup-bit
22Whats Up? Whats Down?
- Definitions of P and V
- Down(S)
- while (S lt 0) // no-op
- S S-1
-
- Up(S)
- S
-
- Counting semaphores 0..N
- Binary semaphores 0,1
23Possible Deadlocks with Semaphores
- Example
- P0 P1
- share two semaphores S and Q
- S 1 Q1
- Down(S) // S0 ------------gt Down(Q) //Q0
- Down(Q) // Q -1 lt--------
- --------------------gt Down(S) //
S-1 - // P0 blocked // P1 blocked
- DEADLOCK
- Up(S) Up(Q)
- Up(Q) Up(S)
24Monitor
- A simpler way to synchronize
- A set of programmer defined operators
- monitor monitor-name
- // variable declaration
- public entry P1(..)
- ...
- ......
- public entry Pn(..)
- ...
- begin
- initialization code
- end
25Monitor Properties
- The internal implementation of a monitor type
cannot be accessed directly by the various
threads. - The encapsulation provided by the monitor type
limits access to the local variables only by the
local procedures. - Monitor construct does not allow concurrent
access to all procedures defined within the
monitor. - Only one thread/process can be active within the
monitor at a time. - Synchronization is built in.
26Cooperating Processors via Message Passing
- IPC is best provided by a messaging system
- Messaging system and shared memory system are not
mutually exclusive, they can be used
simultaneously within a single OS or single
process - Two basic operations
- Send (destination, message)
- Receive (source, message)
- Message size Fixed or Variable size.
- Real life analogy conversation
27Message Passing
28Direct Communication
- Binds the algorithm to Process name
- Sender explicitly names the received or receiver
explicitly names the sender - Send(P,message)
- Receive(Q,message)
- Link is established automatically between every
pair of processes that want to communicate - Processes must know about each other identity
- One link per pair of processes
29Indirect Communication
- send(A,message) / send a message to mailbox A
/ - receive(A,message) / receive a message from
mailbox A / - Mailbox is an abstract object into which a
message can be placed to or removed from. - Mailbox is owned either by a process or by the
system
30Fast Mutual Exclusion for Uniprocessors
- Describe restartable atomic sequences (an
optimistic mechanism for implementing atomic
operations on a uniprocessor) - Assumes that short, atomic sequences are rarely
interrupted. - Rely on a recovery mechanisms.
- Performance improvements.
31Motivation of efficient mutual-exclusion
- Modern applications use multiple threads
- As a program structuring device
- As a mechanism for portability to multiprocessors
- As a way to manage I/O and server concurrency
- Many OSs are build on top of a microkernel
- Many services are implemented as multithreaded
user-level applications - Even single threaded programs rely on basic OS
services that are implemented outside the kernel
32Implementing mutual exclusion on a uniprocessor
- Pessimistic methods
- Memory-interlocked instruction
- Software reservation
- Kernel emulation
- Restartable atomic sequences
33Memory-interlocked instruction
- Implicitly delays interrupts until the
instruction completes. - Require special hardware support from the
processor and bus. - The cycle time for an interlocked access is
several times greater than that for a
non-interlocked access.
34Software reservation
- Explicitly guards against arbitrary interleaving.
- A thread must register its intent to perform an
atomic operation, and then wait. - Examples
- Dekkers algorithm
- Lamports algorithm
- Petersons algorithm
35Kernel emulation
- A strictly uniprocessor solution
- Explicitly disables interrupts during operations
that must execute atomically. - Although requires no special hardware, its
runtime cost is high. - The kernel must be invoked on every
synchronization operation
36Restartable atomic sequence
- Instead of using a mechanism that guards against
interrupts, we can instead recognize when an
interrupt occurs and recover. - The recovery process restart the sequence.
- Are attractive because
- Do not require hardware support.
- Have a short code path with one load and store
per atomic read-modify-write. - Do not involve the kernel on every atomic
operation.
37Implementing restartable atomic sequences
- Require kernel support to ensure that a suspended
thread is resumed at the beginning of the
sequence. - Strategies for implementing kernel
- Explicit registration in Mach
- Designated sequences in Taos
38Explicit registration in Mach
- The kernel keeps track of each address spaces
restartable atomic sequence. - An application registers the starting address and
length of the sequence with kernel. - In response to the failure
- Replace restartable atomic sequence with
conventional mechanisms code.
39Costs of explicit registration
- Cost of subroutine linkage
- Because the kernel identifies restartable atomic
sequences by a single PC range per address space,
They cannot be inlined. - Cost of checking return PC
- Kernel must check the return PC, whenever a
thread is suspended. - Make additional scheduling overhead worthwhile.
40Designated sequences in Taos
- The kernel must recognize every interrupted
sequence. - Uses two-stage check to recognize atomic
sequences. - 1st rejects most interrupted code sequences that
are not restartable. - (the opcode of the suspended instruction is
used as an index into a hash table containing
instructions eligible to appear in a restartable
atomic sequence) - 2nd uses another table (indexed by opcode)
41Kernel design considerations
- Cost of the two-stage check on every thread
switch - Placement of the PC check
- Mutual exclusion in the kernel
42Placement of the PC check
- When should the kernel check/adjust the PC of a
suspended thread? - When it is first suspended.
- When it is about to be resumed.
- Detection at user level
- Whenever a suspended thread is resumed by the
kernel, it returns to a fixed user-level
sequence. - Determine if the thread was suspended within a
restartable atomic sequence. - (complexity and overhead -- save return address
to user-level stack at each suspension)
43Mutual exclusion in the kernel
- The kernel is itself a client of thread
management facilities. - Two events, can trigger a thread switching
- Page fault
- Thread preemption
- Careless ordering of the PC check could lead to
mutual recursion between the thread scheduler and
the virtual memory system.
44The performance
- R.A.S. via Kernel Emulation via Software
reservation - Discuss performance at three levels
- Basic overhead of various mechanisms.
- Effect on the performance of common thread
management operations. - Effect of mutual exclusion overhead on the
performance of several application.
45Microbenchmarks
- The performance is with test which enters
critical section (TSL) in a loop for 1M - Two version of Lamprot algorith (fast and meta)
46Thread management overhead
- Different thread management packages
Two thread using mutex and condition variable
alternatively
47Application performance
- afs-bench file sys intensive like cp
- Parthenon-n theorem prover with n threads
- Procon-64 producer-consumer
- Thread suspensions for R.A.S of time to check
48Conclusions
- R.A.S. represent a common case approach to
mutual exclusion on a uniprocessor. - R.A.S. are appropriate for uniprocessors that do
not support memory-interlocked atomic
instructions. - Also on processors that do have hardware support
for synchronization, better performance may be
possible.
49Next Lecture
- Distributed systems
- References
- Read the first chapter of the book
- Read The Anatomy of the Grid Enabling Scalable
virtual Organizations