Advanced Operating Systems - PowerPoint PPT Presentation

1 / 49
About This Presentation
Title:

Advanced Operating Systems

Description:

Advanced Operating Systems Lecture 7: Concurrency University of Tehran Dept. of EE and Computer Engineering By: Dr. Nasser Yazdani Univ. of Tehran – PowerPoint PPT presentation

Number of Views:542
Avg rating:3.0/5.0
Slides: 50
Provided by: Larry388
Category:

less

Transcript and Presenter's Notes

Title: Advanced Operating Systems


1
Advanced Operating Systems
Lecture 7 Concurrency
  • University of Tehran
  • Dept. of EE and Computer Engineering
  • By
  • Dr. Nasser Yazdani

2
How to use shared resource
  • Some general problem and solutions.
  • References
  • Fast Mutual Exclusion for Uniprocessors
  • On Optimistic Methods for Concurrency Control

3
Outline
  • Introduction
  • Motivation
  • Implementing mutual exclusion
  • Implementing restartable atomic sequence
  • Kernel design considerations
  • The performance of three software techniques
  • Conclusions

4
Why Coordinate?
  • Critical section
  • Must execute atomically, without interruption.
  • Atomicity usually only w.r.t. other operations on
    the same data structures.
  • What are sources of interruption?
  • Hardware interrupts, UNIX signals.
  • Thread pre-emption.
  • Interleaving of multiple CPUs.

5
Spooling Example Correct
Shared memory
Process 1
Process 2
int next_free
int next_free

out
next_free in
1
abc
4
Stores F1 into next_free
Prog.c
5
2
Prog.n
6
innext_free1
in
3
F1
7
next_free in
4
F2
Stores F2 into next_free
5

innext_free1
6
6
Spooling Example Races
Shared memory
Process 1
Process 2
int next_free
int next_free

out
next_free in
1
abc
4
Prog.c
next_free in / value 7 /
5
2
Stores F1 into next_free
Prog.n
6
3
in
F1
7
F2
innext_free1
4
Stores F2 into next_free
5

innext_free1
6
7
Critical Section Problem
  • N threads all competing to use the same shared
    data
  • It might eventuate to Race condition
  • Each thread has a code segment, called a critical
    section, in which share data is accessed
  • We need to ensure that when one thread is
    executing in its critical section, no other
    thread is allowed to execute in its critical
    section

8
Critical Region (Critical Section)
  • Process
  • while (true)
  • ENTER CRITICAL SECTION
  • Access shared variables // Critical Section
    LEAVE CRITICAL SECTION
  • Do other work

9
Critical Region Requirement
  • Mutual Exclusion
  • one process must execute within the critical.
  • Progress
  • If no Waiting process in its critical section,
    any process entry to its critical section cannot
    be postponed indefinitely.
  • No process running outside its critical region
    may block other processes
  • Bounded Wait
  • A process requesting entry to a critical section
    should only have to wait for a bounded number of
    other processes to enter and leave the critical
    section.
  • No process should have to wait forever to enter
    its critical region
  • Speed and Number of CPUs
  • No assumption may be made about speeds or number
    of CPUs.

10
Critical Regions (2)
  • Mutual exclusion using critical regions

11
Synchronization approaches
  • Disabling Interrupts
  • Lock Variables
  • Strict Alternation
  • Petersons solution
  • TSL
  • Sleep and Wakeup
  • Message sending

12
Disabling Interrupts
  • How does it work?
  • Disable all interrupts just after entering a
    critical section and re-enable them just before
    leaving it.
  • Why does it work?
  • With interrupts disabled, no clock interrupts and
    switch can occur.
  • Problems
  • What if the process forgets to enable the
    interrupts?
  • Multiprocessor? (disabling interrupts only
    affects one CPU)
  • Only used inside OS

13
Lock Variables
  • Int lock
  • lock0
  • While (lock)
  • lock 1
  • EnterCriticalSection
  • access shared variable
  • LeaveCriticalSection
  • lock 0
  • Does the above code work?

14
Strict Alternation
  • Thread Me / For two threads /
  • while (true)
  • while ( turn ! my_thread_id)
  • Access shared variables // Critical
    Section
  • turn other_thread_id
  • Do other work
  • Satisfies mutual exclusion but not progress.
  • Why?
  • Notes
  • While turn ! my_thread_id / busy
    waiting/
  • A lock (turn variable) that uses busy waiting is
    called a spin lock

15
Using Flags
  • int flag2 false, false
  • Thread Me
  • while (true)
  • flagmy_thread_id true
  • while (flagother_thread_id )
  • Access shared variables // Critical
    Section
  • flagmy_thread_id false
  • Do other work
  • Can block indefinitely
  • Why? (You go ahead!)

16
Test Set (TSL)
  • Requires hardware support
  • Does test and set atomically
  • char Test_and_Set ( char target)
  • \\ All done atomically
  • char temp target
  • target true
  • return(temp)

17
Problems with TSL
  • Operates at motherboard speeds, not CPU.
  • Much slower than cached load or store.
  • Prevents other use of the memory system.
  • Interferes with other CPUs and DMA.
  • Silly to spin in TSL on a uniprocessor.
  • Add a thread_yield() after every TSL.

18
Other Similar Hardware Instruction
  • Swap TSL
  • void Swap (char x, y)
  • \\ All done atomically
  • char temp x
  • x y
  • y temp

19
Petersons Solution
  • int flag2false, false
  • int turn
  • Thread Me
  • while (true)
  • flagmy_thread_id true
  • turn other_thread_id
  • while (flagother_thread_id
  • and turn other_thread_id )
  • Access shared variables // Critical
    Section
  • flagmy_thread_id false
  • Do other work
  • It works!!!
  • Why?

20
Sleep and Wakeup
  • Problem with previous solutions
  • Busy waiting
  • Wasting CPU
  • Priority Inversion
  • a high priority waits for a low priority to leave
    the critical section
  • the low priority can never execute since the high
    priority is not blocked.
  • Solution sleep and wakeup
  • When blocked, go to sleep
  • Wakeup when it is OK to retry entering the
    critical section
  • Semaphore operation that executes sleep and wakeup

21
Semaphores
  • A semaphore count represents count number of
    abstract resources.
  • New variable having 2 operations
  • The Down (P) operation is used to acquire a
    resource and decrements count.
  • The Up (V) operation is used to release a
    resource and increments count.
  • Any semaphore operation is indivisible (atomic)
  • Semaphores solve the problem of the wakeup-bit

22
Whats Up? Whats Down?
  • Definitions of P and V
  • Down(S)
  • while (S lt 0) // no-op
  • S S-1
  • Up(S)
  • S
  • Counting semaphores 0..N
  • Binary semaphores 0,1

23
Possible Deadlocks with Semaphores
  • Example
  • P0 P1
  • share two semaphores S and Q
  • S 1 Q1
  • Down(S) // S0 ------------gt Down(Q) //Q0
  • Down(Q) // Q -1 lt--------
  • --------------------gt Down(S) //
    S-1
  • // P0 blocked // P1 blocked
  • DEADLOCK
  • Up(S) Up(Q)
  • Up(Q) Up(S)

24
Monitor
  • A simpler way to synchronize
  • A set of programmer defined operators
  • monitor monitor-name
  • // variable declaration
  • public entry P1(..)
  • ...
  • ......
  • public entry Pn(..)
  • ...
  • begin
  • initialization code
  • end

25
Monitor Properties
  • The internal implementation of a monitor type
    cannot be accessed directly by the various
    threads.
  • The encapsulation provided by the monitor type
    limits access to the local variables only by the
    local procedures.
  • Monitor construct does not allow concurrent
    access to all procedures defined within the
    monitor.
  • Only one thread/process can be active within the
    monitor at a time.
  • Synchronization is built in.

26
Cooperating Processors via Message Passing
  • IPC is best provided by a messaging system
  • Messaging system and shared memory system are not
    mutually exclusive, they can be used
    simultaneously within a single OS or single
    process
  • Two basic operations
  • Send (destination, message)
  • Receive (source, message)
  • Message size Fixed or Variable size.
  • Real life analogy conversation

27
Message Passing
28
Direct Communication
  • Binds the algorithm to Process name
  • Sender explicitly names the received or receiver
    explicitly names the sender
  • Send(P,message)
  • Receive(Q,message)
  • Link is established automatically between every
    pair of processes that want to communicate
  • Processes must know about each other identity
  • One link per pair of processes

29
Indirect Communication
  • send(A,message) / send a message to mailbox A
    /
  • receive(A,message) / receive a message from
    mailbox A /
  • Mailbox is an abstract object into which a
    message can be placed to or removed from.
  • Mailbox is owned either by a process or by the
    system

30
Fast Mutual Exclusion for Uniprocessors
  • Describe restartable atomic sequences (an
    optimistic mechanism for implementing atomic
    operations on a uniprocessor)
  • Assumes that short, atomic sequences are rarely
    interrupted.
  • Rely on a recovery mechanisms.
  • Performance improvements.

31
Motivation of efficient mutual-exclusion
  • Modern applications use multiple threads
  • As a program structuring device
  • As a mechanism for portability to multiprocessors
  • As a way to manage I/O and server concurrency
  • Many OSs are build on top of a microkernel
  • Many services are implemented as multithreaded
    user-level applications
  • Even single threaded programs rely on basic OS
    services that are implemented outside the kernel

32
Implementing mutual exclusion on a uniprocessor
  • Pessimistic methods
  • Memory-interlocked instruction
  • Software reservation
  • Kernel emulation
  • Restartable atomic sequences

33
Memory-interlocked instruction
  • Implicitly delays interrupts until the
    instruction completes.
  • Require special hardware support from the
    processor and bus.
  • The cycle time for an interlocked access is
    several times greater than that for a
    non-interlocked access.

34
Software reservation
  • Explicitly guards against arbitrary interleaving.
  • A thread must register its intent to perform an
    atomic operation, and then wait.
  • Examples
  • Dekkers algorithm
  • Lamports algorithm
  • Petersons algorithm

35
Kernel emulation
  • A strictly uniprocessor solution
  • Explicitly disables interrupts during operations
    that must execute atomically.
  • Although requires no special hardware, its
    runtime cost is high.
  • The kernel must be invoked on every
    synchronization operation

36
Restartable atomic sequence
  • Instead of using a mechanism that guards against
    interrupts, we can instead recognize when an
    interrupt occurs and recover.
  • The recovery process restart the sequence.
  • Are attractive because
  • Do not require hardware support.
  • Have a short code path with one load and store
    per atomic read-modify-write.
  • Do not involve the kernel on every atomic
    operation.

37
Implementing restartable atomic sequences
  • Require kernel support to ensure that a suspended
    thread is resumed at the beginning of the
    sequence.
  • Strategies for implementing kernel
  • Explicit registration in Mach
  • Designated sequences in Taos

38
Explicit registration in Mach
  • The kernel keeps track of each address spaces
    restartable atomic sequence.
  • An application registers the starting address and
    length of the sequence with kernel.
  • In response to the failure
  • Replace restartable atomic sequence with
    conventional mechanisms code.

39
Costs of explicit registration
  • Cost of subroutine linkage
  • Because the kernel identifies restartable atomic
    sequences by a single PC range per address space,
    They cannot be inlined.
  • Cost of checking return PC
  • Kernel must check the return PC, whenever a
    thread is suspended.
  • Make additional scheduling overhead worthwhile.

40
Designated sequences in Taos
  • The kernel must recognize every interrupted
    sequence.
  • Uses two-stage check to recognize atomic
    sequences.
  • 1st rejects most interrupted code sequences that
    are not restartable.
  • (the opcode of the suspended instruction is
    used as an index into a hash table containing
    instructions eligible to appear in a restartable
    atomic sequence)
  • 2nd uses another table (indexed by opcode)

41
Kernel design considerations
  • Cost of the two-stage check on every thread
    switch
  • Placement of the PC check
  • Mutual exclusion in the kernel

42
Placement of the PC check
  • When should the kernel check/adjust the PC of a
    suspended thread?
  • When it is first suspended.
  • When it is about to be resumed.
  • Detection at user level
  • Whenever a suspended thread is resumed by the
    kernel, it returns to a fixed user-level
    sequence.
  • Determine if the thread was suspended within a
    restartable atomic sequence.
  • (complexity and overhead -- save return address
    to user-level stack at each suspension)

43
Mutual exclusion in the kernel
  • The kernel is itself a client of thread
    management facilities.
  • Two events, can trigger a thread switching
  • Page fault
  • Thread preemption
  • Careless ordering of the PC check could lead to
    mutual recursion between the thread scheduler and
    the virtual memory system.

44
The performance
  • R.A.S. via Kernel Emulation via Software
    reservation
  • Discuss performance at three levels
  • Basic overhead of various mechanisms.
  • Effect on the performance of common thread
    management operations.
  • Effect of mutual exclusion overhead on the
    performance of several application.

45
Microbenchmarks
  • The performance is with test which enters
    critical section (TSL) in a loop for 1M
  • Two version of Lamprot algorith (fast and meta)

46
Thread management overhead
  • Different thread management packages

Two thread using mutex and condition variable
alternatively
47
Application performance
  • afs-bench file sys intensive like cp
  • Parthenon-n theorem prover with n threads
  • Procon-64 producer-consumer
  • Thread suspensions for R.A.S of time to check

48
Conclusions
  • R.A.S. represent a common case approach to
    mutual exclusion on a uniprocessor.
  • R.A.S. are appropriate for uniprocessors that do
    not support memory-interlocked atomic
    instructions.
  • Also on processors that do have hardware support
    for synchronization, better performance may be
    possible.

49
Next Lecture
  • Distributed systems
  • References
  • Read the first chapter of the book
  • Read The Anatomy of the Grid Enabling Scalable
    virtual Organizations
Write a Comment
User Comments (0)
About PowerShow.com