Title: Loeng 2
1Loeng 2
- Teise loengu teema põhimõisted jätkub
- Meenutus esimese loengu teemad olid
- Sissejuhatus kursusesse
- Lühiülevaade problemaatikast
- Algus põhimõistetega fine and coarse grained
parallelism jne
2Teise loengu teemad
- Lihtsamad asjad
- Arhitektuurid (sisuliselt esimese loengu osade
kordamine) - Client-server interaction
- Transmission modes
- Metrics
- Keerukamad asjad
- Processes and programs and communication types
- Execution order
- Program properties safety, liveness, fairness
- Mutual exclusion (transactions)
- Virtual time
- Data races
- Memory models
3Client/Server (1-1)
4Client/Server (1-N)
5Example Web proxy server
6Client-Server interaction (IV)
7Peer-to-Peer Coordination
8Mobile Code Example Applet
9Client-Server interaction (I)
10Client-Server interaction (II)
11Client-Server interaction (III)
- Asynchronous remote procedure call
12Transmission modes
- Simplex ühel kanalil liiklus ainult ühes suunas
- (arvuti-gtmonitor)
- Half-duplex kanalil võib liiklus olla
kahesuunaline, aga mitte korraga, vaid vahel ühes
suunas, vahel teises (politseiraadio) - Duplex kanalil liiklus korraga mõlemas suunas
(telefon) - Frequency-division
- Time-division
- Synchronous channel divided into time frames.
Each frame has at least as many time slots as
logical I/O lines - Asynchronous n lines, m slots per frame. M is
based on statistical analysis
13Metrics
- Bandwidth (Mbps / Mhz olenevalt kontekstist)
- Latency (time to take a message from A to B.
Sometimes round-trip (A-B-A)) - Propagation Transmit Queue
14Basic Paradigms
- process a unit of sequential instruction
execution - program a collection of processes
- Process communication, two different ways to go
- Shared Memory in the language level we find
- Shared variables
- Semaphores for synchronization
- Mutual exclusion, Critical Code, Monitors/Locks
- Message Passing
- Local variables for each process
- Send/receive parameters and data
- Remote Procedure Call
15Reality is Different from Paradigm
- In shared memory reading and writing is
non-atomic because of queues and caching effects. - Message passing is by way of point to point
jumping and packetization, no direct connection. - OS should present to the user one of the simpler
models. User may assume everything works as in
the spec. -
- More often than not implementation is
buggy, or exposes details of a native view
different from the spec. - Sometimes model is being complicated to
enhance performance and reduce communication
relaxed consistency.
16Common Types of Parallel Systems
Communication Efficiency (bandwidth
latency)
- Multi-threading on a uni-processor (your home
PC) - Multi-threading on a multi-processor (SMP)
- Tightly-coupled parallel computer
- (Compaqs Proliant, SGIs Origin 2000,
- IBMs MP/2, Crays T3D)
- Distributed system (cluster)
- Internet computing (peer-to-peer)
- Traditionally 12 are programmable using shared
memory, 34 are programmable using message
passing, in 5 peer processes communicate with
central control only. - However things change! Most importantly recent
systems in 3 move towards presenting a shared
memory interface to a physically distributed
system. Is this an indication for the future?
Scalability, Level of Parallelism
17Execution Order
- Process execution is a-synchronic, no global bip,
no global clock. Each process has a different
execution speed, which may change over time. For
an observer, on the time axis, instruction
execution is ordered in execution order. Any
order is legal. (Sometimes different processes
may observe different global orders, TBD). - Execution order for a single process is called
program order.
x
P1
P2
18Atomicity of Instruction Execution
Consider P1 INC(i) P2 INC(i)
i i2
- The atomicity model is important for answering
the question - Is my parallel program correct?
19Program properties or invariants
- Typically we are interested of
- Safety bad things cannot happen
- Liveness program keeps working and necessary
things will eventually happen - Fairness if several processes run in parallel,
everybody gets some resources (time and memory)
20Program Properties Safety Properties
- something bad cannot happen
- are kept throughout computation, always true
- if does not hold, we will know within finite
number of steps - Example deadlock freedom
- There is always a process that can execute
another instruction (However, not necessarily
does execute it). - Example mutual exclusion
- It is not allowed for two given code regions (in
two different processes) to execute concurrently.
- Example if xgty holds then xgty holds for the rest
of the execution. - However mutual exclusion as above holds even if
the program does not allow any of the processes
to execute any of the code regions!
21Liveness Properties
- Something good must happen (in finite number
of steps) - Guarantee progress in computation
- Example no starvation
- Any process wishes to execute an instruction will
eventually be able to execute. - Example Program/process eventually terminates.
- Example One of the processes will enter critical
section. - (note the difference from deadlock freedom)
22Fairness Properties
- Liveness properties are relatively weak guarantee
of access to a shared resource. - Weak fairness if a process awaits on a certain
request then eventually it will be granted. - Eventually is not good enough for OS and
real-time systems, when response time counts. - Strong fairness if the process performs the
request sufficiently frequently then eventually
it will be granted. - Linear waiting if a process performs the
request it will be allowed previous to any other
process granted twice. - FIFO - . previous to granting any other process
that asked later. - Easy to implement in a centralized system.
However, in a distributed system it is not clear
what before or later mean.
23Mutual Exclusion
- N processes perform an infinite loop of
instruction sequence, which is composed of a
critical section and a non-critical section. - Mutual exclusion property instructions from
critical sections of two or more processes must
not be interleaved in the (global observers)
execution order.
x
(x
x
x)
x
x
x
x
x
x
x
x
P1
o
(o
o
o)
o
o
o
o
o
o
o
o
P2
time
24Mutual exclusion the solution
- The solution is by way of additional instructions
executed by every process which is to enter or
leave its critical section. - The pre_protocol
- The post_protocol
- Loop
- Non_critical_section
- Pre_protocol
- Critical_section
- Post_protocol
- End_loop
25Solution must guarantee
- A process cannot stop for indefinite time in the
critical_section or the protocols. The solution
must ensure that such a stop at the
non_critical_section by one of the processes will
not violate the ability of the other processes to
enter the critical section. - No deadlock. It may be that several processes
perform inside their pre_protocols. Eventually,
one of them will succeed to enter the
critical_section. - No starvation. If a process enters its
pre_protocol with the intention to enter the
critical section, it will eventually succeed. - No self exclusion. In the absence of other
processes trying to enter the critical_section, a
single process will always succeed doing so in a
very short time.
26Solution try 1 Give them a token to decide
whose turn is it
- Integer Turn 1
- P1
- begin
- loop
- non_crit_1
- loop
- exit when Turn 1
- end loop
- crit_sec_1
- Turn 2
- end loop
- end P1
P2 begin loop non_crit_2 loop
exit when Turn 2 end loop crit_sec_2
Turn 1 end loop end P2
(Note atomic Read/Write)
27Solution try 2 Lets give each process a
variable it can use to announce that it is in its
crit_sec
- Integer C11, C21
- P1
- Loop
- non_crit_sec_1
- loop
- exit when C21
- end loop
- C1 0
- crit_sec_1
- C1 1
- End Loop
-
Problem no mutual exclusion Execution
example P1 sees C21 P2 sees C11 P1 sets C1
0 P2 sets C2 0 P1 enters critical sec P2
enters critical sec
P2 Loop non_crit_sec_2 loop
exit when C11 end loop C2 0
crit_sec_2 C2 1 End Loop
28Solution try 3 Lets set announcing variable
before the loop
- Integer C11, C21
- P1
- Loop
- non_crit_sec_1
- C1 0
- loop
- exit when C21
- end loop
- crit_sec_1
- C1 1
- End Loop
-
P2 Loop non_crit_sec_2 C2 0
loop exit when C11 end loop
crit_sec_2 C2 1 End Loop
Problem deadlock Execution example P1 sets
C10 P2 sets C20 P1 checks C2 forever P2
checks C1 forever
29Solution try 4 Lets allow other process to
enter its crit_sec if we fail to do so
- Integer C11, C21
- P1
- Loop
- non_crit_sec_1
- C1 0
- loop
- exit when C21
- C1 1
- C1 0
- end loop
- crit_sec_1
- C1 1
- End Loop
-
P2 Loop non_crit_sec_2 C2 0
loop exit when C11 C2 1
C2 0 end loop crit_sec_1
C2 1 End Loop
Can other process enter between Ci1 and Ci0 ?
Problem starvation Between C11 and C10 P2
completed a full round. Problem livelock
30Dekkers algorithm lets give processes a
priority token that will give holder the right of
way when competing
- Integer C11, C21, Turn1
- P1
- Loop
- non_crit_sec_1
- C1 0
- loop
- exit when C21
- if Turn 2 then
- C1 1
- loop exit when Turn 1
- end loop
- C1 0
- end if
- end loop
- crit_sec_1
- C1 1
- Turn 2
- End Loop
-
P2 Loop non_crit_sec_2 C2 0
loop exit when C11 if Turn 1
then C2 1 loop
exit when Turn 2 end loop
C2 0 end if end
loop crit_sec_2 C2 1 Turn
1 End Loop
- Algorithm Correct!!!
- P1 is performing inside the
- insisting loop
- If C20 then P1 knows P2 wants to enter crit_sec
- If, in addition, Turn2, then P1 gives turn to
P2, and waits for P2 to finish. - Clearly, while P1 does all these, P2 itself will
not give up because it is his Turn. - All characteristics for a valid solution exist.
31Bakery Algorithm mutual exclusion for N
processes
- Loop
- non_crit_sec_i
- choosing(i) 1
- number(i) 1 max(number)
- choosing(i) 0
- for j in 1..N loop
- if j / i then
- loop exit when choosing(j) 0
end loop - loop
- exit when
- number(j) 0 or
- number(i) lt
number(j) or - number(i) number
(j) and i lt j) - end loop
- end if
- end loop
- crit_sec_i
- number(i) 0
- End loop
Shared arrays array(1..N) of integer Choosing,
Number Process Pi performs integer i
process id
The idea is to have processes take tickets with
numbers on them (just like in the city hall, or
health care). Other processes give turn to
process holding the ticket with minimal
number (he got there first). If two tickets
happen to be the same, the process having minimal
id enters.
32Changing the rules of the game increasing
atomicity (loadstore)
- C shared variable
- Bi Pis private variable
- TS (Test and Set) Bi C
- C 1
- CS (Compare and Swap)
- if Bi / C
- tmp C
- C Bi
- Bi tmp
- end if
Loop non_crit_sec_i loop
TS(Bi) exit when Bi0
end loop crit_sec_i C 0 End
loop
Such strong ops are usually supported by the
underlying hardware/OS.
33The Price of Atomic loadstoreor Why not
Simply Always use Strong Operations?
- The Set of C must be seen immediately by all
other processors, in case they execute competing
code. Since communication between processors is
via the main memory, need to cut through cache
levels. Price dozens to hundreds of clock
cycles, and growing.
Proc. 1
Proc. 2
Proc. 3
B0
B2
Local cache and registers
LoadStore
L2/L3 cache
L2/L3 cache
LoadStore
TS
Main Memory
C
34Semaphores
- A semaphore is a special variable.
- After initialization, only two atomic operations
are applicable - Busy-Wait Semaphore
- P(S) WAIT(S) When Sgt0 then S S-1
- V(S) SIGNAL(S) S S1
- Another definition Blocked-Set Semaphore
- WAIT(S) if Sgt0 then S S-1
- else wait on S
- SIGNAL(S) if there are processes waiting on S
- then let one of them proceed,
- else SS1
NOTE LoadStore are embedded in both WAIT and
SIGNAL. Thus, Mutual Exclusion using semaphores
is easy.
35Virtual Time
- Virtual Time and Global States of Distributed
Systems - Friedmann Mattern, 1989
- The Model An asynchronous distributed system a
set of processes having no shared memory,
communicating by message transfer. - Message delay gt 0, but is not known in advance.
- A global observer sees the global state at
certain points in time. It can be said to take a
snapshot of the global state. - A local observer (one of the processes in the
system) sees the local state. Because of the
asynchrony, a local observer can only gather
local views to an approximate global view. - This is a hard hazard for many management and
control problems - Mutual exclusion, deadlock detection, distributed
contracts, leader election, load sharing,
checkpointing etc.
36Solution Approaches
- Simulating a synchronous system by an
asynchronous one. This requires high overhead on
global synchronization of each and every step. - Simulation of a global state. A snapshot, taken
asynchronously, which is not necessarily correct
for any specific point in time, but is in a way
consistent with the local states of all
processes. - A logical clock which is not global, but can be
used to derive useful global information. The
system works asynchronously, but the processes
make sure to maintain their part of the clock.
37Events
- An event is a change in the process state.
- An event happens instantly, it does not take
time. - A process is a sequence of events
- There are 3 types of events
- send event causes a message to be sent
- receive event causes a message to be received
- local event only causes an internal change of
state - Events correspond to each other as follows
- All events in the same process happen
sequentially, one after the other. - Each send event has a corresponding receive
- This allows us to define the happened before
relation among events.
38The Happened Before Relation
We say that event e happened before event e (and
denote it by e ? e or e lt e) if one of the
following properties holds
Processor Order e precedes e in the same
process Send-Receive e is a send and e is the
corresponding receive Transitivity exists e
s.t. e lt e and elt e
Example
39Independent/Concurrent Events
Two such diagrams are called equivalent when the
happened before relation is the same in both.
(When global time differs for certain
events, think of processor execution as if it was
a rubber band).
Two events e, e are said to be independent or
concurrent (denoted by e e) if not e lt e and
not e lt e.
40Virtual Time (Lamport, 1978)
- A logical clock is a function CE ? T
- E a set of events, C(e) timestamp of e
- T a partially ordered set s.t. elte ?
C(e)ltC(e) - (the opposite not necessarily true, e.g.
concurrent events.) - Commonly, TN, and there is a local clock Ci for
each process Pi. - To meet the requirements, the clocks perform the
following protocol
- Just before executing a local event in Pi Ci
Ci d (dgt0) - Each message m, sent by event e send(m), is
time-stamped t(m) C(e). - Just before Pi receives a message with timestamp
t Ci max(Ci,t(m)) d (d gt0)
Usually, d1. However, d may change arbitrarily
and dynamically, say, to reflect actual time. The
timestamp of e, C(e), is given after advancing
the clock, i.e., after (1) above was already
performed for e.
41Logical Clocks Cntd.
Example
C11
C12
C13
P1
e11
e12
e13
C21
C22
P2
e21
e22
P3
e31
C33
A problem When e and e are concurrent, then any
of C(e) lt C(e), C(e) lt C(e), C(e) C(e) may
hold. Thus, when only the timestamps of the
events are known, there is a loss of information.
We do know that C(e) lt C(e) ? not(e lt e). But
we do not know whether e lt e or e e. In
particular, the information whether the events
are independent is most important, and
unfortunately lost.
42What is a Data-Race?
- Data-race is an anomaly of concurrent accesses by
two or more threads to a shared variable and at
least one is for writing. - Example (variable X is global and shared)
- Thread 1 Thread 2
- X1 TY
- Z2 TX
43Why Data-Races areUndesired?
- Programs which contain data-races usually
demonstrate unexpected and even non-deterministic
behavior. - The outcome might depend on specific execution
order (A.K.A threads interleaving). - Re-running the program may not always produce the
same results. - Thus, hard to debug and hard to write correct
programs.
44Why Data-Races areUndesired? - Example
- First Interleaving Thread 1 Thread 2
- 1. X0
- 2. TX
- 3. X
- Second Interleaving Thread 1 Thread 2
- 1. X0
- 2. X
- 3. TX
- T0 or T1?
45Execution Order
- Each thread has a different execution speed,
which may change over time. - For an external observer of the time axis,
instructions execution is ordered in execution
order. - Any order is legal.
- Execution order for a single
- thread is called program order.
46How Data-Races Can be Prevented? Explicit
Synchronization
- Idea In order to prevent undesired concurrent
accesses to shared locations, we must explicitly
synchronize between threads. - The means for explicit synchronization are
- Locks, Mutexes and Critical Sections
- Barriers
- Binary Semaphores and Counting Semaphores
- Monitors
- Single-Writer/Multiple-Readers (SWMR) Locks
- Others
47Synchronization Bad Bank Account Example
- Thread 1 Thread 2
- Deposit( amount ) Withdraw( amount )
- balanceamount if (balanceltamount)
- print( Error )
- else
- balanceamount
-
- Deposit and Withdraw are not atomic!!!
- What is the final balance after a series of
concurrent deposits and withdraws?
48Synchronization Good Bank Account Example
- Thread 1 Thread 2
- Deposit( amount ) Withdraw( amount )
- Lock( m ) Lock( m )
- balanceamount if (balanceltamount)
- Unlock( m ) print( Error )
- else
- balanceamount
- Unlock( m )
-
- Since critical sections can never execute
concurrently, this version exhibits no data-races.
49Is This Enough?
- Is This Enough?
- Theoretically YES.
- Practically NO.
- What if programmer accidentally forgets to place
correct synchronization? - How all such data-race bugs can be detected in
large program?
50Can Data-Races be Easily Detected? No!
- Unfortunately, the problem of deciding whether a
given program contains potential data-races is
computationally hard!!! - There are a lot of execution orders. For t
threads of n instructions each the number of
possible orders is about tnt. - In addition to all different schedulings, all
possible inputs should be tested as well. - To compound the problem, inserting a detection
code in a program can perturb its execution
schedule enough to make all errors disappear.
51Feasible Data-Races
- Feasible Data-Races races that are based on the
possible behavior of the program (i.e. semantics
of the programs computation). - These are the actual (!) data-races that can
possibly happen in any specific execution. - Locating feasible data-races requires full
analyzing of the programs semantics to determine
if the execution could have allowed a and b
(accesses to same shared variable) to execute
concurrently.
52Apparent Data-Races
- Apparent Data-Races approximations (!) of
feasible data-races that are based on only the
behavior of the explicit synchronization
performed by some feasible execution (and not the
semantics of the programs computation, i.e.
ignoring all conditional statements). - Important, since data-races are usually a result
of improper synchronization. Thus easier to
detect, but less accurate.
53Why Memory Model?
Answers the question Which writes by a process
are seen by which reads of the other processes?
54Memory Consistency Models
Pi R V W V,7 R V R V Pj R V W V,13 R V R V
Example program
A consistency/memory model is an agreement
between the execution environment (H/W, OS,
middleware) and the processes. Runtime guarantees
to the application certain properties on the way
values written to shared variables become visible
to reads. This determines the memory model,
whats valid, whats not.
55Memory Model Coherence
- Coherence is the memory model in which (the
runtime guarantees to the program that) writes
performed by the processes for every specific
variable are viewed by all processes in the same
full order.
Example program
All valid executions under Coherence
Note the view of a process consists of the
values it sees in its reads, and the writes it
performs. Thus, if a R V in P which is later than
a W V,x in P sees a value different than x, then
a later R V cannot see x.
56Formal definition of Coherence
- Program Order The order in which instructions
appear in each process. This is a partial order
on all the instructions in the program. - A serialization A full order on all the
instructions (reads/writes) of all the processes,
which is consistent with the program order. - A legal serialization A serialization in which
each read X returns the value written by the
latest write X in the full order. - Let P be a program let PX be the sub-program
of P which contains all the read X/write X
operations on X only. - Coherence P is said to be coherent if for every
variable X there exists a legal serialization of
PX. (Note a process cannot distinguish one such
serialization from another for a given execution)
57Examples
Process 2 read y,1 write x,1
Process 2 read y,1 write x,1
Coherent. Serializations x write x,1, read
x,1 y write y,1, read y,1
Process 1 read x,1 write x,2
Process 2 read x,2 write x,1
Not Coherent. Cycle of dependencies. Cannot be
serialized.
Not Coherent. Cannot be serialized.
58Sequential Consistency Lamport 1979
- Sequential Consistency is the memory model in
which all reads/writes performed by the processes
are viewed by all processes in the same full
order.
Coherent. Not Sequentially consistent.
Coherent. Not Sequentially consistent.
59Strict (Strong) Memory Models
Sequential Consistency Given an execution, there
exists an order of reads/writes which is
consistent with all program orders.
Coherence For any variable x, there exists an
order of read x/write x consistent with all p.o.s.
60Formal definition of Sequential Consistency
- Let P be a program.
- Sequential Consistency P is said to be
sequentially consistent if there exists a legal
serialization of all reads/writes in P.
Observation Every program which is sequentially
consistent is also coherent. Conclusion Sequentia
l Consistency has stronger requirements and we
thus say that it is stronger than Coherence. In
general A consistency model A is said to be
(strictly) stronger than B if all executions
which are valid under A are also valid under B.