Title: Distributed%20Shared%20Memory%20CIS825%20Project%20Presentation
1Distributed Shared MemoryCIS825 Project
Presentation
- Sathish R. Yenna Avinash Ponugoti
- Rajaravi Kollarapu Yogesh Bharadwaj
- Sethuraman Subramanian Nagarjuna Nagulapati
- Manmohan Uttarwar
2Distributed Shared Memory
- Introduction
- Consistency models
- Sequential consistency
- PRAM consistency
- Release consistency
- Final System
- Performance Evaluation
3Introduction
- What is shared memory?
- - Memory location or object accessed by two or
more different processes running on same machine - - A mechanism is to be defined for the access of
the shared location, otherwise unpredicted states
will result - - Many operating systems provide various
mechanisms to avoid simultaneous access of the
shared memory - For example semaphores, monitors etc..
-
4- Ex Consider the Reader/Writer Problem
- We have a shared buffer into which writer writes
and reader reads form the values from the same. - For avoiding writing on existing value and
reading the same twice, we need to have a
mechanism. - We have semaphore/monitors provided by OS to
avoid simultaneous access. - What if writer is writing from one machine and
reader is reading from another machine???
Reader
Memory
Writer
5- What is Distributed shared memory?
- - Memory accessed by two or more processes
running on different machines connected via a
communication network - Formal Definition
- A Distributed Shared Memory System is a pair
(P,M) where P is a set of N processors P1, P2,
P3, . . . Pn and M is a shared memory. - Each process Pi sequentially executes read and
write operations on data items in M in the order
defined by the program running on it.
6- DSM improves the performance of the whole system
- An abstraction like DSM simplifies the
application programming - BUT
- - The main problem is how to keep the memory
consistent - We dont have traditional semaphores or monitors
to control the accesses in DSM - We can implement by keeping the memory at a
central location and allowing processes at
different machines to access - We can only use the message transmission as an
aid to control the accesses
7 - But networks are slow, so for performance
improvement, we have to keep various copies of
the same variable at various machines - Maintaining perfect consistency (i.e., Any read
to a variable x returns the value stored by the
most recent write operation to x) of all the
copies is hard and results in low performance as
the processes are on different machines
communicating over a slow network - The solution is to accept less than perfect
consistency as the price for better performance - Moreover many application programs dont require
strict consistency - Because of all these reasons many consistency
models were defined
8Consistency Models
- A consistency model is essentially a contract
between the software and the memory. If the
software agrees to obey certain rules, the memory
promises to work correctly. - In our project we are implementing three of
them - - Sequential consistency
- - PRAM consistency
- - Release consistency
9Sequential Consistency
- A system is sequentially consistent if the
result of the any execution is the same as if - - the operations of all the processors were
executed in some sequential order, and - - the operations of each individual processor
appear in this sequence in the order specified
by its program.
10- - When processes run in parallel on different
machines, any valid interleaving is acceptable
behavior, but all processes must see the same
sequence of memory references. - - Note that nothing is said about time that is
no reference to the most recent store. - - It merely guarantees that all processes see all
memory references in the same order. - - Two possible results of the same program
- P1 W(x)1 P1 W(x)1
- ---------------------------------
------------------------------------ - P2 R(x)0 R(x)1 P2 R(x)1 R(x)1
11Implementation
- Browns Algorithm
- Each Process has a queue INi of invalidation
requests -
- W( x) v Perform all invalidations in IN queue.
- Update the main memory and cache.
- Place invalidation requests in IN queue of
each process. -
- R( x) If x is in cache then read it from
cache - Else
- Perform all invalidations in INi
- Read from the main memory
12Problems with Browns Implementation
- All the three operations in W( x) v i.e.,
updating cache, main memory and broadcasting
invalid message should be done atomically. - For ensuring the above atomicity, we will have to
use robust mechanism involving an agreement by
all the processes. There is lot of communication
overhead involved in ensuring this atomicity. - For a single write, we have N invalid messages
being transmitted, where N is the number of
processes.
13Sequentially Consistent DSM Protocol- J. Zhou,
M. Mizuno, and G. Singh
- DSM System consists shared memory module (SMem
Manager) and local manager (Processor Manager) at
each machine. - Each Processor manager
- - handles requests to read or write objects
from the user processes - - communicates with SMem manager.
- SMem manager
- - processes request messages from
processor managers to read or write objects.
14Protocol Description
- SMem manages the following data structures
- - Object memory MObject Range
- - Twodimensional binary array
- Hold_Last_WriteProcessor Range, Object
Range -
- At any time T ,
- - Hold_Last_Writei x1 object x in
the cache at processor i holds a
value written by the last- write with
respect to T , - - Hold_Last_Writei x0 object x in
the cache at processor i does not hold
a value written by the last-write with
respect to T. -
- Each element of Hold_Last_Write is
initialized to 0. - Let us say n processors and m objects.
15- Each processor i maintains the following data
structures - Onedimensional binary array Validi Object
Range - -- Validi x 1 object x in the cache is
valid - -- Validi x 0 object x in the cache is
not valid -
- Each element of Validi is initialized to 0.
- For each object x such that Validi x 1, Ci
x (Ci x is cache memory to hold value of
object x)
16- Operations at processor i
- Write(x v)
- sendwrite,x v to SMem
- receiveInvalid_array1..m message from SMem
- Validi 1..m Invalid_array1..m
//elementwise // assignments - Ci x v
- Read(x)
- if Validi x 0 then
- sendread,x message to SMem
- receive v, Invalid_array1..m from SMem
- Validi 1..m Invalid_array1..m
- Ci x v
- endif
- return Ci x
17- Operations at SMem
- Process write,x v message from processor i
- M x v
- Hold_Last_Write1..n x 0
- Hold_Last_Writei x 1
- send Hold_Last_Writei 1..m to processor i
- /send processor i's row of Hold_Last_Write
to i. - Processor i receives the row in Invalid
array / - Process read,x message from processor i
- Hold_Last_Writei x 1
- send M x, Hold_Last_Writei 1..m to
processor i -
- Each procedure is executed atomically.
18Advantages of the SC-DSM Protocol by J. Zhou,
M. Mizuno, and G. Singh
-
- The number of messages to be exchanged for read
and write operations is the same and requires
considerably less. - - A write operation requires one round of
message exchange between the processor and
the shared memory. - - A read operation at a processor also requires
one round of message exchange between the
processor and the shared memory if the object
is not found in its local cache. - The protocol does not require an atomic
broadcast. - The protocol does not require any broadcast of
messages.
19Release Consistency
- Sequential and PRAM consistencies are
restrictive. - For the case when a process is reading or writing
some variables inside a CS. - Drawback
- No way for memory to differentiate between
entering or leaving a CS. - So, release consistency is introduced.
20Release Consistency
- Three classes of variables
- Ordinary variables
- Shared data variables
- Synchronization variables Acquire and Release
(CS) - DSM has to guarantee the consistency of the
shared - data variables. If a shared variable is read
without - acquire memory has no obligation to return the
current - value.
21Protected Variables
- Acquire and release do not have to apply to all
of the memory. - Only specific shared variables may be guarded, in
which case these variables are kept consistent
and called protected variables. - On acquire, the memory makes sure that all the
local copies of protected variables are made
consistent and changes are propagated to other
machines on release.
22P1 Acq(L) W(x)1 W(x)2 Rel(L)
P2 Acq(L) R(x)2 Rel(L)
P3
R(x)1 Fig Valid event sequence
for release consistency.
23Rules for release consistency
- Before an ordinary access to a shared variable is
performed, all previous acquires done by the
process must have completed successfully. - Before a release is allowed to be performed, all
previous reads and writes done by the process
must have completed. - The acquire and release accesses must be
processor consistent (sequential consistency is
not required).
24Implementation of Release Consistency
- Two types of implementation
- Eager release consistency
- Broadcast of modified data to all other
processors is done at the time of release. - Lazy release consistency
- A process gets the most recent values of the
variables when it tries to acquire them.
25Our Implementation
- Eager release consistency
- All the operations are done locally by the
process and then sent to the DSM, which then
broadcasts the updated values to all the other
processes.
26Data Structures
Each process Pi maintains the following data
structures Cache array cache1..n //
cache memory Array valid1..n
// whether the value in the cache is valid or
not (0/1) Array locked1..n //
whether the variable is locked or not (0/1) Array
request1..m // which variables
it wants to lock Distributed Shared Memory
(DSM) maintains the following data
structures Memory array M1..n //
central memory Array lock1..n
// to keep track of which variables are locked
(0/1) Array whom1..n // locked
by which processor Array pending1..m
// processes who are yet to be replied Array
invalidate1..m // values processes need
to invalidate
27Operations at Processor Pi
lock(list of variables) send(Pid, ACQUIRE,
no_of_variables, request1..m) receive(ACK
and received_values) for i 1 to
m lockedi 1 read(i) if
lockedi return cachei else if
validi return cachei else send(Pid,
READ, i) receive(x) cachei
x validi 1
28Operations at Processor Pi
write(i, x) if lockedi cachei
x validi 1 else send(Pid,
WRITE, i, x) cachei x validi
1 unlock(list of variables) send(Pid,
RELEASE, locked1..m, cache1..m)
receiveACK for i 1 to n lockedi
0
29Operations at DSM
receive() switch(message) case READ
send(Mi) break case WRITE Mi
x break
30case ACQUIRE / for all the variable indices in
request1..m, check in lock if they are free
/ for i 0 to no_of_variables if
(lockrequesti 0) lockrequesti
1 whomrequesti Pid requested_variabl
e_valuesi Mrequesti continue else
for i 0 to no_of_variables
lockrequesti 0 whomrequesti
0 / add requesti to pending
/ pendingPid, i requesti break
send(requested_variable_values) break
31case RELEASE / has received arrays locked and
cache / for i 0 to no_of_variables
Mlockedi cachei invalidatei
lockedi broadcast(invalidate) receive
(ACK) for i 0 to no_of_variables lockloc
kedi 0 whomlockedi
0 send(Pid, ACK) check(pending) chec
k() for i 0 to n / if all pendingi
1, send(ACK, Pid) / break
32 Code for P1 Lock (a, b ,c)
Write (a) Read (b)
Write (c) Write (c)
Unlock (a, b, c)
Sample Execution
P1
DSM
P2
(ACQUIRE, request)
(ACK, values)
Enter CS Write (a)
Read (b)
Write (c) Write (b)
(RELEASE, locked, cache
Exit CS
BROADCAST
ACK
RELEASE_ACK
Leave CS
33Performance Issues
- Knowing the Execution History
- Broadcast overhead can be reduced
- No potential deadlocks
- Operations inside the critical section are atomic
34PRAM Consistency
- The total ordering of requests leads to
inefficiency due to more data movement and
synchronization requirements than what a program
may really call for. -
- A more relaxed version than Sequential
consistency is PRAM.
35PRAM(contd)
- PRAM stands for Pipelined RAM, thus, pipelined
random access - Writes done by a single process are received
by all the processes in the order in which they
were issued but writes from different processes
may be seen in a different order by different
processes.
36Example
- P1 W(x)1
- P2 R(x)1 W(x)2
- P3 R(x)1 R(x)2
- P4 R(x)2 R(x)1
- Fig Valid sequence of events for the PRAM
consistency.
37Weak Restrictions
- Only write operations performed by a single
process are required to be viewed by other
processes in the order that they were performed. - In other terms, all writes generated by different
processes are concurrent. - Only the write order from same process needs to
be consistent, thus the name pipelined. - This is a weaker model than the causal model.
38System Architecture
Cache
Cache
Cache
Cache
MiddleWare (Java Groups)
DSM System
Central Memory
39Implementation
- The operations by the processes are carried as
shown below - Write(x)
- Update the local cache value.
- Send the updated value to all the
processes. - Read(x)
- If present in the cache, read it from
cache. - else goto main memory for the variable.
40continued
- Whenever a write is carried, the value is pushed
to all the processes, thus writes done by a
process are always seen in the order in which
they are written in the program as each is
broadcasted after its occurrence
41Data Structures
- Central Memory (CM)
- - An array CM of shared variables var1..var2.
- - We can do read operations and write
operations on this array. - - Array implemented using a Vector
- Local Cache
- - An array C of type int of size equal to
that of Central memorys. - - A boolean one-dimensional array V for
validity of the ith variable. - - We can do read operations and write
operations on cache. - - Arrays implemented using a Vector
42Pseudo Code
- At Processn
-
- Read ( in)
- - If (valid(in))
- fetch the element in from the cache
vector Vc - else
- - send read(in,n) to CM
- - receive value(in,n) from CM
- - update element in in cache
- - set valid(in) true
- - return value(in)
43Continued
- write( in, valn)
- - write value valn into element in of cache
vector - - send write( in, valn) to CM
- Receive ( in, valn)
- - write value valn into element in of cache
vector -
-
-
44- At Central memory
- Write (index in, value vn)
- - write value vn into element in of vector.
- - send in, vn to all the n
processes. - Read (processn, index in )
- - fetch the element in from the vector.
- - send value(in) to the processn.
45Issues
- Easy to implement
- - No guarantee about the order in which
different processes see writes. - - Except, that writes issued by a particular
process must arrive in pipeline - Processor does not have to stall waiting for each
one to complete before starting the next one.
46Final System
- We are using Java Groups as Middleware
- We have only a single group containing all the
processes and the central DSM. - We are using the Reliable, FIFO JChannel for the
communication between the processes and the DSM. - We have only two types of communications unicast
and broadcast which are efficiently provided by
Jchannel.
47- DSM Initialization
- DSM will be given an argument saying which
consistency level it should provide for the
processes. - Process Initialization
- When a process starts execution, it
- - sends a message to DSM inquiring about the
consistency level provided by the DSM. - - waits for the response
- - Initializes the variables related to the
consistency level so as to use the
corresponding library for communicating with
the DSM.
48- In order to connect to the system each process
should know - Group Address/Group Name
- Central DSM Address
- Scalable
- Easy to connect, with just one round of messages
- Less load on the network.
49Performance Evaluation
- Planning to test the performance of each
consistency level with large number of processes
accessing the shared memory - Calculating the write cycle time and read cycle
time for each consistency level at the
application level - Comparing our implementation of each consistency
level with the above criteria
50References
- Brown, G. Asynchronous multicaches. Distributed
Computing, 431-36, 1990. - Mizuno, M., Raynal, M., and Zhou J.Z. Sequential
consistency in distributed systems. - A Sequentially Consistent Distributed Shared
Memory, J. Zhou, M. Mizuno, and G. Singh. - Distributed Operating Systems, Andrew S.
Tanenbaum. - www.javagroups.com
- www.cis.ksu.edu/singh
51