Distributed%20Shared%20Memory%20CIS825%20Project%20Presentation - PowerPoint PPT Presentation

About This Presentation

Title:

Distributed%20Shared%20Memory%20CIS825%20Project%20Presentation

Description:

A mechanism is to be defined for the access of the shared location, otherwise ... sends a message to DSM inquiring about the consistency level provided by the DSM. ... – PowerPoint PPT presentation

Number of Views:53

Avg rating:3.0/5.0

Slides: 52

Provided by: sky66

Learn more at: https://people.cis.ksu.edu

Category:

more less

Transcript and Presenter's Notes

Title: Distributed%20Shared%20Memory%20CIS825%20Project%20Presentation

1
Distributed Shared MemoryCIS825 Project
Presentation

Sathish R. Yenna Avinash Ponugoti
Rajaravi Kollarapu Yogesh Bharadwaj
Sethuraman Subramanian Nagarjuna Nagulapati
Manmohan Uttarwar

2
Distributed Shared Memory

Introduction
Consistency models
Sequential consistency
PRAM consistency
Release consistency
Final System
Performance Evaluation

3
Introduction

What is shared memory?
- Memory location or object accessed by two or
more different processes running on same machine
- A mechanism is to be defined for the access of
the shared location, otherwise unpredicted states
will result
- Many operating systems provide various
mechanisms to avoid simultaneous access of the
shared memory
For example semaphores, monitors etc..

Ex Consider the Reader/Writer Problem
We have a shared buffer into which writer writes
and reader reads form the values from the same.
For avoiding writing on existing value and
reading the same twice, we need to have a
mechanism.
We have semaphore/monitors provided by OS to
avoid simultaneous access.
What if writer is writing from one machine and
reader is reading from another machine???

Reader
Memory
Writer
5

What is Distributed shared memory?
- Memory accessed by two or more processes
running on different machines connected via a
communication network
Formal Definition
A Distributed Shared Memory System is a pair
(P,M) where P is a set of N processors P1, P2,
P3, . . . Pn and M is a shared memory.
Each process Pi sequentially executes read and
write operations on data items in M in the order
defined by the program running on it.

DSM improves the performance of the whole system
An abstraction like DSM simplifies the
application programming
BUT
- The main problem is how to keep the memory
consistent
We dont have traditional semaphores or monitors
to control the accesses in DSM
We can implement by keeping the memory at a
central location and allowing processes at
different machines to access
We can only use the message transmission as an
aid to control the accesses

But networks are slow, so for performance
improvement, we have to keep various copies of
the same variable at various machines
Maintaining perfect consistency (i.e., Any read
to a variable x returns the value stored by the
most recent write operation to x) of all the
copies is hard and results in low performance as
the processes are on different machines
communicating over a slow network
The solution is to accept less than perfect
consistency as the price for better performance
Moreover many application programs dont require
strict consistency
Because of all these reasons many consistency
models were defined

8
Consistency Models

A consistency model is essentially a contract
between the software and the memory. If the
software agrees to obey certain rules, the memory
promises to work correctly.
In our project we are implementing three of
them
- Sequential consistency
- PRAM consistency
- Release consistency

9
Sequential Consistency

A system is sequentially consistent if the
result of the any execution is the same as if
- the operations of all the processors were
executed in some sequential order, and
- the operations of each individual processor
appear in this sequence in the order specified
by its program.

- When processes run in parallel on different
machines, any valid interleaving is acceptable
behavior, but all processes must see the same
sequence of memory references.
- Note that nothing is said about time that is
no reference to the most recent store.
- It merely guarantees that all processes see all
memory references in the same order.
- Two possible results of the same program
P1 W(x)1 P1 W(x)1
---------------------------------
------------------------------------
P2 R(x)0 R(x)1 P2 R(x)1 R(x)1

11
Implementation

Browns Algorithm
Each Process has a queue INi of invalidation
requests
W( x) v Perform all invalidations in IN queue.
Update the main memory and cache.
Place invalidation requests in IN queue of
each process.
R( x) If x is in cache then read it from
cache
Else
Perform all invalidations in INi
Read from the main memory

12
Problems with Browns Implementation

All the three operations in W( x) v i.e.,
updating cache, main memory and broadcasting
invalid message should be done atomically.
For ensuring the above atomicity, we will have to
use robust mechanism involving an agreement by
all the processes. There is lot of communication
overhead involved in ensuring this atomicity.
For a single write, we have N invalid messages
being transmitted, where N is the number of
processes.

13
Sequentially Consistent DSM Protocol- J. Zhou,
M. Mizuno, and G. Singh

DSM System consists shared memory module (SMem
Manager) and local manager (Processor Manager) at
each machine.
Each Processor manager
- handles requests to read or write objects
from the user processes
- communicates with SMem manager.
SMem manager
- processes request messages from
processor managers to read or write objects.

14
Protocol Description

SMem manages the following data structures
- Object memory MObject Range
- Twodimensional binary array
Hold_Last_WriteProcessor Range, Object
Range
At any time T ,
- Hold_Last_Writei x1 object x in
the cache at processor i holds a
value written by the last- write with
respect to T ,
- Hold_Last_Writei x0 object x in
the cache at processor i does not hold
a value written by the last-write with
respect to T.
Each element of Hold_Last_Write is
initialized to 0.
Let us say n processors and m objects.

Each processor i maintains the following data
structures
Onedimensional binary array Validi Object
Range
-- Validi x 1 object x in the cache is
valid
-- Validi x 0 object x in the cache is
not valid
Each element of Validi is initialized to 0.
For each object x such that Validi x 1, Ci
x (Ci x is cache memory to hold value of
object x)

Operations at processor i
Write(x v)
sendwrite,x v to SMem
receiveInvalid_array1..m message from SMem
Validi 1..m Invalid_array1..m
//elementwise // assignments
Ci x v
Read(x)
if Validi x 0 then
sendread,x message to SMem
receive v, Invalid_array1..m from SMem
Validi 1..m Invalid_array1..m
Ci x v
endif
return Ci x

Operations at SMem
Process write,x v message from processor i
M x v
Hold_Last_Write1..n x 0
Hold_Last_Writei x 1
send Hold_Last_Writei 1..m to processor i
/send processor i's row of Hold_Last_Write
to i.
Processor i receives the row in Invalid
array /
Process read,x message from processor i
Hold_Last_Writei x 1
send M x, Hold_Last_Writei 1..m to
processor i
Each procedure is executed atomically.

18
Advantages of the SC-DSM Protocol by J. Zhou,
M. Mizuno, and G. Singh

The number of messages to be exchanged for read
and write operations is the same and requires
considerably less.
- A write operation requires one round of
message exchange between the processor and
the shared memory.
- A read operation at a processor also requires
one round of message exchange between the
processor and the shared memory if the object
is not found in its local cache.
The protocol does not require an atomic
broadcast.
The protocol does not require any broadcast of
messages.

19
Release Consistency

Sequential and PRAM consistencies are
restrictive.
For the case when a process is reading or writing
some variables inside a CS.
Drawback
No way for memory to differentiate between
entering or leaving a CS.
So, release consistency is introduced.

20
Release Consistency

Three classes of variables
Ordinary variables
Shared data variables
Synchronization variables Acquire and Release
(CS)
DSM has to guarantee the consistency of the
shared
data variables. If a shared variable is read
without
acquire memory has no obligation to return the
current
value.

21
Protected Variables

Acquire and release do not have to apply to all
of the memory.
Only specific shared variables may be guarded, in
which case these variables are kept consistent
and called protected variables.
On acquire, the memory makes sure that all the
local copies of protected variables are made
consistent and changes are propagated to other
machines on release.

22
P1 Acq(L) W(x)1 W(x)2 Rel(L)
P2 Acq(L) R(x)2 Rel(L)
P3
R(x)1 Fig Valid event sequence
for release consistency.
23
Rules for release consistency

Before an ordinary access to a shared variable is
performed, all previous acquires done by the
process must have completed successfully.
Before a release is allowed to be performed, all
previous reads and writes done by the process
must have completed.
The acquire and release accesses must be
processor consistent (sequential consistency is
not required).

24
Implementation of Release Consistency

Two types of implementation
Eager release consistency
Broadcast of modified data to all other
processors is done at the time of release.
Lazy release consistency
A process gets the most recent values of the
variables when it tries to acquire them.

25
Our Implementation

Eager release consistency
All the operations are done locally by the
process and then sent to the DSM, which then
broadcasts the updated values to all the other
processes.

26
Data Structures
Each process Pi maintains the following data
structures Cache array cache1..n //
cache memory Array valid1..n
// whether the value in the cache is valid or
not (0/1) Array locked1..n //
whether the variable is locked or not (0/1) Array
request1..m // which variables
it wants to lock Distributed Shared Memory
(DSM) maintains the following data
structures Memory array M1..n //
central memory Array lock1..n
// to keep track of which variables are locked
(0/1) Array whom1..n // locked
by which processor Array pending1..m
// processes who are yet to be replied Array
invalidate1..m // values processes need
to invalidate
27
Operations at Processor Pi
lock(list of variables) send(Pid, ACQUIRE,
no_of_variables, request1..m) receive(ACK
and received_values) for i 1 to
m lockedi 1 read(i) if
lockedi return cachei else if
validi return cachei else send(Pid,
READ, i) receive(x) cachei
x validi 1
28
Operations at Processor Pi
write(i, x) if lockedi cachei
x validi 1 else send(Pid,
WRITE, i, x) cachei x validi
1 unlock(list of variables) send(Pid,
RELEASE, locked1..m, cache1..m)
receiveACK for i 1 to n lockedi
0
29
Operations at DSM
receive() switch(message) case READ
send(Mi) break case WRITE Mi
x break
30
case ACQUIRE / for all the variable indices in
request1..m, check in lock if they are free
/ for i 0 to no_of_variables if
(lockrequesti 0) lockrequesti
1 whomrequesti Pid requested_variabl
e_valuesi Mrequesti continue else
for i 0 to no_of_variables
lockrequesti 0 whomrequesti
0 / add requesti to pending
/ pendingPid, i requesti break
send(requested_variable_values) break
31
case RELEASE / has received arrays locked and
cache / for i 0 to no_of_variables
Mlockedi cachei invalidatei
lockedi broadcast(invalidate) receive
(ACK) for i 0 to no_of_variables lockloc
kedi 0 whomlockedi
0 send(Pid, ACK) check(pending) chec
k() for i 0 to n / if all pendingi
1, send(ACK, Pid) / break
32
Code for P1 Lock (a, b ,c)
Write (a) Read (b)
Write (c) Write (c)
Unlock (a, b, c)
Sample Execution
P1
DSM
P2

(ACQUIRE, request)

(ACK, values)

Enter CS Write (a)
Read (b)
Write (c) Write (b)
(RELEASE, locked, cache
Exit CS
BROADCAST
ACK
RELEASE_ACK
Leave CS
33
Performance Issues

Knowing the Execution History
Broadcast overhead can be reduced
No potential deadlocks
Operations inside the critical section are atomic

34
PRAM Consistency

The total ordering of requests leads to
inefficiency due to more data movement and
synchronization requirements than what a program
may really call for.
A more relaxed version than Sequential
consistency is PRAM.

35
PRAM(contd)

PRAM stands for Pipelined RAM, thus, pipelined
random access
Writes done by a single process are received
by all the processes in the order in which they
were issued but writes from different processes
may be seen in a different order by different
processes.

36
Example

P1 W(x)1
P2 R(x)1 W(x)2
P3 R(x)1 R(x)2
P4 R(x)2 R(x)1
Fig Valid sequence of events for the PRAM
consistency.

37
Weak Restrictions

Only write operations performed by a single
process are required to be viewed by other
processes in the order that they were performed.
In other terms, all writes generated by different
processes are concurrent.
Only the write order from same process needs to
be consistent, thus the name pipelined.
This is a weaker model than the causal model.

38
System Architecture
Cache
Cache
Cache
Cache
MiddleWare (Java Groups)
DSM System
Central Memory
39
Implementation

The operations by the processes are carried as
shown below
Write(x)
Update the local cache value.
Send the updated value to all the
processes.
Read(x)
If present in the cache, read it from
cache.
else goto main memory for the variable.

40
continued

Whenever a write is carried, the value is pushed
to all the processes, thus writes done by a
process are always seen in the order in which
they are written in the program as each is
broadcasted after its occurrence

41
Data Structures

Central Memory (CM)
- An array CM of shared variables var1..var2.
- We can do read operations and write
operations on this array.
- Array implemented using a Vector
Local Cache
- An array C of type int of size equal to
that of Central memorys.
- A boolean one-dimensional array V for
validity of the ith variable.
- We can do read operations and write
operations on cache.
- Arrays implemented using a Vector

42
Pseudo Code

At Processn
Read ( in)
- If (valid(in))
fetch the element in from the cache
vector Vc
else
- send read(in,n) to CM
- receive value(in,n) from CM
- update element in in cache
- set valid(in) true
- return value(in)

43
Continued

write( in, valn)
- write value valn into element in of cache
vector
- send write( in, valn) to CM
Receive ( in, valn)
- write value valn into element in of cache
vector

At Central memory
Write (index in, value vn)
- write value vn into element in of vector.
- send in, vn to all the n
processes.
Read (processn, index in )
- fetch the element in from the vector.
- send value(in) to the processn.

45
Issues

Easy to implement
- No guarantee about the order in which
different processes see writes.
- Except, that writes issued by a particular
process must arrive in pipeline
Processor does not have to stall waiting for each
one to complete before starting the next one.

46
Final System

We are using Java Groups as Middleware
We have only a single group containing all the
processes and the central DSM.
We are using the Reliable, FIFO JChannel for the
communication between the processes and the DSM.
We have only two types of communications unicast
and broadcast which are efficiently provided by
Jchannel.

DSM Initialization
DSM will be given an argument saying which
consistency level it should provide for the
processes.
Process Initialization
When a process starts execution, it
- sends a message to DSM inquiring about the
consistency level provided by the DSM.
- waits for the response
- Initializes the variables related to the
consistency level so as to use the
corresponding library for communicating with
the DSM.

In order to connect to the system each process
should know
Group Address/Group Name
Central DSM Address
Scalable
Easy to connect, with just one round of messages
Less load on the network.

49
Performance Evaluation

Planning to test the performance of each
consistency level with large number of processes
accessing the shared memory
Calculating the write cycle time and read cycle
time for each consistency level at the
application level
Comparing our implementation of each consistency
level with the above criteria

50
References

Brown, G. Asynchronous multicaches. Distributed
Computing, 431-36, 1990.
Mizuno, M., Raynal, M., and Zhou J.Z. Sequential
consistency in distributed systems.
A Sequentially Consistent Distributed Shared
Memory, J. Zhou, M. Mizuno, and G. Singh.
Distributed Operating Systems, Andrew S.
Tanenbaum.
www.javagroups.com
www.cis.ksu.edu/singh