Title: TECHNIQUES FOR REDUCING CONSISTENCY-RELATED COMMUNICATION IN DISTRIBUTED SHARED-MEMORY SYSTEMS
1TECHNIQUES FOR REDUCING CONSISTENCY-RELATED
COMMUNICATION IN DISTRIBUTED SHARED-MEMORY SYSTEMS
- J. B. CarterUniversity of Utah
- J. K. Bennett and W. ZwaenepoelRice University
2INTRODUCTION
- Distributed shared memory is a software
abstraction allowing a set of workstations
connected by a LAN to share a single paged
virtual address space - Key issue in building a software DSM is
minimizing the amount of data communication among
the workstation memories
3Why bother with DSM?
- Key idea is to build fast parallel computers that
- are cheaper than conventional architectures
- are convenient to use
- Conventional parallel computer architecture was
the shared memory multiprocessor
4Conventional parallel architecture
CPU
CPU
CPU
CPU
Shared memory
5Todays architecture
- Clusters of workstations are much more cost
effective - No need to develop complex bus and cache
structures - Can use off-the-shelf networking hardware
- Gigabit Ethernet
- Myrinet (1.5 Gb/s)
- Can quickly integrate newest microprocessors
6Limitations of cluster approach
- Communication within a cluster of workstation is
through message passing - Much harder to program than concurrent access to
a shared memory - Many big programs were written for shared memory
architectures - Converting them to a message passing architecture
is a nightmare
7Distributed shared memory
main memories
DSM one shared global address space
8Distributed shared memory
- DSM makes a cluster of workstations look like a
shared memory parallel computer - Easier to write new programs
- Easier to port existing programs
- Key problem is that DSM only provides the
illusion of having a shared memory architecture - Data must still move back and forth among the
workstations
9Characterizing a DSM (I)
- Four important issues
- 1. Size of transfer units (level of granularity)
- Big units are more efficient
- Virtual memory pages
- Can have false sharing whenever page contains
different variables that are accessed at the same
time by different processors
10False Sharing
accesses y
accesses x
x y
page containing x and y will move back and
forthbetween main memories of workstations
11Characterizing a DSM (II)
- 2. Consistency model
- Strict consistency is not possible
- Various authors have proposed weak consistency
models - Cheaper to implement
- Harder to use in a correct fashion
12Characterizing a DSM (III)
- 3. Portability of programs
- Some DSMs allow programs written for a
multiprocessor architecture to run on a cluster
of workstations without any modifications (dusty
decks) - More efficient DSMs require more changes
- 4. Portability of DSM
- Some DSMs require specific OS features
13MUNIN
- Developed at Rice University
- Based on software objects (variables)
- Uses the processor virtual memory to detect
access to the shared objects - Includes several techniques for reducing
consistency-related communication - Only runs on top of V kernel
14Key features
- Software release consistency only requires the
memory to be consistent at specific
synchronization points, - Multiple consistency protocols allow the user to
select the best consistency protocols for each
data item, - Write-shared protocols reduce false sharing,
- An update-with-timeout mechanism
15SW RELEASE CONSISTENCY (I)
- Well-written parallel programs use locks to
achieve mutual exclusion when they access shared
variables - P(mutex) and V(mutex)
- lock(csect) and unlock(csect)
- request ( ) and release( )
- Unprotected accesses can produce unpredictable
results
16SW RELEASE CONSISTENCY (II)
- SW release consistency will only guarantee
correctness of operations within a
request/release pair - No need to propagate new values of shared
variables until the release - Must guarantee that workstation has received the
most recent values of all shared variables when
it completes a request
17SW RELEASE CONSISTENCY (III)
- shared int x
- request( )// wait for new value of x
- xrelease ( )
- // propagate x2
- shared int x
- request( ) x 1release ( )
- // propagate x1
18SW RELEASE CONSISTENCY (IV)
- Munin uses eager release new values of shared
variables are propagated at release time - Lazy release delays propagation until a request
is issued (Threadmarks) - A workstation issuing a request gets the current
values of all shared variables - Shared variables are not associated to a
particular critical section (as in Midway)
19Munin Implementation (I)
- Three kinds of variables
- Ordinary variables can only be accessed by the
process that created them - Shared data variables should always be
accessed from within critical regions - Synchronization variables
- locks, barriers or condition variables
- must be accessed through special library
procedures .
20Munin Implementation (II)
- When a processor modifies shared data inside a
critical region, all update messages are buffered
and delayed until the processor leaves the
critical region - Processes accessing shared data variables outside
critical regions do it at their own risks - Same as with shared memory model
- Risk is higher
21FOUR CONSISTENCY PROTOCOLS
- 1. Conventional shared variables
- Replicated on demand
- Single writer/multiple readers policy uses an
invalidation-based protocol - 2. Read-only variables
- Replicated on demand
- Any attempt to modify them will result in a
runtime error
22FOUR CONSISTENCY PROTOCOLS
- 3. Migratory variables
- Migrated among the processes accessing them
- Every process accessing them will always get full
read and write access - 4. Write-shared variables
- Can be updated concurrently because different
portions of the page are accessed
23Implementation
- Programmer uses annotations to specify any of the
last three consistency protocols - Read-only variables
- Migratory variables
- Write-shared variables
- Incorrect annotations may result in inefficient
performance or in runtime errors but not in
incorrect results
24WRITE-SHARED PROTOCOL (I)
- Designed to fight false sharing
- Uses a copy-on-write mechanism
- Whenever a process is granted access to
write-shared data, the page containing these data
is marked copy-on-write - First attempt to modify the contents of the page
will result in the creation of a copy of the
page modified (the twin).
25Example
Before
First write access
x 1 y 2
x 1 y 2
twin
After
Compare with twin
x 3 y 2
New value of x is 3
26WRITE-SHARED PROTOCOL (II)
- At release time, the DSM will perform a word by
word comparison of the page and its twin, store
the diff in the space used by the twin page and
notify all processors having a copy of the shared
data of the update - A runtime switch can be set to check for
conflicting updates to write-shared data.
27UPDATE TIME-OUT MECHANISM
- Munin does not send updates to processors holding
stale replicas - Anytime a processor receives an update for a page
for which it does not have a twin, the page is
marked supervisor-only and the time of receipt of
the update is recorded. - First local access to the page will cause a trap
that will remove the restriction
28UPDATE TIME-OUT MECHANISM
- When a process receives an update for a page that
is still marked supervisor only, it checks the
timestamp of the last update - If more than 50 ms have elapsed, process notifies
the originator of the update not to send more
updates and invalidates the page.
29CONCLUSIONS (I)
- The strongest point of Munin is its excellent
performance - typically within 5 to 33 of the performances of
hand-coded message passing versions of the same
programs - Its major limitation is its dependence of some
features of the V kernel
30CONCLUSIONS (II)
- Munin requires programs to access shared data
from within critical regions or after barriers - Appears to be a reasonable requirement
- Munin allows users to tune the performance of
their programs by selecting the best consistency
protocol for each shared variable - Can quickly become a tedious process
31FURTHER DEVELOPMENTS
- Same team has come with a successor to Munin
named TreadMarks - Key differences are
- TreadMarks uses a more complexlazy release
protocol - TreadMarks is UNIX-based
- More portable