TECHNIQUES FOR REDUCING CONSISTENCY-RELATED COMMUNICATION IN DISTRIBUTED SHARED-MEMORY SYSTEMS - PowerPoint PPT Presentation

About This Presentation
Title:

TECHNIQUES FOR REDUCING CONSISTENCY-RELATED COMMUNICATION IN DISTRIBUTED SHARED-MEMORY SYSTEMS

Description:

TECHNIQUES FOR REDUCING CONSISTENCY-RELATED COMMUNICATION IN DISTRIBUTED SHARED-MEMORY SYSTEMS J. B. Carter University of Utah J. K. Bennett and W. Zwaenepoel – PowerPoint PPT presentation

Number of Views:101
Avg rating:3.0/5.0
Slides: 32
Provided by: Jehan79
Learn more at: https://www2.cs.uh.edu
Category:

less

Transcript and Presenter's Notes

Title: TECHNIQUES FOR REDUCING CONSISTENCY-RELATED COMMUNICATION IN DISTRIBUTED SHARED-MEMORY SYSTEMS


1
TECHNIQUES FOR REDUCING CONSISTENCY-RELATED
COMMUNICATION IN DISTRIBUTED SHARED-MEMORY SYSTEMS
  • J. B. CarterUniversity of Utah
  • J. K. Bennett and W. ZwaenepoelRice University

2
INTRODUCTION
  • Distributed shared memory is a software
    abstraction allowing a set of workstations
    connected by a LAN to share a single paged
    virtual address space
  • Key issue in building a software DSM is
    minimizing the amount of data communication among
    the workstation memories

3
Why bother with DSM?
  • Key idea is to build fast parallel computers that
  • are cheaper than conventional architectures
  • are convenient to use
  • Conventional parallel computer architecture was
    the shared memory multiprocessor

4
Conventional parallel architecture
CPU
CPU
CPU
CPU
Shared memory
5
Todays architecture
  • Clusters of workstations are much more cost
    effective
  • No need to develop complex bus and cache
    structures
  • Can use off-the-shelf networking hardware
  • Gigabit Ethernet
  • Myrinet (1.5 Gb/s)
  • Can quickly integrate newest microprocessors

6
Limitations of cluster approach
  • Communication within a cluster of workstation is
    through message passing
  • Much harder to program than concurrent access to
    a shared memory
  • Many big programs were written for shared memory
    architectures
  • Converting them to a message passing architecture
    is a nightmare

7
Distributed shared memory
main memories
DSM one shared global address space
8
Distributed shared memory
  • DSM makes a cluster of workstations look like a
    shared memory parallel computer
  • Easier to write new programs
  • Easier to port existing programs
  • Key problem is that DSM only provides the
    illusion of having a shared memory architecture
  • Data must still move back and forth among the
    workstations

9
Characterizing a DSM (I)
  • Four important issues
  • 1. Size of transfer units (level of granularity)
  • Big units are more efficient
  • Virtual memory pages
  • Can have false sharing whenever page contains
    different variables that are accessed at the same
    time by different processors

10
False Sharing
accesses y
accesses x
x y
page containing x and y will move back and
forthbetween main memories of workstations
11
Characterizing a DSM (II)
  • 2. Consistency model
  • Strict consistency is not possible
  • Various authors have proposed weak consistency
    models
  • Cheaper to implement
  • Harder to use in a correct fashion

12
Characterizing a DSM (III)
  • 3. Portability of programs
  • Some DSMs allow programs written for a
    multiprocessor architecture to run on a cluster
    of workstations without any modifications (dusty
    decks)
  • More efficient DSMs require more changes
  • 4. Portability of DSM
  • Some DSMs require specific OS features

13
MUNIN
  • Developed at Rice University
  • Based on software objects (variables)
  • Uses the processor virtual memory to detect
    access to the shared objects
  • Includes several techniques for reducing
    consistency-related communication
  • Only runs on top of V kernel

14
Key features
  • Software release consistency only requires the
    memory to be consistent at specific
    synchronization points,
  • Multiple consistency protocols allow the user to
    select the best consistency protocols for each
    data item,
  • Write-shared protocols reduce false sharing,
  • An update-with-timeout mechanism

15
SW RELEASE CONSISTENCY (I)
  • Well-written parallel programs use locks to
    achieve mutual exclusion when they access shared
    variables
  • P(mutex) and V(mutex)
  • lock(csect) and unlock(csect)
  • request ( ) and release( )
  • Unprotected accesses can produce unpredictable
    results

16
SW RELEASE CONSISTENCY (II)
  • SW release consistency will only guarantee
    correctness of operations within a
    request/release pair
  • No need to propagate new values of shared
    variables until the release
  • Must guarantee that workstation has received the
    most recent values of all shared variables when
    it completes a request

17
SW RELEASE CONSISTENCY (III)
  • shared int x
  • request( )// wait for new value of x
  • xrelease ( )
  • // propagate x2
  • shared int x
  • request( ) x 1release ( )
  • // propagate x1

18
SW RELEASE CONSISTENCY (IV)
  • Munin uses eager release new values of shared
    variables are propagated at release time
  • Lazy release delays propagation until a request
    is issued (Threadmarks)
  • A workstation issuing a request gets the current
    values of all shared variables
  • Shared variables are not associated to a
    particular critical section (as in Midway)

19
Munin Implementation (I)
  • Three kinds of variables
  • Ordinary variables can only be accessed by the
    process that created them
  • Shared data variables should always be
    accessed from within critical regions
  • Synchronization variables
  • locks, barriers or condition variables
  • must be accessed through special library
    procedures .

20
Munin Implementation (II)
  • When a processor modifies shared data inside a
    critical region, all update messages are buffered
    and delayed until the processor leaves the
    critical region
  • Processes accessing shared data variables outside
    critical regions do it at their own risks
  • Same as with shared memory model
  • Risk is higher

21
FOUR CONSISTENCY PROTOCOLS
  • 1. Conventional shared variables
  • Replicated on demand
  • Single writer/multiple readers policy uses an
    invalidation-based protocol
  • 2. Read-only variables
  • Replicated on demand
  • Any attempt to modify them will result in a
    runtime error

22
FOUR CONSISTENCY PROTOCOLS
  • 3. Migratory variables
  • Migrated among the processes accessing them
  • Every process accessing them will always get full
    read and write access
  • 4. Write-shared variables
  • Can be updated concurrently because different
    portions of the page are accessed

23
Implementation
  • Programmer uses annotations to specify any of the
    last three consistency protocols
  • Read-only variables
  • Migratory variables
  • Write-shared variables
  • Incorrect annotations may result in inefficient
    performance or in runtime errors but not in
    incorrect results

24
WRITE-SHARED PROTOCOL (I)
  • Designed to fight false sharing
  • Uses a copy-on-write mechanism
  • Whenever a process is granted access to
    write-shared data, the page containing these data
    is marked copy-on-write
  • First attempt to modify the contents of the page
    will result in the creation of a copy of the
    page modified (the twin).

25
Example
Before
First write access
x 1 y 2
x 1 y 2
twin
After
Compare with twin
x 3 y 2
New value of x is 3
26
WRITE-SHARED PROTOCOL (II)
  • At release time, the DSM will perform a word by
    word comparison of the page and its twin, store
    the diff in the space used by the twin page and
    notify all processors having a copy of the shared
    data of the update
  • A runtime switch can be set to check for
    conflicting updates to write-shared data.

27
UPDATE TIME-OUT MECHANISM
  • Munin does not send updates to processors holding
    stale replicas
  • Anytime a processor receives an update for a page
    for which it does not have a twin, the page is
    marked supervisor-only and the time of receipt of
    the update is recorded.
  • First local access to the page will cause a trap
    that will remove the restriction

28
UPDATE TIME-OUT MECHANISM
  • When a process receives an update for a page that
    is still marked supervisor only, it checks the
    timestamp of the last update
  • If more than 50 ms have elapsed, process notifies
    the originator of the update not to send more
    updates and invalidates the page.

29
CONCLUSIONS (I)
  • The strongest point of Munin is its excellent
    performance
  • typically within 5 to 33 of the performances of
    hand-coded message passing versions of the same
    programs
  • Its major limitation is its dependence of some
    features of the V kernel

30
CONCLUSIONS (II)
  • Munin requires programs to access shared data
    from within critical regions or after barriers
  • Appears to be a reasonable requirement
  • Munin allows users to tune the performance of
    their programs by selecting the best consistency
    protocol for each shared variable
  • Can quickly become a tedious process

31
FURTHER DEVELOPMENTS
  • Same team has come with a successor to Munin
    named TreadMarks
  • Key differences are
  • TreadMarks uses a more complexlazy release
    protocol
  • TreadMarks is UNIX-based
  • More portable
Write a Comment
User Comments (0)
About PowerShow.com