Computer Science 328 Distributed Systems - PowerPoint PPT Presentation

1 / 31
About This Presentation
Title:

Computer Science 328 Distributed Systems

Description:

Dirty. W2. A writes a value W2. B snoops on the bus, and invalidate its ... Dirty. The cache consistency protocol is built upon the notion of snooping and ... – PowerPoint PPT presentation

Number of Views:40
Avg rating:3.0/5.0
Slides: 32
Provided by: mehdith
Category:

less

Transcript and Presenter's Notes

Title: Computer Science 328 Distributed Systems


1
Computer Science 328Distributed Systems
  • Lecture 16
  • Distributed Shared Memory

2
Multiprocessors and Multicomputers
  • In a multiprocessor, two or more CPUs share a
    common main memory. Any process on a processor
    can read/write any word in the shared memory.
  • In a multicomputer, each CPU has its own private
    memory.
  • Easier to build One can take a large number of
    single-board computers, each containing a CPU,
    memory, and a network interface, and connect them
    together.
  • Hard to program Communication has to use message
    passing. In contrast, in multiprocessors systems,
    one process just writes data to memory to be read
    by all the others.

3
Bus-Based Multiprocessors with Shared Memory
  • When any of the CPUs wants to read a word from
    the memory, it puts the address of the requested
    word and asserts a bus control (read) line.
  • To prevent two CPUs from accessing the memory at
    the same time, a bus arbitration mechanism is
    used, e.g., a CPU may assert a request line
    first.
  • To improve performance, each CPU can be equipped
    with a snooping cache.

CPU
CPU
CPU
Memory
CPU
Memory
CPU
CPU
Cache
Cache
Cache
Bus
Bus
4
Cache Consistency Write Through
All the other caches see the write (because they
are snooping on the bus) and check to see if
they Are also holding the word being modified. If
so, they invalidate the cache entries.
5
Cache Consistency Write Once
CPU
W1
A reads word W and gets W1. B does not respond
but the memory does
Initially both the memory and B have an updated
entry of word W.
W1
C
A
B
W1
C
A
B
W3
W1
W1
W2
Invalid
Dirty
Dirty
Invalid
A writes W again. This and subsequent writes by A
are done locally, without any bus traffic.
A writes a value W2. B snoops on the bus, and
invalidate its entry. As copy is marked as
dirty.
6
Cache Consistency Write Once
W1
C
A
B
W1
C
A
B
W4
W3
W1
W1
W3
Invalid
Dirty
Invalid
Dirty
Invalid
A writes a value W3. No bus traffic is incurred
C writes W. A sees the request by snooping on
the bus, asserts a signal that inhibits memory
from responding, provides the values, and
invalidates it own entry. C now has the only
valid copy.
The cache consistency protocol is built upon the
notion of snooping and built into the memory
management unit. Mechanisms are implemented in
hardware.
7
Ring-Based Multiprocessors with Shared Memory
  • On each machine, a single address space is
    divided into a private part and a shared part.
    Shared memory is divided into 32-byte blocks
    (units for transfer).
  • Each 32-byte block in the shared memory space has
    a home machine on which physical memory (home
    memory) is always reserved for it.
  • All the machines are connected in a token passing
    ring. The ring wire consists of 16 data bits and
    4 control bits.
  • The block table (indexed by block number) keeps
    track of where each block is located.
  • Valid bit if the block is present in the cache
    and up to date.
  • Exclusive bit if the local copy (if any) is the
    only one.
  • Home bit if this is the blocks home machine.
  • Location field where the block is located in the
    cache if it is present and valid.

8
Ring-Based Multiprocessors with Shared Memory
Valid
Exclusive
Interrupt
Block
Home Memory
MMU
Table
Cache
0 1 2 3 4
Location
CPU
Private Memory
Home
9
Protocol for Ring-Based Multiprocessors
  • To read a word from shared memory
  • The memory address is passed to the device, which
    checks the block table to see if the block is
    present.
  • If yes, the request is satisfied.
  • If not,
  • the device waits until it captures the
    circulating token and puts a request packet onto
    the ring.
  • As the packet passes around the ring, each device
    checks if it has the requested block. If it has
    the block, it provides it and clears the
    exclusive bit (if set).
  • When the token returns, it always has the
    requested block.
  • To write a word to shared memory
  • If the block is present and is the only copy
    (exclusive bit is set), the word is written
    locally.
  • If the block is present but not the only copy, an
    invalidation packet is first sent around the ring
    to invalidate all the other copies. When the
    invalidation packet returns, the exclusive bit is
    set and the write proceeds.
  • If the block is not present,
  • A packet is sent out that combines a read request
    and an invalidation request.
  • The first machine that has the block copies it
    onto the packet and discards (invalidates) its
    own copy. All subsequent machines just discard
    the block from their caches.

10
The Basic Model of DSM

Shred Address Space
0
1
2
3
4
5
6
7
8
9
0
2
1
4
3
6
8
7
9
5
P2
P1
P3
Shred Address Space
Shred Address Space
0
1
2
3
4
5
6
7
8
9
0
1
2
3
4
5
6
7
8
9
0
2
0
2
1
4
3
6
1
4
3
6
8
8
7
7
5
9
9
5
9
9
Read-only replicated page
Page Transfer
11
Distributed Shared Memory
  • In a DSM system, the address space is divided up
    into chunks (e.g., pages), with the chunks being
    spread over all the processors in the system.
  • When a processor references an address that is
    not local, a trap occurs, and the DSM software
    fetches the chunk containing the address and
    restarts the faulting instruction.
  • Major difference between a multiprocessor system
    with global shared memory (e.g., DASH) and a DSM
    system is that processors in the latter can only
    reference their own local memory.

12
Granularity of Chunks
  • When a process references a word that is absent,
    it causes a page fault.
  • On a page fault,
  • the missing page is just brought in from another
    machine instead of from the disk.
  • A region of 2, 4, or 8 pages including the
    missing page is brought in.
  • Locality of reference if a program has
    referenced one word on a page, it is likely to
    reference other neighboring words in the near
    future.
  • False sharing (likely to occur when the chunk
    size is large).

13
False Sharing
Processor 1
Processor 2
A B
A B
Two unrelated shared variables
Code using B
Code using A
14
Achieving Sequential Consistency
  • Achieving consistency is not an issue if
  • Pages are not replicated.
  • Only read-only pages are replicated.
  • Two approaches are taken in DSM
  • Update the write is allowed to take place
    locally, but the address of the modified word and
    its new value are broadcast on the bus
    simultaneously to all the other caches. Each
    cache holding the word copies the new value from
    the bus to its cache.
  • Invalidation The address of the modified word is
    broadcast on the bus, but the new value is not.
  • Paged-based DSM systems typically use an
    invalidation protocol instead of an update
    protocol.

15
Invalidation Protocol to Achieve Consistency
  • Each page is either in R or W state.
  • When a page is in W state, only one copy exists,
    mapped into the owners address space in
    read-write mode.
  • When a page is in R state, the owner has a copy
    (mapped read only), but other processes may have
    copies too.

In the case of processor 1 attempting for a read
(b)
(a)
16
Invalidation Protocol (Read)
Processor 1
Processor 2
(d)
(c)
In the first 4 cases, the page is mapped into its
address space, and no trap occurs.
(f)
(e)
  • Ask for a copy
  • Mark page as R
  • Do read
  • Ask for degradation
  • Ask for a copy
  • Mark page as R
  • Do read

17
Invalidation Protocol (Write)
Processor 1
Processor 2
Processor 1
Processor 2
  • Mark page as W
  • Do write

R
W
P
P
Owner
Owner
Processor 1
Processor 2
  • Ask for invalidation
  • Ask for ownership
  • Mark page as W
  • Do write
  • Invalidate copies
  • Mark page as W
  • Do write

18
Invalidation Protocol (Write)
  • Ask for invalidation
  • Ask for ownership
  • Ask for a page
  • Mark page as W
  • Do write
  • Ask for invalidation
  • Ask for ownership
  • Ask for a page
  • Mark page as W
  • Do write

19
Finding the Owner
  • Do a broadcast, asking for the owner to respond.
  • An optimization is to include in the message
    whether the sender wants to read/write and
    whether it needs a copy.
  • Broadcast interrupts each processor, forcing it
    to inspect the request packet.
  • Designate a page manager to keep track of who
    owns which page.
  • A page manager uses incoming requests not only to
    provide replies but also to keep track of changes
    in ownership.
  • Potential performance bottleneck ? multiple page
    managers
  • The lower-order bits of a page number is used as
    an index into a table of page managers.

Page Manager
Page Manager
1. Request
1. Request
2. Request forwarded
2. Reply
3. Request
P
Owner
Owner
P
3. Reply
4. Reply
20
How does the Owner Find the Copies to Invalidate
  • Broadcast a msg giving the page no. and asking
    processors holding the page to invalidate it.
  • works only if broadcast messages are reliable
    and can never be lost.
  • Each owner or page manager maintains a list of
    copyset telling which processors hold which
    pages.
  • When a page must be invalidated, the owner or
    page manager sends a message to each processor
    holding the page and waits for an acknowledgement.

3
4
3
2
1
4
4
1
2
3
2
3
5
1 3 4
2 4
2 4
Network
Copyset
21
Strict and Sequential Consistency
  • A tradeoff between accuracy and performance.
  • Strict Consistency (one-copy semantics)
  • Any read to a memory location x returns the value
    stored by the most recent write operation to x.
  • When memory is strictly consistent, all writes
    are instantaneously visible to all processes and
    an absolute global time order is maintained.
  • Sequential Consistency
  • The result of any execution is the same as if the
    operations of all processors were executed in
    some sequential order, and the operations of each
    individual processor appear in this sequence in
    the order specified by its program.
  • All processes must see the same sequence of
    memory reference.
  • Can be realized in a system with totally ordered
    reliable broadcast mechanism as follows all
    operations are broadcast. The exact order does
    not matter as long as all processes agree on the
    order of all operations on the shared memory.

22
Sequential Consistency in Textbook
  • Sequential Consistency for any execution
  • the interleaved sequence of operations is such
    that if R(x)a occurs in the sequence, then either
    the last write operation that occurs before it in
    the interleaved sequence is W(x)a, or no write
    operation occurs before it and a is the initial
    value of x.
  • the order of operations for any program is
    consistent with program order
  • In this model, writes must occur in the same
    order on all copies, reads however can be
    interleaved on each system, as convenient. Stale
    reads can occur.

23
How to Determine the Sequential Order?
  • Example Given H1 W(x)1 and H2 R(x)0 R(x)1, how
    do we come up with a single string S that gives
    the order the operations would have been carried
    out, subject to
  • Program order must be maintained
  • Memory coherence must be respected a read to
    some location, x must always return the value
    most recently written to x.
  • Answer S R(x)0 W(x)1 R(x)1

24
Causal Consistency
  • Writes that are potentially causally related must
    be seen by all processes in the same order.
    Concurrent writes may be seen in a different
    order on different machines.
  • Example 1

Concurrent writes
W(x) 3
P1
W(x)1
P2
R(x)1 W(x)2
P3
R(x)1
R(x)3 R(x)2
P4
R(x)2 R(x) 3
R(x)1
This sequence is allowed with causally consistent
memory
25
Causal Consistency
Causally related
P1
W(x)1
P2
R(x)1 W(x)2
P3
R(x)2 R(x)1
P4
R(x)1 R(x) 2
This sequence is not allowed with causally
consistent memory
P1
W(x)1
P2
W(x)2
P3
R(x)2 R(x)1
P4
R(x)1 R(x) 2
This sequence is allowed with causally consistent
memory
26
Pipelined RAM and Processor Consistency
  • Writes done by a single process are received by
    all other processes in the order in which they
    are issued, but writes from different processes
    may not be seen in a different order by different
    processes.

P1
W(x)1
P2
R(x)1 W(x)2
P3
R(x)2 R(x)1
P4
R(x)1 R(x) 2
This sequence is allowed with PRAM consistent
memory
27
Processor Consistency in Textbook
  • Processor Consistency
  • writes from a single processor must be seen by
    all processors in the same order
  • writes from different processors can be
    interleaved differently
  • This is less strict than sequential consistency
    (where all writes must be ordered)
  • This is useful in applications that each
    processor mainly depends on its own actions

28
Weak Consistency
  • Not all the applications require seeing all the
    writes, let alone seeing them in order.
  • E.g, a process is inside a critical section
    reading/writing some variables in a tight loop.
    Other processes are not supposed to touch the
    variables until the first process has left the
    critical section.
  • A synchronization variable is introduced. When a
    synchronization completes, all writes done on
    that machine are propagated outward and all
    writes done on other machines are brought in.
  • Access to synchronization variables are
    sequentially consistent.
  • No access to a synchronization variable is
    allowed to be performed until all previous writes
    have completed elsewhere.
  • Accessing a synchronization variable flushes the
    pipeline.
  • No data access (read/write) is allowed until all
    previous accesses to synchronization variables
    have been performed.

29
Weak Consistency
P1
W(x)1 W(x) 2
S
P2
P3
R(x)2 R(x)1 S
P4
R(x)1 R(x) 2 S
This sequence is allowed with weak consistent
memory
P1
W(x)1 W(x) 2 S
P2
P3
S R(x)2 R(x)2
P4
S R(x)2 R(x) 2
The memory in P3 and P4 has been brought up to
date
30
Release Consistency
  • Two synchronization variables are defined
  • Acquire gathering in all writes from other
    machines.
  • Release all locally initiated writes have been
    completed (propagated to all other machines).
  • Acquire and release do not have to apply to all
    memory, but instead guard specific shared
    variables.

P1
Rel(L)
Acq(L) W(x)1 W(x) 2
P2
P3
Acq(L) R(x)2 Rel(L)
P4
R(x) 1
This sequence is allowed with release consistent
memory
31
Mechanism for Realizing Release Consistency
  • To do an acquire, a process sends a message to a
    synchronization manager requesting an acquire on
    a lock.
  • After the lock is acquired, an arbitrary sequence
    of reads and writes to the shared data can be
    performed without being propagated to other
    machines.
  • When the release is done, the modified data are
    sent to the other machines holding copies.
  • After each machine acknowledges receipt of the
    data, the synchronization manager is informed of
    the release.
Write a Comment
User Comments (0)
About PowerShow.com