Distributed Shared Memory for Large-Scale Dynamic Systems - PowerPoint PPT Presentation

1 / 65
About This Presentation
Title:

Distributed Shared Memory for Large-Scale Dynamic Systems

Description:

Distributed Shared Memory for Large-Scale Dynamic Systems Vincent Gramoli supervised by Michel Raynal – PowerPoint PPT presentation

Number of Views:195
Avg rating:3.0/5.0
Slides: 66
Provided by: vgr51
Category:

less

Transcript and Presenter's Notes

Title: Distributed Shared Memory for Large-Scale Dynamic Systems


1
Distributed Shared Memoryfor Large-Scale
Dynamic Systems
  • Vincent Gramoli
  • supervised by Michel Raynal

2
My Thesis
  • Implementing a distributed shared memory for
  • large-scale dynamic systems

3
My Thesis
  • Implementing a distributed shared memory for
  • large-scale dynamic systems
  • is
  • NECESSARY,

4
My Thesis
  • Implementing a distributed shared memory for
  • large-scale dynamic systems
  • is
  • NECESSARY,
  • DIFFICULT,

5
My Thesis
  • Implementing a distributed shared memory for
  • large-scale dynamic systems
  • is
  • NECESSARY,
  • DIFFICULT,
  • DOABLE!

6
RoadMap
  • Necessary? Communicating in Large-Scale Systems
  • An Example of Distributed Shared Memory
  • Difficult? Facing Dynamism is not trivial
  • Difficult? Facing Scalability is tricky too
  • Doable? Yes, here is a solution!
  • Conclusion

7
RoadMap
  • Necessary? Communicating in Large-Scale Systems
  • An Example of Distributed Shared Memory
  • Difficult? Facing Dynamism is not trivial
  • Difficult? Facing Scalability is tricky too
  • Doable? Yes, here is a solution!
  • Conclusion

8
Distributed Systems Enlarge
  • Internet explosion IPv4 -gt IPv6
  • Multiplication of personal devices
  • 17 billions of network devices by 2012 (IDC
    prediction)

Internet
9
Distributed Systems are Dynamic
  • Independent computational entities act
    asynchronously, and are affected by unpredictable
    events (join/leaving).
  • These sporadic activities make the system dynamic

10
Massively Accessed Applications
  • WebServices use large information
  • eBay Auctioning service
  • Wikipedia Collaborative encyclopedia
  • LastMinute Booking application
  • but require too much power supply and cost too
    much

increase (auction)
modify (article)
reserve (tickets)
11
Massively Distributed Applications
  • Peer-to-Peer applications share resources
  • BitTorrent File Sharing
  • Skype Voice over IP
  • Joost Video Streaming
  • but prevent large-scale collaboration.

copy
exchange
create
12
Filling the Gap is Necessary
  • Providing distributed applications where entities
    (nodes) can fully collaborate
  • P2Pedia using P2P to built a collaborative
    encyclopedia
  • P2P eBay using P2P as an auctioning service

13
There are 2 Ways of Colaborating
  • Using a Shared Memory
  • A node writes information in the memory
  • Another node reads information from the memory
  • Using Message Passing
  • A node sends a message to another node
  • The second node receives the message from the
    other

Memory
Read v
Write v
Node 1
Node 2
Node 3
Node 1
Send v
Node 2
Recv v
Node 3
14
Shared Memory is Easier to Use
  • Shared Memory is easy to use
  • If information is written, collaboration
    progresses!
  • Message Passing is difficult to use
  • To which node the information should be sent?

15
Message Passing Tolerates Failures
  • Shared Memory is failure-prone
  • Communication relies on memory availability
  • Message-Passing is fault-tolerant
  • As long as there is a way to route a message

Memory
Read v
Write v
Node 1
Node 2
Node 3
Node 1
Node 2
Node 3
Send v
Recv v
16
The Best of the 2 Ways
  • Distributed Shared Memory (DSM)
  • emulates a Shared Memory to provide simplicity,
  • in the Message Passing model to tolerate
    failures.

DSM
read / write(v) operations
read-ack(v) / write-ack
17
RoadMap
  • Necessary? Communicating in Large-Scale Systems
  • An Example of Distributed Shared Memory
  • Difficult? Facing Dynamism is not trivial
  • Difficult? Facing Scalability is tricky too
  • Doable? Yes, here is a solution!
  • Conclusion

18
Our DSM ConsistencyAtomicity
  • Atomicity (Linearizability) defines an operation
    ordering
  • If an operation ends before another starts, then
    it can not be ordered after
  • Write operations are totally ordered and read
    operations are ordered with respect to write
    operations
  • A read returns the last value written (or the
    default one if none exist)

19
Quorum-based DSM
Sharing memory robustly in message-passing
systems H. Attiya, A. Bar-Noy, D. Dolev, JACM
1995
  • Quorums mutually intersecting sets of nodes
  • Ex. 3 quorums of size q2, with memory size m3

Q1 n Q2 ? Ø Q1 n Q3 ? Ø Q2 n Q3 ? Ø
Q1
Q2
Q3
  • Each node of the quorums maintains
  • A local value v of the object
  • A unique tag t, the version number of this value

20
Quorum-based DSM
  • Read and write operations
  • A node i reads the object value vk by
  • Asking vj and tj to each node j of a quorum
  • Choosing the value vk with the largest tag tk
  • Replicating vk and tk to all nodes of a quorum
  • A node i writes a new object value vn by
  • Asking tj to each node j of a quorum
  • Choosing a larger tn than any tj returned
  • Replicating vn and tn to all nodes of a quorum

Get ltvk,tkgt
Set ltvk,tkgt
Get ltvk,tkgt
tn tk
Set ltvn,tngt
21
Quorum-based DSM
  • Reading a value

Q1
Q2
Q3
value? tag?
v1,t1
22
Quorum-based DSM
  • Reading a value

Q1
Q2
Q3
v1,t1
23
Quorum-based DSM
  • Reading a value

Q1
Q2
Q3
Output v1
24
Quorum-based DSM
  • Writing a value v2

Input v2
Q1
Q2
Q3
25
Quorum-based DSM
  • Writing a value v2

max tag?
t1
Q1
Q2
Q3
26
Quorum-based DSM
  • Writing a value v2

Q1
Q2
v2,t2 (with t2 gt t1)
Q3
27
Quorum-based DSM
  • Works well in static system
  • Number of failures f must be f m - q

Q1 n Q2 ? Ø Q2 n Q3 ? Ø
Q1
Q2
Q3
  • All operations can access a quorum

28
Quorum-based DSM
  • Does not work in dynamic systems
  • All quorums may fail if failures are unbounded

Problem Q1 n Q2 Ø and Q1 n Q3 Ø and
Q2 n Q3 Ø
Q1
Q2
Q3
29
RoadMap
  • Necessary? Communicating in Large-Scale Systems
  • An Example of Distributed Shared Memory
  • Difficult? Facing Dynamism is not trivial
  • Difficult? Facing Scalability is tricky too
  • Doable? Yes, here is a solution!
  • Conclusion

30
Reconfiguring
  • Dynamism produces unbounded number of failures
  • Solution Reconfiguration
  • Replacing the quorum configuration periodically

Problem Q1 n Q2 Ø and Q1 n Q3 Ø and
Q2 n Q3 Ø
Q1
Q2
Q3
31
Agreeing on the Configuration
  • All must agree on the next configuration
  • Quorum-based consensus algorithm Paxos
  • Before, a consensus block complemented the DSM
    service
  • Paxos, 3-phase leader-based algorithm
  • Prepare a ballot (2 message delays)
  • Propose a configuration to install (2 message
    delays)
  • Propagate the decided configuration (1 message
    delay)

RAMBO Reconfigurable Atomic Memory Service for
Dynamic Networks N. Lynch, A. Shvartsman, DISC
2002
32
RDS Reconfigurable Distributed Storage
  • RDS integrates consensus service into the
    reconfigurable DSM
  • Fast version of Paxos
  • Remove the first phase (in some cases)
  • Quorums also propagate configuration
  • Ensuring Read/Write Atomicity
  • Piggyback object information into Paxos messages
  • Parallelizing Obsolete Configuration Removal
  • Add an additional message to the propagate phase
    of Paxos

33
Contributions
  • Operations are fast (sometimes optimal)
  • 1 to 2 message delays
  • Reconfiguration is fast (fault-tolerance)
  • 3 to 5 message delays
  • While
  • Operation atomicity and
  • Operation independence are preserved

34
Facing Dynamism
  • Reconfigurable Distributed Storage
  • G. Chockler, S. Gilbert, V. Gramoli, P. Musial,
    A. Shvartsman
  • Proceedings of OPODIS 2005

35
RoadMap
  • Necessary? Communicating in Large-Scale Systems
  • An Example of Distributed Shared Memory
  • Difficult? Facing Dynamism is not trivial
  • Difficult? Facing Scalability is tricky too
  • Doable? Yes, here is a solution!
  • Conclusion

36
Facing Scalability is Difficult
  • Problems
  • Large-scale participation induces load
  • When load is too high, requests can be lost
  • Bandwidth resources are limited
  • Goal Tolerate load by preventing communication
    overhead
  • Solution A DSM that adapts to load variations
    and that restricts communication

37
Using Logical Overlay
  • Object replicas r1, , rk share a 2-dim
    coordinate space

r1 r1 r2 r3 r4
r5 r6 r7 r8 r8


rk-1
rk
38
Benefiting from Locality
  • Each replica ri can communicate only with its
    nearest neighbors






ri

39
Reparing the Overlay
  • Topology takeover mechanism






If a node ri fails, a takeover node rj replaces it
rj
ri

A Scalable Content-Addressable Network S.
Ratnasamy, P. Francis, M. Handley, R. Karp, S.
Shenker SIGCOMM 2001
40
Dynamic Bi-Quorums
  • Bi-Quorums
  • Quorums of two types where not all quorums
    intersect
  • Quorums of different types intersect
  • Vertical Quorum All replicas responsible of an
    abscissa x
  • Horizontal Quorum All replicas responsible of an
    ordinate y

x





For any horizontal quorum H and any vertical
quorum V H ? V ? Ø
y
41
Operation Execution




  • Read Operation
  • Get up-to-date value and largest tag on a
    horizontal quorum,
  • 2) Propagate this value and tag on a vertical
    quorum.
  • Write Operation
  • Get up-to-date value and largest tag on a
    horizontal quorum,
  • 2) Propagate the value to write (and a higher
    tag) twice on the same vertical quorum






42
Load Adaptation





Thwart requests follow the diagonal until a
non-overloaded node is found.





Expansion A node is added to the memory if no
non-overloaded node is found.





Shrink if underloaded, a node leaves the memory
after having notified its neighbors.
43
Contributions
  • SQUARE is a DSM that
  • Scales well by tolerating load variations
  • Defines load-optimal quorums (under reasonable
    assumption)
  • Uses communication efficient reconfiguration

44
Operation Latency
Request rate Memory size Read Latency Write Latency
100 10 479 733
125 14 622 812
250 24 1132 1396
500 46 1501 2173
1000 98 2408 3501
Bad News The operation latency increases with
the load (request rate)
45
Facing Scalability is Difficult
  • P2P Architecture for Self- Atomic Memory
  • E. Anceaume, M. Gradinariu, V. Gramoli, A.
    Virgillito
  • Proceedings of ISPAN 2005
  • SQUARE Scalable Quorum-Based Atomic Memory
  • with Local Reconfiguration
  • V. Gramoli, E. Anceaume, A. Virgillito
  • Proceedings of ACM SAC 2007

46
RoadMap
  • Necessary? Communicating in Large-Scale Systems
  • An Example of Distributed Shared Memory
  • Difficult? Facing Dynamism is not trivial
  • Difficult? Facing Scalability is tricky too
  • Doable? Yes, here is a solution!
  • Conclusion

47
Probability for modeling Reality
  • Motivations for Probabilistic Solutions
  • Tradeoff prevents deterministic solutions
    efficiency
  • Allowing more Realistic Models
  • Any node can fail independently
  • Even if it is unlikely that many nodes fail at
    the same time

48
What is Churn?
  • Churn is the dynamism intensity!
  • Dynamic System
  • n interconnected nodes
  • Nodes join/leave the system
  • A joining node is new
  • Here, we model the churn simply as c
  • At each time unit, cn nodes leave the network
  • At each time unit, cn nodes enter the network

49
Relaxing Consistency
  • Every operation verifies all atomicity rules with
    high probability!
  • Unsuccessful operation operation that violate at
    east one of those rules
  • Probabilistic Atomicity
  • If an operation Op1 ends before another Op2
    starts, then it is ordered after with probability
    e e-ß2 (with ß a constant) (If this happen,
    operation Op2 is considered as unsuccessful)
  • Write operations are totally ordered and read
    operations are ordered w.r.t. write operations
  • A read returns the last successfully value
    written (or the default one if none exist) with
    probability 1- e-ß2 (with ß a constant)(If this
    does not hold, then the read is unsuccessful)

50
TQS Timed Quorum System
  • Intersection is provided during a bounded period
    of time with high probability
  • Gossip-based algorithm in parallel
  • Shuffle set of neighbors using gossip-based
    algorithm
  • Traditional read/write operations using two
    message round-trip between the client and a
    quorum
  • Consult value and tag from a quorum
  • Create new larger tag (if write)
  • Propagate value and tag to a quorum

51
TQS Timed Quorum System
  • Contacting a quorum
  • Disseminate message with TTL l to k neighbors,
  • Decrement TTL received if first time received.
  • Forward received messages to k neighbors if their
    TTL is not null.
  • So that at the end, we have contacted nodes

  • with ?, the max period of time
  • between 2 successful operations

52
Complexity of our Implementation
  • Assumptions
  • At least one operation succeeds every ? time
    units
  • Gossip-based protocol provides uniformity
  • Operation Time Complexity (in expectation)
  • where D (1-c)-? is the dynamic parameter

53
Complexity of our Implementation
  • Operation Communication Complexity (in
    expectation)
  • where D (1-c)-? is the dynamic parameter

54
Complexity of our Implementation
  • Operation Communication Complexity (in
    expectation)
  • where D (1-c)-? is the dynamic parameter
  • If D is a constant, then it reaches communication
    complexity of static systems presented in

Probabilistic Quorum Systems D. Malkhi, M.
Reiter, A. Wool, R. Wright Information and Comp.
J. 2001
55
Probability of Success
Quorum size
n 10,000
10 of failures
30 of failures
50 of failures
Probability of non-intersecting
70 of failures
90 of failures
56
Contributions
  • TQS relies on timely and probabilistic
    intersections
  • Operation latency is low
  • Operation communication complexity is low
  • No reconfigurations are needed
  • Replication is inherently done by the operations
  • Atomicity is ensured with high probability

57
A DSM to face Scalability and Dynamism
  • Core Persistence in Peer-to-Peer Systems
    Relating Size to Lifetime
  • V. Gramoli, A-.M. Kermarrec, A. Mostéfaoui, M.
    Raynal, B. Sericola
  • Proceedings of RDDS 2006 (in conjunction with OTM
    2006)
  • Timed Quorum Systems for Large-Scale and Dynamic
    Environments
  • V. Gramoli, M. Raynal
  • Proceedings of OPODIS 2007

58
RoadMap
  • Necessary? Communicating in Large-Scale Systems
  • An Example of Distributed Shared Memory
  • Difficult? Facing Dynamism is not trivial
  • Difficult? Facing Scalability is tricky too
  • Doable? Yes, here is a solution!
  • Conclusion

59
Conclusion
  • We have presented three DSM
  • Dynamism RDS
  • Scalability SQUARE
  • Dynamism and Scalability TQS

60
Conclusion
Solutions Latency Communication Guarantee
RDS Low High Safe
SQUARE High Low Safe
TQS Low Low High Probability
61
Open Questions
  • Could we still speed up operations?
  • Disseminating continuously up-to-date values
  • Consulting values that have already been
    aggregated
  • How to model dynamism?
  • Differing results for the P2P File-Sharing
  • What would it be for different applications?

62
END
63
Load Balancing
Good News The load is well-balanced over the
replicas
63
64
Load Adaptation
Good News The memory self-adapts well in face of
dynamism
64
65
Reconfigurable Distributed Storage
  • Prepare phase
  • The leader creates a new ballot and sends it to
    quorums
  • A quorum of nodes send back their candidate
    config.
  • The leader chooses the configuration for the
    ballot
  • Propose phase
  • The leader sends the ballot and its config. to
    quorums The leader sends its tag and value and
    adds the current configuration
  • A quorum of nodes can send their ballot vote,
    their tag and value to quorums
  • These quorum nodes decide the next configuration
  • Propagate phase
  • These quorum nodes propagate the decided
    configuation to quorums
  • These quorum nodes remove the old configuration

if not done already
65
Write a Comment
User Comments (0)
About PowerShow.com