Reliable Distributed Systems - PowerPoint PPT Presentation

1 / 59
About This Presentation
Title:

Reliable Distributed Systems

Description:

Suppose the primary sends the log to ... still needs 2PC to ensure that primary and backup stay in same states! ... still connected to primary, but one has ... – PowerPoint PPT presentation

Number of Views:91
Avg rating:3.0/5.0
Slides: 60
Provided by: kenneth8
Category:

less

Transcript and Presenter's Notes

Title: Reliable Distributed Systems


1
Reliable Distributed Systems
  • Fault Tolerance
  • (Recoverability ? High Availability)

2
Reliability and transactions
  • Transactions are well matched to database model
    and recoverability goals
  • Transactions dont work well for non-database
    applications (general purpose O/S applications)
    or availability goals (systems that must keep
    running if applications fail)
  • When building high availability systems,
    encounter replication issue

3
Types of reliability
  • Recoverability
  • Server can restart without intervention in a
    sensible state
  • Transactions do give us this
  • High availability
  • System remains operational during failure
  • Challenge is to replicate critical data needed
    for continued operation

4
Replicating a transactional server
  • Two broad approaches
  • Just use distributed transactions to update
    multiple copies of each replicated data item
  • We already know how to do this, with 2PC
  • Each server has equal status
  • Somehow treat replication as a special situation
  • Leads to a primary server approach with a warm
    standby

5
Replication with 2PC
  • Our goal will be 1-copy serializability
  • Defined to mean that the multi-copy system
    behaves indistinguishably from a single-copy
    system
  • Considerable form and theoretical work has been
    done on this
  • As a practical matter
  • Replicate each data item
  • Transaction manager
  • Reads any single copy
  • Updates all copies

6
Observation
  • Notice that transaction manager must know where
    the copies reside
  • In fact there are two models
  • Static replication set basically, the set is
    fixed, although some members may be down
  • Dynamic the set changes while the system runs,
    but only has operational members listed within it
  • Today stick to the static case

7
Replication and Availability
  • A series of potential issues
  • How can we update an object during periods when
    one of its replicas may be inaccessible?
  • How can 2PC protocol be made fault-tolerant?
  • A topic well study in more depth
  • But the bottom line is we cant!

8
Usual responses?
  • Quorum methods
  • Each replicated object has an update and a read
    quorum
  • Designed so QuQr gt replicas and QuQu gt
    replicas
  • Idea is that any read or update will overlap with
    the last update

9
Quorum example
  • X is replicated at a,b,c,d,e
  • Possible values?
  • Qu 1, Qr 5 (violates QUQu gt 5)
  • Qu 2, Qr 4 (same issue)
  • Qu 3, Qr 3
  • Qu 4, Qr 2
  • Qu 5, Qr 1 (violates availability)
  • Probably prefer Qu4, Qr2

10
Things to notice
  • Even reading a data item requires that multiple
    copies be accessed!
  • This could be much slower than normal local
    access performance
  • Also, notice that we wont know if we succeeded
    in reaching the update quorum until we get
    responses
  • Implies that any quorum replication scheme needs
    a 2PC protocol to commit

11
Next issue?
  • Now we know that we can solve the availability
    problem for reads and updates if we have enough
    copies
  • What about for 2PC?
  • Need to tolerate crashes before or during runs of
    the protocol
  • A well-known problem

12
Availability of 2PC
  • It is easy to see that 2PC is not able to
    guarantee availability
  • Suppose that manager talks to 3 processes
  • And suppose 1 process and manager fail
  • The other 2 are stuck and cant terminate the
    protocol

13
What can be done?
  • Well revisit this issue soon
  • Basically,
  • Can extend to a 3PC protocol that will tolerate
    failures if we have a reliable way to detect them
  • But network problems can be indistinguishable
    from failures
  • Hence there is no commit protocol that can
    tolerate failures
  • Anyhow, cost of 3PC is very high

14
A quandry?
  • We set out to replicate data for increased
    availability
  • And concluded that
  • Quorum scheme works for updates
  • But commit is required
  • And represents a vulnerability
  • Other options?

15
Other options
  • We mentioned primary-backup schemes
  • These are a second way to solve the problem
  • Based on the log at the data manager

16
Server replication
  • Suppose the primary sends the log to the backup
    server
  • It replays the log and applies committed
    transactions to its replicated state
  • If primary crashes, the backup soon catches up
    and can take over

17
Primary/backup
primary backup
log
Clients initially connected to primary, which
keeps backup up to date. Backup tracks log
18
Primary/backup
primary backup
Primary crashes. Backup sees the channel break,
applies committed updates. But it may have
missedthe last few updates!
19
Primary/backup
primary backup
Clients detect the failure and reconnect to
backup. Butsome clients may have gone away.
Backup state couldbe slightly stale. New
transactions might suffer from this
20
Issues?
  • Under what conditions should backup take over
  • Revisits the consistency problem seen earlier
    with clients and servers
  • Could end up with a split brain
  • Also notice that still needs 2PC to ensure that
    primary and backup stay in same states!

21
Split brain reminder
primary backup
log
Clients initially connected to primary, which
keeps backup up to date. Backup follows log
22
Split brain reminder
primary backup
Transient problem causes some links to break but
not all. Backup thinks it is now primary, primary
thinks backup is down
23
Split brain reminder
primary backup
Some clients still connected to primary, but one
has switched to backup and one is completely
disconnected from both
24
Implication?
  • A strict interpretation of ACID leads to
    conclusions that
  • There are no ACID replication schemes that
    provide high availability
  • Most real systems solve by weakening ACID

25
Real systems
  • They use primary-backup with logging
  • But they simply omit the 2PC
  • Server might take over in the wrong state (may
    lag state of primary)
  • Can use hardware to reduce or eliminate split
    brain problem

26
How does hardware help?
  • Idea is that primary and backup share a disk
  • Hardware is configured so only one can write the
    disk
  • If server takes over it grabs the token
  • Token loss causes primary to shut down (if it
    hasnt actually crashed)

27
Reconciliation
  • This is the problem of fixing the transactions
    impacted by lack of 2PC
  • Usually just a handful of transactions
  • They committed but backup doesnt know because
    never saw commit record
  • Later. server recovers and we discover the
    problem
  • Need to apply the missing ones
  • Also causes cascaded rollback
  • Worst case may require human intervention

28
Summary
  • Reliability can be understood in terms of
  • Availability system keeps running during a crash
  • Recoverability system can recover automatically
  • Transactions are best for latter
  • Some systems need both sorts of mechanisms, but
    there are deep tradeoffs involved

29
Replication and High Availability
  • All is not lost!
  • Suppose we move away from the transactional model
  • Can we replicate data at lower cost and with high
    availability?
  • Leads to virtual synchrony model
  • Treats data as the state of a group of
    participating processes
  • Replicated update done with multicast

30
Steps to a solution
  • First look more closely at 2PC, 3PC, failure
    detection
  • 2PC and 3PC both block in real settings
  • But we can replace failure detection by consensus
    on membership
  • Then these protocols become non-blocking
    (although solving a slightly different problem)
  • Generalized approach leads to ordered atomic
    multicast in dynamic process groups

31
Non-blocking Commit
  • Goal a protocol that allows all operational
    processes to terminate the protocol even if some
    subset crash
  • Needed if we are to build high availability
    transactional systems (or systems that use quorum
    replication)

32
Definition of problem
  • Given a set of processes, one of which wants to
    initiate an action
  • Participants may vote for or against the action
  • Originator will perform the action only if all
    vote in favor if any votes against (or dont
    vote), we will abort the protocol and not take
    the action
  • Goal is all-or-nothing outcome

33
Non-triviality
  • Want to avoid solutions that do nothing (trivial
    case of all or none)
  • Would like to say that if all vote for commit,
    protocol will commit
  • ... but in distributed systems we cant be sure
    votes will reach the coordinator!
  • any live protocol risks making a mistake and
    counting a live process that voted to commit as a
    failed process, leading to an abort
  • Hence, non-triviality condition is hard to capture

34
Typical protocol
  • Coordinator asks all processes if they can take
    the action
  • Processes decide if they can and send back ok
    or abort
  • Coordinator collects all the answers (or times
    out)
  • Coordinator computes outcome and sends it back

35
Commit protocol illustrated
ok to commit?
36
Commit protocol illustrated
ok to commit?
ok with us
37
Commit protocol illustrated
ok to commit?
ok with us
commit
Note garbage collection protocol not shown here
38
Failure issues
  • So far, have implicitly assumed that processes
    fail by halting (and hence not voting)
  • In real systems a process could fail in arbitrary
    ways, even maliciously
  • This has lead to work on the Byzantine generals
    problem, which is a variation on commit set in a
    synchronous model with malicious failures

39
Failure model impacts costs!
  • Byzantine model is very costly 3t1 processes
    needed to overcome t failures, protocol runs in
    t1 rounds
  • This cost is unacceptable for most real systems,
    hence protocols are rarely used
  • Main area of application hardware
    fault-tolerance, security systems
  • For these reasons, we wont study such protocols

40
Commit with simpler failure model
  • Assume processes fail by halting
  • Coordinator detects failures (unreliably) using
    timouts. It can make mistakes!
  • Now the challenge is to terminate the protocol if
    the coordinator fails instead of, or in addition
    to, a participant!

41
Commit protocol illustrated
ok to commit?
ok with us
times outabort!
crashed!
Note garbage collection protocol not shown here
42
Example of a hard scenario
  • Coordinator starts the protocol
  • One participant votes to abort, all others to
    commit
  • Coordinator and one participant now fail
  • ... we now lack the information to correctly
    terminate the protocol!

43
Commit protocol illustrated
ok to commit?
vote unknown!
ok
decision unknown!
ok
44
Example of a hard scenario
  • Problem is that if coordinator told the failed
    participant to abort, all must abort
  • If it voted for commit and was told to commit,
    all must commit
  • Surviving participants cant deduce the outcome
    without knowing how failed participant voted
  • Thus protocol blocks until recovery occurs

45
Skeen Three-phase commit
  • Seeks to increase availability
  • Makes an unrealistic assumption that failures are
    accurately detectable
  • With this, can terminate the protocol even if a
    failure does occur

46
Skeen Three-phase commit
  • Coordinator starts protocol by sending request
  • Participants vote to commit or to abort
  • Coordinator collects votes, decides on outcome
  • Coordinator can abort immediately
  • To commit, coordinator first sends a prepare to
    commit message
  • Participants acknowledge, commit occurs during a
    final round of commit messages

47
Three phase commit protocol illustrated
ok ....
prepared...
Note garbage collection protocol not shown here
48
Observations about 3PC
  • If any process is in prepare to commit all
    voted for commit
  • Protocol commits only when all surviving
    processes have acknowledged prepare to commit
  • After coordinator fails, it is easy to run the
    protocol forward to commit state (or back to
    abort state)

49
Assumptions about failures
  • If the coordinator suspects a failure, the
    failure is real and the faulty process, if it
    later recovers, will know it was faulty
  • Failures are detectable with bounded delay
  • On recovery, process must go through a
    reconnection protocol to rejoin the system!
    (Find out status of pending protocols that
    terminated while it was not operational)

50
Problems with 3PC
  • With realistic failure detectors (that can make
    mistakes), protocol still blocks!
  • Bad case arises during network partitioning
    when the network splits the participating
    processes into two or more sets of operational
    processes
  • Can prove that this problem is not avoidable
    there are no non-blocking commit protocols for
    asynchronous networks

51
Situation in practical systems?
  • Most use protocols based on 2PC 3PC is more
    costly and ultimately, still subject to blocking!
  • Need to extend with a form of garbage collection
    mechanism to avoid accumulation of protocol state
    information (can solve in the background)
  • Some systems simply accept the risk of blocking
    when a failure occurs
  • Others reduce the consistency property to make
    progress at risk of inconsistency with failed
    proc.

52
Process groups
  • To overcome cost of replication will introduce
    dynamic process group model (processes that join,
    leave while system is running)
  • Will also relax our consistency goal seek only
    consistency within a set of processes that all
    remain operational and members of the system
  • In this model, 3PC is non-blocking!
  • Yields an extremely cheap replication scheme!

53
Failure detection
  • Basic question how to detect a failure
  • Wait until the process recovers. If it was dead,
    it tells you
  • I died, but I feel much better now
  • Could be a long wait
  • Use some form of probe
  • But might make mistakes
  • Substitute agreement on membership
  • Now, failure is a soft concept
  • Rather than up or down we think about whether
    a process is behaving acceptably in the eyes of
    peer processes

54
Architecture
Applications use replicated data for high
availability
3PC-like protocols use membership changes instead
of failure notification
Membership Agreement, join/leave and P seems
to be unresponsive
55
Issues?
  • How to detect failures
  • Can use timeout
  • Or could use other system monitoring tools and
    interfaces
  • Sometimes can exploit hardware
  • Tracking membership
  • Basically, need a new replicated service
  • System membership lists are the data it manages
  • Well say it takes join/leave requests as input
    and produces views as output

56
Architecture
Application processes
membership views
A
A A,B,D A,D A,D,C D,C
GMS processes
join
B
leave
GMS
join
C
X
Y
Z
D
A seems to have failed
57
Issues
  • Group membership service (GMS) has just a small
    number of members
  • This core set will tracks membership for a large
    number of system processes
  • Internally it runs a group membership protocol
    (GMP)
  • Full system membership list is just replicated
    data managed by GMS members, updated using
    multicast

58
GMP design
  • What protocol should we use to track the
    membership of GMS
  • Must avoid split-brain problem
  • Desire continuous availability
  • Well see that a version of 3PC can be used
  • But cant always guarantee liveness

59
Reading ahead?
  • Read chapters 12, 13
  • Thought problem how important is external
    consistency (called dynamic uniformity in the
    text)?
  • Homework Read about FLP. Identify other
    impossibility results for distributed systems.
    What is the simplest case of an impossibility
    result that you can identify?
Write a Comment
User Comments (0)
About PowerShow.com