Reliable Distributed Systems - PowerPoint PPT Presentation

1 / 59

About This Presentation

Title:

Reliable Distributed Systems

Description:

Suppose the primary sends the log to ... still needs 2PC to ensure that primary and backup stay in same states! ... still connected to primary, but one has ... – PowerPoint PPT presentation

Number of Views:91

Avg rating:3.0/5.0

Slides: 60

Provided by: kenneth8

Category:

more less

Transcript and Presenter's Notes

Title: Reliable Distributed Systems

1
Reliable Distributed Systems

Fault Tolerance
(Recoverability ? High Availability)

2
Reliability and transactions

Transactions are well matched to database model
and recoverability goals
Transactions dont work well for non-database
applications (general purpose O/S applications)
or availability goals (systems that must keep
running if applications fail)
When building high availability systems,
encounter replication issue

3
Types of reliability

Recoverability
Server can restart without intervention in a
sensible state
Transactions do give us this
High availability
System remains operational during failure
Challenge is to replicate critical data needed
for continued operation

4
Replicating a transactional server

Two broad approaches
Just use distributed transactions to update
multiple copies of each replicated data item
We already know how to do this, with 2PC
Each server has equal status
Somehow treat replication as a special situation
Leads to a primary server approach with a warm
standby

5
Replication with 2PC

Our goal will be 1-copy serializability
Defined to mean that the multi-copy system
behaves indistinguishably from a single-copy
system
Considerable form and theoretical work has been
done on this
As a practical matter
Replicate each data item
Transaction manager
Reads any single copy
Updates all copies

6
Observation

Notice that transaction manager must know where
the copies reside
In fact there are two models
Static replication set basically, the set is
fixed, although some members may be down
Dynamic the set changes while the system runs,
but only has operational members listed within it
Today stick to the static case

7
Replication and Availability

A series of potential issues
How can we update an object during periods when
one of its replicas may be inaccessible?
How can 2PC protocol be made fault-tolerant?
A topic well study in more depth
But the bottom line is we cant!

8
Usual responses?

Quorum methods
Each replicated object has an update and a read
quorum
Designed so QuQr gt replicas and QuQu gt
replicas
Idea is that any read or update will overlap with
the last update

9
Quorum example

X is replicated at a,b,c,d,e
Possible values?
Qu 1, Qr 5 (violates QUQu gt 5)
Qu 2, Qr 4 (same issue)
Qu 3, Qr 3
Qu 4, Qr 2
Qu 5, Qr 1 (violates availability)
Probably prefer Qu4, Qr2

10
Things to notice

Even reading a data item requires that multiple
copies be accessed!
This could be much slower than normal local
access performance
Also, notice that we wont know if we succeeded
in reaching the update quorum until we get
responses
Implies that any quorum replication scheme needs
a 2PC protocol to commit

11
Next issue?

Now we know that we can solve the availability
problem for reads and updates if we have enough
copies
What about for 2PC?
Need to tolerate crashes before or during runs of
the protocol
A well-known problem

12
Availability of 2PC

It is easy to see that 2PC is not able to
guarantee availability
Suppose that manager talks to 3 processes
And suppose 1 process and manager fail
The other 2 are stuck and cant terminate the
protocol

13
What can be done?

Well revisit this issue soon
Basically,
Can extend to a 3PC protocol that will tolerate
failures if we have a reliable way to detect them
But network problems can be indistinguishable
from failures
Hence there is no commit protocol that can
tolerate failures
Anyhow, cost of 3PC is very high

14
A quandry?

We set out to replicate data for increased
availability
And concluded that
Quorum scheme works for updates
But commit is required
And represents a vulnerability
Other options?

15
Other options

We mentioned primary-backup schemes
These are a second way to solve the problem
Based on the log at the data manager

16
Server replication

Suppose the primary sends the log to the backup
server
It replays the log and applies committed
transactions to its replicated state
If primary crashes, the backup soon catches up
and can take over

17
Primary/backup
primary backup
log
Clients initially connected to primary, which
keeps backup up to date. Backup tracks log
18
Primary/backup
primary backup
Primary crashes. Backup sees the channel break,
applies committed updates. But it may have
missedthe last few updates!
19
Primary/backup
primary backup
Clients detect the failure and reconnect to
backup. Butsome clients may have gone away.
Backup state couldbe slightly stale. New
transactions might suffer from this
20
Issues?

Under what conditions should backup take over
Revisits the consistency problem seen earlier
with clients and servers
Could end up with a split brain
Also notice that still needs 2PC to ensure that
primary and backup stay in same states!

21
Split brain reminder
primary backup
log
Clients initially connected to primary, which
keeps backup up to date. Backup follows log
22
Split brain reminder
primary backup
Transient problem causes some links to break but
not all. Backup thinks it is now primary, primary
thinks backup is down
23
Split brain reminder
primary backup
Some clients still connected to primary, but one
has switched to backup and one is completely
disconnected from both
24
Implication?

A strict interpretation of ACID leads to
conclusions that
There are no ACID replication schemes that
provide high availability
Most real systems solve by weakening ACID

25
Real systems

They use primary-backup with logging
But they simply omit the 2PC
Server might take over in the wrong state (may
lag state of primary)
Can use hardware to reduce or eliminate split
brain problem

26
How does hardware help?

Idea is that primary and backup share a disk
Hardware is configured so only one can write the
disk
If server takes over it grabs the token
Token loss causes primary to shut down (if it
hasnt actually crashed)

27
Reconciliation

This is the problem of fixing the transactions
impacted by lack of 2PC
Usually just a handful of transactions
They committed but backup doesnt know because
never saw commit record
Later. server recovers and we discover the
problem
Need to apply the missing ones
Also causes cascaded rollback
Worst case may require human intervention

28
Summary

Reliability can be understood in terms of
Availability system keeps running during a crash
Recoverability system can recover automatically
Transactions are best for latter
Some systems need both sorts of mechanisms, but
there are deep tradeoffs involved

29
Replication and High Availability

All is not lost!
Suppose we move away from the transactional model
Can we replicate data at lower cost and with high
availability?
Leads to virtual synchrony model
Treats data as the state of a group of
participating processes
Replicated update done with multicast

30
Steps to a solution

First look more closely at 2PC, 3PC, failure
detection
2PC and 3PC both block in real settings
But we can replace failure detection by consensus
on membership
Then these protocols become non-blocking
(although solving a slightly different problem)
Generalized approach leads to ordered atomic
multicast in dynamic process groups

31
Non-blocking Commit

Goal a protocol that allows all operational
processes to terminate the protocol even if some
subset crash
Needed if we are to build high availability
transactional systems (or systems that use quorum
replication)

32
Definition of problem

Given a set of processes, one of which wants to
initiate an action
Participants may vote for or against the action
Originator will perform the action only if all
vote in favor if any votes against (or dont
vote), we will abort the protocol and not take
the action
Goal is all-or-nothing outcome

33
Non-triviality

Want to avoid solutions that do nothing (trivial
case of all or none)
Would like to say that if all vote for commit,
protocol will commit
... but in distributed systems we cant be sure
votes will reach the coordinator!
any live protocol risks making a mistake and
counting a live process that voted to commit as a
failed process, leading to an abort
Hence, non-triviality condition is hard to capture

34
Typical protocol

Coordinator asks all processes if they can take
the action
Processes decide if they can and send back ok
or abort
Coordinator collects all the answers (or times
out)
Coordinator computes outcome and sends it back

35
Commit protocol illustrated
ok to commit?
36
Commit protocol illustrated
ok to commit?
ok with us
37
Commit protocol illustrated
ok to commit?
ok with us
commit
Note garbage collection protocol not shown here
38
Failure issues

So far, have implicitly assumed that processes
fail by halting (and hence not voting)
In real systems a process could fail in arbitrary
ways, even maliciously
This has lead to work on the Byzantine generals
problem, which is a variation on commit set in a
synchronous model with malicious failures

39
Failure model impacts costs!

Byzantine model is very costly 3t1 processes
needed to overcome t failures, protocol runs in
t1 rounds
This cost is unacceptable for most real systems,
hence protocols are rarely used
Main area of application hardware
fault-tolerance, security systems
For these reasons, we wont study such protocols

40
Commit with simpler failure model

Assume processes fail by halting
Coordinator detects failures (unreliably) using
timouts. It can make mistakes!
Now the challenge is to terminate the protocol if
the coordinator fails instead of, or in addition
to, a participant!

41
Commit protocol illustrated
ok to commit?
ok with us
times outabort!
crashed!
Note garbage collection protocol not shown here
42
Example of a hard scenario

Coordinator starts the protocol
One participant votes to abort, all others to
commit
Coordinator and one participant now fail
... we now lack the information to correctly
terminate the protocol!

43
Commit protocol illustrated
ok to commit?
vote unknown!
ok
decision unknown!
ok
44
Example of a hard scenario

Problem is that if coordinator told the failed
participant to abort, all must abort
If it voted for commit and was told to commit,
all must commit
Surviving participants cant deduce the outcome
without knowing how failed participant voted
Thus protocol blocks until recovery occurs

45
Skeen Three-phase commit

Seeks to increase availability
Makes an unrealistic assumption that failures are
accurately detectable
With this, can terminate the protocol even if a
failure does occur

46
Skeen Three-phase commit

Coordinator starts protocol by sending request
Participants vote to commit or to abort
Coordinator collects votes, decides on outcome
Coordinator can abort immediately
To commit, coordinator first sends a prepare to
commit message
Participants acknowledge, commit occurs during a
final round of commit messages

47
Three phase commit protocol illustrated
ok ....
prepared...
Note garbage collection protocol not shown here
48
Observations about 3PC

If any process is in prepare to commit all
voted for commit
Protocol commits only when all surviving
processes have acknowledged prepare to commit
After coordinator fails, it is easy to run the
protocol forward to commit state (or back to
abort state)

49
Assumptions about failures

If the coordinator suspects a failure, the
failure is real and the faulty process, if it
later recovers, will know it was faulty
Failures are detectable with bounded delay
On recovery, process must go through a
reconnection protocol to rejoin the system!
(Find out status of pending protocols that
terminated while it was not operational)

50
Problems with 3PC

With realistic failure detectors (that can make
mistakes), protocol still blocks!
Bad case arises during network partitioning
when the network splits the participating
processes into two or more sets of operational
processes
Can prove that this problem is not avoidable
there are no non-blocking commit protocols for
asynchronous networks

51
Situation in practical systems?

Most use protocols based on 2PC 3PC is more
costly and ultimately, still subject to blocking!
Need to extend with a form of garbage collection
mechanism to avoid accumulation of protocol state
information (can solve in the background)
Some systems simply accept the risk of blocking
when a failure occurs
Others reduce the consistency property to make
progress at risk of inconsistency with failed
proc.

52
Process groups

To overcome cost of replication will introduce
dynamic process group model (processes that join,
leave while system is running)
Will also relax our consistency goal seek only
consistency within a set of processes that all
remain operational and members of the system
In this model, 3PC is non-blocking!
Yields an extremely cheap replication scheme!

53
Failure detection

Basic question how to detect a failure
Wait until the process recovers. If it was dead,
it tells you
I died, but I feel much better now
Could be a long wait
Use some form of probe
But might make mistakes
Substitute agreement on membership
Now, failure is a soft concept
Rather than up or down we think about whether
a process is behaving acceptably in the eyes of
peer processes

54
Architecture
Applications use replicated data for high
availability
3PC-like protocols use membership changes instead
of failure notification
Membership Agreement, join/leave and P seems
to be unresponsive
55
Issues?

How to detect failures
Can use timeout
Or could use other system monitoring tools and
interfaces
Sometimes can exploit hardware
Tracking membership
Basically, need a new replicated service
System membership lists are the data it manages
Well say it takes join/leave requests as input
and produces views as output

56
Architecture
Application processes
membership views
A
A A,B,D A,D A,D,C D,C
GMS processes
join
B
leave
GMS
join
C
X
Y
Z
D
A seems to have failed
57
Issues

Group membership service (GMS) has just a small
number of members
This core set will tracks membership for a large
number of system processes
Internally it runs a group membership protocol
(GMP)
Full system membership list is just replicated
data managed by GMS members, updated using
multicast

58
GMP design

What protocol should we use to track the
membership of GMS
Must avoid split-brain problem
Desire continuous availability
Well see that a version of 3PC can be used
But cant always guarantee liveness

59
Reading ahead?

Read chapters 12, 13
Thought problem how important is external
consistency (called dynamic uniformity in the
text)?
Homework Read about FLP. Identify other
impossibility results for distributed systems.
What is the simplest case of an impossibility
result that you can identify?

Write a Comment

User Comments (0)