Title: Last week review
1Last week review
- Reliability of networks
- Watchdog Techniques
- Redundancy in time (Re-execution)
- RESO
- Processes, threads
- Superscalar, CMP,SMT
- Research on FT microarchitectures
- AR-stream, DIVA
- Other error detection mechanisms in HW
- BIST
2Today
- Replication in Distributed Systems
- Case studies
- Conclusions of the course
3Summary of FTC techniques
Weve covered most techniques
4Data Replication
- Data replication is a form of information
redundancy. - Why its a good idea to replicate data in DS?
- Improving performance
- Replicas allow data to reside close to where it
is used - Fault tolerance to enhance reliability
- Replicas allows remote sites to continue working
in the event of local failures. It is also
possible to protect against data corruption. - Directly supports the DS goal of enhanced
scalability
5More on Replication
- If there are many replicas of the same thing, how
do we keep all of them up-to-date? How do we
keep the replicas consistent? - It is not easy to keep all those replicas
consistent - Consistency can be achieved in a number of ways.
- Using consistency models
- Protocols for implementing the models
6Data-Centric Consistency Models
- A data-store can be read from or written to by
any process in a distributed system. - A local copy of the data-store (replica) can
support fast reads - A write to a local replica needs to be propagated
to all remote replicas
- Consistency models
- Strict
- Sequential
- Causal
- FIFO
- Weak
- Release
7Data-Centric Consistency Models
- For performance reasons it is better to relax
consistency - Updating takes broadcast time. If a value that
has been changed is not being read by other
processes very frequently we can delay its update
at the cost of not having consistent data for
some time.
X
X
X
8What is a Consistency Model?
- A consistency model is a contract between a DS
data-store and its processes. - If the processes agree to the rules, the
data-store will perform properly and as
advertised. - Strict Consistency, is defined as
- Any read on a data item x returns a value
corresponding to the result of the most recent
write on x (regardless of where the write
occurred).
9Consistency Model Diagram Notation
- Wi(x)a a write by process i to item x with
a value of a. That is, x lt- a - (Note The process is often shown as Pi)
- Ri(x)b a read by process i from item x
producing the value b. That is, reading x
returns b - Time moves from left to right in all diagrams
10Strict Consistency Diagrams
- Behavior of two processes, operating on the same
data item - A strictly consistent data-store
- With Strict Consistency, all writes are
instantaneously visible to all processes and
absolute global time order is maintained
throughout the DS. This is impossible within a
DS! - A data-store that is not strictly consistent
11Sequential Consistency
- A weaker consistency model, which represents a
relaxation of the rules. - It is also easier (possible) to implement.
- Definition of Sequential Consistency
- The result of any execution is the same as if the
(read and write) operations by all processes on
the data-store were executed in the same
sequential order and the operations of each
individual process appear in this sequence in the
order specified by its program.
12Sequential Consistency Diagrams
All processes see the same interleaving set of
operations, regardless of what that interleaving
is
- A sequentially consistent data-store the
first write occurred after the second on all
replicas. - A data-store that is not sequentially consistent
it appears the writes have occurred in a
non-sequential order, and this is NOT allowed
13Problem with Sequential Consistency
- Sequential consistency is programmer friendly
- Lipton and Sandberg (1989) proved that changing a
sequential protocol to improve read performance
makes write performance worst and viceversa - For this reason, other weaker consistency models
have been proposed and developed - A relaxation of the rules allows for these weaker
models to make sense - Causal
- FIFO
- Weak
- Release
14Distribution Models
- Regardless of which consistency model we choose
we need to decide - Where is a replica placed?
- When is a replica created?
- Who creates the replica?
15Replica Placement Types
- There are three types of replica
- Permanent replicas tend to be small in number,
organized as COWs (Clusters of Workstations) or
mirrored systems. - Server-initiated replicas used to enhance
performance at the initiation of the owner of the
data-store. Typically used by web hosting
companies to geographically locate replicas close
to where they are needed most. (Often referred
to as push caches). - Client-initiated replicas created as a result of
client requests (e.g. browser caches). Works
well assuming, that cached data wont be replaced
too soon.
16Break
17Permanent Replicas
Client
Client
- Initial set of replicas
- Other replicas can be created from them
- Small and static set
- Example Web site horizontal distribution
- Replicate Web site on a limited number of
machines on a LAN - Distribute requests in round-robin
- Replicate Web site on a limited number of
machines on a WAN (mirroring) - Clients choose which sites to talk to
Server
Server
Server
Server
LAN
Client
Server
Server
WAN
Server
Server
Client
18Server-Initiated Replicas
- Dynamically created at the request of the owner
of the DS - Example push-caches. Web servers dynamically
create replicas near demanding clients - Need dynamic policy to create and delete replicas
- Algorithm for dynamic replication
- Replication takes place to reduce server load
- Keep a counter and access-origin list for each
page
migrate file
19Server-Initiated Replicas
- Each server determines which server is closest to
a client - Hits from clients with same closest server go to
same counter - cntQ(P,F) count of hits to replica F hosted at
server Q, initiated by Ps clients - If cntQ(P,F) drops below threshold D, delete
replica F from Q - If cntQ(P,F) exceeds threshold R, replicate or
migrate F to P - Choose D and R with care
- Do not allow deletion on permanent-replicas
- Server initiated replication is increasingly
popular
20Client-Initiated Replicas
- These are caches managed by clients
- Temporary storage (expire fast)
- Cache access
- Hit return data from cache (fast)
- Miss load copy from original server (slow)
- Kept on the client machine, or on the same LAN
- Multi-level caches possible
21FTC Case Studies
- Recovery Oriented Computing (Research)
- HP Non Stop Sever (commercial implementation)
22Recovery Oriented Computing (ROC)
- D. Patterson et al. U.C Berkeley and Stanford
- Human error is largest single failure source
- HP HA labs human error is 1 cause of failures
(2001) - Gray/Tandem 42 of failures from human
administrator errors (1986) - Much of FT helps MTTF, ROC helps MTTR
- Improving MTTF and MTTR is synergistic (dont
want bad MTTF!)
Rcomputer_sysRhwRswRop
23Recovery-Oriented Computing Philosophy
- People/HW/SW failures are facts, not problems
- Recovery/repair is how we cope with them
- Improving recovery/repair improves availability
- Availability MTTF
- MTTF MTTR
- lim(availability)MTTR-gt0100
- 1/10th MTTR just as valuable as 10X MTBF
A microreboot is the selective crash-restart of
only those parts of a system that trigger the
observed failure.
24Recovery-Oriented Computing
- Experiments using Fig, a fault injection tool for
libgc (Unix), with some mature applications gave
the results shown in the figure. - MySQL and Apache were the more robust apps.
- Nestscape mostly exited without warning
25Fault Detection Localization in ROC
- Micro reboots (software rejuvenation)
- Stall Proxy is used to hold requests while a
micro reboot takes place
- An application of micro reboots into a web server
26Fault Detection Localization in ROC
- Micro reboots (software rejuvenation)
- Effectiveness of micro reboots
27Fault Detection Localization in ROC
- Statistical learning used to detect and localize
likely application level failures (internet apps.
initial research phase) - When the system is working
- A statistical learning tool (called Pinpoint)
learns a baseline model of system behavior. - During system operation
- Pinpoint looks for anomalies (relative to this
model) in low level behaviors that are likely to
reflect high-level application faults, and it
correlates these anomalies to their potential
causes (software components) within the system. - While Pinpoint does exhibit occasional false
positives, the integration of Pinpoint and
microreboots offers higher availability than
recovery based on full process restart.
28Case Study HP non-stop server
- Tandem/HP
- Fault tolerant multiprocessor (SMP) server system
- Applications in banking systems, online
transactions, stock market etc
29Case Study HP non-stop server
- Several possibilities
- Processors execute same instructions in lockstep
checking for errors. - In the event of failure a backup processor takes
over the workload
- Software fault tolerance
- Hardware fault tolerance (DMR,TMR)
- Scalability from 2 to 4,080 processors
- Scalability to 65 TB of main memory
- Online database and application manageability
- Data integrity
30Case Study HP non-stop server
Each logical processorwhat the operating system
sees as a single processoris composed of one
microprocessor from each of two (or three)
processor slices.
One logical processor runs same instruction
stream as two (three) processors. Results checked
by the Logical Synchronization Unit
31Case Study HP non-stop server
TMR
Duplex fault check
LSUs can be replicated for fault tolerance
32Case study HP non-stop server
- Voting
- LSUs compare the (I/O) output streams
- Supports simplex, duplex, and triplex models
- Reintegration (Memory Copy)
- When a slice has diverged or has been replaced
(repaired), copy memory from one slice to another
and resume - This is an on-line operation
- Rendezvous (synchronization)
- A distributed consensus algorithm with hardware
assist - Periodic checking and resynchronization (can
optionally remove time skew)
33Case study HP non-stop servers
34Conclusions of course
- FTC techniques are fundamental to understand
fault tolerance mechanisms in other areas - We have reviewed the main methods and techniques
used in fault tolerant computing - Hardware
- Redundant computer systems
- Networks
- Software
- Reliability models
- Redundancy in software
- Information redundancy
- Re-execution
- Distributed Systems
35Conclusions
- We have covered the basics of FTC
- Specialized techniques can be found in journals
and conferences - FTC techniques are being implemented in more and
more systems - Reliability as a measure of FTC
- Reliability modeling for HW systems has a long
tradition - Reliability for SW systems is relatively new but
is becoming more important, not only for safety
critical systems - Reliability can be measured and is considered a
factor of quality in software - There is a lot of research that still needs to be
done in this area
36Conclusions
Questions or comments? Thanks!