Failstutter Behavior Characterization of NFS

About This Presentation

Title:

Failstutter Behavior Characterization of NFS

Description:

Failstutter Behavior Characterization of NFS – PowerPoint PPT presentation

Number of Views:33

Avg rating:3.0/5.0

Slides: 10

Provided by: pagesC

Learn more at: https://pages.cs.wisc.edu

Category:

more less

Transcript and Presenter's Notes

Title: Failstutter Behavior Characterization of NFS

1
Fail-stutter Behavior Characterization of NFS

Jichuan Chang
CS736 Final Project, UW-Madison
December 13, 2002

2
Motivation

We want systems to be very Fast and Available!
Hard to achieve for modern computer systems
complex interactions among components
cant assume everything is always working
perfectly!
We need a better fault model
Simpler than the Byzantine model
Richer than the fail-stop model
Fail-stutter Fault-tolerance Remzi 01.

Stable Performance
Low Performance
Down
3
Fail-stutter Issues

Separate performance faults from correctness
faults
What are performance faults?
Need a performance specification, but how to get
the spec.?
How to distinguish interference and performance
fault?
What are correctness faults?
Correctness should be defined in an end-to-end
manner.
How to diagnose both types of faults?
Must observe how systems behave!
Exploit fail-stutter behavior
Who should be notified about failures, when and
how?
System supports - programming tools / runtime
support
Integration with existing systems - less intrusion

4
Our Approach

Case study NFS fail-stutter characterization
Fault-injection (vs. system monitoring)
Performance measurement
Simple, software-based test-bed
Interesting observations
Different failed parts have different performance
impact
Different types of clients have different
behaviors
Patient (keep retrying) vs. Impatient (try other
servers)
Transition between performance and correctness
faults
Can be determined proactively by fault-injection
Performance spec. could be application-specific.

5
Experimental Settings

NFS Client App
X
X
Storage System
NFS Server

X

Click S/W Router

Workloads - SpecSFS97, file (micro-benchmark).
Data to collect - throughput, response time,
errors.
Faulty components - network, server, disk, bus,
etc.
Fault injection - network package dropping
drop k Ethernet packages,
drop k IP packages coming from the server.

6
Results (1) - Patient Client
1. Performance degradation scales with drop
probability.
X
X
X
Error occurred
2. Ethernet dropping less harmful compared with
IP dropping.
X
X
X
X
X
3. Performance data less meaningful when error
occurs.
X
X
X
X
X
X
X
X
X
X
4. Different operations switch to correctness
faults at different points (e.g. 5, 15, 20).
Total execution time can hide such difference.
7
Results (2) - Impatient Client
1. Throughput decreases linearly as the dropping
probability increases.
2. Throughput drops manifest under heavy loads.
SpecSFS97 Retry once!
3. Response time doesnt change as much!
4. Ethernet dropping less harmful.
8
Summary

Modern computer system design needs a better
fault-tolerance model.
Using fault-injection to characterize NFS
fail-stutter behavior.
Preliminary observations address some of the
fail-stutter issues
How to separate different types of faults?
Suggest that we can extract performance
specification by fault-injection and probing.

9
Future Work