CSE 598B: Self-* Systems - PowerPoint PPT Presentation

1 / 23

About This Presentation

Title:

**CSE 598B: Self-* Systems**

Description:

CSE 598B: Self-* Systems Path Based Failure and Evolution Management Mike Y. Chen, Anthony Accardi, Emre Kiciman, Jim Lloyd, Dave Patterson, Armando Fox, Eric Brewer ... – PowerPoint PPT presentation

Number of Views:134

Avg rating:3.0/5.0

Slides: 24

Provided by: psu135

Learn more at: https://www.cse.psu.edu

Category:

more less

Transcript and Presenter's Notes

Title: CSE 598B: Self-* Systems

1
CSE 598B Self- Systems

Path Based Failure and Evolution Management
Mike Y. Chen, Anthony Accardi, Emre Kiciman, Jim
Lloyd, Dave Patterson, Armando Fox, Eric Brewer
(UC Berkeley, Stanford U, Tellme Networks, eBay
Inc.)
Presented by Arjun R. Nath

2
The Problem..

Computing systems increasing in complexity
Tending towards large, complex, distributed
systems
Sometimes there are thousands of machines
involved
Basic system management is becoming increasingly
difficult.
Detecting and diagnosing failures to
understanding application behaviour is becoming
very difficult.

2
3
..the Problem

Existing techniques such as code-level debuggers,
program slicing, process profiling and
application logs fail to characterize overall
system behaviour.
Distribuged debuggers are available but focus on
a homogenous subset of the system.

3
4
Goal of the paper

Techniques to help us understand large
distributed systems.
Improve
availability
reliability
manageability
Why are we looking at this paper ? (Self-
context)
This paper is about techniques for monitoring of
large, complex, distributed systems.

4
5
Two main principles

Path-Based Measurement
Model the system as a collection of paths thru
heterogenous components.
Make local observations along the paths and store
these. These can be accessed via queries and
visualization techniques.
(Focus is on correctness rather than performance)
Statistical Behaviour Analysis
Large volumes of system requests are stored for
statistical analysis using classical techniques
to identify deviations from normal behaviour.
This can be applied to live systems or used for
offline analysis.

5
6
What is a "Path" ?

Associated with a request
Control Flow
Resources
Paths may have inter-path dependencies shared
state, shared database tables, shared
filesystems, shared memory.
Multiple paths may be grouped together in
sessions.

6
7
Coarse grained paths
8
Fine grained paths
9
How do paths help ?

Failure Management
Evolution (of the system)

9
10
Failure Management...

Detection
Reduce downtime associcated with detection delays
Using paths can help in noticing developing
problems before they become severe
The Key is to define "normal" behaviour
statistically and then check for deviations
Diagnosis
Isolate problems using solely the recorded path
observations and then drive the diagnosis process
with the path information.
Paths help identify which components are involved
in a given failure and aid in identifiying causes.

10
11
...Failure Management

Impact Analysis
Helps in knowing the scale of the problem -gt
estimate time-to-repair
Which other paths are at risk.

11
12
Evolution (of the system)

Its very difficult to get an overall picture of
how a complex distributed system changes with
time
- Software/hardware upgrades, patches, code
changes etc.
- Systems evolve through changes to their
components and also thru changes in how they
interact
Paths help in revealing system structure and
dependencies and tracking changes.

12
13
Implementation
14
Implementation Architecture
15
Implementation...

Tracers - tracking a request through the target
system.
Each request has an identifier associated that is
maintained throughout the path
Ids may be stored in extensible headers (HTTP,
SOAP)
Tracers are platform specific but can be generic
to applications using the same platform (J2EE,
.NET)
Pinpoint, ObsLogs, SuperCal all have tracers.

15
16
Implementation tools..
Three systems that support path-based analysis
17
...Implementation

Aggregator and Repository
Aggregator receives observations from tracers
reconstructs paths using IDs
Stores this in the Repository
There may be also a Central Repository that
collects from distributed repositories.
Analysis Engines and Visualization.
Single and multi-path analysis
Dedicated engines for various statistical tests
Support for some data mining tools\
Visualization Tukeys boxplots generated using
Octave

18
Implementation
A trend specific to recognition time in Tellme
application A suggests a regression in a speech
grammar in that application. The Tukey boxplots
shown illustrate a distributions center, spread,
and asymmetries by using rectangles to show the
upper and lower quartiles and the median, and
explicitly plotting each outlier.
19
Limitations and constraints

Cannot resolve fault causes at a very detailed
level
Overheads can be high for fine grained paths
Need to decide which observations to include in
paths. This is an iterative process.
Can be difficult to implement especially for
existing systems

Its important so understand that Path-based
analysis is an aid to fault detection and
recovery and not a solution in itself. It is
meant to be used in combination with traditional
fault handling techniques.

21
Conclusion

As systems get more complex, Path-based analysis
tools will have increasing importance.
Path based fault analysis complements traditional
techniques
Hardly any fully functional, path-based, fault
management tools available.
This paper
Has breadth but lacks depth in some places.
Needs some more data around production
environment experiments
Should have concentrated on 1 or 2
implementations and included more details.
Not much info on SuperCal and ObsLogs

22
Other related stuff

Pinpoint project at Stanford http//swig.stanfor
d.edu/pinpoint.shtml (Some interesting papers
here)
Magpie project (MicroSoft)
Quest Software Jprobe Java performance
profiler
Borland's OptimizeIt Enterprise Suite