CSE 598B: Self-* Systems - PowerPoint PPT Presentation

1 / 23
About This Presentation
Title:

CSE 598B: Self-* Systems

Description:

CSE 598B: Self-* Systems Path Based Failure and Evolution Management Mike Y. Chen, Anthony Accardi, Emre Kiciman, Jim Lloyd, Dave Patterson, Armando Fox, Eric Brewer ... – PowerPoint PPT presentation

Number of Views:129
Avg rating:3.0/5.0
Slides: 24
Provided by: psu135
Learn more at: https://www.cse.psu.edu
Category:
Tags: 598b | cse | homogenous | model | self | systems

less

Transcript and Presenter's Notes

Title: CSE 598B: Self-* Systems


1
CSE 598B Self- Systems
  • Path Based Failure and Evolution Management
  • Mike Y. Chen, Anthony Accardi, Emre Kiciman, Jim
    Lloyd, Dave Patterson, Armando Fox, Eric Brewer
    (UC Berkeley, Stanford U, Tellme Networks, eBay
    Inc.)
  • Presented by Arjun R. Nath

2
The Problem..
  • Computing systems increasing in complexity
  • Tending towards large, complex, distributed
    systems
  • Sometimes there are thousands of machines
    involved
  • Basic system management is becoming increasingly
    difficult.
  • Detecting and diagnosing failures to
    understanding application behaviour is becoming
    very difficult.

2
3
..the Problem
  • Existing techniques such as code-level debuggers,
    program slicing, process profiling and
    application logs fail to characterize overall
    system behaviour.
  • Distribuged debuggers are available but focus on
    a homogenous subset of the system.

3
4
Goal of the paper
  • Techniques to help us understand large
    distributed systems.
  • Improve
  • availability
  • reliability
  • manageability
  • Why are we looking at this paper ? (Self-
    context)
  • This paper is about techniques for monitoring of
    large, complex, distributed systems.

4
5
Two main principles
  • Path-Based Measurement
  • Model the system as a collection of paths thru
    heterogenous components.
  • Make local observations along the paths and store
    these. These can be accessed via queries and
    visualization techniques.
  • (Focus is on correctness rather than performance)
  • Statistical Behaviour Analysis
  • Large volumes of system requests are stored for
    statistical analysis using classical techniques
    to identify deviations from normal behaviour.
    This can be applied to live systems or used for
    offline analysis.

5
6
What is a "Path" ?
  • Associated with a request
  • Control Flow
  • Resources
  • Paths may have inter-path dependencies shared
    state, shared database tables, shared
    filesystems, shared memory.
  • Multiple paths may be grouped together in
    sessions.

6
7
Coarse grained paths
8
Fine grained paths
9
How do paths help ?
  • Failure Management
  • Evolution (of the system)

9
10
Failure Management...
  • Detection
  • Reduce downtime associcated with detection delays
  • Using paths can help in noticing developing
    problems before they become severe
  • The Key is to define "normal" behaviour
    statistically and then check for deviations
  • Diagnosis
  • Isolate problems using solely the recorded path
    observations and then drive the diagnosis process
    with the path information.
  • Paths help identify which components are involved
    in a given failure and aid in identifiying causes.

10
11
...Failure Management
  • Impact Analysis
  • Helps in knowing the scale of the problem -gt
    estimate time-to-repair
  • Which other paths are at risk.

11
12
Evolution (of the system)
  • Its very difficult to get an overall picture of
    how a complex distributed system changes with
    time
  • - Software/hardware upgrades, patches, code
    changes etc.
  • - Systems evolve through changes to their
    components and also thru changes in how they
    interact
  • Paths help in revealing system structure and
    dependencies and tracking changes.

12
13
Implementation
14
Implementation Architecture
15
Implementation...
  • Tracers - tracking a request through the target
    system.
  • Each request has an identifier associated that is
    maintained throughout the path
  • Ids may be stored in extensible headers (HTTP,
    SOAP)
  • Tracers are platform specific but can be generic
    to applications using the same platform (J2EE,
    .NET)
  • Pinpoint, ObsLogs, SuperCal all have tracers.

15
16
Implementation tools..
Three systems that support path-based analysis
17
...Implementation
  • Aggregator and Repository
  • Aggregator receives observations from tracers
  • reconstructs paths using IDs
  • Stores this in the Repository
  • There may be also a Central Repository that
    collects from distributed repositories.
  • Analysis Engines and Visualization.
  • Single and multi-path analysis
  • Dedicated engines for various statistical tests
  • Support for some data mining tools\
  • Visualization Tukeys boxplots generated using
    Octave

18
Implementation
A trend specific to recognition time in Tellme
application A suggests a regression in a speech
grammar in that application. The Tukey boxplots
shown illustrate a distributions center, spread,
and asymmetries by using rectangles to show the
upper and lower quartiles and the median, and
explicitly plotting each outlier.
19
Limitations and constraints
  • Cannot resolve fault causes at a very detailed
    level
  • Overheads can be high for fine grained paths
  • Need to decide which observations to include in
    paths. This is an iterative process.
  • Can be difficult to implement especially for
    existing systems

20
  • Its important so understand that Path-based
    analysis is an aid to fault detection and
    recovery and not a solution in itself. It is
    meant to be used in combination with traditional
    fault handling techniques.

21
Conclusion
  • As systems get more complex, Path-based analysis
    tools will have increasing importance.
  • Path based fault analysis complements traditional
    techniques
  • Hardly any fully functional, path-based, fault
    management tools available.
  • This paper
  • Has breadth but lacks depth in some places.
  • Needs some more data around production
    environment experiments
  • Should have concentrated on 1 or 2
    implementations and included more details.
  • Not much info on SuperCal and ObsLogs

22
Other related stuff
  • Pinpoint project at Stanford http//swig.stanfor
    d.edu/pinpoint.shtml (Some interesting papers
    here)
  • Magpie project (MicroSoft)
  • Quest Software Jprobe Java performance
    profiler
  • Borland's OptimizeIt Enterprise Suite

23
  • Thats all folks,
  • Thank You
Write a Comment
User Comments (0)
About PowerShow.com