CompSci 296.2 Self-Managing Systems - PowerPoint PPT Presentation

About This Presentation
Title:

CompSci 296.2 Self-Managing Systems

Description:

CompSci 296.2 Self-Managing Systems Shivnath Babu – PowerPoint PPT presentation

Number of Views:110
Avg rating:3.0/5.0
Slides: 26
Provided by: Jennifer914
Category:

less

Transcript and Presenter's Notes

Title: CompSci 296.2 Self-Managing Systems


1
CompSci 296.2 Self-Managing Systems
  • Shivnath Babu

2
Today
  • Some current work in self-managing systems
  • Ideas resources for projects
  • IBM
  • ROC (Discussion deferred to next class)
  • Our projects at Duke
  • HP

3
Project
  • Group size lt 2
  • Identify general topic by end of January, meet
    Shivnath
  • Feb 7 Scope problem and give 15-minute talk
  • Feb 21 3-minute talk
  • March 7 15-minute talk
  • March 28 3-minute talk
  • April 4/6 15-minute talk
  • April 20/24 15-minute final in-class
    presentation ( demo)

4
Work on Self-Managing Systems
  • IBM
  • IBM Journal, Volume 42, Number 1, 2003
  • Autonomic computing home page
  • IBM autonomic home library, demos
  • Autonomic computing toolkit
  • IBM Tivoli

5
Work on Self-Managing Systems
  • Berkeley-Stanford ROC project
  • Reading for this class
  • Interesting source of project ideas and source
    code
  • Sample project reports/presentations (follow the
    CS444A/294-4 link)

6
The past research goals andassumptions of last
15 years
  • Goal 1 Improve performance
  • Goal 2 Improve performance
  • Goal 3 Improve cost-performance

7
New research goals for a New Century ACME
  • Availability
  • Changeability
  • support rapid deployment of new software, apps,
    UI
  • Maintainability
  • reduce burden on system administrators
  • provide helpful, forgiving SysAdmin environments
  • Evolutionary Growth
  • allow easy system expansion over time
  • Also Security/Privacy

8
Recovery-Oriented Computing (ROC) Philosophy
  • If a problem has no solution, it may not be a
    problem, but a fact, not to be solved, but to be
    coped with over time
  • Shimon Peres (Peress Law)
  • People/HW/SW failures are facts, not problems
  • Recovery/repair is how we cope with above facts
  • Since major Sys Admin job is recovery after
    failure, ROC also helps with maintenance/TCO

ROC focus is on fast repair Vs.old focus on
longer time between failures
9
An Example Project in ROC
  • Undo functionality for system administrators
    (useful for self-managing components as well)
  • To recover from human errors
  • To recover from failed operations like software
    upgrades, installs, and configuration updates
  • An interesting mechanism project for self-healing

10
Mechanism Projects
  • Required/useful mechanisms for self-managing
    systems
  • Take a goal related to self-managing (e.g.,
    self-optimization, predicting problems), take a
    system (e.g., a database) ? What mechanisms are
    needed? Will current mechanisms suffice?
  • Ex Data collection
  • nonintrusive, distributed, active probing

11
Our Projects at Duke
  • Ques Querying Systems (as data)
  • Better tools for system administrators and
    self-managing system components
  • CoD Cluster on Demand
  • Allocate virtual clusters to applications on
    demand

12
Querying Systems as Data
13
Querying Systems as Data
WAN
14
Querying Systems as Data
  • What are probable causes of the
    Service-Level-Agreement (SLA) violations rising
    to 12?

Root-cause query
15
Queries What if
  • Given todays workload, how will average response
    time change if my database fails?
  • If I double the memory on my application servers,
    how will SLA violation rate change?

16
Queries Let me know
  • Let me know if, with 75 probability, average
    response time will exceed 5 seconds in next 30
    minutes
  • Prediction
  • Continuous query

17
Queries What should I do?
  • What should I do to reduce SLA violations of
    requests A to lt1, without increasing violations
    of other requests?
  • Root-cause What-if

18
Querying Systems as Data
  • Instrumented traces, logs
  • System activity data
  • Data from active probing
  • Workload
  • System configuration data (e.g., buffer size,
    indexes)
  • Source code
  • Models
  • Analytic performance models
  • Machine learning models
  • Rules from system experts
  • Simulators

19
Querying Systems with QueS (30,000 ft)
20
Challenges Query Complexity
  • Support for complex queries
  • Rank probable causes of SLA violation rising to
    12?
  • What should I do queries
  • Queries are ad-hoc
  • Queries may be acquisitional

21
Challenges Query Specification
  • Declarative query language
  • Expressibility of language
  • Composition
  • Snapshot queries and continuous queries

22
Challenges Query Processing
  • Model-based query processing
  • Many types of data sources
  • Structured, semi-structured, and unstructured
  • Uncertainty in input data
  • E.g., legacy systems may have partial/no
    instrumentation
  • Imprecise answers
  • Answers may include quantification of accuracy
  • Ranking

23
Challenges Run-time Overhead
  • Real-time service for 24x7 systems
  • Tunable data acquisition
  • Active probing

24
Work in Progress
  • With Piyush Shivam
  • Models for answering queries about expected
    performance given a resource assignment, feasible
    resource assignments to meet SLA, what-if queries
    for scientific applications
  • With Songyun Duan
  • Use of Bayesian Networks for performance
    prediction and root-cause queries
  • With Wanhong Xu
  • What-if queries on configuration-parameter
    settings

25
Projects at HP Research
  • Project 1 Predicting performance problems,
    finding root cases of problems
  • Project 2 Debugging complex systems
  • Project 3 Designing adaptive systems
Write a Comment
User Comments (0)
About PowerShow.com