Improving Database Availability by Recollecting Statistics at Optimal Intervals - PowerPoint PPT Presentation

1 / 12
About This Presentation
Title:

Improving Database Availability by Recollecting Statistics at Optimal Intervals

Description:

Once we have mean sojourn time and transition-matrix, we can calculate the mean ... Sojourn time distribution in each state. Probability transition matrix (P) ... – PowerPoint PPT presentation

Number of Views:73
Avg rating:3.0/5.0
Slides: 13
Provided by: people3
Category:

less

Transcript and Presenter's Notes

Title: Improving Database Availability by Recollecting Statistics at Optimal Intervals


1
Improving Database Availability by
Recollecting Statistics at Optimal Intervals
Ranjith Vasireddy Dr. Shivnath Babu Prof. Kishor
S. Trivedi rv5, shivnath, kst _at_duke.edu Duke
University
  • April 26, 2006
  • IBM University Day

2
Motivation
  • Databases
  • Database statistics play a vital-role in
    determining the optimal execution plan by the
    query optimizer
  • Insertions, deletions and updates on a table
    would
  • degrade the statistics of a table demanding for
    re-collection of statistics.
  • Availability
  • Database Server should be UP to serve
  • Five-Nines Available (downtime of 5.26 minutes
    per year)
  • In this work, database server is considered
    available, if it is able to respond within user
    tolerable response-time

3
Problem statement possible solution
  • Recollecting statistics very often might make
    database server unavailable to serve clients
  • Delaying statistics collection beyond certain
    time might lead to selection of bad query
    execution plan by the optimizer thereby leading
    to higher response-times
  • Therefore, the problem of how often to
    'rejuvenate' the statistics is vital, as
    statistics collection is a computationally-expensi
    ve process
  • Possible Solution Techniques from Software
    Rejuvenation
  • How often to rejuvenate software how often to
    recollect statistics

4
Method from Software Rejuvenation Approach
  • Method To find out the optimal interval using
    Semi-Markov process (SMP)
  • Collect the response-times by executing sql
    queries on a database server
  • Cluster the response times in to few states
  • Clusters form the states of Semi-Markov chain
  • Classify states of SMP into available and
    unavailable states, based on threshold
    response-time
  • Now, the problem is to find out the time that the
    system takes to move from an available state to
    one of the unavailable states in an SMP

5
Computing Optimal Interval
  • To find out such optimal interval, we need
  • Mean sojourn time in each state based on observed
    response times (ideally sojourn time
    distribution)
  • Empirical transition-matrix based on clusters
  • Once we have mean sojourn time and
    transition-matrix, we can calculate the mean time
    to enter into unavailable state from an available
    state in an SMP

6
Semi-Markov Process (SMP)
  • Semi-Markov Process is characterized by
  • Sojourn time distribution in each state
  • Probability transition matrix (P) among the
    states
  • Vector v indicates steady state probability of
    embedded DTMC
  • we subject to normalization condition ?vi 1, to
    solve for vi
  • ?i vihi/(?(vihi)) , hi is mean sojourn time
    in state i
  • We get steady-state probability of state i in a
    SMP
  • ?i can be used to compute steady-state
    availability

7
Four states in an SMP
  • Four states of SMP
  • Highly Available (1)
  • Available (2)
  • Unavailable (3)
  • Highly Unavailable (4)
  • Reasons for more than two states
  • To know whether failure is gradual or sudden
  • If gradual, depending on the system state at
    given time, we can predict the failure
  • Otherwise, looking at system state at any given
    time may not be of much help
  • TKF Time the system takes to move from state K
    to state 3 or state 4
  • T1F h1 p12T2F (h1 p12 h2)/(1- p12
    p21)

8
Example SMP
  • Embedded DTMC
  • For eg. P41 (41)/(4j) 2/6 1/3

9
Experiments on MySQL server
  • Experimentation on tables with 1 million, 3
    million and 10 million records
  • Each record has 3 int elements
  • Each column can have values between 0,100)
  • Each client executes insert, delete, range-query
    selected at random
  • Each delete is
  • Delete from table Y where a? and b?
  • P(Aa Bb) (1/100)(1/100) 1/(104)
  • Since table size is x106 records x 102 104
    (x1, 3, 10)
  • Each such delete deletes x100 records
    probabilistically
  • Each insert should insert x100 records to keep
    the table size roughly the same

10
Results
Greater than Optimal Interval
Optimal Interval
Less than Optimal Interval
Without statistics collection
Greater than Optimal Interval
Without statistics collection
Optimal Interval
Less than Optimal Interval
Optimal Interval
Without statistics collection
Greater than Optimal Interval
11
Future work
  • Good results, but there are some inconsistencies
  • To answer such inconsistencies
  • Apply load-based software rejuvenation techniques

12
References
  • Software Rejuvenation http//srejuv.ee.duke.edu
  • A. Aboulnaga, P. Haas, S. Lightstone, G. Lohman,
    V. Markl, I. Popivanov, V. Raman , Automated
    Statistics Collection in DB2 UDB, Proceedings of
    the Intl. Conf. on Very Large Databases (VLDB),
    2004
  • Kishor S. Trivedi. Probability and Statistics
    with Reliability, Queuing, and Computer Science
    Applications, second-edition, John Wiley, 2001.
  • Kalyanraman Vaidyanathan, Kishor S. Trivedi A
    Comprehensive Model for Software Rejuvenation.
    IEEE Trans. Dependable Sec. Comput. 2(2)
    124-137 (2005)
Write a Comment
User Comments (0)
About PowerShow.com