Ripple Joins for Online Aggregation - PowerPoint PPT Presentation

About This Presentation
Title:

Ripple Joins for Online Aggregation

Description:

Ripple Joins for Online Aggregation by Peter J. Haas and Joseph M. Hellerstein published in June 1999 presented by Nag Prajval B.C – PowerPoint PPT presentation

Number of Views:110
Avg rating:3.0/5.0
Slides: 25
Provided by: Rayt191
Learn more at: https://crystal.uta.edu
Category:

less

Transcript and Presenter's Notes

Title: Ripple Joins for Online Aggregation


1
Ripple Joins for Online Aggregation
  • by
  • Peter J. Haas and Joseph M. Hellerstein
  • published in June 1999
  • presented by
  • Nag Prajval B.C

2
In Simple words
  • This paper tells how to join a bunch of
    tables and get the SUM, COUNT, or AVG in GROUP BY
    clauses showing approximate results immediately
    and the confidence interval of the results from
    the first few tuples retrieved updating a GUI
    display with closer approximation information as
    the join adds more tuples.

3
Motivation
  • Traditional Join Algorithms
  • Nested- Loop Join
  • Merge Join
  • Hash Join
  • Main Goal These join algorithms are designed
    minimize the time to completion of query.

4
Online System
  • Huge database with large tables.
  • Select student.honors_code,avg(enroll grade)
  • From enroll, student
  • Where enroll.sidstudent.sid
  • Group by student.honors_code

5
Traditional Algorithms
  • Traditional algorithms take a lot of time since
    they have to process the entire tables or
    relations
  • The users have to wait for a long time before the
    results are returned.
  • Many times aggregation queries are used to get a
    big picture of a dataset.
  • More Appropriate method would be Online
    Aggregation.

6
Online Aggregation
  • A running estimate of the final aggregates are
    continuously displayed to the user.
  • The proximity of the running estimate to the
    final result is also displayed to the
    user.(confidence interval)
  • Quick results rather than minimize time for
    completion.
  • This required some changes to be made to the
    traditional algorithms for query processing.

7
GUI, 1999
8
Join Algorithms for Online Processing
  • Traditional Nested loop algorithm could be used
    in an online fashion.
  • Drawbacks
  • 1. If R was of non trivial size then time
    between successive updates could be excessive.
  • 2. consider a query like this
  • Select AVG(S.a R.b/1000000)
  • From S and R
  • Where S.xR.x
  • As scan progresses no new information is
    retrieved ie the Confidence Intervals do not
    shrink well.

9
Ripple Join Operation
  • Assume ripple join of relations R and S
  • Select a random tuple r from R.
  • Join with previously selected S tuples.
  • Select a random tuple s from S.
  • Join with previously selected R tuples.
  • Join r and s.

10
Example
11
Aspect Ratios
  • Aspect ratio How many tuples are retrieved from
    each base relation per sampling step.
  • e.g. ß1 1, ß2 3,
  • Square Ripple JoinSamples are drawn from R and S
    at the same rate.
  • Rectangular Ripple JoinOne relation is sampled
    at a higher rate than another to provide shortest
    possible Confidence Intervals.

12
Ripple Join Algorithm
  • For(max1 to infinity)
  • for(i1 to max-1)
  • if(predicate(Ri,smax))
  • output(Ri,Smax)
  • for(i1 to max)
  • if(predicate(Rmax,si))
  • output(Rmax,Si)

13
Ripple Join Algorithm
  • Generalization of Nested loop join
  • Roles of Inner and Outer Relations continuosly
    interchange during processing
  • Cursor at S fixed at maxn whil cursor into R
    loops from 1 to n-1
  • When cursor into R reaches value n,the cursor
    into S loops from 1 to n.

14
Why call this "Ripple Join"?
  1. The algorithm seems to ripple out from a corner
    of the join.
  2. Acronym "Rectangles of Increasing Perimeter
    Length"

15
Iterators
  • Ripple joins can be developed using the iterator
    model.
  • 2 variants
  • 1.Square Binary Ripple Join Iterator
  • 2.Enhanced Ripple Join Iterator

16
Square Binary Ripple Join Iterator
17
Enhanced Ripple Join Iterator
  • Drawbacks of Square Ripple join iterator are
    handled
  • 1)Non Unitary aspect ratios
  • 2)Pipelining multiple ripple joins

18
Ripple Join Variants
  • Block Ripple Join
  • Index Ripple Join
  • Hash Ripple Join

19
Performance
  • Goal Provide efficient and accurate estimation.
  • Join results should be returned in a such a way
    that aggregates are updated regularly and
    confidence intervals shrink rapidly.
  • Depends on 2 important things
  • 1.Estimators for SUM,COUNT and AVG
  • 2.Confidence intervals

20
Estimators for SUM,COUNT and AVG
  • Consider a query of the form
  • SELECT op(expression) from R,S
  • WHERE predicate
  • At end of nth sampling step estimator of
    SUM(expression) is

21
Estimators for SUM,COUNT and AVG
  • Rn and Sn Set of Tuples that have been
    read at nth sampling step
  • R and S Set of Relations
  • expressionP(r,s) equals expression(r,s) if (r,s)
    satisfies where clause and 0 otherwise
  • Count() expressionP(r,s) replaced by
    oneP(r,s) where oneP(r,s) 1 if (r,s)
  • satisfies where clause and 0 otherwise.
  • AVG(expression)SUM estimator/COUNT estimator.

22
Confidence interval
  • A running confidence interval displays how close
    this answer is to the final result.
  • This could be calculated in many ways.
  • The authors present an example calculation built
    on extending the Central Limit Theorem.

23
Ripple Joins
  • Ripple joins are designed to minimize the time
    until an acceptably precise estimate of the query
    result is available, as measured by the length of
    a confidence interval.
  • Ripple joins are adaptive, adjusting their
    behavior during processing in accordance with the
    statistical properties of the data.

24
Ripple Joins
  • Ripple joins also permit the user to dynamically
    trade off the two key performance factors of
    online aggregation the time between successive
    updates of the running aggregate, and the amount
    by which the confidence-interval length decreases
    at each update.
  • Ripple joins appear to be among the first
    database algorithms to use statistical
    information about the data not just to estimate
    selectivities and processing costs, but to
    estimate the quality of the result currently
    being displayed to the user and to dynamically
    adjust algorithm behavior accordingly.
Write a Comment
User Comments (0)
About PowerShow.com