Chris Olston - PowerPoint PPT Presentation

1 / 28
About This Presentation
Title:

Chris Olston

Description:

Sources refresh when value exceeds bound ... value exceeds bounds (bound was too narrow) more precision required (bound was too wide) ... – PowerPoint PPT presentation

Number of Views:22
Avg rating:3.0/5.0
Slides: 29
Provided by: csr93
Category:
Tags: chris | exceeds | olston

less

Transcript and Presenter's Notes

Title: Chris Olston


1
Offering a Precision-Performance Tradeoff for
Aggregation Queries over Replicated Data
  • Chris Olston
  • Jennifer Widom

Stanford University
2
Replication Alternatives
performance
precision
3
Replication Alternative 1
Exact Cache
5
3
5
3
Source (fresh)
Source (fresh)
4
Replication Alternative 1
Exact Cache
5 8
3 4
Propagate all updates
5 8
3 4
Source (fresh)
Source (fresh)
5
Replication Alternative 1
Exact Cache
performance
AVG 6
5 8
3 4
exact cache
precision
Propagate all updates
5 8
3 4
Source (fresh)
Source (fresh)
6
Replication Alternative 2
Stale Cache
5
3
Periodic refresh
5 8
3 4
Source (fresh)
Source (fresh)
7
Replication Alternative 2
stale cache
Stale Cache
performance
5
3
AVG 4
precision
Periodic refresh
5 8
3 4
Source (fresh)
Source (fresh)
8
TRAPP Replication
Bounded Cache
4, 7
2, 4
5
3
Source (fresh)
Source (fresh)
9
TRAPP Replication
Bounded Cache
4, 7
2, 4
6, 10
Refresh when value exceeds bounds
5 8
3 4
Source (fresh)
Source (fresh)
10
TRAPP Replication
you decide
Bounded Cache
performance
AVG ? 4, 7
6, 10
2, 4
precision
8
4
Source (fresh)
Source (fresh)
11
Outline
  • TRAPP Architecture
  • Query Execution for Bounded Answers
  • Adjusting Bound Width
  • Related Work
  • Status and Future Work

12
Overview of TRAPP
  • Caches store bounds that include exact source
    values
  • Sources refresh when value exceeds bound
  • Queries over cached data include a precision
    constraint
  • Our algorithms answer queries by refreshing as
    few values as possible to meet precision
    constraint

13
Example TRAPP Query
Bounded Cache
AVG ? 4, 7 want within 1
6, 10
2, 4
8
4
Source (fresh)
Source (fresh)
14
TRAPP Architecture
query precision constraint
bounded answer
Source
value-initiated refresh
Cache
Refresh Monitor
query-initiated refresh
Query Processor
query-initiated refresh request
15
Precision-Performance Tradeoff
stale cache
TRAPP
performance
exact cache
precision
  • Higher precision requires more refreshing
  • Higher performance forces low precision

TRAPP offers a continuous tradeoff
16
Application Network Monitoring
latency bandwidth traffic
latency bandwidth traffic
latency bandwidth traffic
17
Query Execution for Bounded Answers
  • Input
  • Query aggregation w/selection predicate
  • Precision constraint
  • Set of bounded values with cost to refresh each
  • Step 1 Compute initial bounded answer
  • Step 2 Determine minimum-cost set of values to
    refresh that guarantee satisfaction of the
    precision constraint
  • Step 3 Use exact values from refreshes combined
    with bounds to compute final bounded answer

18
Example Query SUM
  • SELECT SUM(A) WITHIN 2
  • Steps 1 3 (computing bounded answer)

?
A
Li , Hi ? Li , ? Hi
L1, H1 L2, H2
2, 3 4, 8 6, 11
example
19
SUM, Choosing Tuples to Refresh
  • Isomorphic to 0/1 Knapsack Problem
  • Objective fill knapsack with bounds that would
    be expensive to refresh while not exceeding
    capacity
  • Knapsack contents set of bounds not to refresh
  • Benefit cost saved by not refreshing
  • Knapsack capacity precision constraint
  • Weight bound width
  • Knapsack is NP-Hard -- we use ?-approximation
    algorithm by Ibarra, Kim
  • Observation width of answer bound sum of
    non-refreshed bound widths
  • We need this quantity to be less than the
    precision constraint

20
SUM with a Selection Predicate
  • SELECT SUM(A) WITHIN 2 WHERE B gt 10
  • Three possibilities for each B value
  • LBi gt 10 (e.g., 15, 20) yes
  • HBi ? 10 (e.g., 5, 8) no
  • else (e.g., 9, 12) maybe
  • Ignore nos and process yess as before
  • For maybes, pretend that bound on A includes 0
  • e.g., A ? 3, 5 becomes A ? 0, 5

A
B
LA1, HA1 LA2, HA2
LB1, HB1 LB2, HB2
21
Other Aggregation Functions
  • COUNT
  • MIN/MAX
  • AVG
  • MEDIAN
  • see STOC00

22
Realizing the Precision-Performance Tradeoff
0
1000
2000
3000
4000
150 100 50
0
23
Adjusting Bound Width
  • Dynamically adjust bound width to minimize the
    probability of a refresh
  • Preliminary results indicate that this adaptive
    algorithm is promising

value-initiated refresh or
query-initiated refresh
value exceeds bounds (bound was too narrow)
more precision required (bound was too wide)
grow
shrink
24
Not Covered in Talk See Paper
  • Details on other aggregates
  • Many more examples
  • Joins
  • Time-varying bounds L(t), H(t)

25
Related Work
  • Approximate answers
  • Mostly precomputation or sampling to provide
    statistical results
  • Reduce representation size
  • e.g., Multi-resolution data model Read et al.
  • Still fetch all objects

26
Related Work (cont.)
  • Bounds on numerical values
  • e.g., Quasi-copies Alonso et al., Moving
    Objects Databases Wolfson et al., Demarcation
    Protocol Barbara/Garcia-Molina
  • No user control of precision-performance tradeoff
  • e.g., APPROXIMATE Jukic/Vrbsky, Constraint
    Databases, Incomplete Information Databases
    Abiteboul et al.
  • Bounded values are not approximations of exact
    values available at cost
  • Bounds on the number of updates
  • e.g., Divergence Caching Huang et al.
  • No bounds on the values themselves

27
Status and Future Work
  • Underway
  • Performance study on bound functions and width
    adjustment algorithms
  • Non-numeric data (e.g., WWW)
  • Multi-level replication systems
  • Other types of queries
  • Iterative refresh algorithms
  • Delaying the propagation of insertions and
    deletions
  • Planned
  • Investigation of real-time and consistency issues
  • Applying TRAPP to data visualization

28
Thats all folks!
  • To contact me
  • olston_at_db.stanford.edu
  • http//www.db.stanford.edu/olston
Write a Comment
User Comments (0)
About PowerShow.com