Title: Online Query Processing
1Online Query Processing
- Joseph M. Hellerstein
- UC Berkeley
2Road Map
- Requisite tech trends and prognostications
- CONTROL of the situation online query processing
- HCI motivations
- Example applications
- Example Algorithms
- Reordering and Ripple Joins
- Implications for hot topics
- Big data, small pipes P2P, Sensor nets
- Continuous Queries over data streams
3Computers Keep Getting Faster
- Moores Law
- of transistors per chip doubles every 18
months (1965) - ? 2x price/performance every 18 months
- About right, 24 periods running
4But Data Grows Faster Yet!
Source J. Porter, Disk/Trend, Inc. http//www.dis
ktrend.com/pdf/portrpkg.pdf
5Disk Appetite, cont.
- Greg Papadopoulos, CTO Sun
- Disk sales doubling every 9 months
- Translate
- Time to process all your data doubles every 18
months - MOORES LAW INVERTED!
- Big challenge (opportunity?) for SW systems
research - Feel free to discount the analysis above
significantly - Even so, traditional scalability research wont
help - Ideal linear scaleup is NOT NEARLY ENOUGH!
- HW will not take us there.
6Data Volume Prognostications
- Today
- SwipeStream
- E.g. Wal-Mart 100 TB Data Warehouse
- ClickStream
- One service generates 100GB of log per day
- Web
- Internet Archive WayBack Machine 100 TB
- Replicated OS/Apps, MP3s, etc.
- Tomorrow
- Sensor feeds galore
- Big data, small pipes Sensor nets, P2P
- Continuous queries over data streams
7Road Map
- Requisite tech trends and prognostications
- CONTROL of the situation online query processing
- HCI motivations
- Example applications
- Example Algorithms
- Reordering and Ripple Joins
- Implications for hot topics
- P2P query systems
- Sensor nets
- Stream Query Processing
8The Latest Commercial Technology
9Drawbacks of Current Systems
- Only exact answers are available
- A losing proposition as data volume grows
- HCI solution interactive tools dont do big jobs
- E.g., spreadsheet programs (64Krow limit)
- Systems solution big jobs arent interactive
- No user feedback or control in big DBMS queries
- back to the 60s
- Long processing times
- Fundamental mismatch with preferred modes of HCI
- Best solutions to date precompute
- E.g. OLAP
- Dont handle ad hoc queries or data sets well
10CONTROLContinuous Output, Navigation and
Transformation with Refinement On-Line
- Of all men's miseries, the bitterest is thisto
know so much and have control over nothing - -- Herodotus
- Requirements for CONTROL systems
- Early answers
- Refinement over time
- Interaction and ad-hoc control
- Crystal Ball vs. Black Box
- vs. Lucite Watch
11CONTROL Project, 96-01
- Main SW Artifacts
- Online Aggregation in SQL
- Postgres, Informix, IBM DB2
- Potters Wheel
- Scalable Spreadsheet for Online Data Cleaning
- Interactive anomaly detection transformation
- CARMA online association rule mining
- 2 Online Data Viz prototypes
- CLOUDS Phoebus
- Seeded current Telegraph project
- Adaptive dataflow for querying networked sources
12Goals for Online Processing
- New greedy performance regime
- Maximize 1st derivative of the mirth index
- Mirth defined on-the-fly
- Therefore need FEEDBACK and CONTROL
100
Online
?
Traditional
Time
13Road Map
- Requisite tech trends and prognostications
- CONTROL of the situation online query processing
- HCI motivations
- Example applications
- Example Algorithms
- Reordering and Ripple Joins
- Implications for hot topics
- P2P query systems
- Sensor nets
- Stream Query Processing
14Online Aggregation
- SELECT AVG(temp) FROM t GROUP BY site
- 330K rows in table (synthetic data)
- the exact answer
15Online Aggregation, contd
- A simple online aggregation interface (after 74
rows)
Courtesy Peter Haas, IBM
16Online Aggregation, contd
Courtesy Peter Haas, IBM
17Example Online Aggregation
Additional Features Speed up Slow
down Terminate
18Online Data Visualization
19Online Browsing Editing
- Potters Wheel VLDB 2001
- Scalable spreadsheet
- A fraction of data is materialized in GUI widget
at any time - Scrolling preference for delivering quantiles
to widget - Interactive data cleaning
- Direct-manipulation interface via visual algebra
- Transformation by example
- Online structure and discrepancy detection
- MDL meets ADTs
- Minimum Description Length Rissinen for
modeling data - Based on a grammar of user-defined Abstract Data
Types - Simple API for programmers to register new ADTs
20Scalable Spreadsheets
21Visual Transformation Shot
22(No Transcript)
23Road Map
- Requisite tech trends and prognostications
- CONTROL of the situation online query processing
- HCI motivations
- Example applications
- Example Algorithms
- Reordering and Ripple Joins
- Implications for hot topics
- P2P query systems
- Sensor nets
- Stream Query Processing
24Sampling w/o Replacement
- We want i.i.d. samples w/o replacement
- At any time, the input to the query is a sample
- Input grows over time
- Can pre-sort tables randomly
- And start scans from random positions
- Best I/O behavior
- Can re-randomize incrementally in the background
- Note can do this as a secondary index
- Techniques for random sampling in DBs well
studied - Both from files and from indexes
- Some tricks here (e.g. acceptance-rejection
sampling)
25An AsideSampling in Commercial DBMSs
- Simulated Bernoulli sampling
- SQL SELECT WHERE RAND()
- Similar capability in SAS
- Bernoulli Sampling with pre-specified rate
- Informix, Oracle 8i, (DB2)
- Ex SELECT FROM T1 SAMPLE ROW(10), T2
- Ex SELECT FROM T1 SAMPLE BLOCK(10), T2
- Not for novices
- Need to pre-specify precision
- no feedback/control
- recall the multiresolution patterns from
example - No estimators provided in current systems
26Preferential Data Delivery
- Why needed? Examples.
- Speedup/slowdown arrows
- Spreadsheet scrollbars
- Pipeline quasi-sort
- Mechanisms
- Juggle Raman2/Hellerstein, VLDB 99
- General purpose works with streaming data, etc.
- Index stride Hellerstein/Haas/Wang, SIGMOD 97
- Needs index, high I/O cost, but good for outliers
- Mix adaptively! Raman/Hellerstein, ICDE 03
27Online Reordering
join
transmit
process
consume
- Deliver interesting items first
- Interesting determined on the fly
- Exploit rate gap between produce and
process/consume
28Online Reordering
join
transmit
process
consume
reorder
- Deliver interesting items first
- Interesting determined on the fly based on user
gestures at the interface - Exploit rate gap between produce and
process/consume
29Mechanism
process/consume
getNext
buffer
spool
prefetch
enrich
side disk
produce
- Two threads
- prefetch from input
- spool/enrich from auxiliary side disk, deliver
output - Juggle data between buffer and side disk
- keep buffer full of interesting items
- getNext chooses best item currently on buffer
- getNext, enrich/spool decisions -- based on
reordering policy - Side disk management
- hash index, populated in a way that postpones
random I/O - play both sides of sort/hash duality
30Policies
?
QoF
time
- Want a good permutation of items t1tn to
t?1t?n - Quality of Feedback for a prefix t?1t?2t?k
- QoF(UP(t?1), UP(t?2), UP(t?k )), UP user
preference - determined by application
- Goodness of reordering dQoF/dt
- Delivery Priority (DP)
- At any given time, try to deliver tuple that
improves QoF most
31Example Online Aggregation
- Metric avg weighted confidence interval
- Preference acts as weight on confidence interval
- QOF ?Si UPi / (ni)1/2
- ni number of tuples processed from group i
- DPi UPi / (ni)3/2
- Metric explicit proportional rates (NW QoS)
- QOF ?Si (ni nUPi)2
- where n Si ni
- DPi nUPj nj
- Explicit ranking (scrollbar)
- QOF ?Si UPi
- DPi UPi
32Ripple Joins
Haas/Hellerstein, SIGMOD 99
- Need online join processing
- Pipelined produce output while consuming input
- Statistically robust estimates/CIs as you go
- SELECT AVG(R.a S.b)
- FROM R, S
- WHERE R.c S.c
33Ripple Joins
Haas/Hellerstein, SIGMOD 99
- Need online join processing
- Pipelined produce output while consuming input
- Statistically robust estimates/CIs as you go
- SELECT AVG(R.a S.b)
- FROM R, S
- WHERE R.c S.c
- Solution space
- Simple idea sample a join
34Ripple Joins
Haas/Hellerstein, SIGMOD 99
- Need online join processing
- Pipelined produce output while consuming input
- Statistically robust estimates/CIs as you go
- SELECT AVG(R.a S.b)
- FROM R, S
- WHERE R.c S.c
- Solution space
- Simple idea sample a join
- Better join samples
35Ripple Joins
Haas/Hellerstein, SIGMOD 99
- Need online join processing
- Pipelined produce output while consuming input
- Statistically robust estimates/CIs as you go
- SELECT AVG(R.a S.b)
- FROM R, S
- WHERE R.c S.c
- Solution space
- Simple idea sample a join
- Better join samples
- Ripple Joins
- Family of algorithms for joining increasing
samples - Optimized to shrink confidence intervals ASAP
- Q Which input to sample faster? Aspect ratio
of ripples? - A Adapt to estimators variance contribution
from different inputs
36Traditional Nested Loops
SELECT AVG(R.a S.b) FROM R, S WHERE R.c S.c
R
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7
2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
S
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5
37Basic Ripple Join
- Simplest version
- read new tuples s from S and r from R
- join r and s
- join r with old S tuples
- join s with old R tuples
38Basic Ripple Join
R
x
S
39Better I/O Block Ripple Join
Batch IOs a 2
R
40Adapt Rectangular Ripple Join
bR 2, bS 1
R
41Ripple Joins, contd
- Really a family of join algorithms
- Block minimizes I/O in alternating nested loops
- Hash symmetric hash tables to avoid scans
- Index basic index join
- Adaptive aspect ratio
- User sets frame rate of animation at GUI
- System goal minibatch optimizations
- minimize CI width subject to frame-rate time
constraint - CIs computable at corners, so a new frame each
corner - Challenge choose the next minibatch corner
- Sample from higher variance-contributing relation
faster - System solves linear programming problem to do
this (approximately)
42Nice and Tidy! But
- Significant details missing
- Systems issues
- Make it work in a re-entrant iterator model
- RippleJoin.get_next() ?
- Multi-join queries
- Corner cases -- e.g. EOT on one input of k
- Statistical matters
- Estimators and CIs for joins of samples
- Not textbook statistics!
- The adaptivity issue choosing the next corner
- Based on cost (I/O model for algorithm) and
benefit (CI formula) - Approximate a non-linear integer programming
problem
43Implementation Issues
- Basically a symmetric nested loops
- inner and outer roles switch back and forth
- Note asymmetry in inner loops mitres the box
- for (ever)
- for (i 1 to max -1) // S fixed, loop on R
- if (predicate(Ri, Smax))
- output(Ri, Smax)
- for (i 1 to max) // R fixed, loop on S
- if (predicate(Rmax, Si))
- output(Rmax, Si)
-
44Now as an iterator
- Most DBMSs implement query operators as iterators
- get_next() must re-enter and generate next result
tuple - state for square, 2-table ripple join
- cursors on both inputs (as in nested loops join)
- target -- wheres the next corner
- currel -- which relation is currently looping
45Fussy complications in general
- Non-square aspect ratios
- Not just wrapping 1 layer on each side
- must pad side i with bi layers correctly
- Multiple joins pipeline of binary joins
- But cant be done naively
- Wrapping hypercubes not rectangles
- Given subtree of size i, factor in bs and a
- Subtrees hyperplane is b1b2biai tuples
- Boundary conditions
- Overload control messages
- EOT may mean end of sampling step, not end of
table - Need to signal aggregator of arrival at corner
(EOT on all tables) - Also, EOTable on a relation may change
mitre-ing rules!
46Empirical Results for Square Ripples (208 sec.
Query)
Select Online AVG(E.grade) From enroll E,
student S Where E.sid S.sid And
S.honors_code IS NULL
47Square Results (cont.)
48Statistical Matters
- Need robust formulas for
- running estimates
- running CIs
- Adaptive update of aspect ratio
- Efficient maintenance of running statistics
49Running Estimators
- ExampleSELECT SUM(expression) FROM R, SWHERE
predicate - Natural estimator after n sampling steps
- Unbiased, consistent
- A scaled average -- hence amenable to CLT
50Large-Sample Confidence Intervals
- Review single table R
- sampling steps 1
- Based on Central Limit Theorem for averages of
i.i.d. observations - Complications for multiple tables
- Estimator for AVG is a ratio of averages
- expressionp(r, s) and expressionp(r, s) are
correlated! - Solution
- CLT for (ratios of) cross-product averages
- Based on theory of convergence in distribution
- Courtesy Peter Haas, IBM
- Also conservative CIs that can be used for very
small n - Based on Hoeffding inequality
- And deterministic CIs for the endgame
51Large-Sample Confidence Intervals
- General form For K tables and a 100p
CIWhere - zp is the area under the standard normal curve 0
p - n is the number of tuples so far
-
- d(k) is agg-fn dependent, computed on tuples of
input - dn(k) a consistent estimator by applying d(k) to
samples so far - a the block size, a constant
- bi the number of blocks to fetch from relation i
- Tradeoff between update rate/CI shrinkage
- Bigger bs smaller CIs, slower animation
CI width
52Aspect Ratio Selection
- Cost of block ripple join after n steps
- c b1b2bkaK-1nK o(nK)
- Cost of hash ripple after n steps
- c (b1 b2 bk)n
- Non-linear integer programming problemminimize
r2(b1, b2, , bk, a) such that - c (b1, b2, , bk, a) ? constant animation
speed - 1 ? bk ? Rk/ a for 1 ? k ? K
- b1, b2, , bk integer
53Aspect Ratio Selection
- Adaptive procedure
- Start with b1 b2 bK 1
- Execute m sampling steps (m 1)
- Update each d(k) estimate
- Solve optimization problem to get new bs
- Sweep out new rectangle
- Go to 2
- Approximate optimization algorithm for bs
- First ignore all constraints, minimize r2
- Then scale up all so smallest bi 1 (constraint
2) - Scale down by ever-larger bis until animation
goal met (constraint 1) - Round down to integers (constraint 3)
- c (b1, b2, , bk, a) ? c
- 1 ? bk ? Rk/ a
- b1, b2, , bk integer
54Empirical Results aspect ratios
Select Online AVG(D.grade/A.grade) From enroll
D, enroll A Where D.college Education And
A.college Agriculture And A.year d.year
-- 1x6
55Ripple Joins, contd
- Prototypes in Postgres, Informix, IBM DB2
- Follow-on work
- Subqueries in Singapore Tan, et al, VLDB 99
- Scaling at Wisconsin/IBM Luo, et al, SIGMOD 02
- Parallel and out-of-core variants
- Query optimization issues
- Motivated eddies SIGMOD 99 and the rest of the
Telegraph project CIDR 03 - A number of API and other systems issues
- Informix implementation DMKD 2000
56Road Map
- Requisite tech trends and prognostications
- CONTROL of the situation online query processing
- HCI motivations
- Example applications
- Example Algorithms
- Reordering and Ripple Joins
- Implications for hot topics
- P2P query systems
- Sensor nets
- Stream Query Processing
57Two Hot DB Topics
- Queries over networks
- Querying sensor nets (e.g. TinyDB)
- Querying in P2P networks (e.g. PIER)
- Continuous queries over data streams
- E.g. TelegraphCQ, Aurora, STREAM, NiagaraCQ
- Natural settings for these ideas!
58Queries over Networks Big Data, Small Pipes
- Given Large Data Sets
- E.g. disks, traces in P2P setting
- E.g. the physical world in sensornets
- Constraint Small pipes
- P2P ! cluster IPTPS03
- Sensor nets have extremely limited comm per
battery - Will have long-running dataflow programs! So
- Need to provide streaming, approximate answers
- Need to prioritize delivery a la juggle
- Need to decide adaptively on how to sample
physical world - Acquisitional Query Processing in TinyDB
related to both Ripple Join and Juggle - Need to consider human factors while the query
runs
59Continuous Queries over Streams
- Hot topic of the moment
- Berkeleys TelegraphCQ, OGI/Wisconsins
NiagaraCQ, MIT/Brown/Brandeis Aurora, Stanfords
STREAM - Again, natural setting for these ideas
- Need pipelining operators for STREAMS
- Interest in approximate answers during bursts
- And prioritized delivery
- Need to consider human factors!
- DB community focused on the next query language
- Also need to focus on interactivity!
- Initiation of a Continuous Query is only half the
battle
60CONTROL Lessons Learned
- Dream about UIs, work on systems
- User needs drive systems design
- For long-running, data-intensive apps especially
- Systems and statistics intertwine
- Adaptive systems
- All 3 go together naturally
- User desires and behavior 2 more things to
model, predict - Performance metrics need to reflect key user
needs - What unlike things must meet and mate
- -- Art, Herman Melville
61Related Work on Online QP
- Morgensteins PhD, Berkeley 80
- Online Association Rules
- Ng, et als CAP, SIGMOD 98
- Hidbers CARMA, SIGMOD 99
- Implications for deductive DB semantics
- Monotone aggregation in LDL, Zaniolo and Wang
- Online agg with subqueries
- Tan, et al. VLDB 99
- Dynamic Pipeline Scheduling
- Urhan/Franklin VLDB 01
- Pipelining Hash Joins
- Su/Raschid, Wilschut/Apers, Tukwila, Xjoin
- Relation to semi-naive evaluation
- Anytime Algorithms
- Zilberstein, Russell, et al.
62More information?
- http//control.cs.berkeley.edu
- Papers
- Potters Wheel open source
- jmh_at_cs.berkeley.edu
63Backup Slides
64Sampling Design Issues
- Granularity of sample
- Instance-level (row-level) high I/O cost
- Block-level (page-level) high variability from
clustering - Type of sample
- Often simple random sample (SRS)
- Especially for on-the-fly
- With/without replacement usually not critical
- Data structure from which to sample
- Files or relational tables
- Indexes (B trees, etc)
65Row-level Sampling Techniques
- Maintain file in random order
- Sampling scan
- Is file initially in random order?
- Statistical tests needed e.g., Runs test,
Smirnov test - In DB systems cluster via RAND function
- Must freshen ordering (online reorg)
- On-the-fly sampling
- Via index on random column
- Else get random page, then row within page
- Ex extent-map sampling
- Problem variable number of records on page
66Acceptance/Rejection Sampling
- Accept row on page i with probability ni/nMAX
- Commonly used in other settings
- E.g. sampling from joins
- E.g. sampling from indexes
Original pages
Modified pages
67Cost of Row-Level Sampling
- 100,000 pages
- 200 rows/page
68Estimation for Aggregates
- Point estimates
- Easy SUM, COUNT, AVERAGE
- Hard MAX, MIN, quantiles, distinct values
- Confidence intervals a measure of precision
- Two cases single-table and joins
69Confidence Intervals
70The Good and Bad News
- Good news 1/n1/2 magic (n chosen on-the-fly)
- Bad news needle-in-a-haystack problem
71Running confidence intervals (1)
- Confidence parameter p?(0,1) is prespecified
- Display a precision parameter ?n such that
running aggregate Yn is within ? ?n of the final
answer µ with probability approximately equal to
p. Yn- ?n,Yn ?n contains µ with probability
approximately equal to p
72Running confidence intervals (2)
- Three types to construct from n retrieved
records - Conservative confidence intervals based on
Hoeffdings inequality or recent extention of
this inequality, for all n1 - Large-sample confidence intervals based on
central limit theorems (CLTs), for n both small
and large enough - Deterministic confidence intervals contain µ with
probability 1, only for very large n
73Running confidence intervals (3)
- SELECT AVG(exp) FROM R
- v(i) (1 ? i ? m) the value of exp when applied
to tuple i - Li the random index of the ith tuple retrieved
from R - a and b are a priori bounds a ? v(i) ? b for 1 ?
i ? m - Conservative confidence interval equations
74Running confidence intervals (4)
- Large-sample confidence interval equations
- By central limit theorems (CLTs) , Ynapproaches
a normal distribution with a mean (m) and a
variance s2/n as n, the sample size, increases.
s2 can be replaced by the estimator Tn,2(v)
75Estimators for SUM, COUNT and AVG
76Confidence intervals for ripple joins
- Use central limit theorems (CLTs) to compute
large-sample confidence intervals - Fix the problems in classic CLTs with newly
defined ?2 for different aggregate queries
77Ripple optimizationchoosing aspect ratios (1)
- Blocking factor ? is prespecified, we want to
optimize ?ks the aspect-ratio parameters - minimize
- such that
- ?1?2 ?3 ...?K ?K-1?c (decided by animation speed)
- 1 ? ?k ? mk/ ? for 1 ? k ? K
- ?1,?2 ,?3 ,...?K interger
78Precomputation Explicit
- OLAP Data Cubes (drill-down hierarchies)
- MOLAP, ROLAP, HOLAP
- Semantic hierarchies
- APPROXIMATE (Vrbsky, et al.)
- Query Relaxation, e.g. CoBase
- Multiresolution Data Models (Silberschatz/Reed/Fus
sell) - More general materialized views
- See Gupta/Mumicks text
79Precomputation Stat. Summaries
- Histograms
- Originally for aggregation queries, many flavors
- Extended to enumeration queries recently
- Multi-dimensional histograms
- Parametric estimation
- Wavelets and Fractals
- Discrete cosine transform
- Regression
- Curve fitting and splines
- Singular-Value Decomposition (aka LSI, PCA)
- Indexes hierarchical histograms
- Ranking and pseudo-ranking
- Aokis use of GiSTs as estimators for ADTs
- Data Mining
- Clustering, classification, other
multidimensional models
80Precomputed Samples
- Materialized sample views
- Olkens original work
- Chaudhuri et al. join samples
- Statistical inferences complicated over
recycled samples? - Barbarás quasi-cubes
- AQUA join synopses on universal relation
- Maintenance issues
- AQUAs backing samples
- Can use fancier/more efficient sampling
techniques - Stratified sampling or AQUAs congressional
samples - Haas and Swami AFV statistics
- Combine precomputed outliers with on-the-fly
samples