Title: CONTROL:
1CONTROL
- Continuous Output and Navigation Technology with
Refinement On-Line
CONTROL group Joe Hellerstein, Ron Avnur,
Christian Hidber, Bruce Lo, Chris Olston,
Vijayshankar Raman, Tali Roth, Kirk Wylie, UC
Berkeley
2Batch vs. On-Line Processing
- Batch Processing
- Gives 100 accurate answers, but users must wait
for entire query to finish . . . - On-Line Processing
- Gives progressively refining answers as the query
runs! - Allow users to control processing.
- Applications of On-Line Processing
- Large, ad-hoc queries in domains where
approximate answers are acceptable (big
picture)
3Demo Outline
- On-Line Aggregation
- Refining estimates
- Statistics give confidence
- User Control
- The user can speed up the processing of certain
groups - The user can stop the processing at any time
- On-Line Visualization
- Displays an approximation of an image based on
data while the data is being fetched - Shows the estimated density and distribution of
data
4On-Line Agg. Query Processing
- New Access Methods
- Randomly delivered data.
- Index Striding
- We can take advantage of B-Trees to access the
groups - Heap Striding
- More generally, on-line permutation
- Non-blocking Join Algorithms
- Ripple Join Family
- RIPL Rectangles of Increasing Perimeter Length
- Join progressively larger samples of two tables
5Access Methods for On-Line Agg.
- Index Stride
- Round-robin through the groups to get a fair
sample - Works with an index on the grouping column
- Heap Stride (On-Line Permutation)
- Reorder tuples on the fly to get a fair sample
6Multi-Table On-Line Aggregation
- Progressively refining join Ripple Join
- Ever-larger rectangles in R ? S
- Comes in naive, block, and hash flavors
- Benefits
- sample from both relations simultaneously
- gives better statistical confidences much faster
- intimate relationship between delivery and
estimation
7On-Line Aggregation User Interface
Estimates for Each Group
User Controls
Graph of Estimates w/Confidence Intervals
8On-Line Visualization CLOUDS
- CLOUDS displays an approximation of an image
based on data while the data is being fetched
Conventional Algorithm
CLOUDS Algorithm
CLOUDS (with Index)
Note that CLOUDS predicts the high density of
cities in the Midwest
9Quantifying the benefit of CLOUDS
CLOUDS gives a better approximate image faster
than the conventional algorithm
Conventional
Error
CLOUDS
Time (seconds)