Title: Compressing Query Results for Mobile Clients
1Compressing Query Results for Mobile Clients
- Zhiyuan Chen and Praveen Seshadri
- Cornell University
2Motivation(1)
Results
Database Server
Slow network!!!
9.6 - 170 kbps, 1 - 10 minutes for 1 MB
Compress results on the server to save network
bandwidth.
Concern compress as much as possible.
3Motivation(2)
Database Server
Results
- I am a PDA, I cant store that much for later
use! - Often work offline. Need to store results for
later use. - Severe storage constraints.
- Usually only a small portion of results will be
accessed, random access possible.
Store the result compressed, decompress on demand.
Both space decompression cost matters!
4Why not just GZip or WinZip?
- Small decompression unit for PDA
- Utilize information of the query result
- Choose a combination of compression methods based
on semantic and statistical information of the
result. - Because different attributes have different
characteristics, there is no unique winner. - The choice is made by cost-based optimization.
reduce decompression cost
More compression
5An Example
- Query
- Select Year, Month, Day, Ticker, Low, High
- From Quotes where Year is between 1998 and 1999
- Ordered by Year, Month, Day
Semantic compression
Universal compression
Ziv-Lempel on each field
6Outline
- Related work
- An algebraic framework
- To represent the valid combinations.
- Compression optimization
- To choose the best combination.
- Experiments
- Chosen combinations beat universal compression
tools like WinZip.
7Related Work
- Compression community
- Different compression methods. Run-length
encoding, Ziv-Lempel, differential, etc. - Database community compress tables or indices
- Specific compression methods
- Iyer Wilhite - tuple level Ziv-Lempel.
- Goldstein etc.- per-page offset-encoding.
- Ng Ravishankar - tuple differential coding.
- Theodore Johnson - compressing bitmap indices.
- Impact on query processing
- (RothHorn, Graefe Shapiro, Theodore)
8Compression Framework
- Compressed results as a compressed table.
- A compression method as a compression operator
Input table - output table. - A combination of compressing methods as a
compression plan - Decompression operators and plans.
9Compressed Table
- Field and tuple boundaries may blur. - Compressed
data blocks - Extra information enables decompression. -
Compression schema
A compressed block a value compressed from some
cells in the uncompressed table. Also the unit of
decompression.
A compression schema includes Things
compressed to a compressed block, compression
method, relational schema.
10Compression Operator
- Defines how a compression method is applied on
what part of the input table. - The fields the operator is applied on.
- What input blocks will be compressed together.
- The compression method and information used in
compression.
- Applied on Customer ID
- Same Customer IDs compressed together.
- Method grouping.
11Compression Plan
A sequence of compression operators applied on
the original result table, each takes the output
of the previous operator as the input.
12Optimization - Cost Model
- Formula
- Cost w1 compression cost
- w2 decompression cost
- - w3 saving on network transfer
- - w4 saving on client side storage.
- Adjust weight based on goal of compression.
- Compression cost
- CPU speed, compression plans, results size.
- Decompression cost
- Client processor speed, access pattern, etc.
Provided by clients! - Network transfer saving - compressed results
size. - Client side storage saving - If decompress on
demand.
13Searching
- Naïve algorithm has exponential search space.
- Heuristics
- Consider semantic compression first.
- For each field, use the naïve algorithm to find
the best plan compressing this field only. - Combine these plans.
- Consider universal compression methods.
- Add Ziv-Lempel applied on each field only if this
will reduce the overall cost. - Not complete but polynomial search space
- O(of fields of valid plans on each field)
14Experiments
- Data
- TPCD
- Queries
- Adapted from TPCD queries by deleting aggregates.
- Experiment 1 To save network bandwidth.
- Measure the overall end-to-end time.
- Compression time transfer time decompression
time. - Experiment 2 To save PDA clients storage.
- Measure the space and random access time.
- Compare Common tools v.s. chosen combinations.
15- Semantic compression - allow decompress
individual attribute values. (S) - WinZip on the whole table. (W)
- WinZip applied on each field. (PW)
- Semantic compression WinZip applied on each
field. (SPW)
Un/compressed S 3.1 W 3.4 PW 4.4 SPW6.5
Modem
Wireless
Internet
16- Compression plans
- Semantic compression (S) - allow to decompress
an individual tuple. - Windows CEs default compressor. (D)
- Ziv-Lempel applied on each page. (Z)
- Result size 3.7 MB. 2 MB Data storage, 2MB
program storage. 50(1 for semantic compression)
KB program size.
Storage usage and time to randomly access 1000
tuples
17Summary
- A combination of compression methods based on
semantic and statistical information of the
result. - Choice made by cost-based optimization.
- A framework to model combinations of compression
methods. - Future work
- Apply the methodology to compress data tables.
- Joint optimization of result compression query
processing.