Title: Concise Description of Structured Sets
1Concise Description of Structured Sets
- Alberto O. Mendelzon
- Ken Q. Pu
- University of Toronto
2Motivation A true story
- An OLAP report writing software
GUI Range Selection
SQL Translation
RDBMS
Report Writer / Plotting
3The GUI
User selects the region of interest via
GUI CTRL-A selects ALL, Clickn-drag selects
LOTS.
4An Engineering Problem and Solution
- The Problem
- SQL statement has a length limit.
- User selection is LARGE, especially when the
visualization is plotting. - The Solution
- Break a large selection to a collection of SQL
queries. - The Complaint
- Kind of slow!
5A formalization
- A tree-structured categorical set (dimensions)
- A selected subset (user selection)
- Find the shortest possible description of the
selected subset (translated SQL query)
6Outline
- Formal definition of structured sets, expressions
and the Minimal Descriptive Length (MDL) problem - Partition structures
- Hierarchical structures
- Multidimensional product structures
7The structured set
- A finite universe U
- A finite alphabet ?
- An interpretation function ? ? ? P(U)
- E.g. a 1, 2, 3 1 1
- U/? is the set cover naturally induced by ? on U.
8Expressions
- A language L
- An evaluation function ? L ? P(U)
- The Propositional Language L (U, ?) L ? L
L L L L L - The L (U, ?) L ? L L
- is union, is difference, is
intersection.
9The L -MDL Problem
- Let V ? U, its descriptive lengthVL min
s s V and s ? L - The L -MDL decision problem VL ? k ?
- The L -MDL problemFind a (non-unique) compact
(shortest) expression for V in L.
10Complexity Issues
- The L (U, ?)-MDL problem is NP-complete.
- The L-MDL problem is NP-complete. reduction
from set-cover
Back to Engineering
- Is there anything good we can do?
11Tractable MDL-Problems
- Strategy reduce the complexity of the MDL
problem by restricting to a class of structures. - Two tractable cases
- Partition ? ?1 ? U, U/?1 is a partition
- Hierarchy ? ?N ? ?N-1 ? ?1 ? U, and U/?i ?
U/?i1 for all i gt 0.
12Partition
Ottawa
San Jose
Toronto
Windsor
San Francisco
San Diego
Kingston
U Ottawa, Toronto, Windsor, Kingston
San Jose, San Francisco, San Diego
?1 Ontario, California
13Efficient Symbols
- Efficient symbols for V ?(V) ? ??1 ?
? V ? ? V 1
- V Ottawa, Toronto, Kingston, San Francisco
?(V) Ontario - California is not efficient.
14Normal Form for Partitions
- N is a sub-language of L t (?1 ?2 ?3
) (a1 a2 a3 ) (b1 b2 b3 ) - t is N -compact for V if
- S ?(V)
- A V S
- B S V
S
A
B
15L-MDL for Partitions
- Every expression in L can be reduced to an
equivalent expression in N ! - To compute a compact expression for V
- ? (V) ?(V)
- ?(V) (V ? (V) , ? (V) V )
16An Example
Ottawa
San Jose
Toronto
Windsor
San Francisco
San Diego
Kingston
- V Ottawa, Toronto, Kingston, San Francisco
- ? (V) Ontario , ?(V) (San Francisco,
Windsor) - t Ontario San Francisco Windsor
17A Normal Form for Hierarchy
- Levels (?0U) ? ?1 ? ?2 ? ?3 ? ?N
- A recursive definition for N The expression t
is normal w.r.t level i ift t A B, with
A, B ? ?i and t is normal w.r.t. level (i1). - Examplet (( Canada California Ontario )
Toronto San Jose)
t
A
B
18L-MDL for Hierarchies
- Every expression in L can be reduced to an
equivalent expression in N ! - t t A B is a N -compact expression for V
if - t 1 ?1(V) and is N -compact
- (A, B) ?(V)
19An iterative algorithm
V
( VN AN - BN )
( s3 A3 - B3 )
( s1 A1 - B1 ) is compact
20Back to Engineering
21Multiple dimensions
- What about multi-dimensional sets
- V ? U1 ? U2
- U1 and U2 are hierarchically structured sets
- U1 has ? ?0 ? ?1 ? ?2 ?
- U2 has ? ?0 ? ?1 ? ?2 ?
- Form a new (product) structured set
- U U1 ? U2
- ? ? ? ?
- Interpretation
.
22Example
U1 -- Time Jan, Feb, Mar U2 --
Geography Ontario, California
Jan ? Ontario (Jan, Ontario)
L-MDL for product structure is NP-complete.
23Reduction by example
- Reduction from 3-set cover
24Related Work
- The generalized MDL approach for summarization,
Lakshmanan, Ng, Wang, Zhou, Johnson, VLDB, 1999 - limited expressiveness for multidimensional data
sets optimal expression is computed in
polynomial time - Multidimensional MDL problem resembles
rectilinear polygon packing and covering problem - Covering rectilinear polygons with axis-parallel
rectangles, Kumar, Ramesh, ACM STOC, 1999
25Whats next
?
- More complex structures -- ordering
- More complex operators -- range operator
- Conjectured and known NP-hardness of the MDL
problems - L (?, ) is NP-complete , polynomial if ? is
total.
26The End.