Title: Rada Chirkova
1Materializing ViewsWith Minimal SizeTo Answer
Queries
- Rada Chirkova
- (North Carolina State University)
- and Chen Li
- (University of California, Irvine)
2Materializing Minimal-Size Views
- Context relational databases
- The problem minimize the amount of data required
to answer queries, by - automatically designing new relations (views),
and - precomputing and storing (materializing) the new
relations - Central issue inventing new views to materialize
- Applications include
- Mediators in data-integration systems
- Database as a service in enterprise computing
2
3Example Modified TPC-H Query
- Q(name,o_date,priority,comment,o_key,quantity,
shipmode) - - customer(c_key,name,building),
- order(o_key,c_key,o_date,priority,comment),
- lineitem(lineno,o_key,quantity,shipmode).
- V1(name,o_date,priority,comment,o_key) -
- customer(c_key,name,building),
- order(o_key,c_key,o_date,priority,comment),
- lineitem(lineno,o_key,quantity,shipmode).
- V2(o_key,quantity,shipmode) -
- customer(c_key,name,building),
- order(o_key,c_key,o_date,priority,comment),
- lineitem(lineno,o_key,quantity,shipmode).
3
4Partial Answer to the Query Q
Name O_Date Priority Comment O_Key Quantity
Shipmode
Tom 3/14/95 0 close 134721
26 REG AIR
Tom 3/14/95 0 close 134721
75 REG AIR
Tom 3/14/95 0 close 134721
43 AIR
Jack 12/21/94 0 final
571683 43 MAIL
Jack 12/21/94 0 final
571683 33 AIR
4
5Minimal-Size Views for the Query Q
- Q(name,o_date,priority,comment,o_key,quantity,
shipmode) - - customer(c_key,name,building),
- order(o_key,c_key,o_date,priority,comment),
- lineitem(lineno,o_key,quantity,shipmode).
- V1(name,o_date,priority,comment,o_key) -
- customer(c_key,name,building),
- order(o_key,c_key,o_date,priority,comment),
- lineitem(lineno,o_key,quantity,shipmode).
- V2(o_key,quantity,shipmode) -
- customer(c_key,name,building),
- order(o_key,c_key,o_date,priority,comment),
- lineitem(lineno,o_key,quantity,shipmode).
5
6Questions
- How do we know that views V1 and V2 are
minimal-size views for the query Q? On what
databases? - How to find a set of minimal-size views, given a
set of queries and a database - Is the problem decidable? For what inputs?
- What is the complexity of the problem?
- Are there good efficient algorithms for finding
minimal-size views?
6
7Preliminaries
- Two queries are equivalent if they return the
same answers on any database. - An equivalent rewriting of a query Q in terms of
views V is a query that - is defined using the relations in V only, and
- is equivalent to Q
- A conjunctive query (view) can be defined using
only equality selections, projections, and joins - A disjunctive query (view) can be defined as a
union of a finite number of conjunctive queries
(views)
7
8Problem Specification
- Input
- Database instance D with schema R
- Workload Q of queries on D
- Output (optimal solution) a set V of views, such
that - each query in Q has an equivalent rewriting in
terms of V, and - the total size of the views, SVi ÃŽ V size(Vi),
is minimal on D
8
9Assumptions
- Single database instance
- Set semantics
- Finite query workloads
- Conjunctive queries
- Disjunctive views and rewritings
9
10Main Results
- Decidability and upper bounds on the complexity
of the problem - Relationship between
- a restriction on the language of the queries,
and - the language of optimal views
- Dynamic-programming algorithm for finding an
optimal solution for conjunctive queries
(restricted case)
10
11Conjunctive Views and Rewritings
- Theorem. Given a query workload Q and a database
D. - It is possible to construct a finite search
space of views that includes all views in all
optimal solutions for Q on D. - The number of views in the search space is at
most doubly-exponential in the size of the input
query workload Q. - Corollary. The problem of finding a minimal-size
conjunctive viewset is decidable for finite
workloads of conjunctive queries, assuming all
rewritings are conjunctive.
11
12Self-Joins in Queries
- Q1(X,Y) - p(X,Z), p(Z,T), s(Z,Y). // self-join
- Q2(X,Y) - p(X,Z), r(Z,T), s(Z,Y). // no
self-joins - Result 1. For some databases and queries, there
is a set of disjunctive views that is better
than any conjunctive solution. - Example for a single query with self-joins
- Result 2. The problem of finding an optimal
solution in the space of disjunctive views
is decidable, assuming conjunctive rewritings. - Result 3. It is not necessary to consider
disjunctive rewritings. - Result 4. The size of the search space of views
is at most triply-exponential in the size of the
input query workload.
12
13Queries Without Self-Joins The
Problem Is in NP
13
14Queries Without Self-Joins The
Problem Is in NP
disjunctive views
13
15Queries Without Self-Joins The
Problem Is in NP
disjunctive views
conjunctive views
13
16Queries Without Self-Joins The
Problem Is in NP
disjunctive views
conjunctive views
subexpression views
13
17Queries Without Self-Joins The
Problem Is in NP
disjunctive views
conjunctive views
subexpression views
full-reducer views
13
181. Conjunctive Views Are Enough
- Theorem. Given a database D and a set of queries
Q without self-joins. - Suppose a set V of disjunctive views is a
solution for (D,Q). - Then there exists another solution V for (D,Q),
such that - all views in V are conjunctive, and
- size (V) size (V).
- Corollary. For any database and any set of
queries without self-joins, - some optimal disjunctive solution is a set of
conjunctive views.
14
19What We Have Shown
disjunctive views
conjunctive views
15
20Idea of the Proof
- Given Q() - S1(), S2(), , Sn()
- rewriting P of Q that uses V
- V V1 È V2 È È Vt
- Then there exists
- V V1 È V2 È È Vt
- such that
- for some mapping m, each Vi is an image of Vi,
and - each Vi alone can replace any Vj in the
rewriting of Q
16
21Details of the Proof (1)
- P º Q, P P1 È P2 È ... È Ps
- There exists a conjunctive query Pi Pi º Q
- Pi () - Vi1(), , Vij(), , Vim(), G().
- Fix any Vij in Pi consider, in P,
- Pr () - Vij(), , Vij(), , Vij(),
G(). - Because Pr is contained in Q,
- there exists a mapping b from Q to
the expansion of Pr - We can always change b, to redirect all
subgoals of Q that map into subgoals of Vij
in Pr
17
22Details of the Proof (1)
- P º Q, P P1 È P2 È ... È Ps
- There exists a conjunctive query Pi Pi º Q
- Pi () - Vi1(), , Vij(), , Vim(), G().
- Fix any Vij in Pi consider, in P,
- Pr () - Vij(), , Vij(), , Vij(),
G(). - Because Pr is contained in Q,
- there exists a mapping b from Q to
the expansion of Pr - We can always change b, to redirect all
subgoals of Q that map into subgoals of Vij
in Pr
17
23Details of the Proof (1)
- P º Q, P P1 È P2 È ... È Ps
- There exists a conjunctive query Pi Pi º Q
- Pi () - Vi1(), , Vij(), , Vim(), G().
- Fix any Vij in Pi consider, in P,
- Pr () - Vij(), , Vij(), , Vij(),
G(). - Because Pr is contained in Q,
- there exists a mapping b from Q to
the expansion of Pr - We can always change b, to redirect all
subgoals of Q that map into subgoals of Vij
in Pr
17
24Details of the Proof (1)
- P º Q, P P1 È P2 È ... È Ps
- There exists a conjunctive query Pi Pi º Q
- Pi () - Vi1(), , Vij(), , Vim(), G().
- Fix any Vij in Pi consider, in P,
- Pr () - Vij(), , Vij(), , Vij(),
G(). - Because Pr is contained in Q,
- there exists a mapping b from Q to
the expansion of Pr - We can always change b, to redirect all
subgoals of Q that map into subgoals of Vij
in Pr
17
25Details of the Proof (2)
- We can always change b, to redirect all
subgoals of Q that map into subgoals of more
than one Vij in Pr - Then, we can replace Pr with Pr
- Pr() - Vij(), , Vij(), , Vij(), G().
- Pr()- Vij(), G().
- And Pr º Q
18
26Details of the Proof (3)
- Changing b, to redirect all subgoals of Q
that map into subgoals of Vij in Pr - Q() - , Sk(,W,),
- Prexp() - , Sk(,Y,), , Sk(,Y,),
- Pr() - Vij(), Vij(), , Vij(), G()
19
27Details of the Proof (3)
- Changing b, to redirect all subgoals of Q
that map into subgoals of Vij in Pr - Q() - , Sk(,W,),
- Prexp() - , Sk(,Y,), , Sk(,Y,),
- Pr() - Vij(), Vij(), , Vij(), G()
19
28Details of the Proof (3)
- Changing b, to redirect all subgoals of Q
that map into subgoals of Vij in Pr - Q() - , Sk(,W,),
- Prexp() - , Sk(,Y,), , Sk(,Y,),
- Pr() - Vij(), Vij(), , Vij(), G()
b
19
29Details of the Proof (3)
- Changing b, to redirect all subgoals of Q
that map into subgoals of Vij in Pr - Q() - , Sk(,W,),
- Prexp() - , Sk(,Y,), , Sk(,Y,),
- Pr() - Vij(), Vij(), , Vij(), G()
b
19
30Details of the Proof (3)
- Changing b, to redirect all subgoals of Q
that map into subgoals of Vij in Pr - Q() - , Sk(,W,),
- Prexp() - , Sk(,Y,), , Sk(,Y,),
- Pr() - Vij(), Vij(), , Vij(), G()
b
19
31Details of the Proof (3)
- Changing b, to redirect all subgoals of Q
that map into subgoals of Vij in Pr - Q() - , Sk(,W,),
- Prexp() - , Sk(,Y,), , Sk(,Y,),
- Pr() - Vij(), Vij(), , Vij(), G()
b
19
32Details of the Proof (3)
- Changing b, to redirect all subgoals of Q
that map into subgoals of Vij in Pr - Q() - , Sk(,W,),
- Prexp() - , Sk(,Y,), , Sk(,Y,),
- Pr() - Vij(), Vij(), , Vij(), G()
b
b
19
33Details of the Proof (4)
- Thus, we can replace Pr with Pr
- Pr() - Vij(), , Vij(), , Vij(), G().
- Pr()- Vij(), G().
- And Pr º Q
20
342. Subexpression Views Are Enough
- Theorem. Given a database D and a set of queries
Q without self-joins. - Suppose a set V of disjunctive views is a
solution for (D,Q). - Then there exists another solution V for (D,Q),
such that - all views in V are conjunctive
subexpression-type, and - size (V) size (V).
- Corollary. For any database and set of queries
without self-joins, - some optimal disjunctive solution is a set of
conjunctive subexpression-type views. - The size of the search space of views is at most
singly-exponential in the size of the input
query workload
21
353. Full-Reducer Views Are Enough
- A view V is a full-reducer view for a query Q if
V and Q have the same body. - Theorem. Given a database D and a single query Q
without self-joins. - Suppose a set V of disjunctive views is a
solution for (D,Q). - Then there exists another solution V for (D,Q),
such that - all views in V are conjunctive full-reducer
views for Q, and - size (V) size (V).
- Corollary. For any database and any query without
self-joins, some optimal disjunctive solution
is a set of conjunctive full-reducer views.
22
36Using Full-Reducer Views To Rewrite
Sets of Queries
- For query workloads with more than one query, we
can merge optimal full-reducer views for
individual queries in the workload - - and the number of subgoals in the merged views
never exceeds the number of subgoals in
full-reducer views.
23
37What We Have Shown
disjunctive views
conjunctive views
subexpression views
full-reducer views
24
38The Problem Is in NP
- Theorem. Given a database instance, for any
finite workload of conjunctive queries without
self-joins, - the problem of finding a minimal-size disjunctive
viewset is in NP.
25
39Generating Minimal-Size Views
- Input a conjunctive query without self-joins and
a database - Output a minimal-size disjunctive viewset for
the query on the database - Method produce a minimal-size set of conjunctive
full-reducer views, - by doing exhaustive search in the space of the
views - using a dynamic-programming algorithm (cf. query
optimization in System R) - The algorithm returns an optimal solution
- Can be modified to work for non-singleton query
workloads
26
40Heuristics for Generating Views
- Consider only those views that cover up to a
fixed number of subgoals of the query - Consider only those views that have up to a fixed
number of head attributes - Apply the algorithm separately to several subsets
of subgoals of the query, then combine the
solutions
27
41Main Results
- Decidability and upper bounds on the complexity
of the problem - Relationship between
- a restriction on the language of the queries,
and - the language of optimal views
- Dynamic-programming algorithm for finding an
optimal solution for conjunctive queries
(restricted case)
28
42Some Directions of Future Work
- Rewriting queries in more expressive languages
- built-in predicates
- disjunctive queries
-
- Using more expressive languages of views and
rewritings - Maximally-contained rewritings of queries in
terms of views
29
43Reference
- Jia Li, Rada Chirkova, and Chen Li.
- Minimizing Data-Communication Costs by
Decomposing Query Results in Client-Server
Environments. - UCI ICS Technical Report, 2003.
- http//www-db.ics.uci.edu/pages/raccoon/
30