Rada Chirkova - PowerPoint PPT Presentation

About This Presentation
Title:

Rada Chirkova

Description:

(North Carolina State University) and Chen Li (University of California, Irvine) ... change b, to redirect all subgoals of Q that map into subgoals of Vij in Pr ... – PowerPoint PPT presentation

Number of Views:26
Avg rating:3.0/5.0
Slides: 44
Provided by: Pooh6
Learn more at: https://ics.uci.edu
Category:

less

Transcript and Presenter's Notes

Title: Rada Chirkova


1
Materializing ViewsWith Minimal SizeTo Answer
Queries
  • Rada Chirkova
  • (North Carolina State University)
  • and Chen Li
  • (University of California, Irvine)

2
Materializing Minimal-Size Views
  • Context relational databases
  • The problem minimize the amount of data required
    to answer queries, by
  • automatically designing new relations (views),
    and
  • precomputing and storing (materializing) the new
    relations
  • Central issue inventing new views to materialize
  • Applications include
  • Mediators in data-integration systems
  • Database as a service in enterprise computing

2
3
Example Modified TPC-H Query
  • Q(name,o_date,priority,comment,o_key,quantity,
    shipmode) -
  • customer(c_key,name,building),
  • order(o_key,c_key,o_date,priority,comment),
  • lineitem(lineno,o_key,quantity,shipmode).
  • V1(name,o_date,priority,comment,o_key) -
  • customer(c_key,name,building),
  • order(o_key,c_key,o_date,priority,comment),
  • lineitem(lineno,o_key,quantity,shipmode).
  • V2(o_key,quantity,shipmode) -
  • customer(c_key,name,building),
  • order(o_key,c_key,o_date,priority,comment),
  • lineitem(lineno,o_key,quantity,shipmode).

3
4
Partial Answer to the Query Q
Name O_Date Priority Comment O_Key Quantity
Shipmode
Tom 3/14/95 0 close 134721
26 REG AIR
Tom 3/14/95 0 close 134721
75 REG AIR
Tom 3/14/95 0 close 134721
43 AIR
Jack 12/21/94 0 final
571683 43 MAIL
Jack 12/21/94 0 final
571683 33 AIR
4
5
Minimal-Size Views for the Query Q
  • Q(name,o_date,priority,comment,o_key,quantity,
    shipmode) -
  • customer(c_key,name,building),
  • order(o_key,c_key,o_date,priority,comment),
  • lineitem(lineno,o_key,quantity,shipmode).
  • V1(name,o_date,priority,comment,o_key) -
  • customer(c_key,name,building),
  • order(o_key,c_key,o_date,priority,comment),
  • lineitem(lineno,o_key,quantity,shipmode).
  • V2(o_key,quantity,shipmode) -
  • customer(c_key,name,building),
  • order(o_key,c_key,o_date,priority,comment),
  • lineitem(lineno,o_key,quantity,shipmode).

5
6
Questions
  • How do we know that views V1 and V2 are
    minimal-size views for the query Q? On what
    databases?
  • How to find a set of minimal-size views, given a
    set of queries and a database
  • Is the problem decidable? For what inputs?
  • What is the complexity of the problem?
  • Are there good efficient algorithms for finding
    minimal-size views?

6
7
Preliminaries
  • Two queries are equivalent if they return the
    same answers on any database.
  • An equivalent rewriting of a query Q in terms of
    views V is a query that
  • is defined using the relations in V only, and
  • is equivalent to Q
  • A conjunctive query (view) can be defined using
    only equality selections, projections, and joins
  • A disjunctive query (view) can be defined as a
    union of a finite number of conjunctive queries
    (views)

7
8
Problem Specification
  • Input
  • Database instance D with schema R
  • Workload Q of queries on D
  • Output (optimal solution) a set V of views, such
    that
  • each query in Q has an equivalent rewriting in
    terms of V, and
  • the total size of the views, SVi ÃŽ V size(Vi),
    is minimal on D

8
9
Assumptions
  • Single database instance
  • Set semantics
  • Finite query workloads
  • Conjunctive queries
  • Disjunctive views and rewritings

9
10
Main Results
  • Decidability and upper bounds on the complexity
    of the problem
  • Relationship between
  • a restriction on the language of the queries,
    and
  • the language of optimal views
  • Dynamic-programming algorithm for finding an
    optimal solution for conjunctive queries
    (restricted case)

10
11
Conjunctive Views and Rewritings
  • Theorem. Given a query workload Q and a database
    D.
  • It is possible to construct a finite search
    space of views that includes all views in all
    optimal solutions for Q on D.
  • The number of views in the search space is at
    most doubly-exponential in the size of the input
    query workload Q.
  • Corollary. The problem of finding a minimal-size
    conjunctive viewset is decidable for finite
    workloads of conjunctive queries, assuming all
    rewritings are conjunctive.

11
12
Self-Joins in Queries
  • Q1(X,Y) - p(X,Z), p(Z,T), s(Z,Y). // self-join
  • Q2(X,Y) - p(X,Z), r(Z,T), s(Z,Y). // no
    self-joins
  • Result 1. For some databases and queries, there
    is a set of disjunctive views that is better
    than any conjunctive solution.
  • Example for a single query with self-joins
  • Result 2. The problem of finding an optimal
    solution in the space of disjunctive views
    is decidable, assuming conjunctive rewritings.
  • Result 3. It is not necessary to consider
    disjunctive rewritings.
  • Result 4. The size of the search space of views
    is at most triply-exponential in the size of the
    input query workload.

12
13
Queries Without Self-Joins The
Problem Is in NP
13
14
Queries Without Self-Joins The
Problem Is in NP
disjunctive views
13
15
Queries Without Self-Joins The
Problem Is in NP
disjunctive views
conjunctive views
13
16
Queries Without Self-Joins The
Problem Is in NP
disjunctive views
conjunctive views
subexpression views
13
17
Queries Without Self-Joins The
Problem Is in NP
disjunctive views
conjunctive views
subexpression views
full-reducer views
13
18
1. Conjunctive Views Are Enough
  • Theorem. Given a database D and a set of queries
    Q without self-joins.
  • Suppose a set V of disjunctive views is a
    solution for (D,Q).
  • Then there exists another solution V for (D,Q),
    such that
  • all views in V are conjunctive, and
  • size (V) size (V).
  • Corollary. For any database and any set of
    queries without self-joins,
  • some optimal disjunctive solution is a set of
    conjunctive views.

14
19
What We Have Shown
disjunctive views
conjunctive views
15
20
Idea of the Proof
  • Given Q() - S1(), S2(), , Sn()
  • rewriting P of Q that uses V
  • V V1 È V2 È È Vt
  • Then there exists
  • V V1 È V2 È È Vt
  • such that
  • for some mapping m, each Vi is an image of Vi,
    and
  • each Vi alone can replace any Vj in the
    rewriting of Q

16
21
Details of the Proof (1)
  • P º Q, P P1 È P2 È ... È Ps
  • There exists a conjunctive query Pi Pi º Q
  • Pi () - Vi1(), , Vij(), , Vim(), G().
  • Fix any Vij in Pi consider, in P,
  • Pr () - Vij(), , Vij(), , Vij(),
    G().
  • Because Pr is contained in Q,
  • there exists a mapping b from Q to
    the expansion of Pr
  • We can always change b, to redirect all
    subgoals of Q that map into subgoals of Vij
    in Pr

17
22
Details of the Proof (1)
  • P º Q, P P1 È P2 È ... È Ps
  • There exists a conjunctive query Pi Pi º Q
  • Pi () - Vi1(), , Vij(), , Vim(), G().
  • Fix any Vij in Pi consider, in P,
  • Pr () - Vij(), , Vij(), , Vij(),
    G().
  • Because Pr is contained in Q,
  • there exists a mapping b from Q to
    the expansion of Pr
  • We can always change b, to redirect all
    subgoals of Q that map into subgoals of Vij
    in Pr

17
23
Details of the Proof (1)
  • P º Q, P P1 È P2 È ... È Ps
  • There exists a conjunctive query Pi Pi º Q
  • Pi () - Vi1(), , Vij(), , Vim(), G().
  • Fix any Vij in Pi consider, in P,
  • Pr () - Vij(), , Vij(), , Vij(),
    G().
  • Because Pr is contained in Q,
  • there exists a mapping b from Q to
    the expansion of Pr
  • We can always change b, to redirect all
    subgoals of Q that map into subgoals of Vij
    in Pr

17
24
Details of the Proof (1)
  • P º Q, P P1 È P2 È ... È Ps
  • There exists a conjunctive query Pi Pi º Q
  • Pi () - Vi1(), , Vij(), , Vim(), G().
  • Fix any Vij in Pi consider, in P,
  • Pr () - Vij(), , Vij(), , Vij(),
    G().
  • Because Pr is contained in Q,
  • there exists a mapping b from Q to
    the expansion of Pr
  • We can always change b, to redirect all
    subgoals of Q that map into subgoals of Vij
    in Pr

17
25
Details of the Proof (2)
  • We can always change b, to redirect all
    subgoals of Q that map into subgoals of more
    than one Vij in Pr
  • Then, we can replace Pr with Pr
  • Pr() - Vij(), , Vij(), , Vij(), G().
  • Pr()- Vij(), G().
  • And Pr º Q

18
26
Details of the Proof (3)
  • Changing b, to redirect all subgoals of Q
    that map into subgoals of Vij in Pr
  • Q() - , Sk(,W,),
  • Prexp() - , Sk(,Y,), , Sk(,Y,),
  • Pr() - Vij(), Vij(), , Vij(), G()

19
27
Details of the Proof (3)
  • Changing b, to redirect all subgoals of Q
    that map into subgoals of Vij in Pr
  • Q() - , Sk(,W,),
  • Prexp() - , Sk(,Y,), , Sk(,Y,),
  • Pr() - Vij(), Vij(), , Vij(), G()

19
28
Details of the Proof (3)
  • Changing b, to redirect all subgoals of Q
    that map into subgoals of Vij in Pr
  • Q() - , Sk(,W,),
  • Prexp() - , Sk(,Y,), , Sk(,Y,),
  • Pr() - Vij(), Vij(), , Vij(), G()

b
19
29
Details of the Proof (3)
  • Changing b, to redirect all subgoals of Q
    that map into subgoals of Vij in Pr
  • Q() - , Sk(,W,),
  • Prexp() - , Sk(,Y,), , Sk(,Y,),
  • Pr() - Vij(), Vij(), , Vij(), G()

b
19
30
Details of the Proof (3)
  • Changing b, to redirect all subgoals of Q
    that map into subgoals of Vij in Pr
  • Q() - , Sk(,W,),
  • Prexp() - , Sk(,Y,), , Sk(,Y,),
  • Pr() - Vij(), Vij(), , Vij(), G()

b
19
31
Details of the Proof (3)
  • Changing b, to redirect all subgoals of Q
    that map into subgoals of Vij in Pr
  • Q() - , Sk(,W,),
  • Prexp() - , Sk(,Y,), , Sk(,Y,),
  • Pr() - Vij(), Vij(), , Vij(), G()

b
19
32
Details of the Proof (3)
  • Changing b, to redirect all subgoals of Q
    that map into subgoals of Vij in Pr
  • Q() - , Sk(,W,),
  • Prexp() - , Sk(,Y,), , Sk(,Y,),
  • Pr() - Vij(), Vij(), , Vij(), G()

b
b
19
33
Details of the Proof (4)
  • Thus, we can replace Pr with Pr
  • Pr() - Vij(), , Vij(), , Vij(), G().
  • Pr()- Vij(), G().
  • And Pr º Q

20
34
2. Subexpression Views Are Enough
  • Theorem. Given a database D and a set of queries
    Q without self-joins.
  • Suppose a set V of disjunctive views is a
    solution for (D,Q).
  • Then there exists another solution V for (D,Q),
    such that
  • all views in V are conjunctive
    subexpression-type, and
  • size (V) size (V).
  • Corollary. For any database and set of queries
    without self-joins,
  • some optimal disjunctive solution is a set of
    conjunctive subexpression-type views.
  • The size of the search space of views is at most
    singly-exponential in the size of the input
    query workload

21
35
3. Full-Reducer Views Are Enough
  • A view V is a full-reducer view for a query Q if
    V and Q have the same body.
  • Theorem. Given a database D and a single query Q
    without self-joins.
  • Suppose a set V of disjunctive views is a
    solution for (D,Q).
  • Then there exists another solution V for (D,Q),
    such that
  • all views in V are conjunctive full-reducer
    views for Q, and
  • size (V) size (V).
  • Corollary. For any database and any query without
    self-joins, some optimal disjunctive solution
    is a set of conjunctive full-reducer views.

22
36
Using Full-Reducer Views To Rewrite
Sets of Queries
  • For query workloads with more than one query, we
    can merge optimal full-reducer views for
    individual queries in the workload
  • - and the number of subgoals in the merged views
    never exceeds the number of subgoals in
    full-reducer views.

23
37
What We Have Shown
disjunctive views
conjunctive views
subexpression views
full-reducer views
24
38
The Problem Is in NP
  • Theorem. Given a database instance, for any
    finite workload of conjunctive queries without
    self-joins,
  • the problem of finding a minimal-size disjunctive
    viewset is in NP.

25
39
Generating Minimal-Size Views
  • Input a conjunctive query without self-joins and
    a database
  • Output a minimal-size disjunctive viewset for
    the query on the database
  • Method produce a minimal-size set of conjunctive
    full-reducer views,
  • by doing exhaustive search in the space of the
    views
  • using a dynamic-programming algorithm (cf. query
    optimization in System R)
  • The algorithm returns an optimal solution
  • Can be modified to work for non-singleton query
    workloads

26
40
Heuristics for Generating Views
  • Consider only those views that cover up to a
    fixed number of subgoals of the query
  • Consider only those views that have up to a fixed
    number of head attributes
  • Apply the algorithm separately to several subsets
    of subgoals of the query, then combine the
    solutions

27
41
Main Results
  • Decidability and upper bounds on the complexity
    of the problem
  • Relationship between
  • a restriction on the language of the queries,
    and
  • the language of optimal views
  • Dynamic-programming algorithm for finding an
    optimal solution for conjunctive queries
    (restricted case)

28
42
Some Directions of Future Work
  • Rewriting queries in more expressive languages
  • built-in predicates
  • disjunctive queries
  • Using more expressive languages of views and
    rewritings
  • Maximally-contained rewritings of queries in
    terms of views

29
43
Reference
  • Jia Li, Rada Chirkova, and Chen Li.
  • Minimizing Data-Communication Costs by
    Decomposing Query Results in Client-Server
    Environments.
  • UCI ICS Technical Report, 2003.
  • http//www-db.ics.uci.edu/pages/raccoon/

30
Write a Comment
User Comments (0)
About PowerShow.com