Query Planning with Limited Source Capabilities - PowerPoint PPT Presentation

1 / 30
About This Presentation
Title:

Query Planning with Limited Source Capabilities

Description:

Query Planning with Limited Source Capabilities Chen Li Stanford University Edward Y. Chang University of California, Santa Barbara – PowerPoint PPT presentation

Number of Views:169
Avg rating:3.0/5.0
Slides: 31
Provided by: Chen2165
Learn more at: https://www.ics.uci.edu
Category:

less

Transcript and Presenter's Notes

Title: Query Planning with Limited Source Capabilities


1
Query Planning with Limited Source Capabilities
  • Chen Li
  • Stanford University
  • Edward Y. Chang
  • University of California, Santa Barbara

2
Motivation
  • Heterogeneous information sources on the WWW
  • Information-integration systems
  • Limited query capabilities
  • Music stores amazon.com, cdnow.com.
  • Must specify a value of Artist or Title.
  • The sources do not answer queries such as Give
    me all your information about CDs.

3
Example
Query Find the prices of CDs containing a song
titled Friends.
4
Source tuples
Not all the tuples could be retrieved from
the sources due to the restrictions.
5
Traditional approach consider each join at a
time.
6
Our approach retrieve as many tuples as possible.
X
X
X
X
This approach could save the user 15 - 10 5!
7
Observations
  • Access views not in a join to retrieve bindings
  • Recursive process
  • Some tuples in the answer cannot be retrieved.

8
Questions
  • How to compute the maximal answer?
  • When should we access sources not in a query?
  • What sources should be accessed?

9
Source views
  • A set of source views V with binding patterns
  • b a value must be specified for the attribute
  • f free
  • Each view schema uses a set of global attributes

10
Queries
  • A query Q includes
  • Input attributes I
  • Output attributes O.

11
Connections
  • Connection a set of views that connect I and O
    in Q.
  • Meaning natural join of the views.
  • Universal-relation-like assumptions, but
    connections can be generated in various ways.

12
Question 1 Computing the maximal answer
  • Translate a query and source views into a Datalog
    program.
  • Borrowed the idea from Duschka and Levy
    IJCAI-97.
  • We eliminate useless source accesses.
  • Why Datalog programs? Recursion.

13
Constructing program ?(Q,V)
Connection rules ans(P) -
V1(s1, C) V2 (C, A, P) ans(P)
- V1(s1, C) V3 (C, A, P)
Fact rule song(s1) -
14
  • Binding assumptions
  • A binding for an attribute is from the
    attributes domain
  • Do not allow the strategy of trying all the
    possible strings to test the source (may not
    terminate)
  • Any binding is either obtained from the query, or
    from a tuple returned by a source query.
  • The program ?(Q,V) computes the maximal answer.

15
Question 2 when to access off-query sources?
Query Input A a1 Output D ? Connections
T1 v1,v3, T2 v2,v3 Not all the views need
to accessed.
16
(No Transcript)
17
Independent connections
  • A connection T is independent if all the views in
    T can be queried starting from the input
    attributes as the initial bindings and using only
    the views in T.
  • Theorem off-connection source accesses are only
    necessary for nonindependent connections.

18
Question 3 what sources should be accessed?
  • A view v is relevant to connection T if we may
    miss some answers to T when v is not used.
  • How to find all the relevant views of a
    nonindependent
  • connection?

19
Kernel
  • A kernel of a connection is a minimal set of
    attributes that need to be initially bound in
    addition to the input attributes to query the
    full connection.
  • A connection may have multiple kernels.

20
Algorithm FIND_REL Finding relevant views of a
connection
Find all the relevant views of connection T2
v2,v3
(1) Compute queryable views v1,v2 ,v3,v4,v5
(2) Find a kernel K of T2 K C
(3) Compute all the views that can help produce
bindings for the attributes in K R v1,v2 ,v4

(4) Return R ? T2 v1,v2 ,v3 ,v4.
21
Constructing an efficient program
  • Compute the relevant views for each connection
  • Take the union of all these relevant source
    views
  • Use these views to construct a new program
  • Remove useless rules.

22
Conclusions
  • A query-planning framework to compute the maximal
    answer to a query (Duschka and Levy IJCAI-97).
  • Techniques for telling when to access off-query
    views
  • Algorithms
  • finding all the relevant sources for a query
  • constructing an efficient program.

23
Other related work
  • Rajaraman, Sagiv, and Ullman PODS-95
  • Shows how to find an equivalent query rewriting
    using views with binding restrictions
  • We give the maximal rewriting of a query.
  • Optimizing conjunctive queries with binding
    restrictions
  • Yerneni, Li, Garcia-Molina, and Ullman ICDT-99
  • Florescu et al. SIGMOD-99.
  • Testing connection containment
  • Li Stanford-CS-TR 2000, using results of
    monadic programs to prove the problem is
    decidable.

24
Predicates
EDB predicates IDB predicates v1(S, C) V1 (S,
C) v2(C, A,P) V2 (C, A, P) v3(C, A, P) V3 (C,
A, P) cd(C) song(S) artist(A) price(P
) ans(P)
25
Evaluating program ?(Q,V)
  • Assume the right side of an ?-rule or a domain
    rule is
  • domA1(A1), , domAp(Ap), vi(A1,, Am)
  • Once we have bindings for domA1(A1), ,
    domAp(Ap), evaluate the rule and populate the
    domain predicates and ?-predicate.
  • Repeat until no more facts can be derived.
  • Compute the maximal answer to the query.

26
Forward-closure
Given views W ? V, and attributes X, the
forward-closure of X given W, denoted
f-closure(X,W), is the the set of views in W that
can be eventually queried by using the views in
W, starting from the initial bindings X.
27
Backward-closure
  • Backward-closure of a set of attributes X
    b-closure(X), is the set of views that can help
    retrieve bindings for X.

b-closure(C) v1,v2,v4
  • Lemma All backward-closures of a connection
    are
  • the same.

28
BF-chain, backward-closure
  • BF-chain
  • Backward-closure

b-closure(C) v1,v2,v4
29
Other possibilities of obtaining bindings
  • Cached data For a cached tuple ti(a1,a2) for
    view vi(A1,A2), add the following rules to the
    program ?(Q, V)
  • vi(a1,a2) -
  • domA1(a1) -
  • domA2(a2) -
  • Domain knowledge
  • student(name, dept, GPA).
  • dept CS, Physics, Chemistry, etc.

30
Computing a partial answer
  • Independent connections complete answers are
    computable.
  • Nonindependent connections access some relevant
    views. May terminate evaluating the program after
    some results are computed.
Write a Comment
User Comments (0)
About PowerShow.com