Querying Heterogeneous Information Sources Using Source Descriptions - PowerPoint PPT Presentation

1 / 20
About This Presentation
Title:

Querying Heterogeneous Information Sources Using Source Descriptions

Description:

Meaning of the capability record (Sin,Sout,Ssel,min,max) is: ... The elements in Ssel are the parameters on which the source can apply numerical selections ... – PowerPoint PPT presentation

Number of Views:44
Avg rating:3.0/5.0
Slides: 21
Provided by: steliospa
Category:

less

Transcript and Presenter's Notes

Title: Querying Heterogeneous Information Sources Using Source Descriptions


1
Querying Heterogeneous Information Sources Using
Source Descriptions
  • Presentation by Pantouvakis Stelios

2
Basic Problem
  • Keyword search based tools available for the WWW
    is useful for unstructured data.
  • Not effective for structured sources.
  • Need for more complex queries.
  • Use a uniform interface for various databases.

3
The System must
  • Find relevant sources
  • Query each source appropriately
  • Obtain answers
  • Combine answers from multiple sources
  • Answer the users query
  • Our System is Information Manifold (IM)

4
Key difficulties
  • Very large number of sources, so we must have
    enough information to prune the sources accessed
    in answering a specific query (good pruning
    techniques required).
  • Sources contain incomplete information.

5
IM description
  • Provides uniform access to a heterogeneous
    collection of more than 100 information sources
    on the WWW, by
  • a mechanism to describe the contents and the
    query capabilities of the information sources
  • an algorithm that uses the source descriptions to
    create query plans that can access several
    sources to answer a query

6
Data Model
  • Relational model augmented with certain
    object-oriented features
  • Classes and class hierarchy (partial order
    such that C D, whenever C subclass of D)
  • Set of attributes associated with each class
  • Class inherits attributes from its superclasses
  • Attributes may be single- or multi-valued
  • Relations contain tuples while classes contain
    objects.

7
Data Model
  • Each object has a unique identifier.
  • Object may belong to more than one class
  • It is possible to declare a pair of classes to be
    disjoint (no object can belong to both)
  • We associate a unary relation with each class and
    a binary relation with each attribute

8
Describing Sources
  • Use of a World View (like a schema with no stored
    data in relations and classes)
  • We describe the contents of relations in the
    sources as queries over the world-view relations
    and the comparison predicates
  • We describe the capabilities of sources using
    capability records of the form (Sin,Sout,Ssel,min,
    max)

9
Describing Sources
  • Meaning of the capability record
    (Sin,Sout,Ssel,min,max) is
  • To get a tuple from a relation R we must be given
    at least min elements of Sin.
  • The source returns the parameters in Sout.
  • The elements in Ssel are the parameters on which
    the source can apply numerical selections
    (,lt,?,)

10
An exampleThe Sources
11
An exampleThe classes
12
An exampleSource Descriptions
13
Query Plans
  • Query Plan is a sequence of accesses to
    information sources interspersed with local
    processing operations.
  • A plan to answer a given a query consists of a
    set of conjunctive plans
  • Conjunctive plan is a conjunctive query that
    specifies the inputs and outputs of every
    subgoal.

14
Query Plans
  • From the conjunctive plan P we build the
    conjunctive query P by replacing each subgoal
    with the appropriate content description query of
    the specific source.
  • P is semantically correct if the answer to P is
    a subset of the initial query (we dont require
    them to be equivalent).

15
Algorithms for Answering Queries
  • The algorithm for generating executable query
    plans has two steps
  • Generate semantically correct conjunctive query
    plans. (details in different paper)
  • Order the conjuncts to ensure that they are
    executable.

16
First StepCreating semantically correct plans
  • Generate semantically correct conjunctive query
    plans amounts to finding a conjunctive plan Q
    that uses only the source relations and is
    contained in the given query Q.
  • Analogous to problem of answering queries using
    views (NP-complete)
  • This algorithm reduces the number of considered
    rewritings

17
First StepCreating semantically correct plans
  • Compute a bucket for each subgoal in the query,
    containing the information sources from which
    tuples of the subgoal can be obtained.
  • Consider and check correctness of all possible
    combinations of information sources, one from
    each bucket.
  • Minimize each plan by removing redundant subgoals

18
Second StepFinding an Executable Ordering
  • BindAvail Set of variables in Q bound by
    values in query
  • Qout Head variables of Q
  • For i1,,n
  • find any subgoal of Q, not chosen already, that
    has at least the minimum required parameters
    (recall min and Sin) in BindAvail.
  • If such subgoal is found it is the ith in order
  • and BindAvail BindAvail U Sout of subgoal
  • Else return plan not executable.
  • End For
  • If Qout not a subset of BindAvail return plan not
    executable.

19
IM Architecture
20
Conclusions
  • Guaranteed solutions (algorithm finds all and
    only relevant sources)
  • Polynomial time execution.
  • All queries asked to the system are conjunctive
    (fields forms required?)
Write a Comment
User Comments (0)
About PowerShow.com