Computer science Department at University of Toronto - PowerPoint PPT Presentation

1 / 14
About This Presentation
Title:

Computer science Department at University of Toronto

Description:

Cilio A tool for generating mappings between SQL and XML schemas. ... Limbo Hierarchical categorical clustering algorithm that uses Information ... – PowerPoint PPT presentation

Number of Views:50
Avg rating:3.0/5.0
Slides: 15
Provided by: dev117
Category:

less

Transcript and Presenter's Notes

Title: Computer science Department at University of Toronto


1
Computer science Department at University of
Toronto
  • Devi Alagappan
  • 3/24/06

2
Research activities Database groups
  • Cilio A tool for generating mappings between
    SQL and XML schemas. It can also generate queries
    that convert data that conforms to one schema to
    the data that conforms to another.
  • Conquer converts the given SQL query into one
    that retrieves only the results that does not
    violate any integrity constraints
  • Data Exchange The algorithmic and foundational
    issues to deal with the problems of taking a data
    in a source schema and produce a data in target
    schema

3
Research activities Database groups contd..
  • ToXgene This generates consistent collection of
    XML documents that are complex enough and conform
    to the benchmark integrity constraints.
  • Limbo Hierarchical categorical clustering
    algorithm that uses Information Bottleneck to
    provide a distance measure and preserve the
    relevant data when clustering.

4
Adaptive Processing of Top-k Queries in XML
  • Goal is to compute Top-k matches for queries in
    XML.
  • Main reason for interests in Top-k matches for
    queries is the increasing volume of XML
    repository.
  • And user might not be interested in viewing less
    relevant matches
  • Similar to what is done in search engine

5
Query evaluation
  • Compute exact and approximate matches to XPath
    query
  • Approximate matches are obtained by relaxing the
    XPath query
  • Use answer scores to prune irrelevant data
    earlier when obtaining intermediate results
  • Choose a different execution plan for different
    partial matches

6
Query relaxation
7
Query Relaxation
8
Query Relaxation
  • 3 Types of relaxation
  • - edge generalization replacing a pc edge
    with ad-(2b)
  • - leaf deletion making a leaf node
    optional-(2d)
  • - subtree promotion moving a subtree from
    its parent node to its grand parent node (2c)
  • Notion is that these Exact matches to the relaxed
    queries are the approximate matches to the
    original
  • query

9
Scoring Functions
  • Extension of the traditional tfidf function is
    used
  • Query is decomposed into component predicates.
    Ex /a./b and ./c.//d is decomposed into
    aparentdoc-root, a./b, a./c, a.//d.
  • XML idf of a component predicate p(q0,qi) and a
    XML database D the extent to which q0 nodes in
    D additionally satisfies p(q0,qi)
  • XML tf of a component predicate p(q0,qi) and a
    node n that belongs to D the number of distinct
    ways in which n satisifies predicate p.

10
Whirlpool architecture
11
Whirlpool Architecture
  • One server queue for each node in XPath tree.
  • Root node doesnt have a queue as it initializes
    set of partial matches.
  • Every other server maintains a priority queue of
    partial matches. For each of the partial match,
    it computes a set of matches that extends the
    partial match with that node
  • Then it computes scores for each of the matches.
  • Checks and updates the top-k set

12
Top-k set
  • Determines if the new match does one of the
    following
  • Updates the score of an existing match in the set
  • Replaces an existing match in the set
  • The match is pruned.

13
Router and Router Queue
  • Maintains matches based on the maximum possible
    final scores
  • Determines the next server to process the partial
    match and sends that partial match to the queue
    of the respective servers.

14
Conclusion
  • Adaptive query processing ensures that different
    partial matches might have gone through different
    sets of server operations.
  • The Whirlpool Architecture incrementally computes
    the scores for the partial matches during query
    evaluation
  • It also prunes the intermediate matches similar
    to the Query optimization in RDBMS.
Write a Comment
User Comments (0)
About PowerShow.com