Computer science Department at University of Toronto - PowerPoint PPT Presentation

1 / 14

About This Presentation

Title:

Computer science Department at University of Toronto

Description:

Number of Views:50

Avg rating:3.0/5.0

Slides: 15

Provided by: dev117

Category:

more less

Transcript and Presenter's Notes

Title: Computer science Department at University of Toronto

1
Computer science Department at University of
Toronto

2
Research activities Database groups

Cilio A tool for generating mappings between
SQL and XML schemas. It can also generate queries
that convert data that conforms to one schema to
the data that conforms to another.
Conquer converts the given SQL query into one
that retrieves only the results that does not
violate any integrity constraints
Data Exchange The algorithmic and foundational
issues to deal with the problems of taking a data
in a source schema and produce a data in target
schema

3
Research activities Database groups contd..

ToXgene This generates consistent collection of
XML documents that are complex enough and conform
to the benchmark integrity constraints.
Limbo Hierarchical categorical clustering
algorithm that uses Information Bottleneck to
provide a distance measure and preserve the
relevant data when clustering.

4
Adaptive Processing of Top-k Queries in XML

Goal is to compute Top-k matches for queries in
XML.
Main reason for interests in Top-k matches for
queries is the increasing volume of XML
repository.
And user might not be interested in viewing less
relevant matches
Similar to what is done in search engine

5
Query evaluation

Compute exact and approximate matches to XPath
query
Approximate matches are obtained by relaxing the
XPath query
Use answer scores to prune irrelevant data
earlier when obtaining intermediate results
Choose a different execution plan for different
partial matches

6
Query relaxation
7
Query Relaxation
8
Query Relaxation

3 Types of relaxation
- edge generalization replacing a pc edge
with ad-(2b)
- leaf deletion making a leaf node
optional-(2d)
- subtree promotion moving a subtree from
its parent node to its grand parent node (2c)
Notion is that these Exact matches to the relaxed
queries are the approximate matches to the
original
query

9
Scoring Functions

Extension of the traditional tfidf function is
used
Query is decomposed into component predicates.
Ex /a./b and ./c.//d is decomposed into
aparentdoc-root, a./b, a./c, a.//d.
XML idf of a component predicate p(q0,qi) and a
XML database D the extent to which q0 nodes in
D additionally satisfies p(q0,qi)
XML tf of a component predicate p(q0,qi) and a
node n that belongs to D the number of distinct
ways in which n satisifies predicate p.

10
Whirlpool architecture
11
Whirlpool Architecture

One server queue for each node in XPath tree.
Root node doesnt have a queue as it initializes
set of partial matches.
Every other server maintains a priority queue of
partial matches. For each of the partial match,
it computes a set of matches that extends the
partial match with that node
Then it computes scores for each of the matches.
Checks and updates the top-k set

12
Top-k set

13
Router and Router Queue

Maintains matches based on the maximum possible
final scores
Determines the next server to process the partial
match and sends that partial match to the queue
of the respective servers.

14
Conclusion

Adaptive query processing ensures that different
partial matches might have gone through different
sets of server operations.
The Whirlpool Architecture incrementally computes
the scores for the partial matches during query
evaluation
It also prunes the intermediate matches similar
to the Query optimization in RDBMS.

Write a Comment

User Comments (0)