QPIAD: Query Processing over Incomplete Autonomous Databases presentation

About This Presentation

Title:

QPIAD: Query Processing over Incomplete Autonomous Databases

Description:

QPIAD: Query Processing over Incomplete Autonomous Databases –

Number of Views:211

Avg rating:3.0/5.0

Slides: 9

Provided by: 14916

Category:

more less

Transcript and Presenter's Notes

Title: QPIAD: Query Processing over Incomplete Autonomous Databases

1
QPIAD Query Processing over Incomplete
Autonomous Databases

Hemal Khatri (Arizona State University)
Jianchun Fan (Arizona State University)
Garrett Wolf (Arizona State University)
Yi Chen (Arizona State University)
Subbarao Kambhampati (Arizona State University)

2
Incompleteness in Web databases

Population by Lay Users

Automated Extraction
3
Problem
How to retrieve ranked relevant uncertain answers
for user queries?

Challenges
How to retrieve relevant uncertain answers
through form-based interfaces of autonomous
databases?
How to keep query processing cost manageable?
How to rank the retrieved uncertain answers?

Possible Approaches
QBody style Convt
1.CERTAIN ANSWERS ONLY Return certain answers
only as in traditional databases (Low Recall)
2. ALL RETURNED Return certain answers and
answers having body style value missing (Low
precision, Infeasible)
3. ALL RANKED Ranking all answers by predicting
values of missing attribute (Costly, Infeasible)

4
QPIAD System Architecture
5
Retrieving Relevant Answers via Query Rewriting
Given a query Q(Body styleConvt) retrieve all
relevant tuples
Base Result Set
Q
AFD Model Body style
Use F-measure to select top K Rewritten
Queries Q1 ModelA4 Q2 ModelZ4 Q3
ModelBoxster
Re-order top K queries based on Estimated
Precision
Ranked Relevant Uncertain Answers
F-Measure (1a)PR/(aPR) P Estimated
Precision R Estimated Recall based on P and
Estimated Selectivity
6
Learning Statistics to support Ranking Rewriting

Learning attribute correlations by Approximate
Functional Dependency(AFD) and Approximate
Key(AKey)

Determining Set(Y) dtrSet(Y)
Sample Database
Prune based on AKEY
TANE
AFDs (XY) confidence

Learning value distributions using Naïve Bayes
Classifiers(NBC)

Learn NBC classifiers with m-estimates
Determining Set(Am)
Feature Selection
Estimated Precision P(AmvmdtrSet(Am))

Learning Selectivity Estimates of Rewritten
Queries(QSel) based on
Selectivity of rewritten query issued on sample
Ratio of original database size over sample
Percentage of incomplete tuples while creating
sample

7
Empirical Evaluation
Two experimental databases Cars(Cars.com) and
Census(UCI ML)

Experimental Setup
Oracular study used to measure Precision/Recall
by artificially introducing missing values in
databases.
AFDs and NBC classifiers learned for various
samples sizes ranging from 3 to 15.
Purpose of Experiments
Measuring quality of uncertain results returned
by QPIAD Figure 1.
Efficiency of QPIAD in retrieving relevant
results Figure 2.
Robustness of the learning algorithms used in
QPIAD wrt to various sample sizes Figure 3.

Figure 1
Figure 3
Figure 2
8
QPIAD Web Interface

Write a Comment

User Comments (0)

About PowerShow.com