QPIAD: Query Processing over Incomplete Autonomous Databases - PowerPoint PPT Presentation

1 / 8
About This Presentation
Title:

QPIAD: Query Processing over Incomplete Autonomous Databases

Description:

QPIAD: Query Processing over Incomplete Autonomous Databases – PowerPoint PPT presentation

Number of Views:211
Avg rating:3.0/5.0
Slides: 9
Provided by: 14916
Category:

less

Transcript and Presenter's Notes

Title: QPIAD: Query Processing over Incomplete Autonomous Databases


1
QPIAD Query Processing over Incomplete
Autonomous Databases
  • Hemal Khatri (Arizona State University)
  • Jianchun Fan (Arizona State University)
  • Garrett Wolf (Arizona State University)
  • Yi Chen (Arizona State University)
  • Subbarao Kambhampati (Arizona State University)

2
Incompleteness in Web databases
  • Population by Lay Users

Automated Extraction
3
Problem
How to retrieve ranked relevant uncertain answers
for user queries?
  • Challenges
  • How to retrieve relevant uncertain answers
    through form-based interfaces of autonomous
    databases?
  • How to keep query processing cost manageable?
  • How to rank the retrieved uncertain answers?
  • Possible Approaches
  • QBody style Convt
  • 1.CERTAIN ANSWERS ONLY Return certain answers
    only as in traditional databases (Low Recall)
  • 2. ALL RETURNED Return certain answers and
    answers having body style value missing (Low
    precision, Infeasible)
  • 3. ALL RANKED Ranking all answers by predicting
    values of missing attribute (Costly, Infeasible)

4
QPIAD System Architecture
5
Retrieving Relevant Answers via Query Rewriting
Given a query Q(Body styleConvt) retrieve all
relevant tuples
Base Result Set
Q
AFD Model Body style
Use F-measure to select top K Rewritten
Queries Q1 ModelA4 Q2 ModelZ4 Q3
ModelBoxster
Re-order top K queries based on Estimated
Precision
Ranked Relevant Uncertain Answers
F-Measure (1a)PR/(aPR) P Estimated
Precision R Estimated Recall based on P and
Estimated Selectivity
6
Learning Statistics to support Ranking Rewriting
  • Learning attribute correlations by Approximate
    Functional Dependency(AFD) and Approximate
    Key(AKey)

Determining Set(Y) dtrSet(Y)
Sample Database
Prune based on AKEY
TANE
AFDs (XY) confidence
  • Learning value distributions using Naïve Bayes
    Classifiers(NBC)

Learn NBC classifiers with m-estimates
Determining Set(Am)
Feature Selection
Estimated Precision P(AmvmdtrSet(Am))
  • Learning Selectivity Estimates of Rewritten
    Queries(QSel) based on
  • Selectivity of rewritten query issued on sample
  • Ratio of original database size over sample
  • Percentage of incomplete tuples while creating
    sample

7
Empirical Evaluation
Two experimental databases Cars(Cars.com) and
Census(UCI ML)
  • Experimental Setup
  • Oracular study used to measure Precision/Recall
    by artificially introducing missing values in
    databases.
  • AFDs and NBC classifiers learned for various
    samples sizes ranging from 3 to 15.
  • Purpose of Experiments
  • Measuring quality of uncertain results returned
    by QPIAD Figure 1.
  • Efficiency of QPIAD in retrieving relevant
    results Figure 2.
  • Robustness of the learning algorithms used in
    QPIAD wrt to various sample sizes Figure 3.

Figure 1
Figure 3
Figure 2
8
QPIAD Web Interface
Write a Comment
User Comments (0)
About PowerShow.com