Approximate Selection Queries over Imprecise Data - PowerPoint PPT Presentation

About This Presentation
Title:

Approximate Selection Queries over Imprecise Data

Description:

Get selectivity estimates ... Query selectivity. Input Uncertainty (ratio of YES/MAYBE objects) ... Cost increases as selectivity increases, since more objects ... – PowerPoint PPT presentation

Number of Views:51
Avg rating:3.0/5.0
Slides: 26
Provided by: Informatio367
Learn more at: https://ics.uci.edu
Category:

less

Transcript and Presenter's Notes

Title: Approximate Selection Queries over Imprecise Data


1
Approximate Selection Queries over Imprecise Data
  • Iosif Lazaridis and Sharad Mehrotra
  • University of California, Irvine
  • ICDE Conference, March 2004
  • Boston, MA, USA

2
Talk Outline
  • Regular vs. Approximate Selection Queries
  • Quality-Aware Queries (QaQs)
  • Optimization for QaQs
  • Performance Study
  • Conclusions

3
Regular Selection Queries
Set of precise objects T
4
Imprecise Objects
  • An imprecise object o corresponds to a precise
    object ?o which can be retrieved (at cost) via a
    probe operation

5
Approximate Selection Queries
Set of imprecise objects T
6
Formal Problem Setting
  • Let T be a set of imprecise objects
  • Let ? be a selection predicate which maps an
    imprecise object to set YES, NO, MAYBE
  • The exact set is
  • E ?o o?T ?? ?(?o)YES
  • The goal is to produce an approximate answer A
    with associated quality guarantees
  • A will potentially contain both precise and
    imprecise objects

7
Prescriptive vs. Diagnostic Quality Measures
Space of all possible approximate answers to a
query q
8
Quality Metrics
  • Set-based quality
  • Precision fraction of objects in A that are also
    in E
  • p A?? E / A
  • Recall fraction of objects in E that are also in
    A
  • r A?? E / E
  • Value-Based Quality
  • Each imprecise object o has laxity l(o)
  • Each precise object ?o has laxity 0
  • Answer Laxity
  • lmax maxx?Al(x)

9
Quality Guarantees
Laxity Guarantee is lmax maxx?Al(x)
10
Quality-Aware Query (QaQ)
  • Input consists of
  • Set T
  • Predicate ?
  • Quality Requirements pq, rq, lqmax
  • Answer A should be such that
  • pG? ? pq, rG ? rq and lqmax ? lmax

11
QaQ Selection Operator
  • Requires O(1) memory/processing per input object
  • Each object o is read, and ?(o) is evaluated
  • Three choices for each object o
  • Forward it to A
  • Ignore it
  • Probe it, get ?o then Forward or Ignore ?o

12
Handling Objects
  • Ignore NO objects
  • YES objects
  • If l(o) gt lqmax Probe or Ignore
  • Else Forward
  • MAYBE objects
  • If l(o) gt lqmax Probe or Ignore
  • Else all three choices are feasible

13
Ensuring Correctness
  • No object with laxity l(o)gtlqmax may be forwarded
  • The precision guarantee pG may not be lower than
    pq
  • If no other YES objects remain to be seen, then
    pq will be violated
  • If A??Y / (Y Ms-A) lt rq then an object o
    cannot be ignored
  • If no other YES objects remain to be seen, then
    rq will be violated

14
QaQ Evaluation Cost
  • R number of objects read (R?? T)
  • Y, M number of objects that were YES/MAYBE at
    the input
  • Yf, Yp number of YES objects that are
    forwarded/probed (YfYp ? Y)
  • Mf, Mp number of MAYBE objects that are
    forwarded/probed (MfMp ? M)
  • Mpy number of probed MAYBE objects that become
    YES
  • Cost W Rcr (YpMp)cp
    (YfMf)cwi(YpMpy)cwp
  • read probe
    write

15
The Map
NO MAYBE YES
Probe with probability ppy or Ignore
l(o)
1
2
3
6
Probe
Ignore
s5
s3
4
5
Forward with probability pfm or Ignore
Probe
s(o)0 0lts(o)lt1 s(o)1
s(o) probability MAYBE?YES
16
Query Optimization
  • Free parameters ppy, s3, s5 , pfm
  • Estimate of YES, NO, MAYBE objects
  • Estimate of YES, MAYBE objects above lqmax
    laxity requirement
  • Requires some knowledge of distribution of l(o)
  • Distribution of s(o)
  • Minimize cost W subject to pq, rq, lqmax
  • 4-parameter optimization problem

17
Query Evaluation
  • Get selectivity estimates
  • Solve optimization problem for ppy, s3, s5 , pfm,
    thus instantiating the Map
  • Read one object at a time, handle it according to
    the Map
  • Make sure correctness criteria are enforced!
  • Finish when rG ? rq

18
Performance Study
  • Size of input T 10,000
  • Laxity ranges in 0,100
  • Probe cost 100 x read/write unit cost.
  • We vary
  • Precision, Recall, Laxity Requirement
  • Query selectivity
  • Input Uncertainty (ratio of YES/MAYBE objects)
  • Costs are normalized by dividing with T

19
Competing Algorithms
  • We devised two simple heuristics
  • STINGY avoids probes it ignores MAYBE objects
    and objects exceeding the lqmax threshold.
  • STINGY is conservative, but sometimes it is
    forced to probe to meet the quality guarantees.
  • GREEDY forwards all MAYBE objects and probes all
    objects that exceed the lqmax threshold.
  • GREEDY tries to produce the result quickly by not
    ignoring objects, but sometimes it uses too many
    probes and forwards too many objects

20
Varying Laxity
  • Input has 20 YES, 20 MAYBE objects
  • 90 Precision and 50 Recall is requested
  • As the laxity requirement becomes looser, the
    cost is reduced since imprecise objects can be
    forwarded without a probe

21
Varying Precision
  • Input has 20 YES, 20 MAYBE objects
  • 50 Recall and laxity50 is requested
  • Cost increases as Precision requirement
    increases, as objects cant be forwarded unprobed

22
Varying Recall
  • Input has 20 YES, 20 MAYBE objects
  • 90 Precision and laxity50 is requested
  • Cost increases as Recall requirement increases
  • When Recall requirement is low, only part of the
    input needs to be read
  • As Recall requirement tends to 100, all the
    input must be read and no objects can be ignored

23
Varying Selectivity
  • Input has 20 YES, 20 MAYBE objects
  • 90 Precision, 50 Recall, and laxity50 is
    requested
  • Cost increases as selectivity increases, since
    more objects need to be output

24
Varying Input Uncertainty
  • Input has 20 YES, 20 MAYBE objects
  • 90 Precision, 50 Recall, and laxity50 is
    requested
  • When MAYBE objects are few, no probe cost needs
    to be paid the few MAYBE objects can be ignored
  • When MAYBE objects are many, they cannot be
    ignored (Recall might be violated), or forwarded
    (Precision violated). Hence, they are probed,
    increasing the cost

25
Conclusions
  • Quality-Aware Queries (QaQs)
  • Query predicate quality requirement
  • Response answer quality guarantee
  • Quality Metrics for Set-Based Answers
  • On-line algorithm for evaluating QaQs
  • Works better than simple heuristics
  • Takes into account input characteristics/user
    requirements
  • Combines data read/write probing cost
  • Future Work
  • Indexes, Joins

26
Thank You!
?????
Write a Comment
User Comments (0)
About PowerShow.com