NRA - PowerPoint PPT Presentation

1 / 8
About This Presentation
Title:

NRA

Description:

Joint work with Holger Bast, Ralf Schenkel, Martin Theobald, Gerhard Weikum ... with Probabilistic Guarantees Martin Theobald, Gerhard Weikum, Ralf Schenkel ... – PowerPoint PPT presentation

Number of Views:34
Avg rating:3.0/5.0
Slides: 9
Provided by: DBX
Learn more at: http://crystal.uta.edu
Category:
Tags: nra | ralf

less

Transcript and Presenter's Notes

Title: NRA


1
NRA
  • Top k query processing using Non Random Access
  • Only sequential access
  • Algorithm
  • scan index lists in parallel
  • consider dj at position posi in Li
  • E(dj) E(dj) ? i highi si(q,dj)
  • bestscore(dj) aggrx1, ..., xm)
  • with xi si(q,dj) for i ? E(dj), highi for i
    ? E(dj)
  • worstscore(dj) aggrx1, ..., xm)
  • with xi si(q,dj) for i ? E(dj), 0 for i ?
    E(dj)
  • top-k k docs with largest worstscore
  • threshold bestscored d not in top-k
  • if min worstscore top-k threshold then exit

2
NRA
0.60.60.92.1
List 1 List 2 List 3
Candidates
item 83 0.9, 2.1
item 17 0.6, 2.1
item 25 0.6, 2.1
item 25 0.6 item 17 0.6 item 83 0.9
item 78 0.5 item 38 0.6 item 17 0.7
item 83 0.4 item 14 0.6 item 61 0.3
item 17 0.3 item 5 0.6 item 81 0.2
item 21 0.2 item 83 0.5 item 65 0.1
item 91 0.1 item 21 0.3 item 10 0.1
item 44 0.1
Min top-2 score 0.6 Threshold (Max of unseen
tuples) 2.1 Pruning Candidates Min top-2 lt
best score of candidate Stopping
Condition Threshold lt min top-2 ?
3
NRA
List 1 List 2 List 3
Candidates
item 25 0.6 item 17 0.6 item 83 0.9
item 78 0.5 item 38 0.6 item 17 0.7
item 83 0.4 item 14 0.6 item 61 0.3
item 17 0.3 item 5 0.6 item 81 0.2
item 21 0.2 item 83 0.5 item 65 0.1
item 91 0.1 item 21 0.3 item 10 0.1
item 44 0.1
item 17 1.3, 1.8
item 83 0.9, 2.0
item 25 0.6, 1.9
item 38 0.6, 1.8
item 78 0.5, 1.8
Min top-2 score 0.9 Threshold (Max of unseen
tuples) 1.8 Pruning Candidates Min top-2 lt
best score of candidate Stopping
Condition Threshold lt min top-2 ?
4
NRA
List 1 List 2 List 3
Candidates
item 25 0.6 item 17 0.6 item 83 0.9
item 78 0.5 item 38 0.6 item 17 0.7
item 83 0.4 item 14 0.6 item 61 0.3
item 17 0.3 item 5 0.6 item 81 0.2
item 21 0.2 item 83 0.5 item 65 0.1
item 91 0.1 item 21 0.3 item 10 0.1
item 44 0.1
item 83 1.3, 1.9
item 17 1.3, 1.9
item 25 0.6, 1.5
item 78 0.5, 1.4
Min top-2 score 1.3 Threshold (Max of unseen
tuples) 1.3 Pruning Candidates Min top-2 lt
best score of candidate Stopping
Condition Threshold lt min top-2 ?
no more new items can get into top-2 but, extra
candidates left in queue
5
NRA
List 1 List 2 List 3
Candidates
item 25 0.6 item 17 0.6 item 83 0.9
item 78 0.5 item 38 0.6 item 17 0.7
item 83 0.4 item 14 0.6 item 61 0.3
item 17 0.3 item 5 0.6 item 81 0.2
item 21 0.2 item 83 0.5 item 65 0.1
item 91 0.1 item 21 0.3 item 10 0.1
item 44 0.1
item 17 1.6
item 83 1.3, 1.9
item 25 0.6, 1.4
Min top-2 score 1.3 Threshold (Max of unseen
tuples) 1.1 Pruning Candidates Min top-2 lt
best score of candidate Stopping
Condition Threshold lt min top-2 ?
no more new items can get into top-2 but, extra
candidates left in queue
6
NRA
List 1 List 2 List 3
Candidates
item 25 0.6 item 17 0.6 item 83 0.9
item 78 0.5 item 38 0.6 item 17 0.7
item 83 0.4 item 14 0.6 item 61 0.3
item 17 0.3 item 5 0.6 item 81 0.2
item 21 0.2 item 83 0.5 item 65 0.1
item 91 0.1 item 21 0.3 item 10 0.1
item 44 0.1
item 83 1.8
item 17 1.6
Min top-2 score 1.6 Threshold (Max of unseen
tuples) 0.8 Pruning Candidates Min top-2 lt
best score of candidate
7
NRA
  • NRA performs only sorted accesses (SA) (No Random
    Access)
  • Random access (RA)
  • lookup actual (final) score of an item
  • costlier than SA (100 100,000 times), cR/cS
    (cost of RA)/(cost of SA)
  • often very useful
  • CA (Combined Algorithm), (Fagin et al., 2001)
  • one RA after every cR/cS SAs
  • total cost of SA total cost of RA
  • Measure of effectiveness (access cost) SA
    cR/cS x RA
  • Full-merge compute scores for all items followed
    by partial sort
  • simple and efficient
  • important baseline for any top-k algorithm
  • Problems with NRA, CA
  • high bookkeeping overhead
  • for high values of k, gain in even access cost
    not significant

8
References
  • IO-Top-k Index-access Optimized Top-k Query
    Processing
  • Debapriyo Majumdar Max-Planck-Institut für
    Informatik Saarbrücken, Germany
  • Joint work with Holger Bast, Ralf Schenkel,
    Martin Theobald, Gerhard Weikum
  • Top-k Query Evaluation with Probabilistic
    Guarantees Martin Theobald, Gerhard Weikum, Ralf
    Schenkel
Write a Comment
User Comments (0)
About PowerShow.com