Ranking in DB - PowerPoint PPT Presentation

About This Presentation
Title:

Ranking in DB

Description:

Ranking in DB Laks V.S. Lakshmanan Depf. of CS UBC – PowerPoint PPT presentation

Number of Views:27
Avg rating:3.0/5.0
Slides: 51
Provided by: LaksLak9
Category:
Tags: course | ranking

less

Transcript and Presenter's Notes

Title: Ranking in DB


1
Ranking in DB
  • Laks V.S. Lakshmanan
  • Depf. of CS
  • UBC

2
Why ranking in query answering? 1/3
  • Mutimedia data fuzzy querying e.g., find top
    2 red objects with a soft texture.

Obj Score
D 0.85
B 0.80
A 0.75
E 0.65
C 0.60
Obj Score
A 0.9
D 0.8
C 0.4
B 0.3
E 0.1
Overall score
Combine scores
3
Why ranking? 2/3
  • IR find top 5 documents relevant to
    computational, neuroscience and brain
    theory.
  • IR systems maintain full text indexes inverted
    lists of docs w.r.t. each keyword.
  • Same Q/A paradigm as before.
  • Buying a home several criteria price,
    location, area, BRs, school district. ORDER BY
    query in SQL.
  • Finding hotels while traveling.

4
Why ranking? 3/3
  • Data stream, e.g., of network flow data find 10
    users with the max. BW consumption and max.
    packets communicated. score may be complex
    aggregation of these two measures.
  • In a social net, find 5 items tagged as most
    relevant to lawn mowing and blonging to users
    socially close to the seeker.
  • And now, find top-k recs (recommender systems).
  • etc.
  • Fagin et al. pioneering papers PODS96, 01,
    JCSS 2003. Burgeoned into a field now.
  • Focus on middleware algorithm, which given a
    score combo. function, computes top-k answers by
    probing diff. subsystems (or ranked lists).

5
Computational model
  • Naïve method.
  • How to compute top-K efficiently?
  • Access methods
  • Sorted access (sequential access) SA.
  • Random access RA.
  • Diff. optimization metrics
  • Overall running time of algorithm.
  • SA lt RA minimize RAs.
  • RA not possible? avoid RAs.
  • Combined optimization.
  • Has led to a variety of algorithms.
  • Memory vs. disk model.
  • For the most part, assume score agg. is a
    monotone function use SUM in examples.

typical in IR systems.
6
Fagins Algorithm (FA)
  • m lists sorted by descending scores.
  • Access (SA) all lists in parallel.
  • For each new object seen, fetch scores from other
    lists by RA. Overall score t(x) t(x1, , xm).
    Store (obj, score) in set Y.
  • Remember each object seen (under SA) in all lists
    in set H.
  • Repeat until H gt K.
  • Sort Y in descending order of scores, breaking
    ties arbitrarily, and output top K.

7
Example of FA
Answers seen in gt1 list, i.e., Y unsorted.
L1 L2 L3 L4
H(0.95)


C(0.80






A B C D E F G H I J
C(0.95)
E(1.00)
J(1.00)
B(0.90)
C(0.95)
J(0.80)
G(0.95)
D(0.70)
E(0.85)
G(0.85)
H(0.90)
H(0.80)
H(0.65)
B(0.85)
E(0.75)
G(0.75)
G(0.60)
D(0.80)
B(0.55)
C(0.70)
I(0.70)
B(0.75)
D(0.65)
F(0.60)
I(0.50)
A(0.65)
E(0.45)
I(0.55)
A(0.60)
A(0.50)
D(0.40)
J(0.55)
F(0.40)
F(0.45)
Answers seen (under SA) in all 4 lists, i.e., H.
A(0.30)
J(0.30)
F(0.50)
I(0.30)
8
Example of FA
Answers seen in gt1 list, i.e., Y unsorted.
L1 L2 L3 L4
H(0.95)


C(0.80






A B C D E F G H I J
C(0.95)
E(1.00)
J(1.00)
B(0.90)
C(0.95)
J(0.80)
G(0.95)
D(0.70)
E(0.85)
G(0.85)
H(0.90)
H(0.80)
H(0.65)
B(0.85)
E(0.75)
G(0.75)
G(0.60)
D(0.80)
B(0.55)
C(0.70)
I(0.70)
B(0.75)
D(0.65)
F(0.60)
I(0.50)
A(0.65)
E(0.45)
I(0.55)
A(0.60)
A(0.50)
D(0.40)
J(0.55)
F(0.40)
F(0.45)
Answers seen (under SA) in all 4 lists, i.e., H.
A(0.30)
J(0.30)
F(0.50)
I(0.30)
9
Example of FA
Answers seen in gt1 list, i.e., Y unsorted.
L1 L2 L3 L4
H(0.95)


C(0.80






A B C D E F G H I J
C(0.95)
E(1.00)
J(1.00)
B(0.90)
C(0.95)
J(0.80)
G(0.95)
D(0.70)
E(0.85)
G(0.85)
H(0.90)
H(0.80)
H(0.65)
B(0.85)
E(0.75)
G(0.75)
G(0.60)
D(0.80)
3.30
B(0.55)
C(0.70)
I(0.70)
B(0.75)
D(0.65)
F(0.60)
I(0.50)
A(0.65)
E(0.45)
I(0.55)
A(0.60)
A(0.50)
D(0.40)
J(0.55)
F(0.40)
F(0.45)
Answers seen (under SA) in all 4 lists, i.e., H.
A(0.30)
J(0.30)
F(0.50)
I(0.30)
10
Example of FA
Answers seen in gt1 list, i.e., Y unsorted.
L1 L2 L3 L4
H(0.95)


C(0.80






A B C D E F G H I J
C(0.95)
E(1.00)
J(1.00)
B(0.90)
C(0.95)
J(0.80)
G(0.95)
D(0.70)
E(0.85)
G(0.85)
H(0.90)
H(0.80)
H(0.65)
B(0.85)
E(0.75)
G(0.75)
G(0.60)
D(0.80)
3.30
B(0.55)
C(0.70)
I(0.70)
B(0.75)
D(0.65)
F(0.60)
I(0.50)
A(0.65)
2.65
E(0.45)
I(0.55)
A(0.60)
A(0.50)
D(0.40)
J(0.55)
F(0.40)
F(0.45)
Answers seen (under SA) in all 4 lists, i.e., H.
A(0.30)
J(0.30)
F(0.50)
I(0.30)
11
Example of FA
Answers seen in gt1 list, i.e., Y unsorted.
L1 L2 L3 L4
H(0.95)


C(0.80






A B C D E F G H I J
C(0.95)
E(1.00)
J(1.00)
3.40
B(0.90)
C(0.95)
J(0.80)
G(0.95)
D(0.70)
E(0.85)
G(0.85)
H(0.90)
3.05
H(0.80)
H(0.65)
B(0.85)
E(0.75)
G(0.75)
G(0.60)
D(0.80)
3.30
B(0.55)
C(0.70)
I(0.70)
B(0.75)
D(0.65)
F(0.60)
I(0.50)
A(0.65)
2.65
E(0.45)
I(0.55)
A(0.60)
A(0.50)
D(0.40)
J(0.55)
F(0.40)
F(0.45)
Answers seen (under SA) in all 4 lists, i.e., H.
A(0.30)
J(0.30)
F(0.50)
I(0.30)
12
Example of FA
Answers seen in gt1 list, i.e., Y unsorted.
L1 L2 L3 L4
H(0.95)


C(0.80






A B C D E F G H I J
3.05
C(0.95)
E(1.00)
J(1.00)
3.40
B(0.90)
C(0.95)
J(0.80)
G(0.95)
D(0.70)
E(0.85)
G(0.85)
H(0.90)
3.05
H(0.80)
H(0.65)
B(0.85)
E(0.75)
G(0.75)
G(0.60)
D(0.80)
3.15
3.30
B(0.55)
C(0.70)
I(0.70)
B(0.75)
D(0.65)
F(0.60)
I(0.50)
A(0.65)
2.65
E(0.45)
I(0.55)
A(0.60)
A(0.50)
D(0.40)
J(0.55)
F(0.40)
F(0.45)
Answers seen (under SA) in all 4 lists, i.e., H.
A(0.30)
J(0.30)
F(0.50)
I(0.30)
13
Example of FA
Answers seen in gt1 list, i.e., Y unsorted.
L1 L2 L3 L4
H(0.95)


C(0.80






A B C D E F G H I J
3.05
C(0.95)
E(1.00)
J(1.00)
3.40
B(0.90)
C(0.95)
J(0.80)
G(0.95)
2.55
D(0.70)
E(0.85)
G(0.85)
H(0.90)
3.05
H(0.80)
H(0.65)
B(0.85)
E(0.75)
G(0.75)
G(0.60)
D(0.80)
3.15
3.30
B(0.55)
C(0.70)
I(0.70)
B(0.75)
D(0.65)
F(0.60)
I(0.50)
A(0.65)
2.65
E(0.45)
I(0.55)
A(0.60)
A(0.50)
D(0.40)
J(0.55)
F(0.40)
F(0.45)
Answers seen (under SA) in all 4 lists, i.e., H.
A(0.30)
J(0.30)
F(0.50)
I(0.30)
14
Example of FA
Answers seen in gt1 list, i.e., Y unsorted.
L1 L2 L3 L4
H(0.95)


C(0.80






A B C D E F G H I J
3.05
C(0.95)
E(1.00)
J(1.00)
3.40
B(0.90)
C(0.95)
J(0.80)
G(0.95)
2.55
D(0.70)
E(0.85)
G(0.85)
H(0.90)
3.05
H(0.80)
H(0.65)
B(0.85)
E(0.75)
G(0.75)
G(0.60)
D(0.80)
3.15
3.30
B(0.55)
C(0.70)
I(0.70)
B(0.75)
D(0.65)
F(0.60)
I(0.50)
A(0.65)
2.65
E(0.45)
I(0.55)
A(0.60)
A(0.50)
D(0.40)
J(0.55)
F(0.40)
F(0.45)
Answers seen (under SA) in all 4 lists, i.e., H.
A(0.30)
J(0.30)
F(0.50)
I(0.30)
H
15
Example of FA
Answers seen in gt1 list, i.e., Y unsorted.
L1 L2 L3 L4
H(0.95)


C(0.80






A B C D E F G H I J
3.05
C(0.95)
E(1.00)
J(1.00)
3.40
B(0.90)
C(0.95)
J(0.80)
G(0.95)
2.55
D(0.70)
E(0.85)
G(0.85)
H(0.90)
3.05
H(0.80)
H(0.65)
B(0.85)
E(0.75)
G(0.75)
G(0.60)
D(0.80)
3.15
3.30
B(0.55)
C(0.70)
I(0.70)
B(0.75)
D(0.65)
F(0.60)
I(0.50)
A(0.65)
2.65
E(0.45)
I(0.55)
A(0.60)
A(0.50)
D(0.40)
J(0.55)
F(0.40)
F(0.45)
Answers seen (under SA) in all 4 lists, i.e., H.
A(0.30)
J(0.30)
F(0.50)
I(0.30)
H, G
16
Example of FA
Answers seen in gt1 list, i.e., Y unsorted.
L1 L2 L3 L4
H(0.95)


C(0.80






A B C D E F G H I J
3.05
C(0.95)
E(1.00)
J(1.00)
3.40
B(0.90)
C(0.95)
J(0.80)
G(0.95)
2.55
D(0.70)
E(0.85)
G(0.85)
H(0.90)
3.05
H(0.80)
H(0.65)
B(0.85)
E(0.75)
G(0.75)
G(0.60)
D(0.80)
3.15
3.30
B(0.55)
C(0.70)
I(0.70)
B(0.75)
2.05
D(0.65)
F(0.60)
I(0.50)
A(0.65)
2.65
E(0.45)
I(0.55)
A(0.60)
A(0.50)
D(0.40)
J(0.55)
F(0.40)
F(0.45)
Answers seen (under SA) in all 4 lists, i.e., H.
A(0.30)
J(0.30)
F(0.50)
I(0.30)
H, G, B, C
H 4.
17
FA Example concluded
  • A, F not seen in any list. Yet, we are sure
    they cant make it to top-4. Why?
  • Based on where the cursors are now, whats the
    max. possible score for A, F?
  • What assumptions are being made about t()?
  • FA is shown to be optimal with very high
    probability Fagin PODS 1996.
  • But can be beaten by other algorithms on specific
    inputs.
  • What about buffer size?

18
Threshold Algorithm
  • Do parallel SA on all m lists.
  • For each object x seen under SA in a list, fetch
    its scores from other lists by RA and compute
    overall score.
  • If Buffer lt K add x to Buffer
  • Else if score(x) lt k-th score in buffer, toss
  • Else replace bottom of buffer with (x, score(x))
    resort.
  • Stop when threshold lt k-th score in buffer.
  • Threshold t(worst score seen on L1, , worst
    score seen on Lm).
  • Output the top-K objects scores (in buffer).

19
TA Example
L1 L2 L3 L4
H(0.95)


C(0.80






A B C D E F G H I J
C(0.95)
E(1.00)
J(1.00)
B(0.90)
C(0.95)
J(0.80)
G(0.95)
D(0.70)
E(0.85)
G(0.85)
H(0.90)
H(0.80)
H(0.65)
B(0.85)
E(0.75)
G(0.75)
G(0.60)
D(0.80)
B(0.55)
C(0.70)
I(0.70)
B(0.75)
D(0.65)
F(0.60)
I(0.50)
A(0.65)
E(0.45)
I(0.55)
A(0.60)
A(0.50)
D(0.40)
J(0.55)
F(0.40)
F(0.45)
A(0.30)
J(0.30)
F(0.50)
I(0.30)
20
TA Example
L1 L2 L3 L4
H(0.95)


C(0.80






A B C D E F G H I J
C(0.95)
E(1.00)
J(1.00)
B(0.90)
C(0.95)
J(0.80)
G(0.95)
D(0.70)
E(0.85)
G(0.85)
H(0.90)
H(0.80)
H(0.65)
B(0.85)
E(0.75)
G(0.75)
G(0.60)
D(0.80)
B(0.55)
C(0.70)
I(0.70)
B(0.75)
D(0.65)
F(0.60)
I(0.50)
A(0.65)
E(0.45)
I(0.55)
A(0.60)
A(0.50)
D(0.40)
J(0.55)
F(0.40)
F(0.45)
A(0.30)
J(0.30)
F(0.50)
I(0.30)
21
TA Example
L1 L2 L3 L4
H(0.95)


C(0.80






A B C D E F G H I J
C(0.95)
E(1.00)
J(1.00)
B(0.90)
C(0.95)
J(0.80)
G(0.95)
D(0.70)
E(0.85)
G(0.85)
H(0.90)
H(0.80)
H(0.65)
B(0.85)
E(0.75)
G(0.75)
G(0.60)
D(0.80)
B(0.55)
C(0.70)
3.30
I(0.70)
B(0.75)
D(0.65)
F(0.60)
I(0.50)
A(0.65)
E(0.45)
I(0.55)
A(0.60)
A(0.50)
Threshold Bar
D(0.40)
J(0.55)
F(0.40)
F(0.45)
A(0.30)
J(0.30)
x1 x2 x3 x4 0.95 1.00 0.95 1.00
F(0.50)
I(0.30)
22
TA Example
L1 L2 L3 L4
H(0.95)


C(0.80






A B C D E F G H I J
C(0.95)
E(1.00)
J(1.00)
3.40
B(0.90)
C(0.95)
J(0.80)
G(0.95)
D(0.70)
E(0.85)
G(0.85)
H(0.90)
3.05
H(0.80)
H(0.65)
B(0.85)
E(0.75)
G(0.75)
G(0.60)
D(0.80)
B(0.55)
C(0.70)
3.30
I(0.70)
B(0.75)
D(0.65)
F(0.60)
I(0.50)
A(0.65)
2.65
E(0.45)
I(0.55)
A(0.60)
A(0.50)
Threshold Bar T 3.90.
D(0.40)
J(0.55)
F(0.40)
F(0.45)
A(0.30)
J(0.30)
x1 x2 x3 x4 0.95 1.00 0.95 1.00
F(0.50)
I(0.30)
23
TA Example
L1 L2 L3 L4
H(0.95)


C(0.80






A B C D E F G H I J
C(0.95)
E(1.00)
3.05 X
J(1.00)
3.40
B(0.90)
C(0.95)
J(0.80)
G(0.95)
D(0.70)
E(0.85)
G(0.85)
H(0.90)
3.05
H(0.80)
H(0.65)
B(0.85)
E(0.75)
G(0.75)
G(0.60)
D(0.80)
3.15
B(0.55)
C(0.70)
3.30
I(0.70)
B(0.75)
D(0.65)
F(0.60)
I(0.50)
A(0.65)
2.65 X
E(0.45)
I(0.55)
A(0.60)
A(0.50)
Threshold Bar T3.60.
D(0.40)
J(0.55)
F(0.40)
F(0.45)
A(0.30)
J(0.30)
x1 x2 x3 x4 0.90 0.95 0.80 0.95
F(0.50)
I(0.30)
24
TA Example
L1 L2 L3 L4
H(0.95)


C(0.80






A B C D E F G H I J
C(0.95)
E(1.00)
3.05 X
J(1.00)
3.40
B(0.90)
C(0.95)
J(0.80)
G(0.95)
2.55 X
D(0.70)
E(0.85)
G(0.85)
H(0.90)
3.05
H(0.80)
H(0.65)
B(0.85)
E(0.75)
G(0.75)
G(0.60)
D(0.80)
3.15
B(0.55)
C(0.70)
3.30
I(0.70)
B(0.75)
D(0.65)
F(0.60)
I(0.50)
A(0.65)
2.65 X
E(0.45)
I(0.55)
A(0.60)
A(0.50)
Threshold Bar T3.30.
D(0.40)
J(0.55)
F(0.40)
F(0.45)
A(0.30)
J(0.30)
x1 x2 x3 x4 0.85 0.85 0.70 0.90
F(0.50)
I(0.30)
25
TA Example
L1 L2 L3 L4
H(0.95)


C(0.80






A B C D E F G H I J
C(0.95)
E(1.00)
3.05 X
J(1.00)
3.40
B(0.90)
C(0.95)
J(0.80)
G(0.95)
2.55 X
D(0.70)
E(0.85)
G(0.85)
H(0.90)
3.05
H(0.80)
H(0.65)
B(0.85)
E(0.75)
G(0.75)
G(0.60)
D(0.80)
3.15
B(0.55)
C(0.70)
3.30
I(0.70)
B(0.75)
D(0.65)
F(0.60)
I(0.50)
A(0.65)
2.65 X
E(0.45)
I(0.55)
A(0.60)
A(0.50)
Threshold Bar T3.10.
D(0.40)
J(0.55)
F(0.40)
F(0.45)
A(0.30)
J(0.30)
x1 x2 x3 x4 0.80 0.80 0.65 0.85
F(0.50)
I(0.30)
26
TA Example
L1 L2 L3 L4
H(0.95)


C(0.80






A B C D E F G H I J
C(0.95)
E(1.00)
3.05 X
J(1.00)
3.40
B(0.90)
C(0.95)
J(0.80)
G(0.95)
2.55 X
D(0.70)
E(0.85)
G(0.85)
H(0.90)
3.05
H(0.80)
H(0.65)
B(0.85)
E(0.75)
G(0.75)
G(0.60)
D(0.80)
3.15
B(0.55)
C(0.70)
3.30
I(0.70)
B(0.75)
D(0.65)
F(0.60)
I(0.50)
A(0.65)
2.65 X
E(0.45)
I(0.55)
A(0.60)
A(0.50)
Threshold Bar T2.90. gt can stop!
D(0.40)
J(0.55)
F(0.40)
F(0.45)
A(0.30)
J(0.30)
x1 x2 x3 x4 0.75 0.75 0.60 0.80
F(0.50)
I(0.30)
27
TA Remarks
  •  

28
TA is Instance Optimal
  •  

 
 
 
 
 
29
TA IO Proof (contd.)
  •  

30
Proof (contd.)
  •  

31
Proof (contd.)
  •  

 
 
 
 
 
 
 
 
 
 
32
Proof (contd.)
  •  

33
Proof (concluded)
  •  

34
No Random Access Algorithm
  • What if RA gt SA or RA wasnt allowed?
  • Do SA on all lists in parallel. At depth d
  • Maintain worst scores x1, , xm.
  • x any object seen in lists 1, , i.
  • Best(x) t(x1, , xi, xi1, , xm).
  • Worst(x) t(x1, , xi, 0, , 0).
  • TopK contains K objects with max worst scores at
    depth d. Break ties using Best. M k-th Worst
    score in TopK.
  • Object y is viable if Best(y) gt M.
  • Stop when TopK contains gtK distinct objects and
    no object outside TopK is viable. Return TopK.

35
NRA Example
L1 L2 L3 L4
H(0.95)


C(0.80






A B C D E F G H I J
C(0.95)
E(1.00)
J(1.00)
0.95, 3.90
B(0.90)
C(0.95)
J(0.80)
G(0.95)
D(0.70)
E(0.85)
G(0.85)
H(0.90)
1.00, 3.90
H(0.80)
H(0.65)
B(0.85)
E(0.75)
G(0.75)
G(0.60)
D(0.80)
0.95, 3.90
B(0.55)
C(0.70)
I(0.70)
B(0.75)
D(0.65)
F(0.60)
I(0.50)
A(0.65)
1.00, 3.90
E(0.45)
I(0.55)
A(0.60)
A(0.50)
D(0.40)
J(0.55)
F(0.40)
F(0.45)
A(0.30)
J(0.30)
F(0.50)
I(0.30)
36
NRA Example
L1 L2 L3 L4
H(0.95)


C(0.80






A B C D E F G H I J
C(0.95)
E(1.00)
0.90, 3.60
J(1.00)
1.90, 3.75
B(0.90)
C(0.95)
J(0.80)
G(0.95)
D(0.70)
E(0.85)
G(0.85)
H(0.90)
1.00, 3.65
H(0.80)
H(0.65)
B(0.85)
0.95, 3.60
E(0.75)
G(0.75)
G(0.60)
D(0.80)
0.95, 3.65
B(0.55)
C(0.70)
I(0.70)
B(0.75)
D(0.65)
F(0.60)
I(0.50)
A(0.65)
1.80, 3.65
E(0.45)
I(0.55)
A(0.60)
A(0.50)
D(0.40)
J(0.55)
F(0.40)
F(0.45)
A(0.30)
J(0.30)
F(0.50)
I(0.30)
37
NRA Example
L1 L2 L3 L4
H(0.95)


C(0.80






A B C D E F G H I J
C(0.95)
E(1.00)
0.90, 3.35
J(1.00)
1.90, 3.65
B(0.90)
C(0.95)
J(0.80)
G(0.95)
0.70, 3.30
D(0.70)
E(0.85)
G(0.85)
H(0.90)
1.85, 3.40
H(0.80)
H(0.65)
B(0.85)
1.80, 3.35
E(0.75)
G(0.75)
G(0.60)
D(0.80)
1.85, 3.40
B(0.55)
C(0.70)
I(0.70)
B(0.75)
D(0.65)
F(0.60)
I(0.50)
A(0.65)
1.80, 3.55
E(0.45)
I(0.55)
A(0.60)
A(0.50)
D(0.40)
J(0.55)
F(0.40)
F(0.45)
A(0.30)
J(0.30)
F(0.50)
I(0.30)
38
NRA Example
L1 L2 L3 L4
H(0.95)


C(0.80






A B C D E F G H I J
C(0.95)
E(1.00)
1.75, 3.20
J(1.00)
2.70, 3.55
B(0.90)
C(0.95)
J(0.80)
G(0.95)
0.70, 3.15
D(0.70)
E(0.85)
G(0.85)
H(0.90)
1.85, 3.30
H(0.80)
H(0.65)
B(0.85)
1.80, 3.25
E(0.75)
G(0.75)
G(0.60)
D(0.80)
3.30, 3.30
B(0.55)
C(0.70)
I(0.70)
B(0.75)
D(0.65)
F(0.60)
I(0.50)
A(0.65)
1.80, 3.45
E(0.45)
I(0.55)
A(0.60)
A(0.50)
D(0.40)
J(0.55)
F(0.40)
F(0.45)
A(0.30)
J(0.30)
F(0.50)
I(0.30)
39
NRA Example
L1 L2 L3 L4
H(0.95)


C(0.80






A B C D E F G H I J
C(0.95)
E(1.00)
1.75, 3.10
J(1.00)
2.70, 3.50
B(0.90)
C(0.95)
J(0.80)
G(0.95)
1.50, 3.00
D(0.70)
E(0.85)
G(0.85)
H(0.90)
2.60, 3.20
H(0.80)
H(0.65)
B(0.85)
3.15, 3.15
E(0.75)
G(0.75)
G(0.60)
D(0.80)
3.30, 3.30
B(0.55)
C(0.70)
I(0.70)
B(0.75)
D(0.65)
F(0.60)
I(0.50)
A(0.65)
1.80, 3.35
E(0.45)
I(0.55)
A(0.60)
A(0.50)
D(0.40)
J(0.55)
F(0.40)
F(0.45)
A(0.30)
J(0.30)
F(0.50)
I(0.30)
40
NRA Example
L1 L2 L3 L4
H(0.95)


C(0.80






A B C D E F G H I J
C(0.95)
E(1.00)
3.05, 3.05
J(1.00)
3.40, 3.40
B(0.90)
C(0.95)
J(0.80)
G(0.95)
1.50, 2.95
D(0.70)
E(0.85)
G(0.85)
H(0.90)
2.60, 3.15
H(0.80)
H(0.65)
B(0.85)
3.15, 3.15
E(0.75)
G(0.75)
G(0.60)
D(0.80)
3.30, 3.30
B(0.55)
C(0.70)
I(0.70)
B(0.75)
0.70, 2.70
D(0.65)
F(0.60)
I(0.50)
A(0.65)
1.80, 3.20
E(0.45)
I(0.55)
A(0.60)
A(0.50)
D(0.40)
J(0.55)
F(0.40)
F(0.45)
A(0.30)
J(0.30)
F(0.50)
I(0.30)
41
NRA Features
  • What sort of t() do we need to assume, for NRA to
    work correctly?
  • How large can the buffers get?
  • How does the amount of bookkeeping compare with
    TA?
  • NRA is instance optimal over algos not making RA
    (and of course, not making wild guesses).

42
Combined optimization
  • What if we are told cost(RA) ??.cost(SA)?
  • Can we find algos better than NRA and TA in this
    case?
  • Combined algorithm CA. (See Fagin et al.s
    paper for details.)

43
Worrying about I/O cost
  • Based on Bast et al. VLDB 2006.
  • Inverted lists of (itemID, score) entries in
    desc. score order, as usual, but on disk.
  • Blocks sorted by itemID across blocks still in
    desc. score order.
  • ? Inverted Block Index (IBI) Algorithm.
  • What is an IBI?

44
A Motivating Example
  • List 1 List 2
    List 3
  • Doc17 0.8 Doc25 0.7 Doc83
    0.9
  • Doc78 0.2 Doc38 0.5 Doc17
    0.7
  • . Doc14 0.5
    Doc61 0.3
  • Doc83 0.5

  • Doc17 0.2

  • Round 1 (SA on 1,2,3)
  • Doc17 0.8 , 2.4
  • Doc25 0.7 , 2.4
  • Doc83 0.9 , 2.4
  • unseen 2.4

45
A Motivating Example
List 1 List 2
List 3 Doc17 0.8 Doc25 0.7
Doc83 0.9 Doc78 0.2 Doc38 0.5
Doc17 0.7 .
Doc14 0.5 Doc61 0.3
Doc83 0.5

Doc17
0.2

Round 2 (SA on 1,2,3) Doc17 1.5 , 2.0 Doc25
0.7 , 1.6 Doc83 0.9 , 1.6 unseen 1.4
Round 1 (SA on 1,2,3) Doc17 0.8 , 2.4 Doc25
0.7 , 2.4 Doc83 0.9 , 2.4 unseen 2.4
46
A Motivating Example
List 1 List 2
List 3 Doc17 0.8 Doc25 0.7
Doc83 0.9 Doc78 0.2 Doc38 0.5
Doc17 0.7 .
Doc14 0.5 Doc61 0.3
Doc83 0.5

Doc17
0.2

Round 2 (SA on 1,2,3) Doc17 1.5 , 2.0 Doc25
0.7 , 1.6 Doc83 0.9 , 1.6 unseen 1.4
Round 3 (SA on 2,2,3!) Doc17 1.5 , 2.0 Doc83
1.4 , 1.6 unseen 1.0
Round 1 (SA on 1,2,3) Doc17 0.8 , 2.4 Doc25
0.7 , 2.4 Doc83 0.9 , 2.4 unseen 2.4
47
A Motivating Example
List 1 List 2
List 3 Doc17 0.8 Doc25 0.7
Doc83 0.9 Doc78 0.2 Doc38 0.5
Doc17 0.7 .
Doc14 0.5 Doc61 0.3
Doc83 0.5

Doc17
0.2

Round 2 (SA on 1,2,3) Doc17 1.5 , 2.0 Doc25
0.7 , 1.6 Doc83 0.9 , 1.6 unseen 1.4
Round 1 (SA on 1,2,3) Doc17 0.8 , 2.4 Doc25
0.7 , 2.4 Doc83 0.9 , 2.4 unseen 2.4
Round 3 (SA on 2,2,3!) Doc17 1.5 , 2.0 Doc83
1.4 , 1.6 unseen 1.0
Note deviation from round-robin.
Round 4 (RA for Doc17) Doc17 1.7 all others lt
1.7 done!
48
IBI Algorithm
  • Same setting as NRA/CA, except use IBI.
  • Maintain two lists Top-K items (T d1, , dk)
    and StillHaveASHot (SHASH) (S dk1, , dkq)
    items.
  • Pos_i curr cursor position on list Li.
  • high_i score in Li at curr cursor position
    (upper bounds score of unseen items).
  • For items d in S
  • Which attr scores are known E(d).
  • Which attr scores are unknown E(d).
  • Worst(d) total score from E(d).
  • Best(d) Worst(d) ?? high_i(d) i ?E(d).
  • (Exactly as Fagin.)

49
IBI Algorithm (contd.)
  • In each round, compute
  • min-k minWorst(d) d ? T.
  • bestscore that any unseen doc can have sum of
    all high_is.
  • For dj ? S def_j min-k worst(d_j). denotes
    deficit below qualification level for top-k.
  • T sorted in desc. Worst() S sorted in desc.
    Best(). sorting on (score, ItemID) for fast
    processing.
  • Invatiant min-k gt maxWorst(d) d ? S.
  • Termination when min-k gt maxBest(d) d ? S.
  • Can remove an obj from S whenever its Best lt
    min-k. ? stop when S .
  • Early termination AND minimal bookkeeping are
    BOTH important for performance.

50
More on IBI Framework
  • Instead of scheduling SAs using RR, use a
    differential approach for diff. lists based on
    expected score reductions at future cursor
    positions (Knapsack).
  • Do SARA.
  • Order RAs based on estimated Probdj can get into
    top-k answers.
Write a Comment
User Comments (0)
About PowerShow.com