Boolean Ranking: Querying a Database by K-Constrained Optimization - PowerPoint PPT Presentation

About This Presentation
Title:

Boolean Ranking: Querying a Database by K-Constrained Optimization

Description:

Zhen Zhang Seung-won Hwang Kevin C. Chang Min Wang Christian A. Lang Yuan-chi Chang Presented ACM SIGMOD Conference (SIGMOD 2006), Chicago, June 2006 – PowerPoint PPT presentation

Number of Views:42
Avg rating:3.0/5.0
Slides: 46
Provided by: OfficeofI77
Learn more at: https://crystal.uta.edu
Category:

less

Transcript and Presenter's Notes

Title: Boolean Ranking: Querying a Database by K-Constrained Optimization


1
Boolean Ranking Querying a Database by
K-Constrained Optimization
  • Zhen Zhang
  • Seung-won Hwang
  • Kevin C. Chang
  • Min Wang
  • Christian A. Lang
  • Yuan-chi Chang
  • Presented ACM SIGMOD Conference (SIGMOD 2006),
    Chicago, June 2006

Presented By Pavan Kumar M.K. (1000618890)
Aditya Mangipudi (1000649172)
2
Outline
  • Introduction
  • Motivation
  • A Search Algorithm
  • A-Driven State Space Construction
  • Optimization Driven Configuration
  • OPT Search Algorithm
  • Experiments
  • Conclusion

3
Motivation
  • The wide spread of databases for managing
    structured data, compounded with the expanded
    reach of the Internet, has brought forward
    interesting data retrieval and analysis scenarios
    to RDBMS
  • Only the Top-K results are of interest to the
    user.

4
K-Constrained Optimization Query
QUERY Select the Top-5 2nd year students in CSE
with highest GPA
Boolean query dept CSE and year 2
Qualifying constraint
Find top answers

B dept CSE and year 2
O GPA
Ranking query Top 5 ranked by GPA
Quantifying function
5
K-Constrained Optimization Query
  • Query Q (G, k)
  • G - Goal Function
  • G B . O
  • k Retrieval Size

6
What is the query evaluation mechanism?
Ranking query
Boolean query

How to answer?
7
Current techniques lack of global search mechanism
  • If evaluated as separate operators
  • If search by an overall goal function G as a
    ranking function

Boolean query B
Ranking query R
Boolean query B
Ranking query R
  • Current techniques optimize only
    condition-by-condition

8
Threshold Algorithm
Att 1 Att 2









9
Assumptions
  • Threshold Algorithm essentially relies on a rigid
    assumption that G functions are Monotonic.
  • The monotonicity requires G to be decreasing if
    all its parameters are decreasing.

10
Non-Monotonic Functions
  • Consider the example query as below to find
    houses in a certain price range with good
    price/sqrft ratio
  • The function G here in Non-Monotonic.

Select h.address from House h, Where h.price
200k ? h.price 400k Order by h.size/h.price-300
k
11
New Algorithm
Att 1 Att 2









12
Need for encoding as a search problem
  • Existing algorithms build upon their
    problem-specific assumptions on the goal
    functions or index traversals.
  • For example, Threshold Algorithm assumes the
    monotonicity of G and the use of sorted accesses
    (interleaf navigation), based on which the search
    is implicitly hardwired.
  • In a Boolean Query like B price gt 100K, such a
    search is straightforward as the constraint
    expressions B explicitly suggests how to carry
    out a focused search, eg., visiting only the
    nodes with locality potentially satisfying B.

13
Need for encoding as a search problem
  • In contrast, for a general k-constrained
    optimization query potentially involving
    arbitrary ranking combined with Boolean
    conditions and joining multiple relations, eg.. Q
    maximizing size/price ratio, it is no longer
    clear how to focus the search.
  • By encoding into a generic search with no
    assumptions on G, the search is generalized to
    support arbitrary G over potentially multiple
    indices and a combination of both hierarchical
    and interleaf traversals.

14
A Algorithm
  • A is a well known search algorithm that finds
    the Shortest Path, given an initial and a
    designated goal state.
  • Widely used in the field of Artificial
    Intelligence.
  • Uses Best-First Search Traversal.
  • Uses heuristic information to carry out the
    search in a guided manner.
  • A is guaranteed to find the correct answer
    (Correctness) by visiting the least number of
    states (Optimality)
  • Ex GPS, Google Maps, A lot of puzzles, games etc.

15
Goal Function
  • For a tuple t with m attribute values, Goal
    Function G(t) maps the tuple to a positive
    numeric score.

R(t) if B(t) is true 0 if B(t) is false
G(t) B(t)R(t)
(ie, lowest score)
16
Query Model
Addr Price Size
1. Oak park, Chicago 600K 4500
2. Mattis, Champaign 350K 2000
3. 150K 1000
4. 250K 2000
5. 300K 3500
6. 80K 500
Score
15
0
6.67
0
0
2.27
Select h.address from House h, Where h.price
200k ? h.price 400k Order by h.size/h.price-300
k
17
Landscape of Score Function - G
Addr Price Size
1. Oak park, Chicago 600K 4500
2. Mattis, Champaign 350K 2000
3. 150K 1000
4. 250K 2000
5. 300K 3500
6. 80K 500
Score
15
0
6.67
0
0
2.27
18
OPT Framework
  • To realize k-constrained optimization over
    databases, this paper develops the OPT
    framework.
  • Objective To Optimize G with the help of indices
    as access methods over tuples in D.
  • Discrete State Search From the view of using
    indices, we are to search the maximizing tuples
    on the index nodes as discrete states.
  • Continuous Function Optimization From the view
    of maximizing goal functions, we are to optimize
    G.

19
Evaluate query as its nature suggests!
Function optimization of G
Optimize G over D
Discrete state search over D
20
B Tree Structure
Indices
Value Space
21
Some definitions first..
  • States States in a search graph represent
    localities of values at different granularity
    from coarse to fine, and eventually reach tuples
    in the database.
  • Region State
  • Tuple State
  • Transitions While states of space give
    locations in the map, transitions further
    capture possible paths followed to reach our
    destination of query answers.
  • Example for two states u and v, there is a
    transition (u, v) if v ? Next(u)

22
We view compound index as discrete space
Price (k)
600
1
350
2
5
250
4
3
100
6
size
3000
1500
4000
4500
23
We view compound index as discrete space
Price (k)
Mij (ai, bj)
b1
250-600
0-250
600
b3
b2
M11
1
350
100-250
0-100
350-600
250-350
M32
M23
M33
b6
2
b7
M22
5

250
5
2
1
4
3

100

M76
M66
M77
M55
M56
M75
6
size
3000
1500
4000
4500
1
5
4
2
a1
M67
3000-4500
0-3000
a3
a2
1500-3000
0-1500
4000-6000
3000-4000
a6
a7

5
1
24
We view compound index as discrete space
conceptually, combined space
Price (k)
Mij (ai, bj)
b1
250-600
0-250
600
b3
b2
M11
1
350
100-250
0-100
350-600
250-350
b6
2
b7
5

250
5
2
1
4
3
100
M66
M77
M67
M76
M55
M56
M75

6
size
3000
1500
4000
4500
4
1
5
2
a1
3000-4500
0-3000
a3
a2
1500-3000
0-1500
4000-6000
3000-4000
a6
a7

5
1
25
  • Challenge 1 What is the search mechanism?

26
Encoding the problem into shortest path is
challenging
K-constrained optimization
Find a tuple with maximal score
A Shortest path
Find a path with minimal distance
gt A Gives Shortest Path to testable goal. gt
The goal is to find optimal tuple states with
maximal G-Score.
27
Transformation needed.
  • How to encode a tuple to a path?
  • Adding a virtual target t only reachable through
    tuples
  • How to encode maximal tuple with minimal path?
  • Quality of path depends solely on the tuple it
    passes by
  • For tuple state t
  • D(t, t) - G(t)
  • For two states r, u
  • D(r, u) 0

M11
0
0
M22
M32
M23
M33
0
0

M66
M67
M76
M77
M75
M56
M55
0
0
1
5
4
2
- G(1)
- G(4)
t
28
  • Challenge 2 How to guide the search?

29
Functional Optimization perspective
  • Function optimization measures quality of states
  • Function optimization aspects
  • Defines Proper Heuristics
  • Identifies a set of initial states to start
    search.

30
Structure of Procedure OPT
  • Input G(x1,,xm) and domain of values dom xi
    e xi1,xi2
  • Output ltO,Ugt OPT(G,dom)
  • where Ogives local optima
  • UUpper Bound Score
  • OPTPOINT gives O Component of OPT
  • OPTMAX gives U Component of OPT
  • Approaches
  • Analytical Method
  • Seach based (ExHill
  • Climbing)
  • Template Based

31
States and Transitions
High Medium Low
Figure illustrates different states have
different promises. Search should favor the
choice of M77 over M67 because its more promising.
32
1. Define admissible heuristics Measure tightest
upper bound
  • To guarantee completeness
  • A requires admissible heuristics, i.e., estimate
    optimistically
  • To ensure admissible heuristics
  • Function optimization gives tightest upper bound
  • Analytical approaches
  • Numeric analysis package

H(region) OPTMAX(G, region) i.e., maximal value
of G in the region
33
Consider Example
600
1
M77
M67
350
2
5
250
4
3
100
6
3000
1500
4000
4500
  • h(M67) gives U0
  • However if we follow the link from M67 to M77, we
    can reach Tuple 1 with score 15.

34
2. Configure descending space disconnect uphills
  • To guarantee optimality
  • A requires descending heuristics
  • To ensure descending heuristics
  • Remove uphill links

M11

M66
M77
M67
M76
M55
M75
M56
4
1
5
2
35
Find right start point Start from local optima
  • To guarantee correctness
  • Every tuple state must be reachable from start
    states
  • Taking only downhills requires start with high
    points
  • To ensure reachability
  • Initial states should contain all local optima

M11

M66
M55
M75
M56
M77
M67
M76
4
1
2
5
36
Putting together Executing A on the
configured space
top-down
M11
M22
M32
M23
M33

M67
M76
M57
M66
M77
M55
M75
M56
4
1
5
2
  • Search is implemented as priority queue driven
    traversal

37
Need of States and Transitions
  • Example . Given a set of states constructed from
    the set of index graph I, the search, in
    principle, should follow those transitions to
    look for the tuple states maximizing the goal
    function.. The search may follow the path
  • M11 ? M33 ? M77 ? 1 ? Top-down search
  • M57 ? M77 ? 1 ? Bottom-Up Search

38
OPT Search Algorithm
M11
M66
M55
M75
M56
M77
M67
M76
4
1
2
5
39
Optimality of OPT
  • OPT may result in different costs if started at
    different initial states.
  • Top down-gt More hops Bottom up-gtLess hops
  • Preference goes to Bottom Up but what if
  • Goal functions G1/(X-Y)21, any value
    satisfying
  • XY maximizes the function.

40
Experiments
  • Comparison vs.
  • Boolean then ranking
  • Ranking then boolean
  • Metrics node accessed Nl Nt
  • Settings
  • Benchmark queries over real dataset
  • Controlled queries over synthetic dataset

41
Benchmark queries
  • Datasets
  • 19,706 real estate listing crawled online
  • Queries
  • Q1 size bedrms/ price-450k
    40kltpricelt50k
  • Q2 size ebedrms / price-350k
    pricelt400ksizegt4000
  • Q3 size/price bedrms3 ? bedrms4

Q1
Q2
Q3
42
Controlled queries
  • Datasets
  • Three randomly generated datasets of 100k points
  • Uniform, gaussian, logvariatenormal
  • Queries
  • Linear average queries (eg, 0.4a 0.6b)
  • Nearest neighbor queries (eg, (x-3)2 (y-4)2)
  • Join queries (0.4R.a 0.6S.b R.cR.d)

43
Conclusion
  • Problem
  • Study K-constrained optimization queries as
    boolean ranking
  • Abstraction
  • Encode K-constrained optimization into shortest
    path problem
  • Framework
  • Develop OPT to process K-constrained optimization

44
  • References
  • Boolean Ranking Querying a Database by
    K-Constrained Optimization. Z. Zhang, S. Hwang,
    K. C.-C. Chang, M. Wang, C. Lang, and Y. Chang.
    In Proceedings of the 2006 ACM SIGMOD Conference
    (SIGMOD 2006), pages 359-370, Chicago, June 2006
  • www.wikipedia.org

45
Thank you!
  • Questions?
Write a Comment
User Comments (0)
About PowerShow.com