Title: Query Paradigms Model and Semantics
1Query Paradigms Model and Semantics
2What are the sceneries?
- Information retrieval
- Multimedia
- Relational database
3TR vs. Database Retrieval
- Information
- Unstructured/free text vs. structured data
- Ambiguous vs. well-defined semantics
- Query
- Ambiguous vs. well-defined semantics
- Incomplete vs. complete specification
- Answers
- Relevant documents vs. matched records
- TR is an empirically defined problem!
4Document Selection vs. Ranking
R(q)
1
Doc Selection f(d,q)?
True R(q)
-
-
-
-
0
-
-
-
-
-
-
-
-
-
-
-
-
-
0.98 d1 0.95 d2 0.83 d3 - 0.80 d4 0.76 d5
- 0.56 d6 - 0.34 d7 - 0.21 d8 0.21 d9 -
-
Doc Ranking f(d,q)?
-
5TR System Architecture
docs
INDEXING
Query Rep
query
Doc Rep
User
Ranking
results
6Relevance Feedback
7Pseudo/Blind/Automatic Feedback
Results d1 3.5 d2 2.4 dk 0.5 ...
Retrieval Engine
Query
Updated query
Document collection
Judgments d1 d2 d3 dk - ...
Feedback
8VS Model illustration
9Relevance Feedback in VS
- Basic setting Learn from examples
- Positive examples docs known to be relevant
- Negative examples docs known to be non-relevant
- How do you learn from this to improve
performance? - General method Query modification
- Adding new (weighted) terms
- Adjusting weights of old terms
- Doing both
10Probability model
- What is the probability that THIS document is
relevant to THIS query?
11Feedback as Model Interpolation
Document D
Results
Query Q
Feedback Docs Fd1, d2 , , dn
Generative model
12Examples of Information Filtering
- News filtering
- Email filtering
- Movie/book recommenders
- Literature recommenders
- And many others
13Content-based Filtering vs. Collaborative
Filtering
- Basic filtering question Will user U like item
X? - Two different ways of answering it
- Look at what U likes
- Look at who likes X
- Can be combined
gt characterize X gt content-based filtering
gt characterize U gt collaborative filtering
14Adaptive Information Filtering
- Stable long term interest, dynamic info source
- System must make a delivery decision immediately
as a document arrives
Filtering System
15Score Distribution Approaches( Aramptzis
Hameren 01 Zhang Callan 01)
- Assume generative model of scores p(sR), p(sN)
- Estimate the model with training data
- Find the threshold by optimizing the expected
utility under the estimated model - Specific methods differ in the way of defining
and estimating the scoring distributions
16What is Collaborative Filtering (CF)?
- Making filtering decisions for an individual user
based on the judgments of other users - Inferring individuals interest/preferences from
that of other similar users - General idea
- Given a user u, find similar users u1, , um
- Predict us preferences based on the preferences
of u1, , um
17CF Assumptions
- Users with a common interest will have similar
preferences - Users with similar preferences probably share the
same interest - Examples
- interest is IR gt favor SIGIR papers
- favor SIGIR papers gt interest is IR
- Sufficiently large number of user preferences are
available
18Rating-based vs. Preference-based
- Rating-based Users preferences are encoded
using numerical ratings on items - Complete ordering
- Absolute values can be meaningful
- But, values must be normalized to combine
- Preference-based Users preferences are
represented by partial ordering of items - Partial ordering
- Easier to exploit implicit preferences
19A Formal Framework for Rating
Objects O
o1 o2 oj on 3
1.5 . 2 2 1 3
Users U
Xijf(ui,oj)?
u1 u2 ui ... um
The task
- Assume known f values for some (u,o)s
- Predict f values for other (u,o)s
- Essentially function approximation, like other
learning problems
?
Unknown function f U x O? R
20Optimizing Queries Over Multimedia Repositories
- Surajit Chaudhuri
- Hewlett-Packard Laboratories
- Luis Gravano
- Hewlett-Packard Laboratories
- Stanford University
21Our objects have multiple complex attributes
- Object
- Attributes
- caption, color histogram of image, texture of
image, Fagin96
Sunset on the Atlantic Ocean
22Attribute handling differs from traditional
systems
- Attribute values v1,v2 have a grade of match
Grade(v1,v2) (between 0 and 1) - Attributes can only be accessed through indexes
23Queries have two main components
- SELECT oid
- FROM Repository
- WHERE Filter_Condition
- ORDER k by
- Ranking_Expression
24The filter condition selects the objects that are
good enough
- Example
- Grade(color_histogram, reddish_ch) gt
0.7 - Only objects with images that are red enough
are in the answer.
25The ranking expression orders the objects that
are good enough
- Example
- Grade(caption, ocean sunset)
- Objects are sorted based on how well their
caption matches the keywords ocean and sunset.
26Users request the top k objects that are good
enough
- SELECT oid
- FROM Repository
- WHERE Grade(color_histogram, reddish_ch) gt
0.7 - ORDER 10 by
- Grade(caption, ocean sunset)
27Filter conditions and ranking expressions can be
complex
- A filter condition is built from atomic
conditions using Ands and Ors - A ranking expression is built from atomic
expressions using Mins and Maxs
28We gain expressivity by having both query
components
- Filter conditions ranking expressions more
expressive than just ranking expressions
29Attributes can only be accessed through indexes
- Indexes support
- Search gets all objects with a minimum grade of
match for an attribute value - Probe gets the grade of an object for an
attribute value
30We optimize the processing of the filter condition
- Example
- a And b And c
- An execution
- Get all objects that satisfy a (through a
search) - Check if objects satisfy b And c (through probes)
31We process the ranking expression as a modified
filter condition
- We use selectivity estimates to map a ranking
expression into a filter condition - We process the new filter condition
- We output the top objects for the original
ranking expression
32We propagate k down to the atomic expressions
- Top object for the ranking expression (k1)
- Min(Grade(a1,v1), Grade(a2,v2))
- Objects in repository 100
- Then, objects needed from each subexpression
Fagin96 10
33Evaluating Top-K Selection Queries
- Surajit ChaudhuriMicrosoft Research
- Luis GravanoColumbia University
34Top-K Queries over Precise Relational Data
- Support approximate matches with minimal changes
to the relational engine - Initial focus Selection queries with equality
conditions
35Specifying Top-K Queries using SQL
- Select
- From R
- Order k By Scoring_Function
36On Saying Enough Already! in SQLMichael J.
Carey and Donald KossmannSIGMOD97
- January 16, 1997
- Jang Ho Park
37Extending SQL
- SELECT FROM WHERE GROUP BY HAVING ORDER
BY ltsort specification listgtSTOP AFTER ltvalue
expressiongt - ltvalue expressiongt evaluates to an integer that
specifies the maximum number of result tuples
desired.
38Semantics of a STOP AFTER
- With ORDER BY clause
- Only the first N tuples in this ordering are
returned. - Without ORDER BY clause
- Any N tuples that satisfy the rest of the query
are returned. - If fewer than N tuples in the result
- STOP AFTER clause has no effect.
39Implications
- It is not hard just to extend SQL to allow users
to specify a limit on the result size of a query. - The advantage of extending SQL is that it
provides information that the DBMS can exploit
during query optimization and execution.
40Example 1 Query
- SELECT e.name, e.salary, d.nameFROM Emp e, Dept
dWHERE e.works_in d.dnoORDER BY e.salary
DESCSTOP AFTER 2
41Supporting Top-k Join Queries in Relational
DatabasesIhab F. Ilyas Walid G. Aref Ahmed K.
ElmagarmidDepartment of Computer Sciences,
Purdue University
42Query Example
- SELECT
- FROM R1, R2, , Rm
- WHERE join condition(R1R2 Rm)
- ORDER BY f(R1scoreR2score Rmscore)
- STOP AFTER k
43Query 1
- SELECT A.1,B.2
- FROM A,B,C
- WHERE A.1 B.1 and B.2 C.2
- ORDER BY (0.3A.10.7B.2)
- STOP AFTER 5