Basic IR: Queries - PowerPoint PPT Presentation

About This Presentation
Title:

Basic IR: Queries

Description:

... easy to lead astray (e.g., words with multiple meanings), difficult to express ... Just as single word usage is skewed (Zipf's Law) so is query submission on WWW. ... – PowerPoint PPT presentation

Number of Views:78
Avg rating:3.0/5.0
Slides: 17
Provided by: CSU67
Category:
Tags: basic | queries

less

Transcript and Presenter's Notes

Title: Basic IR: Queries


1
Basic IR Queries
  • Query is statement of users information need.
  • Index is designed to map queries to likely to be
    relevant documents. Query type, content,
    representation dictates what the index must do.
  • Varies from single keywords through specialized
    query languages to exemplar documents.

2
Documents as Queries
  • Find other documents like this one.
  • Query is itself a document it can go through
    same sort of pre-processing (e.g., stop word
    removal, stemming).
  • Characteristics of queries mimic those of
    documents.

3
Query Term Distribution in SavvySearch
4
Keyword Queries
  • Query is composed of a set of keywords.
  • Retrieve document that best matches keywords.
  • Advantage easy to use, supports fast indexing
  • Disadvantage coarse, easy to lead astray (e.g.,
    words with multiple meanings), difficult to
    express complex information need

5
Boolean Queries
  • combine queries (keywords) with Boolean
    operators
  • OR children OR kids
  • AND windows AND software
  • BUT unix BUT solaris
  • no NOT!
  • Advantage more precise queries
  • Disadvantage does not support ranking, less
    intuitive

6
Phrase Search
  • Supplement single terms with phrases exact
    sequence of terms
  • Requires index that tracks proximity of terms or
    stores both singletons and phrases
  • Extension is context where proximity between
    terms is stated (e.g., adele w/2 howe to
    CiteSeer)

7
Query Language Web Search Engines I
  • Altavista Advanced Search
  • Form
  • all of these words
  • this exact phrase
  • any of these words
  • none of these words
  • Boolean Expression
  • AND
  • OR
  • AND NOT
  • NEAR
  • Date, File Type, Location

8
Query Language Web Search Engines II
  • Google Advanced Search
  • Find Results
  • with all of the words
  • with the exact phrase
  • with at least one of the words
  • without the words
  • Language, File Format, Date, Occurrences, Domain,
    SafeSearch
  • Page-Specific Search
  • Find pages similar to the page
  • Find pages that link to the page
  • Topic Specific Searches

9
Typical Query Behavior on WWW
  • Query term distribution obeys Zipfs Law (quite
    skewed, although skew does drift).
  • Length is ? terms.
  • Few users exploit full power of query languages
    most enter terms without operations and do not
    use advanced search interfaces.
  • Change in behavior?

10
Natural Language
  • Augmented Boolean approach
  • Treat query as document.
  • Rank documents by how well they match the
    constraints of the query and return those above a
    certain threshold.
  • NLP approach
  • Interpret semantics in a limited way to constrain
    query (e.g., who indicates a person)

11
NL Example AskJeeves
12
Advanced Querying
  • Pattern Matching
  • combinations of syntactic features, e.g., regular
    expressions, wild-card queries
  • Structural Queries
  • forms, hypertext and hierarchies
  • typically supports iterative querying as in
    guided browsing (e.g., WebGlimpse or Letitzia)

13
Advanced Querying Letizia
  • Recommends new pages based on users browsing
    preferences
  • Infers interests by observing user behavior save
    bookmark, follow link, spend time on page
  • Models documents as list of keywords

Figure from http//lieber.www.media.mit.edu/people
/lieber/Liebrary/Letizia/Letizia.html
14
Caching
  • An astonishing number of people submit the same
    queries (e.g., Harry Potter).
  • Just as single word usage is skewed (Zipfs Law)
    so is query submission on WWW.
  • Can exploit this by caching results for oft
    repeated queries.

15
Single vs. On-Going Queries Filtering
  • Find new documents from information stream that
    satisfy a static information need
  • User profile represents interests threshold
    represents how closely documents must match.
  • User may provide the query or it may be learned
    through relevance feedback.

16
Filtering Process
from MIR text
Write a Comment
User Comments (0)
About PowerShow.com