QuerySensitive Similarity Measures for Information Retrieval - PowerPoint PPT Presentation

1 / 16
About This Presentation
Title:

QuerySensitive Similarity Measures for Information Retrieval

Description:

Cluster analysis a technique that allows the identification of ... Di,Dj = document vector. C = Di Dj = vector containing common terms of the document vectors ... – PowerPoint PPT presentation

Number of Views:92
Avg rating:3.0/5.0
Slides: 17
Provided by: bradwa
Category:

less

Transcript and Presenter's Notes

Title: QuerySensitive Similarity Measures for Information Retrieval


1
Query-Sensitive Similarity Measures for
Information Retrieval
  • By Anastasios Tombros and C.J. van Rijsbergen
  • Presented by Brad Wardman

2
Introduction
  • Cluster analysis a technique that allows the
    identification of groups, or clusters, of similar
    objects in a space that is typically assumed to
    be multidimensional
  • The central idea is that closely associated
    documents tend to be relevant to the same requests

3
Alternative View of Cluster Hypothesis
  • The hypothesis should not be seen as a test for
    an individual collections clustering tendency,
    but the hypothesis should be valid for every
    collection

4
Hypothesis
  • For any given query, pairs of relevant documents
    will exhibit an inherent similarity that is
    dictated by the query itself

5
Query-Sensitive Similarity Measures
  • Biases interdocument relationships toward pairs
    of documents that jointly possess attributes that
    are expressed in a query
  • The query terms are the salient features that
    define the context under which similarity of two
    documents is judged

6
Static clustering
  • Clustering has already been applied statically
    over entire document collections prior to
    querying
  • The value of the similarity will stay the same
    under all queries that a user may pose on an IR
    system

7
Cosine Coefficient
  • Commonly used as a measure of interdocument
    relationships

8
Hearst and Pederson
  • Postulated that relevant documents tend to appear
    in the same clusters, but the clusters are
    created as a f(x) of the documents retrieved in
    response to a query therefore, have the
    potential to be more closely tailored to the
    characteristics of the query than a static
    clustering

9
Information Retrieval Goal
  • The goal is, for any query, to place relevant
    documents closer to each other than nonrelevant
    ones
  • Interdocument similarity is dynamic and changes
    explicitly depending on the query

10
QSSM Model - M2
  • Q query vector
  • Di,Dj document vector
  • C Di n Dj vector containing common terms of
    the document vectors

11
M3
  • Linear combination of previous two models
  • Pairs of documents with more terms in common with
    the query than other pairs will be assigned
    higher similarity values

12
M1
  • Product of the two sources of information
  • Assumes that the presence of query terms is
    required for a document to be relevant

13
Limitations
  • Works on the assumption that query terms are
    sufficient indicators of document relevance
  • Only takes one instance of users information
    need into account
  • Short queries (2-3 term queries)
  • It will be doubtful whether the similarity
    measures will have enough information

14
Results
  • M1 and M3 are significantly more effective than
    the cosine coefficient at placing co-relevant
    documents closer to each other
  • Which provides a greater chance for a more
    effective clustering of the document space
  • M1 and M3 achieved higher scores than M2 for
    almost every condition

15
Results Cont.
  • The lower effectiveness of M2 is not surprising
    since given that this measure uses less
    information than the other two measures
  • M2 is highly affected by query length
  • M1 and M3 do not seem to be affected by query
    length
  • The difference between M1 and M3 are not
    statistically significant

16
Conclusions
  • Similarity is a dynamic and purpose-sensitive
    notion
  • QSSM have the potential to capture the dynamics
    of similarity for the calculation of
    interdocument relationships
Write a Comment
User Comments (0)
About PowerShow.com