QuerySensitive Similarity Measures for Information Retrieval

About This Presentation

Title:

Description:

Number of Views:92

Avg rating:3.0/5.0

Slides: 17

Provided by: bradwa

Category:

more less

Transcript and Presenter's Notes

Title: QuerySensitive Similarity Measures for Information Retrieval

1
Query-Sensitive Similarity Measures for
Information Retrieval

2
Introduction

Cluster analysis a technique that allows the
identification of groups, or clusters, of similar
objects in a space that is typically assumed to
be multidimensional
The central idea is that closely associated
documents tend to be relevant to the same requests

3
Alternative View of Cluster Hypothesis

The hypothesis should not be seen as a test for
an individual collections clustering tendency,
but the hypothesis should be valid for every
collection

4
Hypothesis

For any given query, pairs of relevant documents
will exhibit an inherent similarity that is
dictated by the query itself

5
Query-Sensitive Similarity Measures

Biases interdocument relationships toward pairs
of documents that jointly possess attributes that
are expressed in a query
The query terms are the salient features that
define the context under which similarity of two
documents is judged

6
Static clustering

Clustering has already been applied statically
over entire document collections prior to
querying
The value of the similarity will stay the same
under all queries that a user may pose on an IR
system

7
Cosine Coefficient

8
Hearst and Pederson

9
Information Retrieval Goal

The goal is, for any query, to place relevant
documents closer to each other than nonrelevant
ones
Interdocument similarity is dynamic and changes
explicitly depending on the query

10
QSSM Model - M2

11
M3

Linear combination of previous two models
Pairs of documents with more terms in common with
the query than other pairs will be assigned
higher similarity values

12
M1

Product of the two sources of information
Assumes that the presence of query terms is
required for a document to be relevant

13
Limitations

Works on the assumption that query terms are
sufficient indicators of document relevance
Only takes one instance of users information
need into account
Short queries (2-3 term queries)
It will be doubtful whether the similarity
measures will have enough information

14
Results

M1 and M3 are significantly more effective than
the cosine coefficient at placing co-relevant
documents closer to each other
Which provides a greater chance for a more
effective clustering of the document space
M1 and M3 achieved higher scores than M2 for
almost every condition

15
Results Cont.

The lower effectiveness of M2 is not surprising
since given that this measure uses less
information than the other two measures
M2 is highly affected by query length
M1 and M3 do not seem to be affected by query
length
The difference between M1 and M3 are not
statistically significant

16
Conclusions

Similarity is a dynamic and purpose-sensitive
notion
QSSM have the potential to capture the dynamics
of similarity for the calculation of
interdocument relationships

Write a Comment

User Comments (0)