Clustering Top-Ranking Sentences for Information Access - PowerPoint PPT Presentation

1 / 8
About This Presentation
Title:

Clustering Top-Ranking Sentences for Information Access

Description:

We used 4 searchers with a total of 16 queries. each searcher assessed the utility of the top 30 ... more searchers & queries, different clustering methods ... – PowerPoint PPT presentation

Number of Views:39
Avg rating:3.0/5.0
Slides: 9
Provided by: mcle48
Learn more at: http://www.ecdl2003.org
Category:

less

Transcript and Presenter's Notes

Title: Clustering Top-Ranking Sentences for Information Access


1
Clustering Top-Ranking Sentences for Information
Access
  • Anastasios Tombros, Joemon Jose, Ian Ruthven
  • University of Glasgow University of Strathclyde
  • Glasgow, Scotland

2
Some Background Motivation
  • Challenge How to provide effective access to
    information
  • Approach Combine clustering top-ranking
    sentences (TRS)
  • clustering has been used extensively on the
    document level
  • TRS are based on single document summaries
  • Overall aim of the work
  • to create a personalised information space
  • to use information from users interaction

3
Top-Ranking Sentences
  • Assume a user with a query
  • the query is sent to an IR system
  • consider only the top retrieved documents, e.g.
    30
  • apply a query-biased sentence extraction model to
    each of these documents
  • construct a sentence extract of max. 4 sentences
    per document
  • the set of these sentences for the 30 documents
    is the set of TRS
  • TRS can be ranked by their query-biased scores

4
Top-Ranking Sentences (cntd.)
  • TRS have shown to be effective in interactive IR
    on the Web
  • they provide effective access to the retrieved
    information
  • They can be seen as a level of abstraction of the
    set of retrieved documents
  • We introduce an extra layer of abstraction by
    clustering the set of TRS

5
Clustering Top-Ranking Sentences
  • An attempt to create a personalised information
    space
  • sentences give local contexts in which query
    terms occur
  • sentences discussing query terms in similar
    contexts should cluster together
  • this structure should facilitate a more intuitive
    and effective access to information
  • Similarities and differences to document
    clustering

6
Comparing TRS and Document Clustering
  • We used 4 searchers with a total of 16 queries
  • each searcher assessed the utility of the top 30
    documents on a scale of 1-10
  • For each query
  • we downloaded the top-30 retrieved documents
  • we extracted the set of TRS
  • we clustered the 30 documents and the set of TRS
  • we assigned scores to document TRS clusters
  • sum of the document (sentence) scores divided by
    the number of documents (sentences) in the cluster

7
Some Results
  • Scores of TRS clusters were significantly higher
    than those of document clusters
  • best cluster averages 4.78 vs. 5.82
  • overall averages 3.2 vs. 3.73
  • Average precision and recall were higher for TRS
    clusters
  • define P R based on documents with scores 7
  • average P 0.38 vs. 0.49
  • average R 0.73 vs. 0.77
  • Cluster sizes were comparable
  • 5 docs per cluster vs. 5.3 sentences per cluster

8
Conclusions Future Plans
  • TRS clusters have the potential to offer more
    effective information access
  • only one aspect of their expected utility
  • Integrate TRS clustering in interactive web
    searching
  • investigate its utility in user-based studies on
    the live Internet
  • We have extended the reported work
  • more searchers queries, different clustering
    methods
  • inter-sentence similarities, structure of
    information space
Write a Comment
User Comments (0)
About PowerShow.com