CSM06 Information Retrieval - PowerPoint PPT Presentation

1 / 33
About This Presentation
Title:

CSM06 Information Retrieval

Description:

Part 2: Information Visualistion for Making Sense of the Results Set ... The graph highlights classic' papers (large impact), review' papers and the rate ... – PowerPoint PPT presentation

Number of Views:52
Avg rating:3.0/5.0
Slides: 34
Provided by: csp9
Category:

less

Transcript and Presenter's Notes

Title: CSM06 Information Retrieval


1
CSM06 Information Retrieval
  • LECTURE 6 Tuesday 28th October
  • Dr Andrew Salway
  • a.salway_at_surrey.ac.uk

2
Lecture 6 Some Current Issues in IR
  • Part 1 HCI Issues for IR Interfaces
  • Part 2 Information Visualisation
  • Part 3 Inference Beyond the Index

3
Part 1 HCI Issues for IR
  • Three principles for interface design (adapted
    from Shneiderman)
  • Offer informative feedback
  • Reduce working memory load
  • Alternative interfaces for expert / novice

4
Offer Informative Feedback
  • Interface should help user to understand what the
    system has done and why
  • For an information retrieval system this means
    helping the user to understand the relationship
    between their query and the returned documents

5
Reduce Working Memory Load
  • Information retrieval is interactive interface
    must help user keep track of different strategies
    / return to old queries
  • Interface may also suggest search terms and let
    user navigate a list of sources or a hierarchy of
    topics / keywords

6
Alternative Interfaces for Expert / Novice
  • Trade-off between simplicity (ease of use) and
    power
  • A good interface will allow a user to gradually
    progress from simple queries to more powerful
    ones, i.e. incrementally adding more
    sophisticated features to their searches and
    returning more information about the search
    results

7
Good HCI is important for
  • Selecting a document collection
  • Query Specification
  • Making sense of results set (Information
    Visualisation)

8
Selecting a Document Collection
  • An unordered list of collections does not help a
    user to choose - overviews can help
  • Manually organised topic categories, e.g. Yahoo
    or in specialist domains like 1200 ACM categories
  • Automatic document clustering, e.g. with a text
    description of each cluster (Scatter/Gather) or a
    2D map of the document collection where proximity
    and size of regions indicate the content of the
    document collection (e.g. Kohonen Maps)
  • Co-citation analysis identifies which documents
    are central to a collection also applied to
    links between Web pages

9
WEBSOM http//websom.hut.fi/websom/ (based
on Kohonen Maps)
10
Query Specification
  • Consider expressing a (faceted) Boolean query
    through..
  • Command Language expressive, accurate and quick
    for experts non-intuitive for novices (meaning
    of AND / OR and use of nested structures)
  • Form fill-in prompts user with attributes so
    users dont need to learn that part of the
    syntax
  • Menu selection constrains users choices where
    there is a finite set of choices

11
Query Specification
  • Consider expressing a Boolean query through..
  • Direct Manipulation gives users a better feel,
    e.g. Venn diagrams for Boolean queries
    filter-flow visualisation shows results at each
    stage
  • Natural Language requires extracting concepts
    and logical connectives in the disambiguation of
    queries may use question templates so users can
    ask naturally phrased questions, cf. Ask Jeeves

12
Part 2 Information Visualistion for Making Sense
of the Results Set
  • Three systems considered here
  • TileBars
  • InfoCrystal
  • Mappuccino

13
Making Sense of the Results Set
  • TileBars shows which query terms are where in
    which documents
  • One row per query term
  • One column per passage of text
  • Shading indicates frequency of term in passage
  • Users can look for terms co-occurring in passages
    and occurring throughout long documents
  • http//www.sims.berkeley.edu/hearst/tb-overview.h
    tml

14
(No Transcript)
15
Making Sense of the Results Set
  • InfoCrystal displays the results of a faceted
    query simultaneously
  • Limited to four query terms
  • Shows number of documents in the intersections /
    unions of the query terms
  • Users can look judge relative influence of facets
    on results set
  • http//www.scils.rutgers.edu/aspoerri/InfoCrystal
    /Ch_7.html

16
(No Transcript)
17
(No Transcript)
18
Making Sense of the Results Set
  • Mapuccino shows a sub-set of a WWW-site in nodes
    and links view
  • Positive weighting given to the outlinks of
    relevant pages
  • The graphical depiction is of link structure
    however this may reflect semantic content
  • http//www.alphaworks.ibm.com/tech/mapuccino

19
Further Reading
  • Baeza-Yates and Ribeiro-Neto (Chapter 10)

20
Part 3 Inference Beyond the Index
  • Scientific Citation Analysis measuring impact
  • Discovering latent knowledge
  • Web-adjacency Analysis identifying authorities
    and hubs

21
Inference Beyond the Index
  • Main focus in IR research (and in this module)
    has been on the Index i.e. the mapping between
    documents and descriptive keywords
  • Other kinds of relationships can help a user
    searching for information, e.g. relationships
    between documents / authors / keywords (cf.
    Belews Adaptive IR system)
  • Here considering in particular relationships
    between documents (citation analysis)

22
Relationships between Documents citations
  • When a new text is produced it may be woven into
    the larger fabric of other texts, e.g.
    conversational threads, legal arguments,
    scientific papers and web-pages
  • Interesting information can be gained from
    analysing which documents (or parts of documents)
    cite / are cited by which other documents (or
    parts of documents)

23
Scientific Citation Analysis
  • Belew pp. 185-190
  • Bibliometrics studies the graphs created by
    bibliographic citation links
  • Can analyse the impact of a scientific paper,
    e.g. how many times it is cited (by other
    authors) measuring its in-degree
  • Can measure the similarity of two scientific
    papers by comparing their bibliographies
  • Web of Science makes this kind of analysis
    relatively easy now http//wos.mimas.ac.uk/

24
Scientific Citation Analysis
  • Can analyse citation structure over time, e.g.
    Figure 6.3
  • This shows 200 hundred papers, numbered in the
    order that they were published, and indicates
    which other papers were cited in each
  • The graph highlights classic papers (large
    impact), review papers and the rate at which
    ideas are developing (the width of the research
    front)

25
(No Transcript)
26
Discovering Latent Knowledge
  • Belew pp. 234-238
  • Scientists tend to specialise, sometimes so much
    so that they are not aware of works in other
    areas that could help solve problems in their own
    area
  • Swanson considered the scientific literature as a
    web of logical connections some connections
    are made explicit, e.g. through citations, but
    others may be implicit
  • Swansons idea was to discover latent knowledge
    by automatically identifying implicit logical
    connections in the literature

27
Discovering Latent Knowledge
  • Belew pp. 234-238
  • Query-cluster to identify factors related to
    medical condition / disease as reported in one
    literature
  • Then, query-cluster to identify something related
    to those factor in another literature
  • See Figure 6.22
  • Still a challenge to automatically label the
    logical relationships between cause and
    effect

28
(No Transcript)
29
Web-adjacency Analysis
  • Belew pp. 195-199
  • For some web-queries they may be many thousands
    of relevant pages perhaps we need to consider
    another measure in order to provide a user with
    the best results
  • From among all the relevant pages, select those
    that are most authoritative

30
Web-adjacency Analysis
  • Kleinberg and colleagues proposed a method for
    identifying authoritative web-pages
  • Identify set of relevant pages (as normal)
  • Identify those with a large in-degree, i.e. lots
    of pages point to them (cf. impact)
  • Ensure that the authorities selected are referred
    to by a number of the same hubs, i.e. those with
    a large out-degree

31
Web-adjacency Analysis
  • Hubs and authorities exhibit what could be
    called a mutually reinforcing relationship
    (Kleinberg 1998, in Belew)
  • Computing authority and hub values for web-pages
    is an iterative process over a graph, where each
    node is a web-page
  • Two weights are given to each node relating to
    in-degree and out-degree total in-degree weights
    and total out-degree weights are kept constant
  • Weights are modified each iteration depending on
    weights of connected nodes

32
Further Reading
  • Swanson and Smalheiser (1997), An Interactive
    System for Finding Complementary Literatures,
    Artificial Intelligence 91, pp. 183-203.
  • Lawrence, Giles and Bollacker (1999), Digital
    Libraries and Autonomous Citation Indexing, IEEE
    Computer 32(6).
  • (Available through librarys eJournals, and on
    the shelf).

33
Reading
  • Belew pages given in previous slides.
  • Kleinberg (1998), Authoritative Sources in a
    Hyperlinked Environment, Journal of the ACM.
    http//citeseer.nj.nec.com/87928.html
Write a Comment
User Comments (0)
About PowerShow.com