CSM06 Information Retrieval - PowerPoint PPT Presentation

1 / 33

About This Presentation

Title:

CSM06 Information Retrieval

Description:

Part 2: Information Visualistion for Making Sense of the Results Set ... The graph highlights classic' papers (large impact), review' papers and the rate ... – PowerPoint PPT presentation

Number of Views:52

Avg rating:3.0/5.0

Slides: 34

Provided by: csp9

Category:

more less

Transcript and Presenter's Notes

Title: CSM06 Information Retrieval

1
CSM06 Information Retrieval

LECTURE 6 Tuesday 28th October
Dr Andrew Salway
a.salway_at_surrey.ac.uk

2
Lecture 6 Some Current Issues in IR

Part 1 HCI Issues for IR Interfaces
Part 2 Information Visualisation
Part 3 Inference Beyond the Index

3
Part 1 HCI Issues for IR

Three principles for interface design (adapted
from Shneiderman)
Offer informative feedback
Reduce working memory load
Alternative interfaces for expert / novice

4
Offer Informative Feedback

Interface should help user to understand what the
system has done and why
For an information retrieval system this means
helping the user to understand the relationship
between their query and the returned documents

5
Reduce Working Memory Load

Information retrieval is interactive interface
must help user keep track of different strategies
/ return to old queries
Interface may also suggest search terms and let
user navigate a list of sources or a hierarchy of
topics / keywords

6
Alternative Interfaces for Expert / Novice

Trade-off between simplicity (ease of use) and
power
A good interface will allow a user to gradually
progress from simple queries to more powerful
ones, i.e. incrementally adding more
sophisticated features to their searches and
returning more information about the search
results

7
Good HCI is important for

Selecting a document collection
Query Specification
Making sense of results set (Information
Visualisation)

8
Selecting a Document Collection

An unordered list of collections does not help a
user to choose - overviews can help
Manually organised topic categories, e.g. Yahoo
or in specialist domains like 1200 ACM categories
Automatic document clustering, e.g. with a text
description of each cluster (Scatter/Gather) or a
2D map of the document collection where proximity
and size of regions indicate the content of the
document collection (e.g. Kohonen Maps)
Co-citation analysis identifies which documents
are central to a collection also applied to
links between Web pages

9
WEBSOM http//websom.hut.fi/websom/ (based
on Kohonen Maps)
10
Query Specification

Consider expressing a (faceted) Boolean query
through..
Command Language expressive, accurate and quick
for experts non-intuitive for novices (meaning
of AND / OR and use of nested structures)
Form fill-in prompts user with attributes so
users dont need to learn that part of the
syntax
Menu selection constrains users choices where
there is a finite set of choices

11
Query Specification

Consider expressing a Boolean query through..
Direct Manipulation gives users a better feel,
e.g. Venn diagrams for Boolean queries
filter-flow visualisation shows results at each
stage
Natural Language requires extracting concepts
and logical connectives in the disambiguation of
queries may use question templates so users can
ask naturally phrased questions, cf. Ask Jeeves

12
Part 2 Information Visualistion for Making Sense
of the Results Set

Three systems considered here
TileBars
InfoCrystal
Mappuccino

13
Making Sense of the Results Set

TileBars shows which query terms are where in
which documents
One row per query term
One column per passage of text
Shading indicates frequency of term in passage
Users can look for terms co-occurring in passages
and occurring throughout long documents
http//www.sims.berkeley.edu/hearst/tb-overview.h
tml

14
(No Transcript)
15
Making Sense of the Results Set

InfoCrystal displays the results of a faceted
query simultaneously
Limited to four query terms
Shows number of documents in the intersections /
unions of the query terms
Users can look judge relative influence of facets
on results set
http//www.scils.rutgers.edu/aspoerri/InfoCrystal
/Ch_7.html

16
(No Transcript)
17
(No Transcript)
18
Making Sense of the Results Set

Mapuccino shows a sub-set of a WWW-site in nodes
and links view
Positive weighting given to the outlinks of
relevant pages
The graphical depiction is of link structure
however this may reflect semantic content
http//www.alphaworks.ibm.com/tech/mapuccino

19
Further Reading

Baeza-Yates and Ribeiro-Neto (Chapter 10)

20
Part 3 Inference Beyond the Index

Scientific Citation Analysis measuring impact
Discovering latent knowledge
Web-adjacency Analysis identifying authorities
and hubs

21
Inference Beyond the Index

Main focus in IR research (and in this module)
has been on the Index i.e. the mapping between
documents and descriptive keywords
Other kinds of relationships can help a user
searching for information, e.g. relationships
between documents / authors / keywords (cf.
Belews Adaptive IR system)
Here considering in particular relationships
between documents (citation analysis)

22
Relationships between Documents citations

When a new text is produced it may be woven into
the larger fabric of other texts, e.g.
conversational threads, legal arguments,
scientific papers and web-pages
Interesting information can be gained from
analysing which documents (or parts of documents)
cite / are cited by which other documents (or
parts of documents)

23
Scientific Citation Analysis

Belew pp. 185-190
Bibliometrics studies the graphs created by
bibliographic citation links
Can analyse the impact of a scientific paper,
e.g. how many times it is cited (by other
authors) measuring its in-degree
Can measure the similarity of two scientific
papers by comparing their bibliographies
Web of Science makes this kind of analysis
relatively easy now http//wos.mimas.ac.uk/

24
Scientific Citation Analysis

Can analyse citation structure over time, e.g.
Figure 6.3
This shows 200 hundred papers, numbered in the
order that they were published, and indicates
which other papers were cited in each
The graph highlights classic papers (large
impact), review papers and the rate at which
ideas are developing (the width of the research
front)

25
(No Transcript)
26
Discovering Latent Knowledge

Belew pp. 234-238
Scientists tend to specialise, sometimes so much
so that they are not aware of works in other
areas that could help solve problems in their own
area
Swanson considered the scientific literature as a
web of logical connections some connections
are made explicit, e.g. through citations, but
others may be implicit
Swansons idea was to discover latent knowledge
by automatically identifying implicit logical
connections in the literature

27
Discovering Latent Knowledge

Belew pp. 234-238
Query-cluster to identify factors related to
medical condition / disease as reported in one
literature
Then, query-cluster to identify something related
to those factor in another literature
See Figure 6.22
Still a challenge to automatically label the
logical relationships between cause and
effect

28
(No Transcript)
29
Web-adjacency Analysis

Belew pp. 195-199
For some web-queries they may be many thousands
of relevant pages perhaps we need to consider
another measure in order to provide a user with
the best results
From among all the relevant pages, select those
that are most authoritative

30
Web-adjacency Analysis

Kleinberg and colleagues proposed a method for
identifying authoritative web-pages
Identify set of relevant pages (as normal)
Identify those with a large in-degree, i.e. lots
of pages point to them (cf. impact)
Ensure that the authorities selected are referred
to by a number of the same hubs, i.e. those with
a large out-degree

31
Web-adjacency Analysis

Hubs and authorities exhibit what could be
called a mutually reinforcing relationship
(Kleinberg 1998, in Belew)
Computing authority and hub values for web-pages
is an iterative process over a graph, where each
node is a web-page
Two weights are given to each node relating to
in-degree and out-degree total in-degree weights
and total out-degree weights are kept constant
Weights are modified each iteration depending on
weights of connected nodes

32
Further Reading

Swanson and Smalheiser (1997), An Interactive
System for Finding Complementary Literatures,
Artificial Intelligence 91, pp. 183-203.
Lawrence, Giles and Bollacker (1999), Digital
Libraries and Autonomous Citation Indexing, IEEE
Computer 32(6).
(Available through librarys eJournals, and on
the shelf).

33
Reading

Belew pages given in previous slides.
Kleinberg (1998), Authoritative Sources in a
Hyperlinked Environment, Journal of the ACM.
http//citeseer.nj.nec.com/87928.html

Write a Comment

User Comments (0)