Title: CSM06 Information Retrieval
1CSM06 Information Retrieval
- LECTURE 6 Tuesday 28th October
- Dr Andrew Salway
- a.salway_at_surrey.ac.uk
2Lecture 6 Some Current Issues in IR
- Part 1 HCI Issues for IR Interfaces
- Part 2 Information Visualisation
- Part 3 Inference Beyond the Index
3Part 1 HCI Issues for IR
- Three principles for interface design (adapted
from Shneiderman) - Offer informative feedback
- Reduce working memory load
- Alternative interfaces for expert / novice
4Offer Informative Feedback
- Interface should help user to understand what the
system has done and why - For an information retrieval system this means
helping the user to understand the relationship
between their query and the returned documents
5Reduce Working Memory Load
- Information retrieval is interactive interface
must help user keep track of different strategies
/ return to old queries - Interface may also suggest search terms and let
user navigate a list of sources or a hierarchy of
topics / keywords
6Alternative Interfaces for Expert / Novice
- Trade-off between simplicity (ease of use) and
power - A good interface will allow a user to gradually
progress from simple queries to more powerful
ones, i.e. incrementally adding more
sophisticated features to their searches and
returning more information about the search
results
7Good HCI is important for
- Selecting a document collection
- Query Specification
- Making sense of results set (Information
Visualisation)
8Selecting a Document Collection
- An unordered list of collections does not help a
user to choose - overviews can help - Manually organised topic categories, e.g. Yahoo
or in specialist domains like 1200 ACM categories - Automatic document clustering, e.g. with a text
description of each cluster (Scatter/Gather) or a
2D map of the document collection where proximity
and size of regions indicate the content of the
document collection (e.g. Kohonen Maps) - Co-citation analysis identifies which documents
are central to a collection also applied to
links between Web pages
9 WEBSOM http//websom.hut.fi/websom/ (based
on Kohonen Maps)
10Query Specification
- Consider expressing a (faceted) Boolean query
through.. - Command Language expressive, accurate and quick
for experts non-intuitive for novices (meaning
of AND / OR and use of nested structures) - Form fill-in prompts user with attributes so
users dont need to learn that part of the
syntax - Menu selection constrains users choices where
there is a finite set of choices
11Query Specification
- Consider expressing a Boolean query through..
- Direct Manipulation gives users a better feel,
e.g. Venn diagrams for Boolean queries
filter-flow visualisation shows results at each
stage - Natural Language requires extracting concepts
and logical connectives in the disambiguation of
queries may use question templates so users can
ask naturally phrased questions, cf. Ask Jeeves
12Part 2 Information Visualistion for Making Sense
of the Results Set
- Three systems considered here
- TileBars
- InfoCrystal
- Mappuccino
13Making Sense of the Results Set
- TileBars shows which query terms are where in
which documents - One row per query term
- One column per passage of text
- Shading indicates frequency of term in passage
- Users can look for terms co-occurring in passages
and occurring throughout long documents - http//www.sims.berkeley.edu/hearst/tb-overview.h
tml
14(No Transcript)
15Making Sense of the Results Set
- InfoCrystal displays the results of a faceted
query simultaneously - Limited to four query terms
- Shows number of documents in the intersections /
unions of the query terms - Users can look judge relative influence of facets
on results set - http//www.scils.rutgers.edu/aspoerri/InfoCrystal
/Ch_7.html
16(No Transcript)
17(No Transcript)
18Making Sense of the Results Set
- Mapuccino shows a sub-set of a WWW-site in nodes
and links view - Positive weighting given to the outlinks of
relevant pages - The graphical depiction is of link structure
however this may reflect semantic content - http//www.alphaworks.ibm.com/tech/mapuccino
19Further Reading
- Baeza-Yates and Ribeiro-Neto (Chapter 10)
20Part 3 Inference Beyond the Index
- Scientific Citation Analysis measuring impact
- Discovering latent knowledge
- Web-adjacency Analysis identifying authorities
and hubs
21Inference Beyond the Index
- Main focus in IR research (and in this module)
has been on the Index i.e. the mapping between
documents and descriptive keywords - Other kinds of relationships can help a user
searching for information, e.g. relationships
between documents / authors / keywords (cf.
Belews Adaptive IR system) - Here considering in particular relationships
between documents (citation analysis)
22Relationships between Documents citations
- When a new text is produced it may be woven into
the larger fabric of other texts, e.g.
conversational threads, legal arguments,
scientific papers and web-pages - Interesting information can be gained from
analysing which documents (or parts of documents)
cite / are cited by which other documents (or
parts of documents)
23Scientific Citation Analysis
- Belew pp. 185-190
- Bibliometrics studies the graphs created by
bibliographic citation links - Can analyse the impact of a scientific paper,
e.g. how many times it is cited (by other
authors) measuring its in-degree - Can measure the similarity of two scientific
papers by comparing their bibliographies - Web of Science makes this kind of analysis
relatively easy now http//wos.mimas.ac.uk/
24Scientific Citation Analysis
- Can analyse citation structure over time, e.g.
Figure 6.3 - This shows 200 hundred papers, numbered in the
order that they were published, and indicates
which other papers were cited in each - The graph highlights classic papers (large
impact), review papers and the rate at which
ideas are developing (the width of the research
front)
25(No Transcript)
26Discovering Latent Knowledge
- Belew pp. 234-238
- Scientists tend to specialise, sometimes so much
so that they are not aware of works in other
areas that could help solve problems in their own
area - Swanson considered the scientific literature as a
web of logical connections some connections
are made explicit, e.g. through citations, but
others may be implicit - Swansons idea was to discover latent knowledge
by automatically identifying implicit logical
connections in the literature
27Discovering Latent Knowledge
- Belew pp. 234-238
- Query-cluster to identify factors related to
medical condition / disease as reported in one
literature - Then, query-cluster to identify something related
to those factor in another literature - See Figure 6.22
- Still a challenge to automatically label the
logical relationships between cause and
effect
28(No Transcript)
29Web-adjacency Analysis
- Belew pp. 195-199
- For some web-queries they may be many thousands
of relevant pages perhaps we need to consider
another measure in order to provide a user with
the best results - From among all the relevant pages, select those
that are most authoritative
30Web-adjacency Analysis
- Kleinberg and colleagues proposed a method for
identifying authoritative web-pages - Identify set of relevant pages (as normal)
- Identify those with a large in-degree, i.e. lots
of pages point to them (cf. impact) - Ensure that the authorities selected are referred
to by a number of the same hubs, i.e. those with
a large out-degree
31Web-adjacency Analysis
- Hubs and authorities exhibit what could be
called a mutually reinforcing relationship
(Kleinberg 1998, in Belew) - Computing authority and hub values for web-pages
is an iterative process over a graph, where each
node is a web-page - Two weights are given to each node relating to
in-degree and out-degree total in-degree weights
and total out-degree weights are kept constant - Weights are modified each iteration depending on
weights of connected nodes
32Further Reading
- Swanson and Smalheiser (1997), An Interactive
System for Finding Complementary Literatures,
Artificial Intelligence 91, pp. 183-203. - Lawrence, Giles and Bollacker (1999), Digital
Libraries and Autonomous Citation Indexing, IEEE
Computer 32(6). - (Available through librarys eJournals, and on
the shelf).
33Reading
- Belew pages given in previous slides.
- Kleinberg (1998), Authoritative Sources in a
Hyperlinked Environment, Journal of the ACM.
http//citeseer.nj.nec.com/87928.html