Tutorial: User Interfaces

About This Presentation

Title:

Tutorial: User Interfaces

Description:

none – PowerPoint PPT presentation

Number of Views:213

Avg rating:3.0/5.0

Slides: 193

Provided by: hea4

Learn more at: https://bailando.berkeley.edu

more less

Transcript and Presenter's Notes

Title: Tutorial: User Interfaces

1
TutorialUser Interfaces Visualization for
Information Access

Prof. Marti Hearst
University of California, Berkeley
http//www.sims.berkeley.edu/hearst
SIGIR 2000

2
Outline

Search Interfaces Today
HCI Foundations
The Information Seeking Process
Visualizing Text Collections
Incorporating Context and Tasks
Promising Future Directions

3
Introductory Remarks

Much of HCI is art, still not science
In this tutorial, I discuss user studies whenever
available
I do not have time to do justice to most of the
topics.

4
What do current search interfaces do well?
5
Web search interfaces solutions

Single-word queries
Standard IR assumed long queries
Web searches average 1.5-2.5 words
Problems
One word can have many meanings
What context is the word used in?
Which of many articles to retrieve?

6
(No Transcript)
7
(No Transcript)
8
(No Transcript)
9
(No Transcript)
10
(No Transcript)
11
(No Transcript)
12
Web search interfaces solutions

Single-word queries
Solutions
Incorporation of manually-created categories
Provides useful starting points
Disambiguates the term(s)
Development of sideways hierarchy representation
Ranking that emphasizes starting points
Link analysis finds server home pages, etc
Use of behavior of other users
Suggests related pages
Popularity of pages

13
Web search interfaces are interesting in terms of
what they do NOT do

Current Web interfaces
Are the results of each site experimenting
Only those ideas that work for most people
survive
Only very simple ideas remain
Abandoned strategies
Scores and graphical bars that show degree of
match
Associated term graph (altavista)
Suggested terms for expansion (excite)
Why did these die?

14
What is lacking in Web search?
15
What is lacking?

Support for complex information needs
Info on the construction on highway 80
Research chemo vs. surgery
How does the 6th circuit tend to rule on
intellectual property cases?
What is the prior art for this invention?

16
What is lacking?

Integration of search and analysis
Support for a series of searches
Backing up and moving forward
Suggested next steps
Comparisons and contrasts
Personal prior history and interests
More generally
CONTEXT
INTERACTIVITY

17
What is lacking?

Question Answering
Answers, not documents!
Active area of research and industry
Not always appropriate
Should I have chemo or surgery?
Who will win the election?

18
(No Transcript)
19
(No Transcript)
20
Question Answering State-of-the-Art

Kupiec SIGIR 93, Srihari Li NAACL 00, Cardie et
al. NAACL 00
Goal Find a paragraph, phrase or sentence that
(hopefully) answers the question.
Approach
Identify certain types of noun phrases
People
Dates
Hook these up with question types
Who
When
Match keywords in question to keywords in
candidate answer that contain the right kind of
NP
Use syntactic or simple frame semantics to help
with matching (optional)

21
The future of search toolsA Prediction of a
Dichotomy

Information Intensive
Business analysis
Scientific research
Planning design

Quick lookup
Question answering
Context-dependent info (location, time)

22
Human-Computer Interaction
23
What is HCI?

HCI Human-Computer Interaction
A discipline concerned with
design
evaluation
implementation
of interactive computing systems for human use
The study of major phenomena surrounding the
interaction of humans with computers.

24
Shneiderman on HCI

Well-designed interactive computer systems
promote
Positive feelings of success, competence, and
mastery.
Allow users to concentrate on their work, rather
than on the system.

25
What is HCI?
Organizational Social Issues
Task
Humans
Technology
26
User-centered Design

Focus first on what people need to do, not what
the system needs to do.
Formulate typical scenarios of use.
Take into account
Cognitive constraints
Organizational/Social constraints
Keep users involved throughout the project

27
Waterfall Design Model (from Software
Engineering)
Initiation
Application Description
Analysis
Requirements Specification
Design
System Design
Implementation
Product
?
28
UI Design Iteration
Design
Evaluate
Prototype
29
Comparing Design Processes

Waterfall model
The customer is not the user
User-centered design
Assess what the user needs
Design for this
Redesign if user needs are not met

30
Steps in Standard UI Design

Needs Assessment / Task Analysis
Low-fidelity Prototype Evaluation
Redesign
Interactive Prototype
Heuristic Evaluation
Redesign
Revised Interactive Prototype
Pilot User Study
Redesign
Revised Interactive Prototype
Larger User Study

31
Task Analysis

Observe existing work practices
Create examples and scenarios of actual use
Try out new ideas before building software

32
Rapid Prototyping

Build a mock-up of design
Low fidelity techniques
paper sketches
cut, copy, paste
video segments
Interactive prototyping tools
Visual Basic, HyperCard, Director, etc.
UI builders
NeXT, etc.

33
Usability EvaluationStandard Techniques

User studies
Have people use the interface to complete some
tasks
Requires an implemented interface
"Discount" vs. Scientific Results
Heuristic Evaluation
Usability expert assesses guidelines

34
Cognitive ConsiderationsNormans Action Cycle

Human action has two aspects
execution and evaluation
Execution doing something
Evaluation comparison of what happened to what
was desired

35
Action Cycle
Goals
Evaluation
Execution
The World
36
Action Cycle
Goals
Evaluation Evaluation of interpretations Interpre
ting the perception Perceiving the state of
the world
Execution Intention to act Sequence of
actions Execution of seq uence of actions
The World
37
Normans Action Cycle

Execution has three stages
Start with a goal
Translate into an intention
Translate into a sequence of actions
Now execute the actions
Evaluation has three stages
Perceive world
Interpret what was perceived
Compare with respect to original intentions

38
Gulf of Evaluation

The amount of effort a person must exert to
interpret
the physical state of the system
how well the expectations and intentions have
been met
We want a small gulf!

39
Mental Models

People have mental models of how things work
how does your car start?
how does an ATM machine work?
how does your computer boot?
Allows people to make predictions about how
things will work

40
Strategy for Design

Provide a good conceptual model
allows users to predict consequences of actions
communicated through the image of the system
relations between users intentions, required
actions, and results should be
sensible
consistent
meaningful (non-arbitrary)

41
Design Guidelines
Shneiderman (8 design rules)

Consistency
Shortcuts (for experts)
Feedback
Closure
Error prevention
Easy reversal of actions
User control
Low memory burden

There are hundreds of design guidelines listings!
42
Design Guidelines for Search UIs

I think the most important are
Reduce memory burden / Provide feedback
Previews
History
Context
User control
Query modification
Flexible manipulation of results
Easy reversal of actions

43
Designing for Error

Norman on designing for error
Understand the causes of error and design to
minimize these causes
Make it possible to reverse actions
Make it hard to do non-reversible actions
Make it easy to discover the errors that do occur
Change attitude towards errors
A user is attempting to do a task, getting
there by imperfect approximations actions are
approximations to what is actually desired.

44
HCI Intro Summary

UI design involves users
UI design is iterative
An art, not a science
Evaluation is key
Design guidelines
are useful
but application to information-centric systems
can be difficult

45
Recommended HCI Books

Alan Dix et al., Human-Computer Interaction, 2nd
edition (Feb 1998) Prentice Hall
Ben Shneiderman, Designing the user interface
strategies for effective human--computer
interaction, 3rd ed. Addison-Wesley, 1998.
Jakob Nielsen, Usability Engineering, Morgan
Kaufmann, 1994
Holtzblatt and Beyer, Making Customer-Centered
Design Work for Teams, CACM, 36 (10), October
1993.
www.useit.com
world.std.com/uieweb
usableweb.com

46
Supporting the Information Seeking Process

Two parts to the process
search and retrieval
analysis and synthesis of search results

47
Standard IR Model

Assumptions
Maximizing precision and recall simultaneously
The information need remains static
The value is in the resulting document set

48
Problem with Standard Model

Users learn during the search process
Scanning titles of retrieved documents
Reading retrieved documents
Viewing lists of related topics/thesaurus terms
Navigating hyperlinks
Some users dont like long disorganized lists of
documents

49
A sketch of a searcher moving through many
actions towards a general goal of satisfactory
completion of research related to an information
need. (after Bates 89)
Q2
Q4
Q3
Q1
Q5
Q0
50
Berry-picking model (Bates 90)

The query is continually shifting
Users may move through a variety of sources
New information may yield new ideas and new
directions
The query is not satisfied by a single, final
retrieved set, but rather by a series of
selections and bits of information found along
the way.

51
Implications

Interfaces should make it easy to store
intermediate results
Interfaces should make it easy to follow trails
with unanticipated results
Makes evaluation more difficult.

52
Orienteering (ODay Jeffries 93)

Interconnected but diverse searches on a single,
problem-based theme
Focus on information delivery rather than search
performance
Classifications resulting from an extended
observational study
15 clients of professional intermediaries
financial analyst, venture capitalist, product
marketing engineer, statistician, etc.

53
Orienteering (ODay Jeffries 93)

Identified three main search types
Monitoring
Following a plan
Exploratory
A series of interconnected but diverse searches
on one problem-based theme
Changes in direction caused by triggers
Each stage followed by reading, assimilation, and
analysis of resulting material.

54
Orienteering (ODay Jeffries 93)

Defined three main search types
monitoring
a well-known topic over time
e.g., research four competitors every quarter
following a plan
a typical approach to the task at hand
e.g., improve business process X
exploratory
explore topic in an undirected fashion
get to know an unfamiliar industry

55
Orienteering (ODay Jeffries 93)

Trends
A series of interconnected but diverse searches
on one problem-based theme
This happened in all three search modes
Each analyst did at least two search types
Each stage followed by reading, assimilation, and
analysis of resulting material

56
Orienteering (ODay Jeffries 93)

Searches tended to trigger new directions
Overview, then detail, repeat
Information need shifted between search requests
Context of problem and previous searches were
carried to next stage of search
The value was contained in the accumulation of
search results, not the final result set
These observations verified Bates predictions.

57
Orienteering (ODay Jeffries 93)

Triggers motivation to switch from one strategy
to another
next logical step in a plan
encountering something interesting
explaining change
finding missing pieces

58
Stop Conditions (ODay Jeffries 93)

Stopping conditions not as clear as for triggers
People stopped searching when
no more compelling triggers
finished an appropriate amount of searching for
the task
specific inhibiting factor
e.g., learning market was too small
lack of increasing returns
80/20 rule
Missing information/inferences ok
business world different than scholarship

59
After the Search Analyzing and Synthesizing
Search Results

Orienteering Post-Search Behaviors
Read and Annotate
Analyze 80 fell into six main types

60
Post-Search Analysis Types (ODay Jeffries 93)

Trends
Comparisons
Aggregation and Scaling
Identifying a Critical Subset
Assessing
Interpreting
The rest
cross-reference
summarize
find evocative visualizations
miscellaneous

61
SenseMaking (Russell et al. 93)

The process of encoding retrieved information to
answer task-specific questions
Combine
internal cognitive resources
external retrieved resources
Create a good representation
an iterative process
contend with a cost/benefit tradoff

62
UIs for Supporting the Search Process
63
Infogrid (design mockup) (Rao et al. 92)
64
InfoGrid/Protofoil (Rao et al. 92)

A general search interface architecture
Itemstash store retrieved docs
Search Event -- current query
History -- history of queries
Result Item -- view selected docs metadata

65
Infogrid Design Mockups(Rao et al. 92)
66
DLITE (Cousins 97)

Drag and Drop interface
Reify queries, sources, retrieval results
Animation to keep track of activity

67
DLITE

UI to a digital library
Direct manipulation interface to distributed IR
Workcenter approach
lots of handy tools
experts create workcenters
contents persistent
concurrently shareable across sites
Web browser used to display document or
collection metadata

68
(No Transcript)
69
Interaction

Pointing at object brings up tooltip -- metadata
Activating object -- component specific action
5 types for result set component
Drag-and-drop data onto program
Animation used to show what happens with
drag-and-drop (e.g. waggling)

70
User Reacting to DLITE

Two participant pools
7 Stanford CS
11 NASA researchers librarians
Requires learning, initially unfamiliar
Many requested help pages
After the model was understood, few errors
Overall positive attitude, even stronger after a
two week delay
Successfully remembered most features after 2
week lag

71
Keeping Track of History

Techniques
List of prior queries and results (standard)
Slide sorter view, snapshots of earlier
interactions
Graphical hierarchy for web browsing

72
Keeping Track of History

PadPrints (Hightower et al. 98)
Tree-based history of recently visited web-pages
history map placed to left of browser window
Zoomable, can shrink sub-hierarchies
Node title thumbnail

73
PadPrints (Hightower et al. 98)
74
PadPrints (Hightower et al. 98)
75

Initial User Study of PadPrints

13.4 unable to find recently visited pages
only 0.1 use History button, 42 use Back
problems with history list (according to authors)
incomplete, lose out on every branch
textual (not necessarily a problem! )
pull down menu cumbersome -- cannot see history
along with current document

76
Second User Study of Padprints

Changed the task to involve revisiting web pages
CHI database, National Park Service website
Only correctly answered questions considered
20-30 fewer pages accessed
faster response time for tasks that involve
revisiting pages
slightly better user satisfaction ratings

77
Info Seeking Summary

The standard model (issue query, get results,
repeat) is not fully adequate
Berry picking/orienteering offer an alternative
to the standard IR model
Interfaces can be devised to support the
interactive process over time
More work needs to be done to support the process
of completing infoseeking tasks

78
Interactive Query Modification
79
Query Modification

Problem how to reformulate the query?
Thesaurus expansion
Suggest terms similar to query terms
Relevance feedback
Suggest terms (and documents) similar to
retrieved documents that have been judged to be
relevant

80
Relevance Feedback

Usually do both
expand query with new terms
re-weight terms in query
There are many variations
usually positive weights for terms from relevant
docs
sometimes negative weights for terms from
non-relevant docs

81
Using Relevance Feedback

Known to improve results
in TREC-like conditions (no user involved)
What about with a user in the loop?
How might you measure this?
Lets examine a user study of relevance feedback
by Koenneman Belkin 1996.

82
Questions being InvestigatedKoenemann Belkin 96

How well do users work with statistical ranking
on full text?
Does relevance feedback improve results?
Is user control over operation of relevance
feedback helpful?
How do different levels of user control effect
results?

83
How much of the guts should the user see?

Opaque (black box)
(like web search engines)
Transparent
(see available terms after the r.f. )
Penetrable
(see suggested terms before the r.f.)
Which do you think worked best?

84
(No Transcript)
85
Terms available for relevance feedback made
visible(from Koenemann Belkin)
86
Details on User StudyKoenemann Belkin 96

Subjects have a tutorial session to learn the
system
Their goal is to keep modifying the query until
theyve developed one that gets high precision
This is an example of a routing query (as opposed
to ad hoc)
Reweighting
They did not reweight query terms
Instead, only term expansion
pool all terms in rel docs
take top N terms, where
n 3 (number-marked-relevant-docs2)
(the more marked docs, the more terms added to
the query)

87
Details on User StudyKoenemann Belkin 96

64 novice searchers
43 female, 21 male, native English
TREC test bed
Wall Street Journal subset
Two search topics
Automobile Recalls
Tobacco Advertising and the Young
Relevance judgements from TREC and experimenter
System was INQUERY (vector space with some bells
and whistles)

88
Sample TREC query
89
Evaluation

Precision at 30 documents
Baseline (Trial 1)
How well does initial search go?
One topic has more relevant docs than the other
Experimental condition (Trial 2)
Subjects get tutorial on relevance feedback
Modify query in one of four modes
no r.f., opaque, transparent, penetration

90
Precision vs. RF condition (from Koenemann
Belkin 96)
91
Effectiveness Results

Subjects with R.F. did 17-34 better performance
than no R.F.
Subjects with penetration case did 15 better as
a group than those in opaque and transparent
cases.

92
Number of iterations in formulating queries (from
Koenemann Belkin 96)
93
Behavior Results

Search times approximately equal
Precision increased in first few iterations
Penetration case required fewer iterations to
make a good query than transparent and opaque
R.F. queries much longer
but fewer terms in penetrable case -- users were
more selective about which terms were added in.

94
Relevance Feedback Summary

Iterative query modification can improve
precision and recall for a standing query
In at least one study, users were able to make
good choices by seeing which terms were suggested
for R.F. and selecting among them
So more like this can be useful!
But usually requires more than one document,
unlike how web versions work.

95
(No Transcript)
96
Alternative Notions of Relevance Feedback
97
Social and Implicit Relevance Feedback

Find people whose taste is similar to yours.
Will you like what they like?
Follow a users actions in the background. Can
this be used to predict what the user will want
to see next?
Track what lots of people are doing. Does this
implicitly indicate what they think is good and
not good?

98
Collaborative Filtering (social filtering)

If Pam liked the paper, Ill like the paper
If you liked Star Wars, youll like Independence
Day
Rating based on ratings of similar people
Ignores the text, so works on text, sound,
pictures etc.
But Initial users can bias ratings of future
users

99
Social Filtering

Ignores the content, only looks at who judges
things similarly
Works well on data relating to taste
something that people are good at predicting
about each other too
Does it work for topic?
GroupLens results suggest otherwise (preliminary)
Perhaps for quality assessments
What about for assessing if a document is about a
topic?

100
Learning interface agents

Use machine learning to improve performance
learn user behavior, preferences
Useful when
1) past behavior is a useful predictor of the
future
2) wide variety of behaviors amongst users
Examples
mail clerk sort incoming messages in right
mailboxes
calendar manager automatically schedule meeting
times?

101
Example Systems

Example Systems
WebWatcher
Letizia
Vary according to
User states topic or not
User rates pages or not

102
WebWatcher (Freitag et al.)

A "tour guide" agent for the WWW.
User tells it what kind of information is wanted
System tracks web actions
Highlights hyperlinks that it computes will be of
interest.
Strategy for giving advice is learned from
feedback from earlier tours.
Uses WINNOW as a learning algorithm

103
(No Transcript)
104
Letizia (Lieberman 95)
user
letizia
heuristics
recommendations
user profile

Recommends web pages during browsing based on
user profile
Learns user profile using simple heuristics
Passive observation, recommend on request
Provides relative ordering of link
interestingness
Assumes recommendations near current page are
more valuable than others

105
Letizia (Lieberman 95)

Infers user preferences from behavior
Interesting pages
record in hot list
save as a file
follow several links from pages
returning several times to a document
Not Interesting
spend a short time on document
return to previous document without following
links
passing over a link to document (selecting links
above and below document)

106
Consequences of passive observation

No ability to fine-tune profile or express
interest without visiting appropriate pages
Weak heuristics
Must click through multiple uninteresting pages
en route to interestingness
Hierarchies tend to get more hits near root
But page read time does seem to robustly
indicate interest (across many pages and many
users)

107
MARS (Riu et al. 97)
Relevance feedback based on image similarity
108
Time Series R.F. (Keogh Pazzani 98)
109
Social and Implicit Relevance Feedback

Several different criteria to consider
Implicit vs. Explicit judgements
Individual vs. Group judgements
Standing vs. Dynamic topics
Similarity of the items being judged vs.
similarity of the judges themselves

110

Classifying R. F. Systems Amazon.com
Books on related topics
Books bought by others who bought this
Community, implicit, standing, judges items,
similar items

111
Classifying R.F. Systems

Standard Relevance Feedback
Individual, explicit, dynamic, item comparison
Standard Filtering (NewsWeeder)
Individual, explicit, standing profile, item
comparison
Standard Routing
Community (gold standard), explicit, standing
profile, item comparison

112
Classifying R.F. Systems

Letizia and WebWatcher
Individual, implicit, dynamic, item comparison
Ringo and GroupLens
Group, explicit, standing query, judge-based
comparison

113
Query Modification Summary

Relevance feedback is an effective means for
user-directed query modification.
Modification can be done with either direct or
indirect user input
Modification can be done based on an individuals
or a groups past input.

114
Information Visualization
115
Visualization Success Stories
116
Visualization Success Stories
Illustration of John Snows deduction that a
cholera epidemic was caused by a bad water pump,
circa 1854. Horizontal lines indicate location
of deaths.
From Visual Explanations by Edward Tufte,
Graphics Press, 1997
117
Visualizing Text Collections

Some Visualization Principles
Why Text is Tough
Visualizing Collection Overviews
Evaluations involving Users

118
Preattentive Processing

A limited set of visual properties are processed
preattentively
(without need for focusing attention).
This is important for design of visualizations
what can be perceived immediately
what properties are good discriminators
what can mislead viewers

All Preattentive Processing figures from Healey
97 (on the web)
119
Example Color Selection
Viewer can rapidly and accurately
determine whether the target (red circle) is
present or absent. Difference detected in color.
120
Example Shape Selection
Viewer can rapidly and accurately
determine whether the target (red circle) is
present or absent. Difference detected in form
(curvature)
121
Pre-attentive Processing

lt 200 - 250ms qualifies as pre-attentive
eye movements take at least 200ms
yet certain processing can be done very quickly,
implying low-level processing in parallel

122
Example Conjunction of Features
Viewer cannot rapidly and accurately
determine whether the target (red circle) is
present or absent when target has two or more
features, each of which are present in the
distractors. Viewer must search sequentially.
123
SUBJECT PUNCHED QUICKLY OXIDIZED TCEJBUS DEHCNUP
YLKCIUQ DEZIDIXO CERTAIN QUICKLY PUNCHED METHODS
NIATREC YLKCIUQ DEHCNUP SDOHTEM SCIENCE ENGLISH
RECORDS COLUMNS ECNEICS HSILGNE SDROCER
SNMULOC GOVERNS PRECISE EXAMPLE MERCURY SNREVOG
ESICERP ELPMAXE YRUCREM CERTAIN QUICKLY PUNCHED
METHODS NIATREC YLKCIUQ DEHCNUP SDOHTEM GOVERNS
PRECISE EXAMPLE MERCURY SNREVOG ESICERP ELPMAXE
YRUCREM SCIENCE ENGLISH RECORDS COLUMNS ECNEICS
HSILGNE SDROCER SNMULOC SUBJECT PUNCHED QUICKLY
OXIDIZED TCEJBUS DEHCNUP YLKCIUQ
DEZIDIXO CERTAIN QUICKLY PUNCHED METHODS NIATREC
YLKCIUQ DEHCNUP SDOHTEM SCIENCE ENGLISH RECORDS
COLUMNS ECNEICS HSILGNE SDROCER SNMULOC
124
Accuracy Ranking of Quantitative Perceptual
Tasks(Mackinlay 88 from Cleveland McGill)
Position
More Accurate
Length
Angle
Slope
Area
Volume
Less Accurate
Color
Density
125
Why Text is Tough to Visualize

Text is not pre-attentive
Text consists of abstract concepts
Text represents similar concepts in many
different ways
space ship, flying saucer, UFO, figment of
imagination
Text has very high dimensionality
Tens or hundreds of thousands of features
Many subsets can be combined together

126
Why Text is Tough
The Dog.
127
Why Text is Tough
The Dog.
The dog cavorts.
The dog cavorted.
128
Why Text is Tough
The man.
The man walks.
129
Why Text is Tough
The man walks the cavorting dog.
So far, we can sort of show this in pictures.
130
Why Text is Tough
As the man walks the cavorting dog,
thoughts arrive unbidden of the previous spring,
so unlike this one, in which walking was marching
and dogs were baleful sentinals outside unjust
halls.
How do we visualize this?
131
Why Text is Tough

Abstract concepts are difficult to visualize
Combinations of abstract concepts are even more
difficult to visualize
time
shades of meaning
social and psychological concepts
causal relationships

132
Why Text is Tough

Language only hints at meaning
Most meaning of text lies within our minds and
common understanding
How much is that doggy in the window?
how much social system of barter and trade (not
the size of the dog)
doggy implies childlike, plaintive, probably
cannot do the purchasing on their own
in the window implies behind a store window,
not really inside a window, requires notion of
window shopping

133
Why Text is Tough

General categories have no standard ordering
(nominal data)
Categorization of documents by single topics
misses important distinctions
Consider an article about
NAFTA
The effects of NAFTA on truck manufacture
The effects of NAFTA on productivity of truck
manufacture in the neighboring cities of El Paso
and Juarez

134
Why Text is Tough

I saw Pathfinder on Mars with a telescope.
Pathfinder photographed Mars.
The Pathfinder photograph mars our perception of
a lifeless planet.
The Pathfinder photograph from Ford has arrived.
The Pathfinder forded the river without marring
its paint job.

135
Why Text is Easy

Text is highly redundant
When you have lots of it
Pretty much any simple technique can pull out
phrases that seem to characterize a document
Instant summary
Extract the most frequent words from a text
Remove the most common English words

136
Guess the Texts

64 president
38 jones
38 information
32 evidence
31 lewinsky
28 oic
28 investigation
26 court
26 clinton
22 office
21 discovery
20 sexual
20 case
17 testimony
16 judge

478 said
233 god
201 father
187 land
181 jacob
160 son
157 joseph
134 abraham
121 earth
119 man
118 behold
113 years
104 wife
101 name
94 pharaoh

137
Text Collection Overviews

How can we show an overview of the contents of a
text collection?
Show info external to the docs
e.g., date, author, source, number of inlinks
does not show what they are about
Show the meanings or topics in the docs
a list of titles
results of clustering words or documents
organize according to categories (next time)

138
Visualizing Collection Clusters

Scatter/Gather
show main themes as groups of text summaries
Scatter Plots
show docs as points closeness indicates nearness
in cluster space
show main themes of docs as visual clumps or
mountains
Kohonen Feature maps
show main themes as adjacent polygons
BEAD
show main themes as links within a force-directed
placement network

139
Text Clustering

Finds overall similarities among groups of
documents
Finds overall similarities among groups of tokens
Picks out some themes, ignores others

140
Clustering for Collection Overviews

Two main steps
cluster the documents according to the words they
have in common
map the cluster representation onto a
(interactive) 2D or 3D representation

141
Scatter/GatherCutting, Pedersen, Tukey Karger
92, 93, Hearst Pedersen 95

First use of text clustering in the interface
Showing clusters to users had not been done
Focus on interaction
Show topical terms and typical titles
Allow users to change the views
Did not emphasize visualization

142
Scatter/Gather
143
S/G Example query on star

Encyclopedia text
14 sports
8 symbols 47 film, tv
68 film, tv (p) 7 music
97 astrophysics
67 astronomy(p) 12 steller phenomena
10 flora/fauna 49 galaxies, stars
29 constellations
7 miscelleneous
Clustering and re-clustering is entirely
automated

144
Northern Light used to cluster exclusively. Now
combines categorization with clustering
145
Northern Light second level clusters are these
really about NLP?Note that next level
corresponds to URLs
146
Scatter Plot of Clusters(Chen et al. 97)
147
BEAD (Chalmers 97)
148
BEAD (Chalmers 96)
An example layout produced by Bead, seen in
overview, of 831 bibliography entries. The
dimensionality (the number of unique words in
the set) is 6925. A search for cscw or
collaborative shows the pattern of occurrences
coloured dark blue, mostly to the right. The
central rectangle is the visualizers motion
control.
149
Example Themescapes(Wise et al. 95)
Themescapes (Wise et al. 95)
150
Clustering for Collection Overviews

Since text has tens of thousands of features
the mapping to 2D loses a tremendous amount of
information
only very coarse themes are detected

151
Galaxy of News Rennison 95
152
Galaxy of News Rennison 95
153
Kohonen Feature Maps(Lin 92, Chen et al. 97)
(594 docs)
154
How Useful is Collection Cluster Visualization
for Search?

Three studies find negative results

155
Study 1

Kleiboemer, Lazear, and Pedersen. Tailoring a
retrieval system for naive users. In Proc. of
the 5th Annual Symposium on Document Analysis and
Information Retrieval, 1996
This study compared
a system with 2D graphical clusters
a system with 3D graphical clusters
a system that shows textual clusters
Novice users
Only textual clusters were helpful (and they were
difficult to use well)

156
Study 2 Kohonen Feature Maps

H. Chen, A. Houston, R. Sewell, and B. Schatz,
JASIS 49(7)
Comparison Kohonen Map and Yahoo
Task
Window shop for interesting home page
Repeat with other interface
Results
Starting with map could repeat in Yahoo (8/11)
Starting with Yahoo unable to repeat in map (2/14)

157
Study 2 (cont.)

Participants liked
Correspondence of region size to documents
Overview (but also wanted zoom)
Ease of jumping from one topic to another
Multiple routes to topics
Use of category and subcategory labels

158
Study 2 (cont.)

Participants wanted
hierarchical organization
other ordering of concepts (alphabetical)
integration of browsing and search
correspondence of color to meaning
more meaningful labels
labels at same level of abstraction
fit more labels in the given space
combined keyword and category search
multiple category assignment (sportsentertain)

159
Study 3 NIRVE

NIRVE Interface by Cugini et al. 96. Each
rectangle is a cluster. Larger clusters closer
to the pole. Similar clusters near one
another. Opening a cluster causes a projection
that shows the titles.

160
Study 3

Visualization of search results a comparative
evaluation of text, 2D, and 3D interfaces
Sebrechts, Cugini, Laskowski, Vasilakis and
Miller, Proceedings of SIGIR 99, Berkeley, CA,
1999.
This study compared
3D graphical clusters
2D graphical clusters
textual clusters
15 participants, between-subject design
Tasks
Locate a particular document
Locate and mark a particular document
Locate a previously marked document
Locate all clusters that discuss some topic
List more frequently represented topics

161
Study 3

Results (time to locate targets)
Text clusters fastest
2D next
3D last
With practice (6 sessions) 2D neared text
results 3D still slower
Computer experts were just as fast with 3D
Certain tasks equally fast with 2D text
Find particular cluster
Find an already-marked document
But anything involving text (e.g., find title)
much faster with text.
Spatial location rotated, so users lost context
Helpful viz features
Color coding (helped text too)
Relative vertical locations

162
Visualizing Clusters

Huge 2D maps may be inappropriate focus for
information retrieval
cannot see what the documents are about
space is difficult to browse for IR purposes
(tough to visualize abstract concepts)
Perhaps more suited for pattern discovery and
gist-like overviews

163
Co-Citation Analysis

Has been around since the 50s. (Small, Garfield,
White McCain)
Used to identify core sets of
authors, journals, articles for particular fields
Not for general search
Main Idea
Find pairs of papers that cite third papers
Look for commonalitieis
A nice demonstration by Eugene Garfield at
http//165.123.33.33/eugene_garfield/papers/mapsci
world.html

164
Co-citation analysis (From Garfield 98)
165
Co-citation analysis (From Garfield 98)
166
Co-citation analysis (From Garfield 98)
167
Context
168
Types of Context

Personal situation
Where you are
What time it is
Your general preferences
Context of other documents
Context of what you have done so far in the
search process

169
Putting Results in Context

Visualizations of Query Term Distribution
KWIC, TileBars, SeeSoft
Table of Contents as Context
Superbook, Cha-Cha, DynaCat
Visualizing Shared Subsets of Query Terms
InfoCrystal, VIBE, Lattice Views
Dynamic Metadata as Query Previews

170
KWIC (Keyword in Context)

An old standard, ignored by internet search
engines
used in some intranet engines, e.g., Cha-Cha

171
Table-of-Contents Views

Superbook (Remde et al., 87)
Functions
Word Lookup
Show a list query words, stems, and word
combinations
Table of Contents Dynamic fisheye view of the
hierarchical topics list
Search words can be highlighted here too
Page of Text show selected page and highlighted
search terms
See UI/IR textbook chapter for information on
interesting user study

172
Superbook (http//superbook.bellcore.com/SB)
173
Egan et al. Study

Goal compare Superbook with paper book
Tasks
structured search find answer to a specific
question using an unfamiliar reference text
open-book essay synthesize material from
different places in the document
incidental learning how much useful information
about the document is acquired while doing other
tasks
subjective ratings user reactions to the form
and content

174
Egan et al. Study

Factors for structured search
Does the users question correspond to the
authors organization of the material?
Half the study search questions contained cues as
to which topic heading to use, half did not
Does the users query as stated contain some of
the same words as those used by the author?
Half the questions contained words taken from the
text surrounding the target text, half did not

175
Egan et al. Study

Example search questions
Find the section discussing the basic concept
that the value of any expression, however
complicated, is a data structure.
The dataset murder contains murder rates per
100,000 population. Find the section that says
which staes are included in this dataset.
Find the section that describes pie charts and
states whether or not they are a good means for
analyzing data.
Find the section that describes the first thing
you have to do to get S to print pictoral output.
blue boldface terms taken from text
pink italics terms taken from topic
heading

176
Egan et al. Study

Hypotheses
Conventional document would require good cues
from the topic headings, but Superbook would not.
Word lookup function hypothesized to allow
circumvention of authors organization scheme.
Superbooks search facility would result in
open-book essays that include more information.

177
Egan et al. Study

Source text statistics package manual (562 pp.)
Compare
superbook vs. paper versions
Four sets of search questions of mixed type
20 university students with stats background
Superbook training tutorial
15 minutes per structured query
One open-book essay retained

178
Egan et al. Study

Results Superbook had an advantage in
overall average accuracy (75 vs. 62)
Superbook did better on questions with words from
text but not in topic headings
Print version did better on questions with no
search hits
speed (5.4 vs. 5.6 min/query on average)
Superbook faster for text-only cues
Paper faster for no questions with no hits
essay creation
average score of 5.8 vs. 3.6 points out of 7
average 8.8 facts vs. 6.0 out of 15

179
Egan et al. Study

Results
Subjective ratings
Superbook users rated it easier than paper (5.8
vs. 3.1 out of 7)
Superbook users gave higher ratings on the stat
system
Incidental learning
Superbook users recalled more chapter headings
maybe because these were continually displayed
No other differences were significant
Problems with study
Did not compare against non-hypertext
computerized version
Did not show if/how hyperlinks affected results

180
Cha-Cha (Chen Hearst 98)

Shows table-of-contents-like view, like
Superbook
Takes advantage of human-created structure within
hyperlinks to create the TOC

181
(No Transcript)
182
DynaCat (Pratt, Hearst, Fagan 99)

Decide on important question types in an advance
What are the adverse effects of drug D?
What is the prognosis for treatment T?
Make use of MeSH categories
Retain only those types of categories known to be
useful for this type of query.

183
DynaCat (Pratt, Hearst, Fagan 99)
184
DynaCat Study

Design
Three queries
24 cancer patients
Compared three interfaces
ranked list, clusters, categories
Results
Participants strongly preferred categories
Participants found more answers using categories
Participants took same amount of time with all
three interfaces
Similar results have been verified by another
study by Chen and Dumais (CHI 2000)

185
Cat-a-ConeMultiple Simultaneous Categories

Key Ideas
Separate documents from category labels
Show both simultaneously
Link the two for iterative feedback
Distinguish between
Searching for Documents vs.
Searching for Categories

186
Cat-a-Cone Interface
187
search
browse
query terms
Category Hierarchy
Collection
Retrieved Documents
188
Proposed Advantages

Integrate category selection with viewing of
categories
Show all categories context
Show relationship of retrieved documents to the
category structure
But was not evaluated with user study

189
Our new project FLAMENCO

FLexible Access using MEtadata in Novel
COmbinations
Main idea
Preview and postview information
Determined dynamically and (semi) automatically,
based on current task

190
The future of search toolsA Prediction of a
Dichotomy

Information Intensive
Business analysis
Scientific research
Planning design

Quick lookup
Question answering
Context-dependent info (location, time)

191
My Predictions of Future Trends in Search
Interfaces

Specialization
Single topic search (vortals)
Task-oriented search
Personalization
Question-Answering
Visualization???

192
References

See the bibliography of Chapter 10 of Modern
Information Retrieval, Ricardo Baeza-Yates
Berthier Ribeiro-Neto (Eds.) This chapter is
called User Interfaces and Visualization, by
Marti Hearst. Available at www.sims.berkeley.edu/
hearst/irbook/chapters/chap10.html

Write a Comment

User Comments (0)