Title: Tutorial: User Interfaces
1TutorialUser Interfaces Visualization for
Information Access
- Prof. Marti Hearst
- University of California, Berkeley
- http//www.sims.berkeley.edu/hearst
- SIGIR 2000
2Outline
- Search Interfaces Today
- HCI Foundations
- The Information Seeking Process
- Visualizing Text Collections
- Incorporating Context and Tasks
- Promising Future Directions
3Introductory Remarks
- Much of HCI is art, still not science
- In this tutorial, I discuss user studies whenever
available - I do not have time to do justice to most of the
topics.
4What do current search interfaces do well?
5Web search interfaces solutions
- Single-word queries
- Standard IR assumed long queries
- Web searches average 1.5-2.5 words
- Problems
- One word can have many meanings
- What context is the word used in?
- Which of many articles to retrieve?
6(No Transcript)
7(No Transcript)
8(No Transcript)
9(No Transcript)
10(No Transcript)
11(No Transcript)
12Web search interfaces solutions
- Single-word queries
- Solutions
- Incorporation of manually-created categories
- Provides useful starting points
- Disambiguates the term(s)
- Development of sideways hierarchy representation
- Ranking that emphasizes starting points
- Link analysis finds server home pages, etc
- Use of behavior of other users
- Suggests related pages
- Popularity of pages
13Web search interfaces are interesting in terms of
what they do NOT do
- Current Web interfaces
- Are the results of each site experimenting
- Only those ideas that work for most people
survive - Only very simple ideas remain
- Abandoned strategies
- Scores and graphical bars that show degree of
match - Associated term graph (altavista)
- Suggested terms for expansion (excite)
- Why did these die?
14What is lacking in Web search?
15What is lacking?
- Support for complex information needs
- Info on the construction on highway 80
- Research chemo vs. surgery
- How does the 6th circuit tend to rule on
intellectual property cases? - What is the prior art for this invention?
16What is lacking?
- Integration of search and analysis
- Support for a series of searches
- Backing up and moving forward
- Suggested next steps
- Comparisons and contrasts
- Personal prior history and interests
- More generally
- CONTEXT
- INTERACTIVITY
17What is lacking?
- Question Answering
- Answers, not documents!
- Active area of research and industry
- Not always appropriate
- Should I have chemo or surgery?
- Who will win the election?
18(No Transcript)
19(No Transcript)
20Question Answering State-of-the-Art
- Kupiec SIGIR 93, Srihari Li NAACL 00, Cardie et
al. NAACL 00 - Goal Find a paragraph, phrase or sentence that
(hopefully) answers the question. - Approach
- Identify certain types of noun phrases
- People
- Dates
- Hook these up with question types
- Who
- When
- Match keywords in question to keywords in
candidate answer that contain the right kind of
NP - Use syntactic or simple frame semantics to help
with matching (optional)
21The future of search toolsA Prediction of a
Dichotomy
- Information Intensive
- Business analysis
- Scientific research
- Planning design
- Quick lookup
- Question answering
- Context-dependent info (location, time)
22Human-Computer Interaction
23What is HCI?
- HCI Human-Computer Interaction
- A discipline concerned with
- design
- evaluation
- implementation
- of interactive computing systems for human use
- The study of major phenomena surrounding the
interaction of humans with computers.
24Shneiderman on HCI
- Well-designed interactive computer systems
promote - Positive feelings of success, competence, and
mastery. - Allow users to concentrate on their work, rather
than on the system.
25What is HCI?
Organizational Social Issues
Task
Humans
Technology
26User-centered Design
- Focus first on what people need to do, not what
the system needs to do. - Formulate typical scenarios of use.
- Take into account
- Cognitive constraints
- Organizational/Social constraints
- Keep users involved throughout the project
27Waterfall Design Model (from Software
Engineering)
Initiation
Application Description
Analysis
Requirements Specification
Design
System Design
Implementation
Product
?
28UI Design Iteration
Design
Evaluate
Prototype
29Comparing Design Processes
- Waterfall model
- The customer is not the user
- User-centered design
- Assess what the user needs
- Design for this
- Redesign if user needs are not met
30Steps in Standard UI Design
- Needs Assessment / Task Analysis
- Low-fidelity Prototype Evaluation
- Redesign
- Interactive Prototype
- Heuristic Evaluation
- Redesign
- Revised Interactive Prototype
- Pilot User Study
- Redesign
- Revised Interactive Prototype
- Larger User Study
31Task Analysis
- Observe existing work practices
- Create examples and scenarios of actual use
- Try out new ideas before building software
32Rapid Prototyping
- Build a mock-up of design
- Low fidelity techniques
- paper sketches
- cut, copy, paste
- video segments
- Interactive prototyping tools
- Visual Basic, HyperCard, Director, etc.
- UI builders
- NeXT, etc.
33Usability EvaluationStandard Techniques
- User studies
- Have people use the interface to complete some
tasks - Requires an implemented interface
- "Discount" vs. Scientific Results
- Heuristic Evaluation
- Usability expert assesses guidelines
34Cognitive ConsiderationsNormans Action Cycle
- Human action has two aspects
- execution and evaluation
- Execution doing something
- Evaluation comparison of what happened to what
was desired
35Action Cycle
Goals
Evaluation
Execution
The World
36Action Cycle
Goals
Evaluation Evaluation of interpretations Interpre
ting the perception Perceiving the state of
the world
Execution Intention to act Sequence of
actions Execution of seq uence of actions
The World
37Normans Action Cycle
- Execution has three stages
- Start with a goal
- Translate into an intention
- Translate into a sequence of actions
- Now execute the actions
- Evaluation has three stages
- Perceive world
- Interpret what was perceived
- Compare with respect to original intentions
38Gulf of Evaluation
- The amount of effort a person must exert to
interpret - the physical state of the system
- how well the expectations and intentions have
been met - We want a small gulf!
39Mental Models
- People have mental models of how things work
- how does your car start?
- how does an ATM machine work?
- how does your computer boot?
- Allows people to make predictions about how
things will work
40Strategy for Design
- Provide a good conceptual model
- allows users to predict consequences of actions
- communicated through the image of the system
- relations between users intentions, required
actions, and results should be - sensible
- consistent
- meaningful (non-arbitrary)
41Design Guidelines
Shneiderman (8 design rules)
- Consistency
- Shortcuts (for experts)
- Feedback
- Closure
- Error prevention
- Easy reversal of actions
- User control
- Low memory burden
There are hundreds of design guidelines listings!
42Design Guidelines for Search UIs
- I think the most important are
- Reduce memory burden / Provide feedback
- Previews
- History
- Context
- User control
- Query modification
- Flexible manipulation of results
- Easy reversal of actions
43Designing for Error
- Norman on designing for error
- Understand the causes of error and design to
minimize these causes - Make it possible to reverse actions
- Make it hard to do non-reversible actions
- Make it easy to discover the errors that do occur
- Change attitude towards errors
- A user is attempting to do a task, getting
there by imperfect approximations actions are
approximations to what is actually desired.
44HCI Intro Summary
- UI design involves users
- UI design is iterative
- An art, not a science
- Evaluation is key
- Design guidelines
- are useful
- but application to information-centric systems
can be difficult
45Recommended HCI Books
- Alan Dix et al., Human-Computer Interaction, 2nd
edition (Feb 1998) Prentice Hall - Ben Shneiderman, Designing the user interface
strategies for effective human--computer
interaction, 3rd ed. Addison-Wesley, 1998. - Jakob Nielsen, Usability Engineering, Morgan
Kaufmann, 1994 - Holtzblatt and Beyer, Making Customer-Centered
Design Work for Teams, CACM, 36 (10), October
1993. - www.useit.com
- world.std.com/uieweb
- usableweb.com
46Supporting the Information Seeking Process
- Two parts to the process
- search and retrieval
- analysis and synthesis of search results
47Standard IR Model
- Assumptions
- Maximizing precision and recall simultaneously
- The information need remains static
- The value is in the resulting document set
48Problem with Standard Model
- Users learn during the search process
- Scanning titles of retrieved documents
- Reading retrieved documents
- Viewing lists of related topics/thesaurus terms
- Navigating hyperlinks
- Some users dont like long disorganized lists of
documents
49A sketch of a searcher moving through many
actions towards a general goal of satisfactory
completion of research related to an information
need. (after Bates 89)
Q2
Q4
Q3
Q1
Q5
Q0
50Berry-picking model (Bates 90)
- The query is continually shifting
- Users may move through a variety of sources
- New information may yield new ideas and new
directions - The query is not satisfied by a single, final
retrieved set, but rather by a series of
selections and bits of information found along
the way.
51Implications
- Interfaces should make it easy to store
intermediate results - Interfaces should make it easy to follow trails
with unanticipated results - Makes evaluation more difficult.
52Orienteering (ODay Jeffries 93)
- Interconnected but diverse searches on a single,
problem-based theme - Focus on information delivery rather than search
performance - Classifications resulting from an extended
observational study - 15 clients of professional intermediaries
- financial analyst, venture capitalist, product
marketing engineer, statistician, etc.
53Orienteering (ODay Jeffries 93)
- Identified three main search types
- Monitoring
- Following a plan
- Exploratory
- A series of interconnected but diverse searches
on one problem-based theme - Changes in direction caused by triggers
- Each stage followed by reading, assimilation, and
analysis of resulting material.
54Orienteering (ODay Jeffries 93)
- Defined three main search types
- monitoring
- a well-known topic over time
- e.g., research four competitors every quarter
- following a plan
- a typical approach to the task at hand
- e.g., improve business process X
- exploratory
- explore topic in an undirected fashion
- get to know an unfamiliar industry
55Orienteering (ODay Jeffries 93)
- Trends
- A series of interconnected but diverse searches
on one problem-based theme - This happened in all three search modes
- Each analyst did at least two search types
- Each stage followed by reading, assimilation, and
analysis of resulting material
56Orienteering (ODay Jeffries 93)
- Searches tended to trigger new directions
- Overview, then detail, repeat
- Information need shifted between search requests
- Context of problem and previous searches were
carried to next stage of search - The value was contained in the accumulation of
search results, not the final result set - These observations verified Bates predictions.
57Orienteering (ODay Jeffries 93)
- Triggers motivation to switch from one strategy
to another - next logical step in a plan
- encountering something interesting
- explaining change
- finding missing pieces
58Stop Conditions (ODay Jeffries 93)
- Stopping conditions not as clear as for triggers
- People stopped searching when
- no more compelling triggers
- finished an appropriate amount of searching for
the task - specific inhibiting factor
- e.g., learning market was too small
- lack of increasing returns
- 80/20 rule
- Missing information/inferences ok
- business world different than scholarship
59After the Search Analyzing and Synthesizing
Search Results
- Orienteering Post-Search Behaviors
- Read and Annotate
- Analyze 80 fell into six main types
60Post-Search Analysis Types (ODay Jeffries 93)
- Trends
- Comparisons
- Aggregation and Scaling
- Identifying a Critical Subset
- Assessing
- Interpreting
- The rest
- cross-reference
- summarize
- find evocative visualizations
- miscellaneous
61SenseMaking (Russell et al. 93)
- The process of encoding retrieved information to
answer task-specific questions - Combine
- internal cognitive resources
- external retrieved resources
- Create a good representation
- an iterative process
- contend with a cost/benefit tradoff
62UIs for Supporting the Search Process
63Infogrid (design mockup) (Rao et al. 92)
64InfoGrid/Protofoil (Rao et al. 92)
- A general search interface architecture
- Itemstash store retrieved docs
- Search Event -- current query
- History -- history of queries
- Result Item -- view selected docs metadata
65Infogrid Design Mockups(Rao et al. 92)
66DLITE (Cousins 97)
- Drag and Drop interface
- Reify queries, sources, retrieval results
- Animation to keep track of activity
67DLITE
- UI to a digital library
- Direct manipulation interface to distributed IR
- Workcenter approach
- lots of handy tools
- experts create workcenters
- contents persistent
- concurrently shareable across sites
- Web browser used to display document or
collection metadata
68(No Transcript)
69Interaction
- Pointing at object brings up tooltip -- metadata
- Activating object -- component specific action
- 5 types for result set component
- Drag-and-drop data onto program
- Animation used to show what happens with
drag-and-drop (e.g. waggling)
70User Reacting to DLITE
- Two participant pools
- 7 Stanford CS
- 11 NASA researchers librarians
- Requires learning, initially unfamiliar
- Many requested help pages
- After the model was understood, few errors
- Overall positive attitude, even stronger after a
two week delay - Successfully remembered most features after 2
week lag
71Keeping Track of History
- Techniques
- List of prior queries and results (standard)
- Slide sorter view, snapshots of earlier
interactions - Graphical hierarchy for web browsing
72Keeping Track of History
- PadPrints (Hightower et al. 98)
- Tree-based history of recently visited web-pages
history map placed to left of browser window - Zoomable, can shrink sub-hierarchies
- Node title thumbnail
73PadPrints (Hightower et al. 98)
74PadPrints (Hightower et al. 98)
75 Initial User Study of PadPrints
- 13.4 unable to find recently visited pages
- only 0.1 use History button, 42 use Back
- problems with history list (according to authors)
- incomplete, lose out on every branch
- textual (not necessarily a problem! )
- pull down menu cumbersome -- cannot see history
along with current document
76Second User Study of Padprints
- Changed the task to involve revisiting web pages
- CHI database, National Park Service website
- Only correctly answered questions considered
- 20-30 fewer pages accessed
- faster response time for tasks that involve
revisiting pages - slightly better user satisfaction ratings
77Info Seeking Summary
- The standard model (issue query, get results,
repeat) is not fully adequate - Berry picking/orienteering offer an alternative
to the standard IR model - Interfaces can be devised to support the
interactive process over time - More work needs to be done to support the process
of completing infoseeking tasks
78Interactive Query Modification
79Query Modification
- Problem how to reformulate the query?
- Thesaurus expansion
- Suggest terms similar to query terms
- Relevance feedback
- Suggest terms (and documents) similar to
retrieved documents that have been judged to be
relevant
80Relevance Feedback
- Usually do both
- expand query with new terms
- re-weight terms in query
- There are many variations
- usually positive weights for terms from relevant
docs - sometimes negative weights for terms from
non-relevant docs
81Using Relevance Feedback
- Known to improve results
- in TREC-like conditions (no user involved)
- What about with a user in the loop?
- How might you measure this?
- Lets examine a user study of relevance feedback
by Koenneman Belkin 1996.
82Questions being InvestigatedKoenemann Belkin 96
- How well do users work with statistical ranking
on full text? - Does relevance feedback improve results?
- Is user control over operation of relevance
feedback helpful? - How do different levels of user control effect
results?
83How much of the guts should the user see?
- Opaque (black box)
- (like web search engines)
- Transparent
- (see available terms after the r.f. )
- Penetrable
- (see suggested terms before the r.f.)
- Which do you think worked best?
84(No Transcript)
85Terms available for relevance feedback made
visible(from Koenemann Belkin)
86Details on User StudyKoenemann Belkin 96
- Subjects have a tutorial session to learn the
system - Their goal is to keep modifying the query until
theyve developed one that gets high precision - This is an example of a routing query (as opposed
to ad hoc) - Reweighting
- They did not reweight query terms
- Instead, only term expansion
- pool all terms in rel docs
- take top N terms, where
- n 3 (number-marked-relevant-docs2)
- (the more marked docs, the more terms added to
the query)
87Details on User StudyKoenemann Belkin 96
- 64 novice searchers
- 43 female, 21 male, native English
- TREC test bed
- Wall Street Journal subset
- Two search topics
- Automobile Recalls
- Tobacco Advertising and the Young
- Relevance judgements from TREC and experimenter
- System was INQUERY (vector space with some bells
and whistles)
88Sample TREC query
89Evaluation
- Precision at 30 documents
- Baseline (Trial 1)
- How well does initial search go?
- One topic has more relevant docs than the other
- Experimental condition (Trial 2)
- Subjects get tutorial on relevance feedback
- Modify query in one of four modes
- no r.f., opaque, transparent, penetration
90Precision vs. RF condition (from Koenemann
Belkin 96)
91Effectiveness Results
- Subjects with R.F. did 17-34 better performance
than no R.F. - Subjects with penetration case did 15 better as
a group than those in opaque and transparent
cases.
92Number of iterations in formulating queries (from
Koenemann Belkin 96)
93Behavior Results
- Search times approximately equal
- Precision increased in first few iterations
- Penetration case required fewer iterations to
make a good query than transparent and opaque - R.F. queries much longer
- but fewer terms in penetrable case -- users were
more selective about which terms were added in.
94Relevance Feedback Summary
- Iterative query modification can improve
precision and recall for a standing query - In at least one study, users were able to make
good choices by seeing which terms were suggested
for R.F. and selecting among them - So more like this can be useful!
- But usually requires more than one document,
unlike how web versions work.
95(No Transcript)
96Alternative Notions of Relevance Feedback
97Social and Implicit Relevance Feedback
- Find people whose taste is similar to yours.
Will you like what they like? - Follow a users actions in the background. Can
this be used to predict what the user will want
to see next? - Track what lots of people are doing. Does this
implicitly indicate what they think is good and
not good?
98Collaborative Filtering (social filtering)
- If Pam liked the paper, Ill like the paper
- If you liked Star Wars, youll like Independence
Day - Rating based on ratings of similar people
- Ignores the text, so works on text, sound,
pictures etc. - But Initial users can bias ratings of future
users
99Social Filtering
- Ignores the content, only looks at who judges
things similarly - Works well on data relating to taste
- something that people are good at predicting
about each other too - Does it work for topic?
- GroupLens results suggest otherwise (preliminary)
- Perhaps for quality assessments
- What about for assessing if a document is about a
topic?
100Learning interface agents
- Use machine learning to improve performance
- learn user behavior, preferences
- Useful when
- 1) past behavior is a useful predictor of the
future - 2) wide variety of behaviors amongst users
- Examples
- mail clerk sort incoming messages in right
mailboxes - calendar manager automatically schedule meeting
times?
101Example Systems
- Example Systems
- WebWatcher
- Letizia
- Vary according to
- User states topic or not
- User rates pages or not
102WebWatcher (Freitag et al.)
- A "tour guide" agent for the WWW.
- User tells it what kind of information is wanted
- System tracks web actions
- Highlights hyperlinks that it computes will be of
interest. - Strategy for giving advice is learned from
feedback from earlier tours. - Uses WINNOW as a learning algorithm
103(No Transcript)
104Letizia (Lieberman 95)
user
letizia
heuristics
recommendations
user profile
- Recommends web pages during browsing based on
user profile - Learns user profile using simple heuristics
- Passive observation, recommend on request
- Provides relative ordering of link
interestingness - Assumes recommendations near current page are
more valuable than others
105Letizia (Lieberman 95)
- Infers user preferences from behavior
- Interesting pages
- record in hot list
- save as a file
- follow several links from pages
- returning several times to a document
- Not Interesting
- spend a short time on document
- return to previous document without following
links - passing over a link to document (selecting links
above and below document)
106Consequences of passive observation
- No ability to fine-tune profile or express
interest without visiting appropriate pages - Weak heuristics
- Must click through multiple uninteresting pages
en route to interestingness - Hierarchies tend to get more hits near root
- But page read time does seem to robustly
indicate interest (across many pages and many
users)
107MARS (Riu et al. 97)
Relevance feedback based on image similarity
108Time Series R.F. (Keogh Pazzani 98)
109Social and Implicit Relevance Feedback
- Several different criteria to consider
- Implicit vs. Explicit judgements
- Individual vs. Group judgements
- Standing vs. Dynamic topics
- Similarity of the items being judged vs.
similarity of the judges themselves
110- Classifying R. F. Systems Amazon.com
- Books on related topics
- Books bought by others who bought this
- Community, implicit, standing, judges items,
similar items
111Classifying R.F. Systems
- Standard Relevance Feedback
- Individual, explicit, dynamic, item comparison
- Standard Filtering (NewsWeeder)
- Individual, explicit, standing profile, item
comparison - Standard Routing
- Community (gold standard), explicit, standing
profile, item comparison
112Classifying R.F. Systems
- Letizia and WebWatcher
- Individual, implicit, dynamic, item comparison
- Ringo and GroupLens
- Group, explicit, standing query, judge-based
comparison
113Query Modification Summary
- Relevance feedback is an effective means for
user-directed query modification. - Modification can be done with either direct or
indirect user input - Modification can be done based on an individuals
or a groups past input.
114Information Visualization
115Visualization Success Stories
116Visualization Success Stories
Illustration of John Snows deduction that a
cholera epidemic was caused by a bad water pump,
circa 1854. Horizontal lines indicate location
of deaths.
From Visual Explanations by Edward Tufte,
Graphics Press, 1997
117Visualizing Text Collections
- Some Visualization Principles
- Why Text is Tough
- Visualizing Collection Overviews
- Evaluations involving Users
118Preattentive Processing
- A limited set of visual properties are processed
preattentively - (without need for focusing attention).
- This is important for design of visualizations
- what can be perceived immediately
- what properties are good discriminators
- what can mislead viewers
All Preattentive Processing figures from Healey
97 (on the web)
119Example Color Selection
Viewer can rapidly and accurately
determine whether the target (red circle) is
present or absent. Difference detected in color.
120Example Shape Selection
Viewer can rapidly and accurately
determine whether the target (red circle) is
present or absent. Difference detected in form
(curvature)
121Pre-attentive Processing
- lt 200 - 250ms qualifies as pre-attentive
- eye movements take at least 200ms
- yet certain processing can be done very quickly,
implying low-level processing in parallel
122Example Conjunction of Features
Viewer cannot rapidly and accurately
determine whether the target (red circle) is
present or absent when target has two or more
features, each of which are present in the
distractors. Viewer must search sequentially.
123SUBJECT PUNCHED QUICKLY OXIDIZED TCEJBUS DEHCNUP
YLKCIUQ DEZIDIXO CERTAIN QUICKLY PUNCHED METHODS
NIATREC YLKCIUQ DEHCNUP SDOHTEM SCIENCE ENGLISH
RECORDS COLUMNS ECNEICS HSILGNE SDROCER
SNMULOC GOVERNS PRECISE EXAMPLE MERCURY SNREVOG
ESICERP ELPMAXE YRUCREM CERTAIN QUICKLY PUNCHED
METHODS NIATREC YLKCIUQ DEHCNUP SDOHTEM GOVERNS
PRECISE EXAMPLE MERCURY SNREVOG ESICERP ELPMAXE
YRUCREM SCIENCE ENGLISH RECORDS COLUMNS ECNEICS
HSILGNE SDROCER SNMULOC SUBJECT PUNCHED QUICKLY
OXIDIZED TCEJBUS DEHCNUP YLKCIUQ
DEZIDIXO CERTAIN QUICKLY PUNCHED METHODS NIATREC
YLKCIUQ DEHCNUP SDOHTEM SCIENCE ENGLISH RECORDS
COLUMNS ECNEICS HSILGNE SDROCER SNMULOC
124Accuracy Ranking of Quantitative Perceptual
Tasks(Mackinlay 88 from Cleveland McGill)
Position
More Accurate
Length
Angle
Slope
Area
Volume
Less Accurate
Color
Density
125Why Text is Tough to Visualize
- Text is not pre-attentive
- Text consists of abstract concepts
- Text represents similar concepts in many
different ways - space ship, flying saucer, UFO, figment of
imagination - Text has very high dimensionality
- Tens or hundreds of thousands of features
- Many subsets can be combined together
126Why Text is Tough
The Dog.
127Why Text is Tough
The Dog.
The dog cavorts.
The dog cavorted.
128Why Text is Tough
The man.
The man walks.
129Why Text is Tough
The man walks the cavorting dog.
So far, we can sort of show this in pictures.
130Why Text is Tough
As the man walks the cavorting dog,
thoughts arrive unbidden of the previous spring,
so unlike this one, in which walking was marching
and dogs were baleful sentinals outside unjust
halls.
How do we visualize this?
131Why Text is Tough
- Abstract concepts are difficult to visualize
- Combinations of abstract concepts are even more
difficult to visualize - time
- shades of meaning
- social and psychological concepts
- causal relationships
132Why Text is Tough
- Language only hints at meaning
- Most meaning of text lies within our minds and
common understanding - How much is that doggy in the window?
- how much social system of barter and trade (not
the size of the dog) - doggy implies childlike, plaintive, probably
cannot do the purchasing on their own - in the window implies behind a store window,
not really inside a window, requires notion of
window shopping
133Why Text is Tough
- General categories have no standard ordering
(nominal data) - Categorization of documents by single topics
misses important distinctions - Consider an article about
- NAFTA
- The effects of NAFTA on truck manufacture
- The effects of NAFTA on productivity of truck
manufacture in the neighboring cities of El Paso
and Juarez
134Why Text is Tough
- I saw Pathfinder on Mars with a telescope.
- Pathfinder photographed Mars.
- The Pathfinder photograph mars our perception of
a lifeless planet. - The Pathfinder photograph from Ford has arrived.
- The Pathfinder forded the river without marring
its paint job.
135Why Text is Easy
- Text is highly redundant
- When you have lots of it
- Pretty much any simple technique can pull out
phrases that seem to characterize a document - Instant summary
- Extract the most frequent words from a text
- Remove the most common English words
136Guess the Texts
- 64 president
- 38 jones
- 38 information
- 32 evidence
- 31 lewinsky
- 28 oic
- 28 investigation
- 26 court
- 26 clinton
- 22 office
- 21 discovery
- 20 sexual
- 20 case
- 17 testimony
- 16 judge
- 478 said
- 233 god
- 201 father
- 187 land
- 181 jacob
- 160 son
- 157 joseph
- 134 abraham
- 121 earth
- 119 man
- 118 behold
- 113 years
- 104 wife
- 101 name
- 94 pharaoh
137Text Collection Overviews
- How can we show an overview of the contents of a
text collection? - Show info external to the docs
- e.g., date, author, source, number of inlinks
- does not show what they are about
- Show the meanings or topics in the docs
- a list of titles
- results of clustering words or documents
- organize according to categories (next time)
138Visualizing Collection Clusters
- Scatter/Gather
- show main themes as groups of text summaries
- Scatter Plots
- show docs as points closeness indicates nearness
in cluster space - show main themes of docs as visual clumps or
mountains - Kohonen Feature maps
- show main themes as adjacent polygons
- BEAD
- show main themes as links within a force-directed
placement network
139Text Clustering
- Finds overall similarities among groups of
documents - Finds overall similarities among groups of tokens
- Picks out some themes, ignores others
140Clustering for Collection Overviews
- Two main steps
- cluster the documents according to the words they
have in common - map the cluster representation onto a
(interactive) 2D or 3D representation
141Scatter/GatherCutting, Pedersen, Tukey Karger
92, 93, Hearst Pedersen 95
- First use of text clustering in the interface
- Showing clusters to users had not been done
- Focus on interaction
- Show topical terms and typical titles
- Allow users to change the views
- Did not emphasize visualization
142Scatter/Gather
143S/G Example query on star
- Encyclopedia text
- 14 sports
- 8 symbols 47 film, tv
- 68 film, tv (p) 7 music
- 97 astrophysics
- 67 astronomy(p) 12 steller phenomena
- 10 flora/fauna 49 galaxies, stars
- 29 constellations
- 7 miscelleneous
- Clustering and re-clustering is entirely
automated
144Northern Light used to cluster exclusively. Now
combines categorization with clustering
145Northern Light second level clusters are these
really about NLP?Note that next level
corresponds to URLs
146Scatter Plot of Clusters(Chen et al. 97)
147BEAD (Chalmers 97)
148BEAD (Chalmers 96)
An example layout produced by Bead, seen in
overview, of 831 bibliography entries. The
dimensionality (the number of unique words in
the set) is 6925. A search for cscw or
collaborative shows the pattern of occurrences
coloured dark blue, mostly to the right. The
central rectangle is the visualizers motion
control.
149Example Themescapes(Wise et al. 95)
Themescapes (Wise et al. 95)
150Clustering for Collection Overviews
- Since text has tens of thousands of features
- the mapping to 2D loses a tremendous amount of
information - only very coarse themes are detected
151Galaxy of News Rennison 95
152Galaxy of News Rennison 95
153Kohonen Feature Maps(Lin 92, Chen et al. 97)
(594 docs)
154How Useful is Collection Cluster Visualization
for Search?
- Three studies find negative results
155Study 1
- Kleiboemer, Lazear, and Pedersen. Tailoring a
retrieval system for naive users. In Proc. of
the 5th Annual Symposium on Document Analysis and
Information Retrieval, 1996 - This study compared
- a system with 2D graphical clusters
- a system with 3D graphical clusters
- a system that shows textual clusters
- Novice users
- Only textual clusters were helpful (and they were
difficult to use well)
156Study 2 Kohonen Feature Maps
- H. Chen, A. Houston, R. Sewell, and B. Schatz,
JASIS 49(7) - Comparison Kohonen Map and Yahoo
- Task
- Window shop for interesting home page
- Repeat with other interface
- Results
- Starting with map could repeat in Yahoo (8/11)
- Starting with Yahoo unable to repeat in map (2/14)
157Study 2 (cont.)
- Participants liked
- Correspondence of region size to documents
- Overview (but also wanted zoom)
- Ease of jumping from one topic to another
- Multiple routes to topics
- Use of category and subcategory labels
158Study 2 (cont.)
- Participants wanted
- hierarchical organization
- other ordering of concepts (alphabetical)
- integration of browsing and search
- correspondence of color to meaning
- more meaningful labels
- labels at same level of abstraction
- fit more labels in the given space
- combined keyword and category search
- multiple category assignment (sportsentertain)
159Study 3 NIRVE
- NIRVE Interface by Cugini et al. 96. Each
rectangle is a cluster. Larger clusters closer
to the pole. Similar clusters near one
another. Opening a cluster causes a projection
that shows the titles.
160Study 3
- Visualization of search results a comparative
evaluation of text, 2D, and 3D interfaces
Sebrechts, Cugini, Laskowski, Vasilakis and
Miller, Proceedings of SIGIR 99, Berkeley, CA,
1999. - This study compared
- 3D graphical clusters
- 2D graphical clusters
- textual clusters
- 15 participants, between-subject design
- Tasks
- Locate a particular document
- Locate and mark a particular document
- Locate a previously marked document
- Locate all clusters that discuss some topic
- List more frequently represented topics
161Study 3
- Results (time to locate targets)
- Text clusters fastest
- 2D next
- 3D last
- With practice (6 sessions) 2D neared text
results 3D still slower - Computer experts were just as fast with 3D
- Certain tasks equally fast with 2D text
- Find particular cluster
- Find an already-marked document
- But anything involving text (e.g., find title)
much faster with text. - Spatial location rotated, so users lost context
- Helpful viz features
- Color coding (helped text too)
- Relative vertical locations
162Visualizing Clusters
- Huge 2D maps may be inappropriate focus for
information retrieval - cannot see what the documents are about
- space is difficult to browse for IR purposes
- (tough to visualize abstract concepts)
- Perhaps more suited for pattern discovery and
gist-like overviews
163Co-Citation Analysis
- Has been around since the 50s. (Small, Garfield,
White McCain) - Used to identify core sets of
- authors, journals, articles for particular fields
- Not for general search
- Main Idea
- Find pairs of papers that cite third papers
- Look for commonalitieis
- A nice demonstration by Eugene Garfield at
- http//165.123.33.33/eugene_garfield/papers/mapsci
world.html
164Co-citation analysis (From Garfield 98)
165Co-citation analysis (From Garfield 98)
166Co-citation analysis (From Garfield 98)
167Context
168Types of Context
- Personal situation
- Where you are
- What time it is
- Your general preferences
- Context of other documents
- Context of what you have done so far in the
search process
169Putting Results in Context
- Visualizations of Query Term Distribution
- KWIC, TileBars, SeeSoft
- Table of Contents as Context
- Superbook, Cha-Cha, DynaCat
- Visualizing Shared Subsets of Query Terms
- InfoCrystal, VIBE, Lattice Views
- Dynamic Metadata as Query Previews
170KWIC (Keyword in Context)
- An old standard, ignored by internet search
engines - used in some intranet engines, e.g., Cha-Cha
171Table-of-Contents Views
- Superbook (Remde et al., 87)
- Functions
- Word Lookup
- Show a list query words, stems, and word
combinations - Table of Contents Dynamic fisheye view of the
hierarchical topics list - Search words can be highlighted here too
- Page of Text show selected page and highlighted
search terms - See UI/IR textbook chapter for information on
interesting user study
172Superbook (http//superbook.bellcore.com/SB)
173Egan et al. Study
- Goal compare Superbook with paper book
- Tasks
- structured search find answer to a specific
question using an unfamiliar reference text - open-book essay synthesize material from
different places in the document - incidental learning how much useful information
about the document is acquired while doing other
tasks - subjective ratings user reactions to the form
and content
174Egan et al. Study
- Factors for structured search
- Does the users question correspond to the
authors organization of the material? - Half the study search questions contained cues as
to which topic heading to use, half did not - Does the users query as stated contain some of
the same words as those used by the author? - Half the questions contained words taken from the
text surrounding the target text, half did not
175Egan et al. Study
- Example search questions
- Find the section discussing the basic concept
that the value of any expression, however
complicated, is a data structure. - The dataset murder contains murder rates per
100,000 population. Find the section that says
which staes are included in this dataset. - Find the section that describes pie charts and
states whether or not they are a good means for
analyzing data. - Find the section that describes the first thing
you have to do to get S to print pictoral output. - blue boldface terms taken from text
- pink italics terms taken from topic
heading
176Egan et al. Study
- Hypotheses
- Conventional document would require good cues
from the topic headings, but Superbook would not. - Word lookup function hypothesized to allow
circumvention of authors organization scheme. - Superbooks search facility would result in
open-book essays that include more information.
177Egan et al. Study
- Source text statistics package manual (562 pp.)
- Compare
- superbook vs. paper versions
- Four sets of search questions of mixed type
- 20 university students with stats background
- Superbook training tutorial
- 15 minutes per structured query
- One open-book essay retained
178Egan et al. Study
- Results Superbook had an advantage in
- overall average accuracy (75 vs. 62)
- Superbook did better on questions with words from
text but not in topic headings - Print version did better on questions with no
search hits - speed (5.4 vs. 5.6 min/query on average)
- Superbook faster for text-only cues
- Paper faster for no questions with no hits
- essay creation
- average score of 5.8 vs. 3.6 points out of 7
- average 8.8 facts vs. 6.0 out of 15
179Egan et al. Study
- Results
- Subjective ratings
- Superbook users rated it easier than paper (5.8
vs. 3.1 out of 7) - Superbook users gave higher ratings on the stat
system - Incidental learning
- Superbook users recalled more chapter headings
- maybe because these were continually displayed
- No other differences were significant
- Problems with study
- Did not compare against non-hypertext
computerized version - Did not show if/how hyperlinks affected results
180Cha-Cha (Chen Hearst 98)
- Shows table-of-contents-like view, like
Superbook - Takes advantage of human-created structure within
hyperlinks to create the TOC
181(No Transcript)
182DynaCat (Pratt, Hearst, Fagan 99)
- Decide on important question types in an advance
- What are the adverse effects of drug D?
- What is the prognosis for treatment T?
- Make use of MeSH categories
- Retain only those types of categories known to be
useful for this type of query.
183DynaCat (Pratt, Hearst, Fagan 99)
184DynaCat Study
- Design
- Three queries
- 24 cancer patients
- Compared three interfaces
- ranked list, clusters, categories
- Results
- Participants strongly preferred categories
- Participants found more answers using categories
- Participants took same amount of time with all
three interfaces - Similar results have been verified by another
study by Chen and Dumais (CHI 2000)
185Cat-a-ConeMultiple Simultaneous Categories
- Key Ideas
- Separate documents from category labels
- Show both simultaneously
- Link the two for iterative feedback
- Distinguish between
- Searching for Documents vs.
- Searching for Categories
186 Cat-a-Cone Interface
187search
browse
query terms
Category Hierarchy
Collection
Retrieved Documents
188Proposed Advantages
- Integrate category selection with viewing of
categories - Show all categories context
- Show relationship of retrieved documents to the
category structure - But was not evaluated with user study
189Our new project FLAMENCO
- FLexible Access using MEtadata in Novel
COmbinations - Main idea
- Preview and postview information
- Determined dynamically and (semi) automatically,
based on current task
190The future of search toolsA Prediction of a
Dichotomy
- Information Intensive
- Business analysis
- Scientific research
- Planning design
- Quick lookup
- Question answering
- Context-dependent info (location, time)
191My Predictions of Future Trends in Search
Interfaces
- Specialization
- Single topic search (vortals)
- Task-oriented search
- Personalization
- Question-Answering
- Visualization???
192References
- See the bibliography of Chapter 10 of Modern
Information Retrieval, Ricardo Baeza-Yates
Berthier Ribeiro-Neto (Eds.) This chapter is
called User Interfaces and Visualization, by
Marti Hearst. Available at www.sims.berkeley.edu/
hearst/irbook/chapters/chap10.html