SIMS%20247:%20Information%20Visualization%20and%20Presentation%20Marti%20Hearst

About This Presentation

Title:

SIMS%20247:%20Information%20Visualization%20and%20Presentation%20Marti%20Hearst

Description:

give an overview of a collection. show user what aspects of their interests ... dogs were baleful sentinals outside unjust halls. How do we visualize this? 10 ... – PowerPoint PPT presentation

Number of Views:56

Avg rating:3.0/5.0

Slides: 100

Provided by: coursesIs8

Learn more at: https://courses.ischool.berkeley.edu

Category:

more less

Transcript and Presenter's Notes

Title: SIMS%20247:%20Information%20Visualization%20and%20Presentation%20Marti%20Hearst

1
SIMS 247 Information Visualization and
PresentationMarti Hearst
Nov 2 and Nov 7, 2005
2
Outline

Why Text is Tough
Single-document Visualization
Visualizing Concept Spaces
Clusters
Category Hierarchies
Visualizing Query Specifications
Visualizing Retrieval Results
Usability Study Meta-Analysis

3
Why Visualize Text?

To help with Information Retrieval
give an overview of a collection
show user what aspects of their interests are
present in a collection
help user understand why documents retrieved as a
result of a query
Text Data Mining
Mainly clustering nodes-and-links
Software Engineering
not really text, but has some similar properties

4
Why Text is Tough

Text is not pre-attentive
Text consists of abstract concepts
which are difficult to visualize
Text represents similar concepts in many
different ways
space ship, flying saucer, UFO, figment of
imagination
Text has very high dimensionality
Tens or hundreds of thousands of features
Many subsets can be combined together

5
Why Text is Tough
The Dog.
6
Why Text is Tough
The Dog.
The dog cavorts.
The dog cavorted.
7
Why Text is Tough
The man.
The man walks.
8
Why Text is Tough
The man walks the cavorting dog.
So far, we can sort of show this in pictures.
9
Why Text is Tough
As the man walks the cavorting dog,
thoughts arrive unbidden of the previous spring,
so unlike this one, in which walking was marching
and dogs were baleful sentinals outside unjust
halls.
How do we visualize this?
10
Why Text is Tough

Abstract concepts are difficult to visualize
Combinations of abstract concepts are even more
difficult to visualize
time
shades of meaning
social and psychological concepts
causal relationships

11
Why Text is Tough

Language only hints at meaning
Most meaning of text lies within our minds and
common understanding
How much is that doggy in the window?
how much social system of barter and trade (not
the size of the dog)
doggy implies childlike, plaintive, probably
cannot do the purchasing on their own
in the window implies behind a store window,
not really inside a window, requires notion of
window shopping

12
Why Text is Tough

General categories have no standard ordering
(nominal data)
Categorization of documents by single topics
misses important distinctions
Consider an article about
NAFTA
The effects of NAFTA on truck manufacture
The effects of NAFTA on productivity of truck
manufacture in the neighboring cities of El Paso
and Juarez

13
Why Text is Tough

Other issues about language
ambiguous (many different meanings for the same
words and phrases)
different combinations imply different meanings

14
Why Text is Tough

I saw Pathfinder on Mars with a telescope.
Pathfinder photographed Mars.
The Pathfinder photograph mars our perception of
a lifeless planet.
The Pathfinder photograph from Ford has arrived.
The Pathfinder forded the river without marring
its paint job.

15
Why Text is Easy

Text is highly redundant
When you have lots of it
Pretty much any simple technique can pull out
phrases that seem to characterize a document
Instant summary
Extract the most frequent words from a text
Remove the most common English words

16
Guess the Text

478 said
233 god
201 father
187 land
181 jacob
160 son
157 joseph
134 abraham
121 earth
119 man
118 behold
113 years
104 wife
101 name
94 pharaoh

17
Visualizing Individual Documents

Early approach SuperBook
Showing term occurences TextArc

18
Superbook (http//superbook.bellcore.com/SB)
19
TextArc (www.textarc.org)
20
SeeSoft Showing Text Content using a linear
representation and brushing and linking (Eick
Wills 95)
21
Virtual Shakespeare (Small 96)
22
Text Collection Overviews

How can we show an overview of the contents of a
text collection?
Show info external to the docs
e.g., date, author, source, number of inlinks
does not show what they are about
Show the meanings or topics in the docs
a list of titles
results of clustering words or documents
organize according to categories (next time)

23
The Need to Group

Interviews with lay users often reveal a desire
for better organization of retrieval results
Useful for suggesting where to look next
People prefer links over generating search terms
But only when the links are for what they want
Three main approaches for text and images
Group items according to pre-defined categories
Group items into automatically-created clusters
Group items according to common keywords

Ojakaar and Spool, Users Continue After Category
Links, UIETips Newsletter, http//world.std.com/u
ieweb/Articles/, 2001
24
Categories

Human-created
But often automatically assigned to items
Arranged in hierarchy, network, or facets
Can assign multiple categories to items
Or place items within categories
Usually restricted to a fixed set
So help reduce the space of concepts
Intended to be readily understandable
To those who know the underlying domain
Provide a novice with a conceptual structure
There are many already made up!
However, until recently, their use in interfaces
has been
Under-investigated
Not met their promise

25
Clustering

The art of finding groups in data
Kaufman and Rousseeuw
Groups are formed according to associations and
commonalities among the datas features.
There are dozens of algorithms, more all the time
Most need a way of determining similarity or
difference between a pair of items
In text clustering, documents usually represented
as a vector of weighted features which are some
transformation on the words
Similarity between documents is a weighted
measure of feature overlap

26
Clustering

Potential benefits
Find the main themes in a set of documents
Potentially useful if the user wants a summary of
the main themes in the subcollection
Potentially harmful if the user is interested in
less dominant themes
More flexible than pre-defined categories
There may be important themes that have not been
anticipated
Disambiguate ambiguous terms
ACL
Clustering retrieved documents tends to group
those relevant to a complex query together

Hearst, Pedersen, Revisiting the Cluster
Hypothesis, SIGIR96
27
Scatter/Gather Clustering

Developed at PARC in the late 80s/early 90s
Top-down approach
Start with k seeds (documents) to represent k
clusters
Each document assigned to the cluster with the
most similar seeds
To choose the seeds
Cluster in a bottom-up manner
Hierarchical agglomerative clustering
Start with n documents, compare all by pairwise
similarity, combine the two most similar
documents to make a cluster
Now compare both clusters and individual
documents to find the most similar pair to
combine
Continue until k clusters remain
Use the centroid of each of these as seeds
Centroid average of the weighted vectors
Can recluster a cluster to produce a hierarchy of
clusters

Pedersen, Cutting, Karger, Tukey, Scatter/Gather
A Cluster-based Approach to Browsing Large
Document Collections, SIGIR 1992
28
Scatter/Gather
29
Northern Light Web Search Started out with
clustering. Then integrated with categories.
Then did not do web search and used only
categories.
30
(No Transcript)
31
(No Transcript)
32
Visualizing Clustering Results

Use clustering to map the entire huge
multidimensional document space into a huge
number of small clusters.
User dimension reduction and then project these
onto a 2D/3D graphical representation

33
Clustering Multi-Dimensional Document
Space(image from Wise et al 95)
34
Clustering Multi-Dimensional Document
Space(image from Wise et al 95)
35
Kohonen Feature Maps on Text(from Chen et al.,
JASIS 49(7))
36
Is it useful?

4 Clustering Visualization Usability Studies

37
Clustering for Search Study 1

This study compared
a system with 2D graphical clusters
a system with 3D graphical clusters
a system that shows textual clusters
Novice users
Only textual clusters were helpful (and they were
difficult to use well)

Kleiboemer, Lazear, and Pedersen. Tailoring a
retrieval system for naive users. SDAIR96
38
Clustering Study 2 Kohonen Feature Maps

Comparison Kohonen Map and Yahoo
Task
Window shop for interesting home page
Repeat with other interface
Results
Starting with map could repeat in Yahoo (8/11)
Starting with Yahoo unable to repeat in map (2/14)

Chen, Houston, Sewell, Schatz, Internet Browsing
and Searching User Evaluations of Category Map
and Concept Space Techniques. JASIS 49(7)
582-603 (1998)
39
Kohonen Feature Maps(Lin 92, Chen et al. 97)
40
Study 2 (cont.)

Participants liked
Correspondence of region size to documents
Overview (but also wanted zoom)
Ease of jumping from one topic to another
Multiple routes to topics
Use of category and subcategory labels

Chen, Houston, Sewell, Schatz, Internet Browsing
and Searching User Evaluations of Category Map
and Concept Space Techniques. JASIS 49(7)
582-603 (1998)
41
Study 2 (cont.)

Participants wanted
hierarchical organization
other ordering of concepts (alphabetical)
integration of browsing and search
correspondence of color to meaning
more meaningful labels
labels at same level of abstraction
fit more labels in the given space
combined keyword and category search
multiple category assignment (sportsentertain)
(These can all be addressed with faceted
hierarchical categories)

Chen, Houston, Sewell, Schatz, Internet Browsing
and Searching User Evaluations of Category Map
and Concept Space Techniques. JASIS 49(7)
582-603 (1998)
42
Clustering Study 3 NIRVE

Each rectangle is a cluster. Larger clusters
closer to the pole. Similar clusters near one
another. Opening a cluster causes a projection
that shows the titles.

43
Study 3

This study compared
3D graphical clusters
2D graphical clusters
textual clusters
15 participants, between-subject design
Tasks
Locate a particular document
Locate and mark a particular document
Locate a previously marked document
Locate all clusters that discuss some topic
List more frequently represented topics

Visualization of search results a comparative
evaluation of text, 2D, and 3D interfaces
Sebrechts, Cugini, Laskowski, Vasilakis and
Miller, SIGIR 99.
44
Study 3

Results (time to locate targets)
Text clusters fastest
2D next
3D last
With practice (6 sessions) 2D neared text
results 3D still slower
Computer experts were just as fast with 3D
Certain tasks equally fast with 2D text
Find particular cluster
Find an already-marked document
But anything involving text (e.g., find title)
much faster with text.
Spatial location rotated, so users lost context
Helpful viz features
Color coding (helped text too)
Relative vertical locations

Visualization of search results a comparative
evaluation of text, 2D, and 3D interfaces
Sebrechts, Cugini, Laskowski, Vasilakis and
Miller, SIGIR 99.
45
Clustering Study 4

Compared several factors
Findings
Topic effects dominate (this is a common finding)
Strong difference in results based on spatial
ability
No difference between librarians and other people
No evidence of usefulness for the cluster
visualization

Aspect windows, 3-D visualizations, and indirect
comparisons of information retrieval systems,
Swan, Allan, SIGIR 1998.
46
SummaryVisualizing for Search Using Clusters

Huge 2D maps may be inappropriate focus for
information retrieval
cannot see what the documents are about
space is difficult to browse for IR purposes
(tough to visualize abstract concepts)
Perhaps more suited for pattern discovery and
gist-like overviews

47
Category Combinations

Lets show categories instead of clusters

48
DynaCat (Pratt, Hearst, Fagan 99)
49
DynaCat (Pratt 97)

Decide on important question types in an advance
What are the adverse effects of drug D?
What is the prognosis for treatment T?
Make use of MeSH categories
Retain only those types of categories known to be
useful for this type of query.

50
DynaCat Study

Design
Three queries
24 cancer patients
Compared three interfaces
ranked list, clusters, categories
Results
Participants strongly preferred categories
Participants found more answers using categories
Participants took same amount of time with all
three interfaces

51
MultiTrees (Furnas Zacks 94)
52
Cat-a-ConeMultiple Simultaneous Categories

Key Ideas
Separate documents from category labels
Show both simultaneously
Link the two for iterative feedback
Distinguish between
Searching for Documents vs.
Searching for Categories

53
Cat-a-Cone Interface
54
Cat-a-Cone

Catacomb
(definition 2b, online Websters)
A complex set of interrelated things
Makes use of earlier PARC work on 3Danimation

Rooms Henderson and Card 86 IV Cone Tree
Robertson, Card, Mackinlay 93 Web Book Card,
Robertson, York 96
55
search
browse
query terms
Category Hierarchy
Collection
Retrieved Documents
56
ConeTree for Category Labels

Browse/explore category hierarchy
by search on label names
by growing/shrinking subtrees
by spinning subtrees
Affordances
learn meaning via ancestors, siblings
disambiguate meanings
all cats simultaneously viewable

57
Virtual Book for Result Sets

Categories on Page (Retrieved Document) linked to
Categories in Tree
Flipping through Book Pages causes some Subtrees
to Expand and Contract
Most Subtrees remain unchanged
Book can be Stored for later Re-Use

58
Improvements over Standard Category Interfaces

Integrate category selection with viewing of
categories
Show all categories context
Show relationship of retrieved documents to the
category structure
But do users understand and like the 3D?

59
The FLAMENCO Project

Basic idea similar to Cat-a-Cone
But use familiar HTML interaction to achieve
similar goals
Usability results are very strong for users who
care about the collection.

60
Co-Citation Analysis

Has been around since the 50s. (Small, Garfield,
White McCain)
Used to identify core sets of
authors, journals, articles for particular fields
Not for general search
Main Idea
Find pairs of papers that cite third papers
Look for commonalitieis
A nice demonstration by Eugene Garfield at
http//165.123.33.33/eugene_garfield/papers/mapsci
world.html

61
Co-citation analysis (From Garfield 98)
62
Co-citation analysis (From Garfield 98)
63
Co-citation analysis (From Garfield 98)
64
Query Specification
65
Command-Based Query Specification

command attribute value connector
find pa shneiderman and tw user
What are the attribute names?
What are the command names?
What are allowable values?

66
Form-Based Query Specification (Altavista)
67
Form-Based Query Specification (Melvyl)
68
Form-based Query Specification (Infoseek)
69
Direct Manipulation Spec.VQUERY (Jones 98)
70
Menu-based Query Specification(Young
Shneiderman 93)
71
Context
72
Putting Results in Context

Visualizations of Query Term Distribution
KWIC, TileBars, SeeSoft
Visualizing Shared Subsets of Query Terms
InfoCrystal, VIBE, Lattice Views
Table of Contents as Context
Superbook, Cha-Cha, DynaCat
Organizing Results with Tables
Envision, SenseMaker
Using Hyperlinks
WebCutter

73
Putting Results in Context

Interfaces should
give hints about the roles terms play in the
collection
give hints about what will happen if various
terms are combined
show explicitly why documents are retrieved in
response to the query
summarize compactly the subset of interest

74
KWIC (Keyword in Context)

An old standard, ignored until recently by
internet search engines
used in some intranet engines, e.g., Cha-Cha

75
Highlighting Keywords in Context
76
(No Transcript)
77
Superbook (Remde et al. 89)

Hyper-media software manual
Functions
Word Lookup
Table of Contents Dynamic fisheye view of the
hierarchical topics list
Page of Text show selected page and highlighted
search terms
Hypertext features linking through search words
rather than page links

78
Display of Retrieval Results

Goal minimize time/effort for deciding which
documents to examine in detail
Idea show the roles of the query terms in the
retrieved documents, making use of document
structure

79
TileBars

Graphical Representation of Term Distribution and
Overlap
Simultaneously Indicate
relative document length
query term frequencies
query term distributions
query term overlap

80
(No Transcript)
81
(No Transcript)
82
Exploiting Visual Properties

Variation in gray scale saturation imposes a
universal, perceptual order (Bertin et al. 83)
Varying shades of gray show varying quantities
better than color (Tufte 83)
Differences in shading should align with the
values being presented (Kosslyn et al. 83)

83
Key Aspect Faceted Queries

Conjunct of disjuncts
Each disjunct is a concept
osteoporosis, bone loss
prevention, cure
research, Mayo clinic, study
User does not have to specify which are main
topics, which are subtopics
Ranking algorithm gives higher weight to overlap
of topics
This kind of query works better at high-precision
queries than similarity search (Hearst 95)

84
TileBars Summary

Preliminary User Studies
users understand them
find them helpful in some situations, but
probably slower than just reading titles
sometimes terms need to be disambiguated

85
More Recent Attempts

Analyzing retrieval results
KartOO http//www.kartoo.com/
Grokker http//www.groxis.com/service/grok

86
(No Transcript)
87
(No Transcript)
88
(No Transcript)
89
(No Transcript)
90
Query Term Subsets

Show which subsets of query terms occur in
which subsets of documents occurs in which
subsets of retrieved documents

91
Term Occurrences in Results Sets

Show how often each query term occurs in
retrieved documents
VIBE (Korfhage 91)
InfoCrystal (Spoerri 94)
Problems
cant see overlap of terms within docs
quantities not represented graphically
more than 4 terms hard to handle
no help in selecting terms to begin with

92
InfoCrystal (Spoerri 94)
93
VIBE (Olson et al. 93, Korfhage 93)
94
Term Occurrences in Results Sets

Problems
cant see overlap of terms within docs
quantities not represented graphically
more than 4 terms hard to handle
no help in selecting terms to begin with

95
DLITE (Cousins 97)

Supporting the Information Seeking Process
UI to a digital library
Direct manipulation interface
Workcenter approach
experts create workcenters
lots of tools for one task
contents persistent

96
DLITE (Cousins 97)

Drag and Drop interface
Reify queries, sources, retrieval results
Animation to keep track of activity

97
IR Infovis Meta-Analysis (Chen Yu 00)

Goal
Find invariant underlying relations suggested
collectively by empirical findings from many
different studies
Procedure
Examine the literature of empirical infoviz
studies
35 studies between 1991 and 2000
27 focused on information retrieval tasks
But due to wide differences in the conduct of the
studies and the reporting of statistics, could
use only 6 studies

98
IR Infovis Meta-Analysis (Chen Yu 00)

Conclusions
IR Infoviz studies not reported in a standard
format
Individual cognitive differences had the largest
effect
Especially on accuracy
Somewhat on efficiency
Holding cognitive abilities constant, users did
better with simpler visual-spatial interfaces
The combined effect of visualization is not
statistically significant
Misc
Tilebars and Scatter/Gather are well-known enough
to not require citations!!