Title: Faceted Metadata in Search Interfaces
1Faceted Metadata in Search Interfaces
Marti HearstUC Berkeley School of Information
This Research Supported by NSF IIS-9984741.
2Focus Search and Navigation of Large Collections
Shopping Sites
Digital Libraries
E-Government Sites
Image Collections
Example the University of California Library
Catalog
3(No Transcript)
4(No Transcript)
5(No Transcript)
6What do we want done differently?
- Organization of results
- Hints of where to go next
- Flexible ways to move around
- How to structure the information?
7The Problem with Hierarchy
8The Problem With Hierarchy
9The Problem with Hierarchy
10The Problem With Hierarchy
- Where is Berkeley?
- College and University gt Colleges and
Universities gtUnited States gt U gt University of
California gt Campuses gt Berkeley - U.S. States gt California gt Cities gtBerkeley gt
Education gt College and University gt Public gt UC
Berkeley
11Outline
- Motivation support for browsing big collections
- Focus on usability for a wide range of lay users
- Approach flexible application of hierarchical
faceted metadata - Advantages of the approach
- Results of usability studies
- Opportunities for AI
- Creating faceted category hierarchies
- Assigning items to categories
- Combine categories to identify tasks
- A way to focus for personalization research
12Why Care? These folks do
- NYTimes archive
- eBay
- California Digital Library
- US Census
13How to Structure Information for Search and
Browsing?
- Hierarchy is too rigid
- KL-One is too complex
- Hierarchical faceted metadata
- A useful middle ground
14What are facets?
- Sets of categories, each of which describe a
different aspect of the objects in the
collection. - Each of these can be hierarchical.
- (Not necessarily mutually exclusive nor
exhaustive, but often that is a goal.)
15Facet example Recipes
16Example of Faceted MetadataCategories for
Biomedical Journal Articles
- 1. Anatomy A
- 2. Organisms B
- 3. Diseases C
- 4. Chemicals and Drugs D
-
1. Lung 2. Mouse 3. Cancer 4.
Tamoxifen
17Goal assign labels from facets
18Motivation
- Description 19th c. paint horse saddle and
hackamore spurs bandana on rider old time
cowboy hat underchin thong flying off.
19Motivation
- Description 19th c. paint horse saddle and
hackamore spurs bandana on rider old time
cowboy hat underchin thong flying off.
By using facets, what we are not capturing? The
hat flew off The bandana stayed on. The thong
is part of the hat. The bandana is on the
cowboy (not the horse). The saddle is on the
horse (not the cowboy).
20Hierarchical Faceted Metadata
- A simplification of knowledge representation
- Does not represent relationships directly
- BUT can be understood well by many people when
browsing rich collections of information.
21How to Put In an Interface?Some Challenges
- Users dont like new search interfaces.
- How to show lots of information without
overwhelming or confusing?
22A Solution (The Flamenco Project)
- Use proper HCI methods.
- Organize search results according to the faceted
metadata so navigation looks similar throughout - Easy to see what to go next, were youve been
- Avoids empty result sets
- Integrates seamlessly with keyword search
23The Flamenco Project
- Incorporating Faceted Hierarchical Metadata into
Interfaces for Large Collections - Key Goals
- Support integrated browsing and keyword search
- Provide an experience of browsing the shelves
- Add power and flexibility without introducing
confusion or a feeling of clutter - Allow users to take the path most natural to them
- Method
- User-centered design, including needs assessment
and many iterations of design and testing
24Art History Images Collection
25Questions we are trying to answer
- How many facets are allowable?
- Should facets be mixed and matched?
- How much is too much?
- Should hierarchies be progressively revealed,
tabbed, some combination? - How should free-text search be integrated?
26(No Transcript)
27(No Transcript)
28(No Transcript)
29(No Transcript)
30(No Transcript)
31(No Transcript)
32(No Transcript)
33(No Transcript)
34(No Transcript)
35(No Transcript)
36(No Transcript)
37(No Transcript)
38(No Transcript)
39(No Transcript)
40(No Transcript)
41Information previews
- Use the metadata to show where to go next
- More flexible than canned hyperlinks
- Less complex than full search
- Help users see and return to previous steps
- Reduces mental work
- Recognition over recall
- Suggests alternatives
- More clicks are ok iff (J. Spool)
- The scent of the target does not weaken
- If users feel they are going towards, rather than
away, from their target.
42What is Tricky About This?
- It is easy to do it poorly
- It is hard to be not overwhelming
- Most users prefer simplicity unless complexity
really makes a difference - Small details matter
- It is hard to make it flow
43eBay Products
44(No Transcript)
45(No Transcript)
46Search Usability Design Goals
- Strive for Consistency
- Provide Shortcuts
- Offer Informative Feedback
- Design for Closure
- Provide Simple Error Handling
- Permit Easy Reversal of Actions
- Support User Control
- Reduce Short-term Memory Load
From Shneiderman, Byrd, Croft, Clarifying
Search, DLIB Magazine, Jan 1997. www.dlib.org
47Usability Studies
- Usability studies done on 3 collections
- Recipes 13,000 items
- Architecture Images 40,000 items
- Fine Arts Images 35,000 items
- Conclusions
- Users like and are successful with the dynamic
faceted hierarchical metadata, especially for
browsing tasks - Very positive results, in contrast with studies
on earlier iterations.
48Post-Test Comparison
Which Interface Preferable For
Faceted
Baseline
Find images of roses Find all works from a given
period Find pictures by 2 artists in same media
Overall Assessment
More useful for your tasks Easiest to use Most
flexible More likely to result in dead
ends Helped you learn more Overall preference
49Advantages of the Approach
- Honors many of the most important usability
design goals - User control
- Provides context for results
- Reduces short term memory load
- Allows easy reversal of actions
- Provides consistent view
- Allows different people to add content without
breaking things - Can make use of standard technology
50Advantages of the Approach
- Systematically integrates search results
- reflect the structure of the info architecture
- retain the context of previous interactions
- Gives users control and flexibility
- Over order of metadata use
- Over when to navigate vs. when to search
- Allows integration with advanced methods
- Collaborative filtering, predicting users
preferences
51Disadvantages
- Does not model relations explicitly
- Does it scale to millions of items?
- Adaptively determine which facets to show for
different combinations of items - Requires faceted metadata!
52Opportunities for AI
- Creating hierarchical faceted categories
- Assigning items to those categories
- Adaptively adding new facets as data changes
- A new approach to personalization
- User-tailored facet combinations
- Create task-based search interfaces
- Equate a task with a sequence of facet types
53Creating Classifications from Data
- Most approaches are associational
- AKA clustering, LSA, LDA, etc.
- This leads to poor results when applied to text
- To derive facets, need a different angle
- We have a simple approach based on WordNet
54Clustering (The Hope)
55Clustering (The Hope)
56Clustering (The Reality)
57Clustering (The Reality)
58Example Recipes (3500 docs)
59Blei, Ng, Jordan 03 (Latent Dirichlet
Allocation)
60Blei, Ng, Jordan 03 (Latent Dirichlet
Allocation)
61Sanderson Croft 99Term Subsumption
62Sanderson Croft 99Term Subsumption
63Stoica Hearst 04WordNet-based
64Stoica Hearst 04WordNet-based
65Stoica Hearst 04WordNet-based
66Stoica Hearst 04WordNet-based
67Example AP Newswire
P-2 ABSTRACT The Bechtel Group Inc.
offered in 1985 to sell oil to Israel at a
discount of at least 650 million for 10 years if
it promised not to bomb a proposed Iraqi
pipeline, a Foreign Ministry official said
Wednesday. But then-Prime Minister Shimon Peres
said the offer from Bruce Rappaport, a partner in
the San Francisco-based construction and
engineering company, was unimportant,'' the
senior official told The Associated Press. Peres,
now foreign minister, never discussed the offer
with other government ministers, said the
official, who spoke on condition of anonymity.
The comments marked the first time Israel has
acknowledged any offer was made for assurances
not to bomb the planned 1 billion pipeline,
which was to have run near Israel's border
68Blei, Ng, Jordan 03 (Latent Dirichlet
Allocation)
69Stoica Hearst 04WordNet-based
70Stoica Hearst 04WordNet-based
71Stoica Hearst 04WordNet-based
72(No Transcript)
73Stoica Hearst 04WordNet-based
74Stoica Hearst 04WordNet-based
75Associational techniques
- Pros
- Sometimes terms grouped to get a general concept
- Airline, airplane, pilots, flight
- Cons
- Highly unpredictable
- Not comprehensive
- Dollar and yen but no deutchmarks
- Eastern but no other directions
- Not uniform in subject matter
- Mixing currencies with countries with timing
- Mixing compass directions with airlines
76Lexical Hierarchy-based
- Pros
- Faceted and hierarchical
- Consistent is-a hierarchies
- Comprehensiveness more likely
- Cons
- Doesnt provide overall themes
- Airlines, pilots, airplanes
- Sometimes uses wrong word sense
- Sometimes the right term/hierarchy is not present
- Doesnt have dish type nor cuisine for
recipes - Specialized domains wont work
77Our Approach
- Leverage the structure of WordNet
Documents
78Our Approach
- Leverage the structure of WordNet
791. Select Terms
Build tree
Comp. tree
- Select well distributed
- terms from collection
Documents
Select terms
Get hypernym paths
WordNet
802. Get Hypernym Path
red
blue
813. Build Tree
Build tree
Comp. tree
Documents
Select terms
Get hypernym paths
WordNet
red
blue
824. Compress Tree
Build tree
Comp. tree
Documents
Select terms
Get hypernym paths
WordNet
color
chromatic color
red, redness
blue, blueness
green, greenness
red
blue
green
834. Compress Tree (cont.)
Build tree
Comp. tree
Documents
Select terms
Get hypernym paths
WordNet
color
color
chromatic color
red
blue
green
red
blue
green
84Disambiguation
- Ambiguity in
- Word senses
- Paths up the hypernym tree
85How to Select the Right Senses and Paths?
- First build core tree
- (1) Create paths for words with only one sense
- (2) Use Domains
- Wordnet has 212 Domains
- medicine, mathematics, biology, chemistry,
linguistics, soccer, etc. - Automatically scan the collection to see which
domains apply - The user selects which of the suggested domains
to use or may add own - Paths for terms that match the selected domains
are added to the core tree - Then add remaining terms to the core tree.
86Using Domains
dip glosses Sense 1 A depression in an
otherwise level surface Sense 2 The angle that a
magnet needle makes with horizon Sense 3 Tasty
mixture into which bite-size foods are dipped
dip hypernyms Sense 1
Sense 2 Sense 3
solid
shape, form food gt concave
shape gt space
gt ingredient, fixings gt
depression gt angle
gt flavorer
Given domain food, choose
sense 3
87Opportunities for AI
- New opportunity Tagging, folksonomies
- (flickr de.lici.ous)
- People are created facets in a decentralized
manner - They are assigning multiple facets to items
- This is done on a massive scale
- This leads naturally to meaningful associations
88(No Transcript)
89http//www.airtightinteractive.com/projects/relate
d_tag_browser/app/
90(No Transcript)
91(No Transcript)
92(No Transcript)
93(No Transcript)
94This Doesnt Solve Everything
- Harder to determine whats related to more
complex terms - Still not good for finding a recipe using potatoes
95(No Transcript)
96(No Transcript)
97(No Transcript)
98Linking Metadata Into Tasks
- Old Yahoo restaurant guide combined
- Region
- Topic (restaurants)
- Related Information
- Other attributes (cuisines)
- Other topics related in place and time (movies)
99Yellow geographic region
Green restaurants attributes
Red related in place time
100Other Possible Combinations
- Region AE
- City Restaurant Movies
- City Weather
- City Education Schools
- Restaurants Schools
101Creating Tasks from HFM
- Recipes Example
- Click Ingredient gt Avocado
- Click Dish gt Salad
- Implies task of I want to make a Dish type d
with an Ingredient i that I have lying around - Maybe users will prefer to select tasks like
these over navigating through the metadata.
102Summary
- Flexible application of hierarchical faceted
metadata is a proven approach for navigating
large information collections. - Midway in complexity between simple hierarchies
and deep knowledge representation. - Perhaps HFM is a good stepping stone to deeper
semantic relations - Currently in use on e-commerce sites spreading
to other domains
103AI Opportunities
- Creating hierarchical faceted categories
- Assigning items to those categories
- Adaptively adding new facets as data changes
- A new approach to personalization
- User-tailored facet combinations
- Create task-based search interfaces
- Equate a task with a sequence of facet types
- Make use of folksonomies data!
104Acknowledgements
- Flamenco team
- Brycen Chun
- Ame Elliott
- Jennifer English
- Kevin Li
- Rashmi Sinha
- Emilia Stoica
- Kirsten Swearingen
- Ping Yee
- Thanks also to NSF (IIS-9984741)
105Thank you!
Marti HearstUC Berkeley School of Information
This Research Supported by NSF IIS-9984741.