Semi-Automated Creation of Facet Hierarchies - PowerPoint PPT Presentation

About This Presentation
Title:

Semi-Automated Creation of Facet Hierarchies

Description:

Fruit Berries Strawberries. Preparation Freeze. Marti Hearst, Taxonomy Bootcamp 06 ... 32 Art History Students ~35,000 images from SF Fine Arts Museum ... – PowerPoint PPT presentation

Number of Views:26
Avg rating:3.0/5.0
Slides: 77
Provided by: Pres3
Category:

less

Transcript and Presenter's Notes

Title: Semi-Automated Creation of Facet Hierarchies


1
Semi-Automated Creation ofFacet Hierarchies
  • Marti Hearst
  • School of Information, UC Berkeley
  • Joint work with Dr. Emilia Stoica

2
Outline
  • Faceted Metadata
  • Definition
  • Advantages
  • Flamenco
  • Search Interface Design using Faceted Metadata
  • Castanet
  • (Semi) Automated Tool for Creation of Category
    Systems
  • Comparison to State-of-the-Art Alternatives
  • Conclusions

3
Focus Search and Navigation of Large Collections
Shopping Sites
Digital Libraries
E-Government Sites
Image Collections
4
Problems with Site Search
  • Study by Vividence in 2001 on 69 Sites
  • 70 eCommerce
  • 31 Service
  • 21 Content
  • 2 Community
  • Poorly organized search results
  • Frustration and wasted time
  • Poor information architecture
  • Confusion
  • Dead ends
  • "back and forthing"
  • Forced to search

5
What we want to Achieve
  • Integrate browsing and searching seamlessly
  • Support exploration and learning
  • Avoid dead-ends, pogoing, and lostness

6
Main Idea
  • Use hierarchical faceted metadata
  • Design the interface to
  • Allow flexible navigation
  • Provide previews of next steps
  • Organize results in a meaningful way
  • Support both expanding and refining the search

7
The Problem With Hierarchy
  • Most things can be classified in more than one
    way.
  • Most organizational systems do not handle this
    well.
  • Example Animal Classification

Skin Covering
otter penguin robin salmon wolf cobra bat
Locomotion
Diet
8
The Problem with Hierarchy
  • Inflexible
  • Force the user to start with a particular
    category
  • What if I dont know the animals diet, but the
    interface makes me start with that category?
  • Wasteful
  • Have to repeat combinations of categories
  • Makes for extra clicking and extra coding
  • Difficult to modify
  • To add a new category type, must duplicate it
    everywhere or change things everywhere

9
The Problem With Hierarchy
start
swim
fly
run
slither
fur
scales
feathers
fur
scales
feathers
fur
scales
feathers

fish
fish
fish
fish
fish
fish
fish
fish
fish
rodents
rodents
rodents
rodents
rodents
rodents
rodents
rodents
rodents
insects
insects
insects
insects
insects
insects
insects
insects
insects
salmon
bat
robin
wolf
10
The Idea of Facets
  • Facets are a way of labeling data
  • A kind of Metadata (data about data)
  • Can be thought of as properties of items
  • Facets vs. Categories
  • Items are placed INTO a category system
  • Multiple facet labels are ASSIGNED TO items

11
The Idea of Facets
  • Create INDEPENDENT categories (facets)
  • Each facet has labels (sometimes arranged in a
    hierarchy)
  • Assign labels from the facets to every item
  • Example recipe collection

Ingredient
Cooking Method
Chicken
Stir-fry
Bell Pepper
Curry
Course
Cuisine
Main Course
Thai
12
The Idea of Facets
  • Break out all the important concepts into their
    own facets
  • Sometimes the facets are hierarchical
  • Assign labels to items from any level of the
    hierarchy

Preparation Method Fry Saute Boil
Bake Broil Freeze
Desserts Cakes Cookies Dairy
Ice Cream Sorbet Flan
Fruits Cherries Berries Blueberries
Strawberries Bananas Pineapple
13
Using Facets
  • Now there are multiple ways to get to each item

Preparation Method Fry Saute Boil
Bake Broil Freeze
Desserts Cakes Cookies Dairy
Ice Cream Sherbet Flan
Fruits Cherries Berries Blueberries
Strawberries Bananas Pineapple
Fruit gt Pineapple Dessert gt Cake Preparation gt
Bake
Dessert gt Dairy gt Sherbet Fruit gt Berries gt
Strawberries Preparation gt Freeze
14
ExampleNobel Prize Winners Collection(Before
and After Facets)
15
Only One Way to View Laureates
16
First, Choose Prize Type
17
Next, view the list!
The user must first choose an Award type
(literature), then browse through the laureates
in chronological order. No choice is given to,
say organize by year and then award, or
by country, then decade, then award, etc.
18
Flamenco InterfaceUsing Hierarchical Faceted
Metadata
19
Opening ViewSelect literature from PRIZE facet
20
Group results by YEAR facet
21
Select 1920s from YEAR facet
22
Current query is PRIZE gt literature ANDYEAR
1920s. Now remove PRIZE gt literature
23
Now Group By YEAR gt 1920s
24
Hierarchy TraversalGroup By YEAR gt 1920s, and
drill down to 1921
25
Select an individual item
26
Use Endgame to expand out
27
Use Endgame to expand out
28
Or use More like this to find similar items
29
Start a new search using keyword California
30
Note that category structure remains after the
keyword search
31
The query is now a keyword ANDed with a facet
subhierarchy
32
Using Facets
  • The system only shows the labels that correspond
    to the current set of items
  • Start with all items and all facets
  • The user then selects a label within a facet
  • This reduces the set of items (only those that
    have been assigned to the subcategory label are
    displayed)
  • This also eliminates some subcategories from the
    view.

33
Advantages of Facets
  • Cant end up with empty results sets
  • (except with keyword search)
  • Helps avoid feelings of being lost.
  • Easier to explore the collection.
  • Helps users infer what kinds of things are in the
    collection.
  • Evokes a feeling of browsing the shelves
  • Is preferred over standard search for collection
    browsing in usability studies.
  • (Interface must be designed properly)

34
Advantages of Facets
  • Seamless to add new facets and subcategories
  • Seamless to add new items.
  • Helps with categorization wars
  • Dont have to agree exactly where to place
    something
  • Interaction can be implemented using a standard
    relational database.
  • May be easier for automatic categorization

35
Information previews
  • Use the metadata to show where to go next
  • More flexible than canned hyperlinks
  • Less complex than full search
  • Help users see and return to previous steps
  • Reduces mental work
  • Recognition over recall
  • Suggests alternatives
  • More clicks are ok only if (J. Spool)
  • The scent of the target does not weaken
  • If users feel they are going towards, rather than
    away, from their target.

36
Facets vs. Hierarchy
  • Early Flamenco studies compared allowing multiple
    hierarchical facets vs. just one facet.
  • Multiple facets was preferred and more successful.

37
Limitation of Facets
  • Do not naturally capture MAIN THEMES
  • Facets do not show RELATIONS explicitly

Aquamarine Red Orange
Door Doorway Wall
  • Which color associated with which object?

Photo by J. Hearst, jhearst.typepad.com
38
Terminology Clarification
  • Facets vs. Attributes
  • Facets are shown independently in the interface
  • Attributes just associated with individual items
  • E.g., ID number, Source, Affiliation
  • However, can always convert an attribute to a
    facet
  • Facets vs. Labels
  • Labels are the names used within facets
  • These are organized into subhierarchies
  • Synonyms
  • There should be alternate names for the category
    labels
  • Currently (in Flamenco) this is done with
    subcategories
  • E.g., Deer has subcategories stag, fawn,
    doe

39
Usability Study Results
40
Flamenco Usability Studies
  • Usability studies done on 3 collections
  • Recipes (epicurious) 13,000 items
  • Architecture Images 40,000 items
  • Fine Arts Images 35,000 items
  • Conclusions
  • Users like and are successful with the dynamic
    faceted hierarchical metadata, especially for
    browsing tasks
  • Very positive results, in contrast with studies
    on earlier iterations.

41
Most Recent Usability Study
  • Participants Collection
  • 32 Art History Students
  • 35,000 images from SF Fine Arts Museum
  • Study Design
  • Within-subjects
  • Each participant sees both interfaces
  • Balanced in terms of order and tasks
  • Participants assess each interface after use
  • Afterwards they compare them directly
  • Data recorded in behavior logs, server logs,
    paper-surveys one or two experienced testers at
    each trial.
  • Used 9 point Likert scales.
  • Session took about 1.5 hours pay was 15/hour

42
Post-Interface Assessments
All significant at plt.05 except simple and
overwhelming
43
Post-Test Comparison
Which Interface Preferable For
Faceted
Baseline
Find images of roses Find all works from a given
period Find pictures by 2 artists in same media
Overall Assessment
More useful for your tasks Easiest to use Most
flexible More likely to result in dead
ends Helped you learn more Overall preference
44
How to Create Facet Hierarchies?
  • Our Approach Castanet

45
Example Recipes (3500 docs)
46
Castanet Output (shown in Flamenco)
47
Castanet Output (shown in Flamenco)
48
Castanet Output (shown in Flamenco)
49
Castanet Output (shown in Flamenco)
50
Castanet Output (shown in Flamenco)
51
Our ApproachLeverage the structure of WordNet
52
Our Approach
  • Leverage the structure of WordNet

Documents
53
1. Select Terms
Build tree
Comp. tree
  • Select well distributed
  • terms from collection

Documents
Select terms
Get hypernym paths
WordNet
54
2. Get Hypernym Path
red
blue
55
3. Build Tree
Build tree
Comp. tree
Documents
Select terms
Get hypernym paths
WordNet
red
blue
56
4. Compress Tree
Build tree
Comp. tree
Documents
Select terms
Get hypernym paths
WordNet
color
chromatic color
red, redness
blue, blueness
green, greenness
red
blue
green
57
4. Compress Tree (cont.)
Build tree
Comp. tree
Documents
Select terms
Get hypernym paths
WordNet
color
color
chromatic color
red
blue
green
red
blue
green
58
5. Divide into Facets
Divide into facets
59
Disambiguation
  • Ambiguity in
  • Word senses
  • Paths up the hypernym tree

60
How to Select the Right Senses and Paths?
  • First build core tree
  • (1) Create paths for words with only one sense
  • (2) Use Domains
  • Wordnet has 212 Domains
  • medicine, mathematics, biology, chemistry,
    linguistics, soccer, etc.
  • Automatically scan the collection to see which
    domains apply
  • The user selects which of the suggested domains
    to use or may add own
  • Paths for terms that match the selected domains
    are added to the core tree
  • Then add remaining terms to the core tree.

61
Using Domains
dip glosses Sense 1 A depression in an
otherwise level surface Sense 2 The angle that a
magnet needle makes with horizon Sense 3 Tasty
mixture into which bite-size foods are dipped
dip hypernyms Sense 1
Sense 2 Sense 3
solid
shape, form food gt concave
shape gt space
gt ingredient, fixings gt
depression gt angle
gt flavorer
Given domain food, choose
sense 3
62
Castanet Evaluation
63
Castanet Evaluation
  • This is a tool for information architects, so
    people of this type did the evaluation
  • We compared output on
  • Recipes
  • Biomedical journal titles
  • We compared to two state-of-the-art algorithms
  • LDA (Blei et al. 04)
  • Subsumption (Sanderson Croft 99)

64
Subsumption Output (shown in Flamenco)
65
Subsumption Output (shown in Flamenco)
66
Subsumption Output (shown in Flamenco)
67
Subsumption Output (shown in Flamenco)
68
LDA Output (shown in Flamenco)
69
LDA Output (shown in Flamenco)
70
LDA Output (shown in Flamenco)
71
Evaluation Method
  • Information architects assessed the category
    systems
  • For each of 2 systems output
  • Examined and commented on top-level
  • Examined and commented on two sub-levels
  • Then comment on overall properties
  • Meaningful?
  • Systematic?
  • Likely to use in your work?

72
Evaluation Results
  • Results on recipes collection for Would you use
    this system in your work?
  • Yes in some cases or yes definitely
  • Pine (Castanet) 29/34
  • Oak (LDA) 0/18
  • Birch (Subsumption) 6/16
  • Results on quality of categories

73
Opportunities for Tagging
  • New opportunity Tagging, folksonomies
  • (flickr de.lici.ous)
  • People are created facets in a decentralized
    manner
  • They are assigning multiple facets to items
  • This is done on a massive scale
  • This leads naturally to meaningful associations

74
Conclusions
  • Flexible application of hierarchical faceted
    metadata is a proven approach for navigating
    large information collections.
  • Midway in complexity between simple hierarchies
    and deep knowledge representation.
  • Currently in use on e-commerce sites spreading
    to other domains
  • Systems are needed to help create faceted
    metadata structures
  • Our WordNet-based algorithm, while not perfect,
    seems like it will be a useful tool for
    Information Architects.

75
Acknowledgements
  • Flamenco Team
  • Brycen Chun, Ame Elliott, Jennifer English, Kevin
    Li, Rashmi Sinha, Emilia Stoica, Kirsten
    Swearingen, Ka-Ping Yee
  • Castanet
  • Emilia Stoica
  • Funding
  • This work supported in part by NSF (IIS-9984741)

76
For more informationflamenco.berkeley.edu
  • Thank you!
  • Marti Hearst Emilia Stoica
Write a Comment
User Comments (0)
About PowerShow.com