Title: Nearly-Automated Metadata Hierarchy Creation
1Nearly-Automated Metadata Hierarchy
Creation
- Emilia Stoica and Marti HearstSIMSUniversity of
California, Berkeley
2Motivation
- Want to assign items labels from multiple
hierarchies
3Motivation
- Description 19th c. paint horse saddle and
hackamore spurs bandana on rider old time
cowboy hat underchin thong flying off.
4Use in Browsing Interfaces like Flamenco
5Use in Browsing Interfaces like Flamenco
6How to Obtain the Hierarchies?
- Goal
- Help an information architect get started
- Currently they do it all by hand!
- Assume they will do some editing
- Nearly automated
- Multiple hierarchies (facets)
- Automatically assign items to multiple hierarchies
7Related Work
- Automated text categorization
- LOTS of work on this
- Assumes that a set of categories is already
created - To be intuitive, a categorization should contain
sets of IS-A relations (hierarchical) - Rosenfeld and Morville, (2002)
- Pratt, Hearst, and Fagan (1999)
- Current automated approaches contain only
associative relations
8Examples ofAssociative Relations
- Hofmann 1999
- Collection Machine learning abstracts
- Top-level categories
- learn, paper, base, model, new
train - Problem
- These are not intuitive categories for machine
learning - Sanderson and Croft 1999
- Collection Medical texts
- Top level categories
- disease, post polio, serious disease, dengue,
infection control, immunology, - Problem
- These are at different levels of generality
9Examples ofAssociative Relations
- Schuetze 1993
- Collection Arts descriptions
- Sample Groupings
- carriage cart horse ride walk passing horseback
wagon men chicken rider - bald balding head facing hand faced arm hat
haired glove long - Problem
- Terms are associated with one another, but are
not organized into hierarchies that can be
navigated.
10Our Approach
- Leverage the structure of WordNet
Documents
111. Select Terms
- Select well distributed
- terms from collection
Build tree
Comp. tree
Documents
Select terms
Get hypernym paths
WordNet
122. Get Hypernym Path
- Get hypernym path for each term
red
blue
133. Build Tree
- Merge hypernym paths to build a tree
Build tree
Comp. tree
Documents
Select terms
Get hypernym paths
WordNet
red
blue
144. Compress Tree
- Eliminate a parent with fewer than n children
unless it is the root or its distribution is
larger than 0.1maxdist
Build tree
Comp. tree
Documents
Select terms
Get hypernym paths
WordNet
color
chromatic color
red, redness
blue, blueness
green, greenness
red
blue
green
154. Compress Tree (cont.)
- Eliminate a child whose name appears within
parents
Build tree
Comp. tree
Documents
Select terms
Get hypernym paths
WordNet
color
color
chromatic color
red
blue
green
red
blue
green
165. Remove top Levels
- Top levels of WordNet are too general, e.g.
- Entity
- Substance, matter
- Abstraction
17Disambiguation
- Ambiguity in
- Word senses
- Paths up the hypernym tree
18How to Select the Right Senses and Paths?
- (This part is not in the paper.)
- Solution Modify the algorithm
- First build core tree
- (1) Create paths for words with only one sense
- (2) Use Domains
- Wordnet has 212 Domains
- medicine, mathematics, biology, chemistry,
linguistics, soccer, etc. - Automatically scan the collection to see which
domains apply - The user selects which of the suggested domains
to use or he may add his own - Paths for terms that match the selected domains
are added to the core tree - Then add remaining terms to the core tree.
19Using Domains
dip glosses Sense 1 A depression in an
otherwise level surface Sense 2 The angle that a
magnet needle makes with horizon Sense 3 Tasty
mixture into which bite-size foods are dipped
dip hypernyms Sense 1
Sense 2 Sense 3
solid
shape, form food gt concave
shape gt space
gt ingredient, fixings gt
depression gt angle
gt flavorer
Given domain food, choose
sense 3
20Enrich Core Tree
- For each new term t
- Q(t) ? 0 // set of candidate
paths - for each path p of t
- compute the fraction fp(t) of nodes in p that are
shared with a path in the core tree - if (fp(t) gt thresh )
- Q(t) Q(t) U p
- if (Q(t) )
- chose first sense of t
- else
- among all ps in Q(t), chose path in core tree
with most items assigned
21Enrich Core Tree
entity
entity
substance, matter object
food, nutrient artifact
nutriment instrumentality
dish
device fondue, fondu
conductor
semiconductor
diode
light-emitting diode
(led)
Core tree
Toaster with led indicators
22Enrich Core Tree
entity
entity
entity entity
substance, matter
object
substance,matter object
food,
nutrient artifact
food, nutrient artifact
nutriment
instrumentality nutriment
instrumentality
dish device
dish
device fondue, fondu
conductor snack food
conductor
semiconductor
chip semiconductor
diode
chip
light-emitting diode (led)
Core tree
Chip (p1)
Chip (p2)
23Enrich Core Tree
entity
entity
entity entity
substance, matter
object
substance,matter object
food,
nutrient artifact
food, nutrient artifact
nutriment
instrumentality nutriment
instrumentality
dish device
dish
device fondue, fondu
conductor snack food
conductor
semiconductor
chip semiconductor
diode
chip
light-emitting diode (led)
Core tree
Chip (p1)
Chip (p2)
24Enrich Core Tree (contd)
entity entity
substance,
matter object
food,
nutrient artifact
nutriment
instrumentality
dish
(1699) device
fondue, fondu (40)
conductor
semiconductor (45)
diode
light-emitting diode (led)
Core tree
snack food chip
chip
25Results on a Recipes/ Kitchen Appliances Data
Set
26Results on a Recipes/ Kitchen Appliances Data
Set
27Discussion
- This is very simple, but works very well
- Why hasnt this been done before?
- Because WordNet did not have enough coverage?
28Conclusions
- Can nearly-automatically build a set of
hierarchies by finding IS-A relations between
terms using WordNet - The method has been tested on various domains
- medicine, mathematics, recipes, news, arts
- User study in progress
- Limitations
- The ontology has to be appropriate for the target
domain - No disambiguation between nouns, verbs, and
adjectives -