Title: Word Sense Disambiguation
1UnsupervisedWord Sense Disambiguation
REU, Summer, 2009
2Word Sense Disambiguation
- E.g., The soldiers drove the tank .
large vessel for holding gases or liquids
armored combat vehicle
3Context Knowledge Base
Many companies hire computer programmers
Computer programmers write software
hire
write
company
programmer
programmer
software
many
computer
computer
4Context Knowledge Base
Result of merging dependency trees
hire
write
1
1
1
1
company
programmer
software
1
2
many
computer
Weights are number of dependency relation
instances found
5WSD Algorithm
Parse original sentence using Minipar, get
weighted dependency tree.
A large software company hires computer
programmers.
hire
1
company
programmer
0.5
To-be-disambiguated word
software
computer
large
Weights are distances from to-be-disambiguated
word
0.33
1
1
6WSD Algorithm
Parse each gloss of to-be-disambiguated word, get
weighted dependency trees.
Gloss 1 an institution created to conduct
business
Gloss 2 a small military unit
create
unit
institution
small
military
conduct
business
7WSD Algorithm
For each word in a gloss tree, find that words
dependent words in the context knowledge base.
We are looking for words in the knowledge base
that match words in the original sentence. In
other words, we are looking for context clues to
disambiguate a word. A score is generated based
on the weights of those dependency relations in
the knowledge base, and the dependent words of
the to-be-disambiguated word in the original
sentence. The more matches we find, the higher
the generated score will be. The gloss with the
highest generated score will be selected as the
correct sense of the word.
8Synonym Matching
If no direct matches are found between a gloss
word and dependency relations in context
knowledge base, we can replace the gloss word
with one of its synonyms, since synonyms are
semantically equivalent words.
9Hypernym/hyponym Matching
- Extract hypernyms and hyponyms of words from
WordNet database. - Store these in a data structure.
- Strategies use all levels
- use only levels close to the original word
- apply the above strategies to synonym matching,
as well
E.g., animal
mammal
dog
poodle
10Word Similarity
- Use WordNetSimilarity Perl module to calculate
similarity score between gloss word and
dependent words in knowledge base. - The most similar word found will be considered
the closest to an actual match.
dog
animal
0.780
dog
desk
0.162
WordNetSimilarity similarity scores