Collective Word Sense Disambiguation - PowerPoint PPT Presentation

About This Presentation
Title:

Collective Word Sense Disambiguation

Description:

The electricity plant supplies 500 homes with power. ... Dog, Hound, Canine. Retriever. Terrier. Animal. Bird. Available Data ... – PowerPoint PPT presentation

Number of Views:33
Avg rating:3.0/5.0
Slides: 19
Provided by: rebekah2
Category:

less

Transcript and Presenter's Notes

Title: Collective Word Sense Disambiguation


1
Collective Word Sense Disambiguation
  • David Vickrey
  • Ben Taskar
  • Daphne Koller

2
Word Sense Disambiguation
Clues
The electricity plant supplies 500 homes with
power.
vs.
A plant requires water and sunlight to survive.
Clues
Tricky
That plant produces bottled water.
3
WSD as Classification
  • Senses s1, s2, , sk correspond to classes c1,
    c2, , ck
  • Features properties of context of word
    occurrence
  • Subject or verb of sentence
  • Any word occurring within 4 words of occurrence
  • Document set of features corresponding to an
    occurrence

The electricity plant supplies 500 homes with
power.
4
Simple Approaches
  • Only features are what words appear in context
  • Naïve Bayes
  • Discriminative, e.g. SVM
  • Problems
  • Feature set not rich enough
  • Data extremely sparse
  • space occurs 38 times in corpus with 200,000
    words

5
Available Data
  • WordNet electronic thesaurus
  • Words grouped by meaning into synsets
  • Slightly over 100,000 synsets
  • For nouns and verbs, hierarchy over synsets

Animal
Mammal
Bird
Dog, Hound, Canine
Retriever
Terrier
6
Available Data
  • Around 400,000 word corpus labeled with synsets
    from WordNet
  • Sample sentences from WordNet
  • Very sparse for most words

7
What Hasnt Worked
  • Intuition context of dog similar to context of
    retriever
  • Use hierarchy to determine possibly useful data
  • Using cross-validation, learn what data is
    actually useful
  • This hasnt worked out very well

8
Why?
  • Lots of parameters (not even counting parameters
    estimated using MLE)
  • gt 100K for one model, 20K for another
  • Not much data (400K words)
  • a, the, and, of, to occur 65K times (together)
  • Hierarchy may not be very useful
  • Hand-built not designed for this task
  • Features not very expressive
  • Luke is looking at this more closely using an SVM

9
Collective WSD
  • Ideas
  • Determine senses of all words in a document
    simultaneously
  • Allows for richer features
  • Train on unlabeled data as well as labeled
  • Lots and lots of unlabeled text available

10
Model
  • Variables
  • S1,S2, , Sn synsets
  • W1,W2, , Wn words, always observed

S1
S3
S2
S4
S5
W1
W3
W2
W4
W5
11
Model
  • Each synset generated from previous context
    size of context a parameter (4)

n
?
P(Wi Si) P(Si Si-3,Si-2,Si-1)
P(S,W)
i 1
P(Sis Si-3,Si-2,Si-1) Z(si-3,si-2,si-1)
exp(?s(si-3)?s(si-2)?s(si-1)?s)
P(W) S P(S,W)
12
Learning
  • Two sets of parameters
  • P(Wi Si) Given current estimates of marginals
    P(Si), expected counts
  • ?s(s) For s ? Domain(Si-1), s ? Domain(Si),
    gradient descent on log likelihood gives

P(w,si-3,si-2,s,s) P(w,si-3,si-2,s) P(s
si-3,si-2,s)
?s(s) S
Si-3,Si-2
13
Efficiency
  • Only need to calculate marginals over contexts
  • Forwards-backwards
  • Issue some words have many possible synsets
    (40-50) want very fast inference
  • Possibly prune values?

14
WordNet and Synsets
  • Model uses WordNet to determine domain of Si
  • Synset information should be more reliable
  • This allows us learn without any labeled data
  • Consider synsets eagle,hawk, eagle (golf
    shot), and hawk(to sell)
  • Since parameters depend only on synset, even
    without labeled data, can find correct clustering

15
Richer Features
  • Heuristic One sense per discourse usually,
    within a document any given word only takes one
    of its possible senses
  • Can capture this using long-range links
  • Could assume each word independent of all
    occurrences besides the ones immediately before
    and after
  • Or, could use approximate inference (Kikuchi)

16
Richer Features
  • Can reduce feature sparsity using hierarchy
    (e.g., replace all occurrences of dog and cat
    with animal)
  • Need collective classification to do this
  • Could add global hidden variables to try to
    capture document subject

17
Advanced Parameters
  • Lots of parameters
  • Regularization likely helpful
  • Could tie parameters together based on similarity
    in the WordNet hierarchy
  • Ties in what I was working on before
  • More data in this situation (unlabeled)

18
Experiments
  • Soon
Write a Comment
User Comments (0)
About PowerShow.com