Learning a joint model of word sense and syntactic preference

1 / 21
About This Presentation
Title:

Learning a joint model of word sense and syntactic preference

Description:

... a joint model of word sense and syntactic preference. Galen Andrew ... Semcor Data (marked for sense) Penn Treebank Data (marked for subcat) Stanford University ... –

Number of Views:26
Avg rating:3.0/5.0
Slides: 22
Provided by: nlpSta
Category:

less

Transcript and Presenter's Notes

Title: Learning a joint model of word sense and syntactic preference


1
Learning a joint model of word sense and
syntactic preference
  • Galen Andrew and Teg Grenager
  • NLP Lunch
  • December 3, 2003

2
Motivation
  • Divide and conquer NLP has divided the problem
    of language processing into small problems
    (tagging, parsing, WSD, anaphora resolution,
    information extraction)
  • Traditionally, we build separate models for each
    problem
  • But linguistic phenomena are correlated!
  • In some cases, a joint model can better represent
    the phenomena
  • We may be better able to perform one task if we
    have a joint model and can use many kinds of
    evidence

3
Motivation
  • In particular, syntactic and semantic properties
    of language are (of course!) very correlated
  • For example, semantic information (word sense,
    coresolution, etc.) is useful when doing
    syntactic processing (tagging, parsing, movement,
    etc.) and vice versa
  • Evidence that humans use this information (e.g.,
    Clifton et al. 1984, FerreiraMcClure 1997,
    Garnsey et al. 1997)
  • Evidence that it is useful in NLP (e.g., Yarowsky
    2000, Lin 1997, Bikel 2000)

4
Verb Sense and Subcat
  • Weve chosen to focus on modelling two specific
    phenomena verb sense and verb subcategorization
    preference.
  • Roland and Jurafsky (1998) demonstrate that
    models which condition on verb sense are better
    able to predict verb subcategorization
  • Others (e.g., Yarowski, Lin) have shown that
    models that condition on syntactic information
    are better able to predict word sense
  • We believe that a joint model of verb sense and
    subcategorization may be more accurate than
    separate models on either task

5
Example
  • The word admit has 8 senses in WordNet, with
    different distributions over subcategories

Sense
Definition
Subcategorization
Example
1
Acknowledge
Somebody admits that something. (Sfin) Somebody
admits something. (NP)
The defendant admitted that he had lied. He
admitted his guilt.
2
Allow in
Somebody admits somebody. (NP) Somebody admits
someone (in)to somewhere. (NP PP)
I will admit only those with passes. The student
was admitted.
6
Give access to
Something admits to somewhere. (PP)
The main entrance admits to the foyer.
6
Lack of Training Data
  • To learn a joint model over verb sense and
    subcategorization preference, wed ideally have a
    (large) dataset marked for both
  • No such dataset exists (although parts of the
    Brown corpus are in Semcor and in the PTB, it is
    small and not aligned)
  • However, we have some datasets marked for sense
    (Senseval, Semcor), and others that can easily be
    marked for subcategory (PTB)
  • We can think of this as one big corpus with
    missing data

7
Lack of Training Data
sense
subcat
seq
bow
x
x
x
x
x
x
Semcor Data(marked for sense)
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
Penn Treebank Data(marked for subcat)
x
x
x
x
x
x
x
x
x
8
EM to the Rescue
  • How do people usually deal with model parameter
    estimation when there is missing data? The
    expectation-maximization algorithm.
  • Big idea its easy to
  • E fill in missing data if you have a good model,
    and
  • M compute maximum likelihood model parameters if
    you have complete data
  • So you initialize somehow, and then loop over the
    above two steps until convergence

9
EM to the Rescue
  • More formally, for data x, missing data z, and
    parameters ?
  • E-step For each instance i, set
  • M-step Set

10
The Model
Subcat
Sense
Seq
BOW
11
The Model E-step
Subcat
Sense
observed
unobserved
query
Seq
BOW
12
The Model E-step
Subcat
Sense
observed
unobserved
query
Seq
BOW
13
The Model M-step
Subcat
Sense
Seq
BOW
Deterministic
14
The Model M-step
Prior, estimated from counts
Subcat
Sense
Seq
BOW
15
The Model M-step
Estimated from counts
Subcat
Sense
Seq
BOW
16
The Model M-step
Subcat
Sense
Multinomial NB Model
Seq
BOW
17
The Model M-step
Subcat
Sense
Encoded as PCFG Grammar
Only computed once
Seq
BOW
18
Subcategory Grammars
  • In order to represent P(seqsubcat) we needed to
    learn separate grammars/lexicons for each
    subcategory of the target verb
  • When reading in PTB trees, we first make a
    separate copy of the tree for each verb.
  • Then for each tree, we mark the selected verb for
    subcategory (using Tgrep expressions) and
    propagate the markings to the top of the tree.
  • Then trees are annotated (tag split, for
    accuracy) and binarized, and we read off grammars
    and lexicon
  • Thus at parse time, each root symbol must parse
    some verb to its specified subcategory.

19
Model Testing
  • Once weve trained a model with EM, we can use it
    to predict sense and/or subcat in a completely
    unmarked instance
  • For example, to infer sense given only the
    sequence (and bow)
  • Infering subcat given only the sequence is similar

20
Results
  • None yet, but we should have them soon

21
Future Work
  • More features, and a more complex model
  • Learn separate distributions over words in the VP
    and outside of the VP, conditioned on sense
  • Learn a distribution over words contained in
    particular argument and adjunct conditioned on
    sense
Write a Comment
User Comments (0)
About PowerShow.com