Learning a joint model of word sense and syntactic preference

1 / 21

About This Presentation

Title:

Learning a joint model of word sense and syntactic preference

Description:

... a joint model of word sense and syntactic preference. Galen Andrew ... Semcor Data (marked for sense) Penn Treebank Data (marked for subcat) Stanford University ... –

Number of Views:26

Avg rating:3.0/5.0

Slides: 22

Provided by: nlpSta

Category:

more less

Transcript and Presenter's Notes

Title: Learning a joint model of word sense and syntactic preference

1
Learning a joint model of word sense and
syntactic preference

Galen Andrew and Teg Grenager
NLP Lunch
December 3, 2003

2
Motivation

Divide and conquer NLP has divided the problem
of language processing into small problems
(tagging, parsing, WSD, anaphora resolution,
information extraction)
Traditionally, we build separate models for each
problem
But linguistic phenomena are correlated!
In some cases, a joint model can better represent
the phenomena
We may be better able to perform one task if we
have a joint model and can use many kinds of
evidence

3
Motivation

In particular, syntactic and semantic properties
of language are (of course!) very correlated
For example, semantic information (word sense,
coresolution, etc.) is useful when doing
syntactic processing (tagging, parsing, movement,
etc.) and vice versa
Evidence that humans use this information (e.g.,
Clifton et al. 1984, FerreiraMcClure 1997,
Garnsey et al. 1997)
Evidence that it is useful in NLP (e.g., Yarowsky
2000, Lin 1997, Bikel 2000)

4
Verb Sense and Subcat

Weve chosen to focus on modelling two specific
phenomena verb sense and verb subcategorization
preference.
Roland and Jurafsky (1998) demonstrate that
models which condition on verb sense are better
able to predict verb subcategorization
Others (e.g., Yarowski, Lin) have shown that
models that condition on syntactic information
are better able to predict word sense
We believe that a joint model of verb sense and
subcategorization may be more accurate than
separate models on either task

5
Example

The word admit has 8 senses in WordNet, with
different distributions over subcategories

Sense
Definition
Subcategorization
Example
1
Acknowledge
Somebody admits that something. (Sfin) Somebody
admits something. (NP)
The defendant admitted that he had lied. He
admitted his guilt.
2
Allow in
Somebody admits somebody. (NP) Somebody admits
someone (in)to somewhere. (NP PP)
I will admit only those with passes. The student
was admitted.
6
Give access to
Something admits to somewhere. (PP)
The main entrance admits to the foyer.
6
Lack of Training Data

To learn a joint model over verb sense and
subcategorization preference, wed ideally have a
(large) dataset marked for both
No such dataset exists (although parts of the
Brown corpus are in Semcor and in the PTB, it is
small and not aligned)
However, we have some datasets marked for sense
(Senseval, Semcor), and others that can easily be
marked for subcategory (PTB)
We can think of this as one big corpus with
missing data

7
Lack of Training Data
sense
subcat
seq
bow
x
x
x
x
x
x
Semcor Data(marked for sense)
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
Penn Treebank Data(marked for subcat)
x
x
x
x
x
x
x
x
x
8
EM to the Rescue

How do people usually deal with model parameter
estimation when there is missing data? The
expectation-maximization algorithm.
Big idea its easy to
E fill in missing data if you have a good model,
and
M compute maximum likelihood model parameters if
you have complete data
So you initialize somehow, and then loop over the
above two steps until convergence

9
EM to the Rescue

More formally, for data x, missing data z, and
parameters ?
E-step For each instance i, set
M-step Set

10
The Model
Subcat
Sense
Seq
BOW
11
The Model E-step
Subcat
Sense
observed
unobserved
query
Seq
BOW
12
The Model E-step
Subcat
Sense
observed
unobserved
query
Seq
BOW
13
The Model M-step
Subcat
Sense
Seq
BOW
Deterministic
14
The Model M-step
Prior, estimated from counts
Subcat
Sense
Seq
BOW
15
The Model M-step
Estimated from counts
Subcat
Sense
Seq
BOW
16
The Model M-step
Subcat
Sense
Multinomial NB Model
Seq
BOW
17
The Model M-step
Subcat
Sense
Encoded as PCFG Grammar
Only computed once
Seq
BOW
18
Subcategory Grammars

In order to represent P(seqsubcat) we needed to
learn separate grammars/lexicons for each
subcategory of the target verb
When reading in PTB trees, we first make a
separate copy of the tree for each verb.
Then for each tree, we mark the selected verb for
subcategory (using Tgrep expressions) and
propagate the markings to the top of the tree.
Then trees are annotated (tag split, for
accuracy) and binarized, and we read off grammars
and lexicon
Thus at parse time, each root symbol must parse
some verb to its specified subcategory.

19
Model Testing

Once weve trained a model with EM, we can use it
to predict sense and/or subcat in a completely
unmarked instance
For example, to infer sense given only the
sequence (and bow)
Infering subcat given only the sequence is similar

20
Results

None yet, but we should have them soon

21
Future Work

More features, and a more complex model
Learn separate distributions over words in the VP
and outside of the VP, conditioned on sense
Learn a distribution over words contained in
particular argument and adjunct conditioned on
sense

Write a Comment

User Comments (0)