Title: Improved Semantic Role Parsing
1Improved Semantic Role Parsing
- Kadri Hacioglu, Sameer Pradhan, Valerie Krugler,
- Steven Bethard, Ashley Thornton,
- Wayne Ward, Dan Jurafsky, James Martin
- Center for Spoken Language Research
- University of Colorado
- Boulder, CO
2What is Thematic Role Tagging?
- Assigning semantic labels to sentence elements.
- Elements are arguments of some predicate or
participants in some event. - Who did What to Whom, How, When, Where, Why
- DATE In 1901 PATIENT President William
McKinley was shot AGENT by anarchist Leon
Czolgosz LOCATION at the
Pan-American Exposition
3Baseline Parsing Algorithm
- From Gildea and Jurafsky (2002)
- Generate syntactic parse of sentence (Charniak)
- Specify predicate (verb)
- For each constituent node in parse tree
- Extract features relative to predicate
- Path, Voice, Headword, Position, Phrase Type,
Predicate - Estimate P(Role features) for each role and
normalize - Assign role with highest probability
422 Thematic Roles
Stimulus Source Type Temporal Topic
Theme Other - Null
- Agent
- Actor
- Cause
- Degree
- Experiencer
- Goal
- Instrument
- Location
- Manner
- Means
- Path
- Proposition
- Result
- State
5Baseline Thematic Parse Accuracy
- Train on PropBank Training Set (72,000 annotated
roles) - Test on PropBank section-23 (3,800
annotated roles)
6Improvements to Baseline
- Improve Null vs !Null classification
- Prune constituents with P(Null) gt threshold (.99)
- P(RoleFea) P(!NullPath, Head) P(Role!Null,
Fea) gt thresh - For unseen targets, back off to target cluster
- Disallow overlapping role labels
7SVM Classifier
- Same basic algorithm as Baseline
- Change classification step to use SVM
- Used TinySVM software Kudo Matsumoto 2000
- Same features as baseline
- For each role train one-vs-all classifier
- includes Null role
- Run classifiers on each constituent
- Convert svm output to probabilities by fitting a
sigmoid - Generate N-Best classifications for constituents
8SVM N-Best Performance
- Same training test sets as baseline (74 / 69,
79/74)
9Robustness
Significant performance degredation for new type
of data (30 unseen targets, word senses could be
different)
10Other Experiments
- Combine PropBank and FrameNet training
- Attempt to increase training data
- Did not significantly improve performance
- Different word senses were tagged in two corpora
- Different strategy for choosing which node to
label - Add Named Entities as features
- Only 1 increase in PR overall
- Significant improvement to Temporal and Location
roles
11Segment Classify with SVM
- Use SVM to segment and classify chunks
- Features
- window of 5 words
- POS tags for words
- Syntactic phrase position tags (B,I,O)
- Path from word to target
- Class assignments for previous words
- Assign Semantic phrase position tag to each word
12SVM Chunker I
Features
words
Target word detector
target word
path for each word
Syntactic Parser
Path Finder
input sentence
POS tags
word positions
Chunker
Active Passive Detector
voice
13Example I
But AGENT analysts TARGET say TOPIC Sansui
is a special case
But CC O CClt-S-gtVP-gtVBP say BEF ACT
O analysts NNS B-NP NNSlt-NPlt-S-gtVP-gtVBP
say BEF ACT B-agent say
VBP B-VP O say I-TARGET ACT
O Sansui NNP B-NP VBPlt-VP-gtSBAR-gtS-gtNP-gtNNP
say AFT ACT B-topic is AUX O
VBPlt-VP-gtSBAR-gtS-gtVP-gtAUX say AFT ACT
I-topic a DT B-NP VBPlt-VP-gtSBAR-gtS-gtVP-gtNP-
gtDT say AFT ACT I-topic special JJ I-NP
VBPlt-VP-gtSBAR-gtS-gtVP-gtNP-gtJJ say AFT ACT
I-topic case NN I-NP VBPlt-VP-gtSBAR-gtS-gtVP-gtNP
-gtNN say AFT ACT I-topic
14SVM Chunker II
Features
words
Target word detector
target word
path for each word
POS tagger
Path Finder
input sentence
POS tags
word positions
Chunker
Active Passive Detector
voice
15Example II
But AGENT analysts TARGET say TOPIC Sansui
is a special case
POS tagged Chunked (only NP and VP)
But_CC NP analysts_NNS (VP say_VBP ) NP
Sansui_NNP (VP is_VBZ ) NP a_DT special_JJ
case_NN
But CC O CC-gtNP-gtVP-gtVBP say BEF ACT
O analysts NNS B-NP NNS-gtNP-gtVP-gtVBP say
BEF ACT B-agent say VBP B-VP O say
I-TARGET ACT O Sansui NNP B-NP NNP-gtNP-gtVP-gtV
BP say AFT ACT B-topic Is VBZ
B-VP VBZ-gtVP-gtNP-gtVP-gtVBP say AFT ACT
I-topic a DT B-NP DT-gtNP-gtVP-gtNP-gtVP-gtVBP say
AFT ACT I-topic Special JJ I-NP
JJ-gtNP-gtVP-gtNP-gtVP-gtVBP say AFT ACT I-topic
case NN I-NP NN-gtNP-gtVP-gtNP-gtVP-gtVBP say AFT
ACT I-topic
16Performance
Train on only first 3000 sentences PropBank data
Chunker is SVM based. Provides NP, VP, PP, ADVP,
ADJP, SBAR etc. chunks
Chunker- I trained on 21000 sentences P 79,
R71
SVM- I
P 80, R74
17Summary and Future Work
- Project has shown continued improvement in
semantic parsing - Goals
- Improve accuracy through new features
- Improve robustness to data sets by improving word
sense robustness - Continue experiments without full syntactic parse
- Apply to Question Answering