Shallow semantic parsing: Making most of limited training data - PowerPoint PPT Presentation

About This Presentation

Title:

Shallow semantic parsing: Making most of limited training data

Description:

Cross-lingual appeal (Boas 2005) Prerequisite for use in NLP: ... (A. Conan Doyle, The Hound of the Baskervilles) Problems of supervised learning setting ... – PowerPoint PPT presentation

Number of Views:81

Avg rating:3.0/5.0

Slides: 25

Provided by: sebasti60

Category:

more less

Transcript and Presenter's Notes

Title: Shallow semantic parsing: Making most of limited training data

1
Shallow semantic parsing Making most of limited
training data

Katrin Erk
Sebastian Pado
Saarland University

2
Introduction

Frame semantics
Who does what to whom analysis senses and
roles
Cross-lingual appeal (Boas 2005)
Prerequisite for use in NLPAutomatic, robust,
accurate methods for analysis of free text
Predominant machine learning paradigm Supervised
classification
Learn relation between features and classes from
training corpus guess classes in test corpus
Gildea and Jurafsky (2002) and many since

3
Frame-semantic analysis

Step 1 Frame disambiguation
WSD-style classification of predicate in terms of
frames
Step 2 Role assignment
Classification of nodes in terms of role labels

4
Frame-semantic analysis
Creeping in its shadow I reached a point whence
I could look straight through the uncurtained
window. (A. Conan Doyle, The Hound of the
Baskervilles)
5
Problems of supervised learning setting

Coverage
lemmas may be missing
frames may be missing
Languages other than English
Training data may not be available
Can we take advantage of existing resources for
English?

6
Todays talk

Shalmaneser a system for automatic
frame-semantic analysis
Unknown sense detection dealing with missing
frames
Annotation projection for cross-lingual data
creation
Summary

7
Shalmaneser Automatic frame-semantic analysis

Assignment of
senses (frames) to predicates
semantic roles
Aim easy use, for exploring applications of
frame-semantic analysis
Input plain text
Syntactic preprocessing integrated
Visualization with SALTO tool

8
Shalmaneser Automatic frame-semantic analysis

Semantic analysis as supervised learning tasks
Pre-trained classifiers available for English
(FrameNet) and German (SALSA)
Performance of English models
Frame assignment accuracy 0.93, baseline 0.89
High baseline because some senses are missing
Role assignment
Role recognition F-score 0.75
Role labeling Accuracy 0.78
Not top-scoring, but okay. Focus on ease of use
and on flexibility.

9
Shalmaneser Flexibiliby

Processing steps linked only by interface format
Salsa/Tiger XML (Erk Pado 04)
Adding a module just needs to speak Salsa/Tiger
XML
Model features specified in experiment file, can
be changed easily
Adding new parser by instantiating an interface
class
New language only syntactic preprocessing changes

10
Todays talk

Shalmaneser a system for automatic
frame-semantic analysis
Unknown sense detection dealing with missing
frames
Annotation projection for cross-lingual data
creation
Summary

11
Detecting unknown word senses (frames)

Unseen senses ? normal WSD approach will
assign wrong sense
Automatically detect senses we havent seen
before?

12
Unknown sense detection as outlier detection

Outlier detection detect occurrences of
previously unseen events (overview articles
Markou Singh 2003a,b)
training data positive cases only. Derive model
of normal cases
test data positive and negative cases

13
A Nearest Neighbor-based outlier detection method

Tax and Duin (2000) simple method, easy to
implement
Given test point and its nearest training
neighbor Is closer to than s
nearest neighbor?
Test point x, nearest training neighbor t,
nearest neighbor t of t, (Euclidean) distances
d Accept x if pNN(x) is below a given threshold

yes
no
14
Unknown sense detection Results

Evaluation (Erk NAACL 2006)
Use FrameNet data
Treat one sense of a lemma as pseudo-unknown(iter
ate over all senses)
Results (assignment of label unknown)
TaxDuins method, one lemma at a timePrec
0.70, Rec 0.35
More data all data for a frame, not just that
of one lemmaPrec 0.77, Rec 0.82

15
Results

What features are important?
Best just context words
Almost as good features of 1, 3, 4 together
Just the subcategorization frame high precision,
low recall
Subcat frame, plus headwords of arguments
inbetween 3 and 2, but obviously too sparse

16
Unknown sense detection as outlier detection The
bigger picture

Why assume missing word senses in the sense
inventory and in the training data?
Growing, unfinished resources, like FrameNet
Domain-specific senses may be missing from
general-purpose sense inventories
Outlier detection method presented here
applicable to any resource that groups words into
senses, e.g. WordNet
Using outlier detection to detect occurrences of
nonliteral use?

17
Todays talk

Shalmaneser a system for automatic
frame-semantic analysis
Unknown sense detection dealing with missing
frames
Annotation projection for cross-lingual data
creation
Summary

18
Motivation
Definitions, Role set Language-independent
Annotated Sentences Specific, too
Predicate classes Language-specific
19
Agenda

For new language, induce
Frame-semantic predicate classification
Corpus with frame-semantic annotation
Method Annotation projection in parallel corpus
Word alignments approximate semantic equivalence
Corresponding word pairs (predicates)
Corresponding constituents
Evaluation Study on EUROPARL corpus (De/En/Fr)

20
An idealised example
Arriving
Arriving
Peter comes home
Pierre revient à la maison
21
Frame-semantic classes

Idea For each frame, construct list of
predicates in new language occurring aligned to
predicates of this frame gt FEEs for new
languages
Main obstacle Translational divergence
Corresponding predicates dont evoke same frame
Address by shallow, language-independent
filtering (Pado and Lapata AAAI 2005)
Important Distributional patterns
Evaluation Can obtain predicate classes for
German and French with precision of 65-70
Main remaining problem English polysemy not
covered by FrameNet

22
Role annotations (I)

Idea For each sentence, transfer semantic role
annotation onto translated sentence
Obstacle 1 Frame divergence
Role projection only sensible if frames match
Good news In En-De test corpus (Pado and Lapata
HLT/EMNLP 2005), 70 of frames match
Obstacle 2 Role divergence
Even if frames are parallel, do roles match?
Good news In En-De test corpus, matching frames
show 90 role matches
Remaining cases mostly elisions (e.g. passive)

23
Role annotations (II)

Obstacle 3 Errors/omissions in automatically
induced word alignments
Can be overcome by using bracketing information
(chunks / constituents)
Induction of cross-lingual correspondences as
graph optimisation problem (Pado and Lapata ACL
2006)
Evaluation (all exact match F-score)
Word-based projection 0.50
Constituent-based 0.75
Upper limit 0.85
Remaining errors mostly parsing-related

24
Summary