Shallow semantic parsing: Making most of limited training data - PowerPoint PPT Presentation

About This Presentation
Title:

Shallow semantic parsing: Making most of limited training data

Description:

Cross-lingual appeal (Boas 2005) Prerequisite for use in NLP: ... (A. Conan Doyle, The Hound of the Baskervilles) Problems of supervised learning setting ... – PowerPoint PPT presentation

Number of Views:81
Avg rating:3.0/5.0
Slides: 25
Provided by: sebasti60
Category:

less

Transcript and Presenter's Notes

Title: Shallow semantic parsing: Making most of limited training data


1
Shallow semantic parsing Making most of limited
training data
  • Katrin Erk
  • Sebastian Pado
  • Saarland University

2
Introduction
  • Frame semantics
  • Who does what to whom analysis senses and
    roles
  • Cross-lingual appeal (Boas 2005)
  • Prerequisite for use in NLPAutomatic, robust,
    accurate methods for analysis of free text
  • Predominant machine learning paradigm Supervised
    classification
  • Learn relation between features and classes from
    training corpus guess classes in test corpus
  • Gildea and Jurafsky (2002) and many since

3
Frame-semantic analysis
  • Step 1 Frame disambiguation
  • WSD-style classification of predicate in terms of
    frames
  • Step 2 Role assignment
  • Classification of nodes in terms of role labels

4
Frame-semantic analysis
Creeping in its shadow I reached a point whence
I could look straight through the uncurtained
window. (A. Conan Doyle, The Hound of the
Baskervilles)
5
Problems of supervised learning setting
  • Coverage
  • lemmas may be missing
  • frames may be missing
  • Languages other than English
  • Training data may not be available
  • Can we take advantage of existing resources for
    English?

6
Todays talk
  • Shalmaneser a system for automatic
    frame-semantic analysis
  • Unknown sense detection dealing with missing
    frames
  • Annotation projection for cross-lingual data
    creation
  • Summary

7
Shalmaneser Automatic frame-semantic analysis
  • Assignment of
  • senses (frames) to predicates
  • semantic roles
  • Aim easy use, for exploring applications of
    frame-semantic analysis
  • Input plain text
  • Syntactic preprocessing integrated
  • Visualization with SALTO tool

8
Shalmaneser Automatic frame-semantic analysis
  • Semantic analysis as supervised learning tasks
  • Pre-trained classifiers available for English
    (FrameNet) and German (SALSA)
  • Performance of English models
  • Frame assignment accuracy 0.93, baseline 0.89
  • High baseline because some senses are missing
  • Role assignment
  • Role recognition F-score 0.75
  • Role labeling Accuracy 0.78
  • Not top-scoring, but okay. Focus on ease of use
    and on flexibility.

9
Shalmaneser Flexibiliby
  • Processing steps linked only by interface format
    Salsa/Tiger XML (Erk Pado 04)
  • Adding a module just needs to speak Salsa/Tiger
    XML
  • Model features specified in experiment file, can
    be changed easily
  • Adding new parser by instantiating an interface
    class
  • New language only syntactic preprocessing changes

10
Todays talk
  • Shalmaneser a system for automatic
    frame-semantic analysis
  • Unknown sense detection dealing with missing
    frames
  • Annotation projection for cross-lingual data
    creation
  • Summary

11
Detecting unknown word senses (frames)
  • Unseen senses ? normal WSD approach will
    assign wrong sense
  • Automatically detect senses we havent seen
    before?

12
Unknown sense detection as outlier detection
  • Outlier detection detect occurrences of
    previously unseen events (overview articles
    Markou Singh 2003a,b)
  • training data positive cases only. Derive model
    of normal cases
  • test data positive and negative cases

13
A Nearest Neighbor-based outlier detection method
  • Tax and Duin (2000) simple method, easy to
    implement
  • Given test point and its nearest training
    neighbor Is closer to than s
    nearest neighbor?
  • Test point x, nearest training neighbor t,
    nearest neighbor t of t, (Euclidean) distances
    d Accept x if pNN(x) is below a given threshold

yes
no
14
Unknown sense detection Results
  • Evaluation (Erk NAACL 2006)
  • Use FrameNet data
  • Treat one sense of a lemma as pseudo-unknown(iter
    ate over all senses)
  • Results (assignment of label unknown)
  • TaxDuins method, one lemma at a timePrec
    0.70, Rec 0.35
  • More data all data for a frame, not just that
    of one lemmaPrec 0.77, Rec 0.82

15
Results
  • What features are important?
  • Best just context words
  • Almost as good features of 1, 3, 4 together
  • Just the subcategorization frame high precision,
    low recall
  • Subcat frame, plus headwords of arguments
    inbetween 3 and 2, but obviously too sparse

16
Unknown sense detection as outlier detection The
bigger picture
  • Why assume missing word senses in the sense
    inventory and in the training data?
  • Growing, unfinished resources, like FrameNet
  • Domain-specific senses may be missing from
    general-purpose sense inventories
  • Outlier detection method presented here
    applicable to any resource that groups words into
    senses, e.g. WordNet
  • Using outlier detection to detect occurrences of
    nonliteral use?

17
Todays talk
  • Shalmaneser a system for automatic
    frame-semantic analysis
  • Unknown sense detection dealing with missing
    frames
  • Annotation projection for cross-lingual data
    creation
  • Summary

18
Motivation
Definitions, Role set Language-independent
Annotated Sentences Specific, too
Predicate classes Language-specific
19
Agenda
  • For new language, induce
  • Frame-semantic predicate classification
  • Corpus with frame-semantic annotation
  • Method Annotation projection in parallel corpus
  • Word alignments approximate semantic equivalence
  • Corresponding word pairs (predicates)
  • Corresponding constituents
  • Evaluation Study on EUROPARL corpus (De/En/Fr)

20
An idealised example
Arriving
Arriving
Peter comes home
Pierre revient à la maison
21
Frame-semantic classes
  • Idea For each frame, construct list of
    predicates in new language occurring aligned to
    predicates of this frame gt FEEs for new
    languages
  • Main obstacle Translational divergence
  • Corresponding predicates dont evoke same frame
  • Address by shallow, language-independent
    filtering (Pado and Lapata AAAI 2005)
  • Important Distributional patterns
  • Evaluation Can obtain predicate classes for
    German and French with precision of 65-70
  • Main remaining problem English polysemy not
    covered by FrameNet

22
Role annotations (I)
  • Idea For each sentence, transfer semantic role
    annotation onto translated sentence
  • Obstacle 1 Frame divergence
  • Role projection only sensible if frames match
  • Good news In En-De test corpus (Pado and Lapata
    HLT/EMNLP 2005), 70 of frames match
  • Obstacle 2 Role divergence
  • Even if frames are parallel, do roles match?
  • Good news In En-De test corpus, matching frames
    show 90 role matches
  • Remaining cases mostly elisions (e.g. passive)

23
Role annotations (II)
  • Obstacle 3 Errors/omissions in automatically
    induced word alignments
  • Can be overcome by using bracketing information
    (chunks / constituents)
  • Induction of cross-lingual correspondences as
    graph optimisation problem (Pado and Lapata ACL
    2006)
  • Evaluation (all exact match F-score)
  • Word-based projection 0.50
  • Constituent-based 0.75
  • Upper limit 0.85
  • Remaining errors mostly parsing-related

24
Summary
  • Frame-semantic analysis potentially interesting
    for many NLP applications
  • Goal of Shalmaneser flexible and easy-to-use
    system
  • Address incompleteness in resources
  • Unknown sense detection as outlier detection
  • Porting Frame Semantics to new languages
  • Parallel corpora for automatic annotation
    projection
Write a Comment
User Comments (0)
About PowerShow.com