Title: FATE: a FrameNet Annotated corpus for Textual Entailment
1FATEa FrameNet Annotated corpus for Textual
Entailment
LREC 2008 , Marrakech , 28 May 2008
- Marco Pennacchiotti, Aljoscha Burchardt
- Computerlinguistik
- Saarland University, Germany
SALSA II - The Saarbrücken Lexical Semantics
Acquisition Project
2Summary
- FrameNet and Textual Entailment
- FATE annotation schema
- Annotation examples and statistics
- Conclusions
3Frame Semantics
Fillmore 1976, 2003
- Frame conceptual structure modeling a
prototypical situation - Frame Elements (FE) participants of the
situation - Frame Evoking elements (FEE) predicates evoking
the situation
Predicate-argument level normalizations
Evelyn spoke about her past Evelyns
statement about her past STATEMENT(Speaker
Evelyn Topic her past)
- FrameNet Berkeley Project 1
- Database of frames for the core lexicon of
English - 800 frames, 10.000 lemmas, 135.000 annotated
sentences
(1) http//framenet.icsi.berkeley.edu
4Textual Entailment (TE)
Given two text fragments, the Text T and the
Hypothesis H, T entails H if the meaning of H
can be inferred from the meaning of T, as would
typically interpreted by people Dagan 2005
- T Yahoo has recently acquired Overture
- H Yahoo owns Overture
- T ? H
- Recognizing Textual Entailment (RTE)
- recognize if entailment holds for a given (T,H)
pair - Models core inferences of many NLP applications
(QA, IE, MT,) - RTE Challenges Dagan et al.,2005 Giampiccolo
et al., 2007 - Compare systems for RTE
- Corpus 800 training pairs, 800 test pairs,
evenly split in and - pairs
5Predicate-argument and RTE
- Predicate-level inference plays a relevant role
in TE (20 of positive examples in RTE-2
Garoufi, 2007 ) -
An avalanche has struck a popular skiing resort
in Austria, killing at least 11 people. Humans
died in an avalanche.
T
H
DEATH(Protagonist 11 people / humans Cause
avalanche / avalanche )
- Implementation gap
- Burchardt et al.,2007 FrameNet system
comparable to lexical overlap - Hickl et al.,2006 PropBank-based features are
not effective - Rana et al.,2005 DIRT paraphrase repository
does not help
6FATE corpus
FATE a manually frame-annotated Textual
Entailment corpus, to study the role of frame
semantics in RTE
- Reference corpus RTE-2 test set, 800 pairs,
29,000 tokens - Frame resource FrameNet version 1.3
- Corpus Format SALSA/TIGER XML Burchardt
et al.,2006 - Pre-processing annotation on top of Collins
parser syntactic analysis - T and H are randomly reordered to avoid
biases - Annotation performed by one highly experienced
annotator - inter-annotator agreement over 5 of the
corpus - FEE-agreement 82
- Frame-agreement 88
- Role-agreement 91
- annotation carried out using the SALTO tool 1
(1) http//www.coli.uni-saarland.de/projects/salsa
/salto/doc
7FATE annotation process an example
Collins synt. an.
full-text annotation (all words considered)
Ruppenhofer,2007
8FATE annotation process an example
frame
Collins synt. an.
FEE
9FATE annotation process an example
frame
FE
Collins synt. an.
FEE
FE filler
Maximization principle chose the largest
constituent possible when annotating
10Annotation Schema
Relevance Principle
- Intuition annotate as FEE only those words
evoking a relevant situation (frame) in the
sentence at hand - Very intuitive flavor, but high agreement 83 on
a pilot set of 15 sentences
KIDNAPPING
Victim
Place
Perpetrator
Authorities in Brazil hold 200 people as hostage
11Annotation Schema
Span Annotation
- On T of positive pairs, annotate only the
fragments (spans) contributing to the inferential
process - Spans are obtained from the ARTE annotation
Garoufi,2007 - For negative pairs it is not straightforward to
derive spans, hence we do full annotation
T Soon after the EZLN had returned to Chiapas,
Congress approved a different version of the
COCOPA Law, which did not include the autonomy
clauses, claiming they were in contradiction with
some constitutional rights (private property and
secret voting) this was seen as a betrayal by
the EZLN and other political groups. H EZLN is
a political group.
12Annotation Schema
Other guidelines
- Unknown frames use an Unknown frame for words
evoking situations not present in the FrameNet
database - Anaphora
- Copula and support verbs
- Modal expressions
- Metaphors
- Existential constructions
13Corpus statistics
- Annotated pairs 800 (400 positive, 400
negatives) - Annotated frames 4,500
- avg. 5.6 frames per pair
- 1,600 frames in positive pairs
- 2,800 in negative pairs
- Annotated roles 9,500
- avg. 2.1 roles per frame
- Annotation time 230 hours
- 90 h for positive pairs (13 min/pair)
- 140 h for negative pairs (21 min/pair)
14FrameNet and RTE (simple case)
T
H
- Syntactic normalization
- Active / Passive
EDUCATIONAL_TEACHING(Student ground soldiers /
soldiers Material virtual reality/ virtual
reality)
15Implementation gap insights
- Resource coverage is too low
- Models for predicate-argument inference are weak
- Automatic annotation models (SRL) are not good
enough to be safely used in RTE
- FrameNet coverage is good
- 373 Unknown frames (8 of total frames)
- Unknown roles 1 of total roles
- Coverage is unlikely to be a limiting factor for
using FrameNet in applications
16Why should you use FATE ?
- Resource coverage is too low
- Models for predicate-argument inference are weak
- Automatic annotation models (SRL) are not good
enough to be safely used in RTE
- To better study predicate-argument inference in
RTE - To experiment frame-RTE models on a gold-std
corpus - To learn better SRL models, by training on FATE
- Corpus is freely available on-line
17FATE download http//www.coli.uni-saarland.de/pr
ojects/salsa/fate
- pennacchiotti_at_coli.uni-sb.de
- www.coli.uni-saarland.de/pennacchiotti
18(No Transcript)
19FrameNet and RTE
T
H
- Syntactic normalization
- Apposition to copula
PEOPLE_BY_VOCATION(Person Andreotti / Andreotti
Place Italy / Italy Age elder/ elder)
20FrameNet and RTE
T
H
- Frame-to-frame inference
- Sentencing --- HR ---gt Imprisonment
- Convict maps to Prisoner
- Place maps to Place
21Annotation Schema
Anaphora
- Locality principle
- Annotate the local referent of a role filler
- Link the local referent to the external referent
via the Anaphora frame
22Annotation Schema
Support and Copula Verbs
- Verbs carrying minimal semantic content (e.g. be,
seem) - Annotate the noun as FEE, instead of the verb
Ruppenhofer,2007
23Annotation Schema
Modal Expressions
- Modal expression (e.g. modal verbs, particles,
modal triggers) are annotated only when the modal
meaning is prevalent in the sentence
24Annotation Schema
Other guidelines
- Metaphors are annotated with their figurative
meaning - Existential constructions (e.g. there is) are
annotated with the frame Existence, only when it
is the only meaning conveyed in the sentence
(e.g. There are 11 official languages) - Unknown frames use an Unknown frame for words
evoking situations not present in the FrameNet
database - Maximization principle chose the largest
constituent possible when annotating
25Motivations
- Semantic knowledge at the predicate-argument
level is critical in NLP tasks - From who did BMW buy Rover ?
-
- Rover was bought by BMW from British
Aerospace - BMW acquired Rover from British Aerospace
- BMWs purchase of Rover from British
Aerospace - British Aerospace sold Rover to BMW
- Predicate-argument resources (e.g. PropBank and
FrameNet) allow to map meaning preserving
alternations to the same predicative structure -
- BUY_EVENT (Buyer BMW , Seller British Aerospace
, Good Rover)
26Motivations
- Implementation gap very scarce impact of
predicate-argument resource in NLP applications
Fliedner,2007 Frank et al.,2006 - Possible reasons
- Resource coverage is too low
- Modeling predicate knowledge is too hard
- Automatic annotation (SRL) is not good enough
- Our goal create a gold-standard corpus,
manually annotated with predicate-argument
structure, to investigate (1)-(3) - Corpus Second Recognizig Textual Entailment
(RTE) Challenge - Annotation FrameNet
27FATE Corpus annotation an example
T
Collins synt. an.
full-text annotation (all words considered)
Ruppenhofer,2007
28Frame Semantics
Fillmore 1976, 2003
- Frames are organized in a hierarchy with various
frame-to-frame relations
LEGEND
- FrameNet Berkeley Project 1
- Database of frames for the core lexicon of
English - 800 frames, 10.000 lemmas, 135.000 annotated
sentences - Hierarchy 7 frame relations, 1136 edges, 86
roots
(1) http//framenet.icsi.berkeley.edu
29FATE Corpus annotation an example
frame
T
Collins synt. an.
FEE
30FATE Corpus annotation an example
frame
T
FE
Collins synt. an.
FEE
FE filler
Maximization principle chose the largest
constituent possible when annotating
31FATE Corpus annotation an example
frame
T
FE
Collins synt. an.
FEE
FE filler
H
DEATH(Protagonist Hiddleston / person Cause
avalanche )
32FrameNet and Salsa Project
- FrameNet Berkeley Project 1
- Database of frames for the core lexicon of
English - 800 frames, 10.000 lemmas, 135.000 annotated
sentences from BNC - SALSA Project 2
- A German corpus with frame annotation (20.000
verbal instances) - Semantic frame-based lexicon for German
- Methods for automation and application of
frame-semantic information (SRL, RTE, discourse
interpretation, etc.)
(1) http//framenet.icsi.berkeley.edu/ (2)
http//www.coli.uni-saarland.de/projects/salsa/
33Annotation Schema
Span Annotation
- On T of positive pairs, annotate only the
fragments (spans) contributing to the inferential
process - Spans are obtained from the ARTE annotation
Garoufi,2007 - For negative pairs it is not straightforward to
derive spans, hence we do full annotation
T Soon after the EZLN had returned to Chiapas,
Congress approved a different version of the
COCOPA Law, which did not include the autonomy
clauses, claiming they were in contradiction with
some constitutional rights (private property and
secret voting) this was seen as a betrayal by
the EZLN and other political groups. H EZLN is
a political group.
34FrameNet and RTE
T
H
- Frame-to-frame inference
- KILLING --- cause ---gt DEATH
- Cause maps to Cause
- Victim maps to Protagonist