Title: Approximating Textual Entailment with LFG and FrameNet Frames
1Approximating Textual Entailment with LFG and
FrameNet Frames
- Aljoscha Burchardt and Anette Frank
- Computational Linguistics Department
Language Technology Lab - Saarland University
DFKI GmbH - Saarbrücken Saarbrücken
SALSA Workshop, Saarbrücken, June 27-28,
2006 Multilingual semantic annotation theory and
applications
2Overview
- The PASCAL Recognizing Textual Entailment task
(RTE) What is it, and how to approach it? - The SALSA RTE SystemA baseline system for
approximating Textual Entailment - Building on LFG-based syntactic analysis and
frame semantics - Computing structural and semantic overlap as an
approximation of textual entailment in a learning
architecture - Open architecture for future extensions towards
deeper modelling - Linguistic analysis LFG and FrameNet frames
- Approximating Textual Entailment
- Computing a match graph for structural and
semantic overlap - Feature extraction and machine learning
- Results of this years RTE task
- Discussion, error analysis and perspectives
- Conclusion
3The PASCAL RTE Task What is it?
- A recently established Challenge for the NLP/AI
community - Testing a systems capacity to recognize Textual
Entailment - Realistic, open-domain data set
- drawn from system outputs in NLP applications
IR, IE, QA, SUM - Controlled set-up balanced training and test
sets - 800/800 text-hypothesis pairs
4Taking a look at the data
- Fine-grained linguistic analysis
- T Oscar-winning actor Nicolas Cages new son
and Superman have sth. in common ... - H Nicolas Cages new son was awarded an Oscar.
No (IE) - Lexical semantics and paraphrases
(nominalisation, synonymy) - T on December 10th 1936 King Edward VIII gave
up his right to the British throne. - H King Edward VIII abdicated on the 10th of
December, 1936. Yes (QA) - Inference and world knowledge
- T Olson, 62, previously worked as a partner at
Ernst Young LLP, before joining the Fed board
in 2001, to serve a term ending in 2010. - H Olson is a member of the Fed board. Yes
(IE) - Modality
- T U.S. Secretary of State Condoleezza Rice said
Thursday that North Korea should return to
nuclear disarmament talks and ... - H North Korea says it will rejoin nuclear talks.
No (SUM) - Temporal and local restrictions (monotonicity)
- T In most Pacific countries there are very few
women in parliament. - H Women are poorly represented in parliament.
Yes (!) (IR)
5Textual Entailment
We say that T entails H if the meaning of H can
be inferred from the meaning of T, as would
typically be interpreted by people. This
somewhat informal definition is based on (and
assumes) common human understanding of language
as well as common background knowledge. Cases
in which inference is very probable (but not
completely certain) are still judged
True. (Dagan, Glickmann, Magnini, RTE 2005
Workshop Proceedings)
Circumscribing Textual Entailment? See
discussions in Zaenen, Karttunen and Crouch
(2005), Manning
(2006), Crouch,
Karttunen and Zaenen (2006).
6A Challenge, ... in fact
- T Hundreds of divers and treasure hunters,
including the Duke of Argyll, have risked their
lives in the dangerous waters of the Isle of Mull
trying to discover the reputed 30,000,000 pounds
in Gold carried by this vessel--the target of the
most enduring treasure hunt in British history. - H Shipwreck salvaging was attempted. (Yes, IR)
- T The 26-member International Energy Agency
said, Friday, that member countries would release
oil to help relieve the U.S. fuel crisis caused
by Hurricane Katrina. - H Responding to a plea from the International
Energy Agency for member countries to release
reserves, Canada is prepared to help. (No, SUM)
7Approximating Textual Entailment
- How to reconcile obvious complexity and required
depth? - Parsing complexity
- Semantic analysis
- Argument structure, anaphora, lexical meaning,
semantic and discourse relations, presupposition,
... - Inferences based on linguistic meaning and world
knowledge - Statistical/ML approximation of Textual
Entailment - Based on state-of-the-art syntactic and shallow
semantic analysis - Measuring structural and semantic overlap
- With possibilities for extensions towards deeper
modelling - Inference on partial structures (lexical
entailment) - Targeted modelling of specific aspects, e.g.
modality contexts
8A baseline system for approximating Textual
Entailment
- Fine-grained LFG-based syntactic analysis
- English LFG grammar (Riezler et al.
2002)broad-coverage with high-quality
probabilistic disambiguation - Frame Semantics
- Coarse-grained lexical-semantic classification of
predicates with role-based argument structure
encoding - Extended semantic representations WordNet
senses, SUMO concepts - Computing structural and semantic overlap
- Hypothesis high/low ratio of H/T overlap gt
entailment yes/no
9A baseline system for approximating Textual
Entailment
- Fine-grained LFG-based syntactic analysis
- English LFG grammar (Riezler et al.
2002)broad-coverage with high-quality
probabilistic disambiguation - Frame Semantics
- Coarse-grained lexical-semantic classification of
predicates with role-based argument structure
encoding - Extended semantic representations WordNet
senses, SUMO concepts - Computing structural and semantic overlap
- A learning problem measures of overlap, weighted
entailment decision
10The SALSA RTE System
Linguistic analysis componentsand Integration
XLE parsingLFG f-structure
f-structure w/ (extended) frame- semantic
projection
Fred/Detour Rosy frames roles
WordNet-based WSDWordNet SUMO
Using XLE term rewriting system (Crouch 2005)
11Linguistic ComponentsLFG analysis combined with
FrameNet frames
- Deep syntactic LFG analysis
- Broad-coverage grammar with probabilistic
disambiguation - Fine-grained grammatical function analysis with
integrated NER - Performance on RTE-II development and test set
- Coverage ? 99 (? 86 full parses, ? 13 partial
parses) - On RTE H/T pairs ? 76 fully analysed pairs ?
2 single analysis only - Frame semantic analysis
- Focusing on lexical semantic classes and
role-based argument structure - Disregarding aspects of deep semantics
modality, quantification, ... - Normalisation over syntactic and lexical
alternations (diatheses, lexicalisation, PoS) -
-
12Linguistic ComponentsFrame and role assignment
- Shalmaneser (Erk Pado, 2006)
- Shallow semantic parser for FrameNet frame and
role assignment - Fred statistical frame assignment
- WSD system for predicates, in terms of frames
- Rosy semantic role assignment
- Argument recognition and argument labelling
- Using state-of-the-art features from robust
syntactic parsing - Detour (to FrameNet via WordNet) (Burchardt et
al., 2005) - Aim overcome lexical gaps in FrameNet
- A rule-based frame assignment system that takes a
detour to FrameNet via WordNet - Determine similarity of unknown LUs to existing
frames (their LUs) based on WordNet-similarity
measures
13Linguistic ComponentsFrame and role assignment
- Fred
- Rosy
- Fred,
- Detour
- Rosy
14Linguistic ComponentsFrame and role assignment
- Fred Detour different sense assignments (FN
coverage)
15Linguistic ComponentsIntegration and extended
semantics projection
- Porting frame and role assignments to LFG
f-structure - Defining a frame semantics projection using head
lemmata as interface layer (accounts for parser
discrepancies) - Using XLE rewrite system (Crouch 2005)
Head-indexed frame role assignments
16Linguistic ComponentsIntegration and extended
semantics projection
- Rule-based extensions of LFG-frame structures
- Frames corresponding to LFG NE classes
- Locations, companies, dates,
- Extra-thematic roles, based on LFG adjunct
classes, etc. - Time, Reason, Location, Concessive,
- adjunct(Z,Y), ntype_sem(Y,time)
- gt s(Z,SemZ), s(Y,SemY), time(SemZ,SemY).
-
- Extended semantics projection WordNet and SUMO
classes - WSD Banerjee Pedersen, 2003
- WordNet SUMO/MILO mapping Niles and Pease
(20019
17Linguistic ComponentsIntegration and extended
semantics projection
- Normalisations of syntactic structure
- Passive Mapping SUBJ and OBJ to dsubj and dobj
argument slots - Coindexing relative pronouns and relativised
head, appositives, etc. - Heuristic rules collect antecedent candidate sets
for pronominals - FEF Frame-Exchange-Format
- (Partial) Visualisation of extended
syntactic-semantic graph structures in FEFViewer
(Alexander Koller, Coli Saarbrücken)
18A walk-through-example from RTE 2006
- Pair 716
- Text
- In 1983, Aki Kaurismäki directed his first
full-time feature. - Hypothesis
- Aki Kaurismäki directed a film.
19LFG F-Structuresin XLE graphical display
20Automatic Frame Annotation for Textin SALTO
Viewer
Collins Parse
21Automatic Frame Annotation for Hypothesis
- 716_h Aki Karusmäki directed a film.
22LFG and Frames for Hypothesisin FEFViewer
Aki Kaurismäki directed a film.
23The SALSA RTE System
Recognizing Textual Entailment Graph matching
Statistical approximation
Linguistic analysis componentsand Integration
hypothesis
text
XLE parsingLFG f-structure
f-structure w/ frames concepts
f-structure w/ frames concepts
f-structure w/ (extended) frame- semantic
projection
Fred/Detour Rosy frames roles
text-hypothesis-match graph
- matching nodes and edges
- different match types (similarity types)
- extensions for deeper modelling (modality,
lexical entailment)
WordNet-based WSDWordNet SUMO
Feature extraction
Model training classification
24Hypothesis-Text-Match GraphsComputing structural
and semantic overlap
- Computing structural and semantic overlap
- Computing a match graph from text and
hypothesis graphs - Matches are established by different aspects and
degrees of similarity - Approximating textual entailment
- High/low overlap ratio of hypothesis and match
graph gt entailment yes/no -
-
-
25Hypothesis-Text-Match Graphs Different matching
strategies
- Match graph/Text overlap Ratio of matched
material and non-matched material in Text - Match graph/Hypothesis overlap Ratio of the
matched material and non-matched material in
Hypothesis - T Leo Fender invented the first electric guitar
and the electric bass guitar. - H Leo Fender invented the first electric guitar.
- I 7/12 58 II 7/7 100
-
hypothesis
26Hypothesis-Text-Match GraphsComputing structural
and semantic overlap
- Graph matching using XLE rewrite system
- Defining different types of match conditions on
t- and h-graph, triggering new nodes and edges
in m-graph, with match-type info - Matching algorithm tied to rewrite-logic
- Locally defined matches (no graph traversal)
- Starting with (multiple) node matches
- Edge matches restricted to connect matched nodes
text-hypothesis gt text-hypothesis-match
frame(hx1,killing)
frame(m(z1,x1,y1), killing), match_type(m(z1,x1,
y1),killing,frame)
gt
frame(ty1,killing)
Rewrite rule frame(hX1,Frame),
frame(tY1,Frame) gt frame(m(Z1,X1,Y1),Frame),
match_type(m(Z1,X1,Y1),Frame,frame).
27Hypothesis-Text-Match GraphsComputing structural
and semantic overlap
- Aspects of similarity
- Syntax-based (i.e. lexical and structural)
similarity - Identical PREDs and attribute values trigger node
matches - Identical ATTRIBUTES (GF, morph. features)
trigger edge matches - Semantics-based similarity
- Identical FRAMES and CONCEPTS trigger node
matches - Identical ROLES trigger edge matches
- Match graph consists of identical partial
syntactic semantic graphs - Degrees of similarity (strict vs. weak matching)
- Non-identical, but structurally related PREDs
- coreferentially related (relative clauses,
appositives, pronominals) - Non-identical, but semantically related PREDs
(WN-related, pathlt3) - Non-identical, but semantically related FRAMES
(FN-/Detour-related) - Match graph establishes overlapping partial
graphs (marked by match types)
28t In 1983, Aki Kaurismäki directed his first
full-time feature.
29Approximating Textual Entailment Extensions for
deeper modelling Modality
- Detecting indicators of inconsistent modality
types - T A pet must have rabies protection confirmed by
a blood test. - H A case of rabies was confirmed.
- Marking modal contexts in text and hypothesis
- 5 modality types conditional, future, diamond,
box, negation - Handling inconsistent modality types in matching
process - Introducing negatively marked match nodes
- Blocking embedded structures for similarity-based
matches - Thus, reducing the size of the match graph
30Approximating Textual Entailment Extensions for
deeper modelling Lexical Entailments
- Bridging partial non-matching text and hypothesis
pairs - T Olson, 62, previously worked as a partner at
Ernst Young LLP, as a Minnesota bank president
and as a congressional aide, before joining the
Fed board in 2001, to serve a term ending in
2010. - H Olsen is a member of the Fed board.
- Lexically induced inferences, defined as rewrite
rules on h/t/m graphs - Similar non-lexical heuristic inferences
- Appositions prime minister X ? X is prime
minister - Possessive constructions Xs Y ? the Y of X
t (X1) joins X2 h (Y1) member-of Y2
m(Z2,Y2,X2) gt match_type(heuristic_entailment_
match).
31Approximating Textual EntailmentMachine learning
- Feature selection with WEKA Classifiers
- Many learners select intuitively important
features, but also idiosyncratic ones - Selected learners and models
- Model 1 Simple Conjunctive Rule classifier
generated a single rule - Medium/high threshold on pred/frame matches as
criterion for rejection - High degree of frame similarity /w medium
predicate similarity models entailment - Model 2 Meta-classifier LogitBoost (additive
logistic regression) - Features (1.-4.) used in iteration final
feature set 1.,2.,4.
1. No. of predicate matches relative to hypothesis
2. No. of frame (Fred,Detour) matches relative to hypothesis
3. No. of roles (Rosy) matches relative to hypothesis
4. Match graph size rel. to hypothesis, incl. syn, sem, ontological info
32Results in RTE-II
- SALSA RTE system results
- Both models score SUM gt IR gt QA gt IE
- Refined model better on QA simple model better
on SUM - Overall RTE-II results
- Average accuracy 60 (Median 59)
- Shallow overlap measures vary considerably
between data sets, whereas deeper approaches
remain more stable - Tendency towards deeper, knowledge-rich methods
Dev set all tasks
Model 1 61.1
Model 2 59.8
RTE-II all tasks IE IR QA SUM
Model 1 59.0 49.5 59.5 54.5 72.5
Model 2 57.8 48.5 58.5 57.0 67.0
Accuracy range (in) 53 - 56 58 - 61 62 - 64 74 -75
No. of groups 7 11 3 2
33Discussion of ResultsTrue positives
- High ratio of matching predicates, frames, and
f-structure - Typical phenomena
- Non-identical predicates compensated by matching
frames (626) - Missing frame assignments compensated by WN
relatedness - die pass away (wn-related, 103)
- Active-passive diathesis resolved by f-structure
normalisation (129) - Relative overlap measures also work for longer
hypotheses
T Everest summiter David Hiddleston has passed away in an avalanche of Mt. Tasman. H A person died in an avalanche. (103)
T An earthquake has hit the east coast of Hokkaido, Japan, with a magnitude of 7.0 Mw. H An earthquake occurred on the east coast of Hokkaido, Japan. (626)
T In one of the latest attacks, a US soldier on patrol was killed by a single shot from a sniper in northern Baghdad, the military said yesterday. H A sniper killed a U.S. soldier on patrol in Baghdad with a single shot. (129)
34Discussion of ResultsTrue negatives
- Modal context marking seems to be effective
- 27 of all true negatives involved modality
mismatches, while only 11.9 of all sentences
involve marked modal contexts - Future plans
- Extend to lexically induced modality/facticity
indicators - Testing for non-monotonicity contexts
T The goal of preserving indigenous culture can hardly be achieved by a handful of researchers and curators at museums of ethnology and folk culture. H Indigenous folk art is preserved. (233)
T Even today, within the deepest recesses of our mind, lies a primordial fear that will not allow us to enter the sea without thinking about the possibility of being attacked by a shark. H A shark attacked a human being. (322)
35Error analysisFalse positives
- Typical cases
- Semantic dissimilarity
- Non-matching predicates within larger match
graphs, which are in fact semantically dissimilar - Structural distance
- Matching nodes within a match graph correspond to
far distant nodes in the text graph compared to
neighbouring nodes in the match graph
36Error analysisFalse positives
Unconnected nodes matched with distant nodes in
text grap
TSome 420 people have been hanged in Singapore
since 1991, mostly for drug trafficking, an
Amnesty International 2004 report said. That
gives the country of 4.4 million people the
highest execution rate in the world relative to
population. H4.4 million people were executed in
Singapore. (198) False positive
37Error analysisFalse positives
- Graph matching process
- Not a top-down process
- Starts by relating any nodes, and builds growing
clusters by finding matching edges - This allows criss-cross matching of nodes in the
match graph
- Introduce weighted edges that reflect the
relative distance of pairs of match nodes in
text and hypothesis (path distance)
38Error analysisFalse positives
- Graph matching process
- Not a top-down process
- Starts by relating any nodes, and builds growing
clusters by finding matching edges - This allows criss-cross matching of nodes in the
match graph
text
hypothesis
- Introduce weighted edges that reflect the
relative distance of pairs of match nodes in
text and hypothesis (path distance)
39Conclusions
- A medium-depth approach Approximating Textual
Entailment - Lexical and syntactic overlap, semantic
similarity (WordNet) - Frame semantics lexical semantic classes
argument structure - Flexible graph matching method with extensions to
deeper processing - Modality contexts, lexical inferences
- Perspectives for future extensions
- Engineering and fine-tuning
- Combination with shallow (and deeper) methods in
voting architecture - Frame and role assignment
- Sense discrimination outlier detection (Erk,
2006) - Coverage integration with other resources
(VerbNet, NomBank) - Modelling dissimilarity
- Semantic distance measures and distance-weighted
graph edges - Acquisition of lexical modality indicators and
(lexical) entailment rules
40References
- RTE Proceedings
- RTE Challenge Homepage http//www.pascal-network.
org/Challenges/RTE2 - I. Dagan, O. Glickman, and B. Magnini(2005) The
PASCAL recognising textual entailment challenge.
In Proceedings of the RTE-1 Workshop,
Southampton, UK. - B. Magnini and I. Dagan, editors (2006).
Proceedings of the Second PASCAL Recognising
Textual Entailment Challenge, Venice, Italy. - Electronic proceedings and slides
http//ir-srv.cs.biu.ac.il64080/RTE2/proceedings
/ - Discussion about RTE Task
- Zaenen, Karttunen and Crouch, 2005 Local
Textual Inference can it be defined or
circumscribed?, In ACL 2005 Workshop on
Empirical Modelling of Semantic Equivalence and
Entailment, Ann Arbor, Michigan. - Manning (2006) Local Textual Inference It's
hard to circumscribe, but you know it when you
see it - and NLP needs it, MS. Stanford
University. - Crouch, Karttunen and Zaenen (2006)
Circumscribing is not excluding A reply to
Manning, MS. Palo Alto Research Center. - All papers http//www2.parc.com/istl/members/zaen
en/
41References
- A. Burchardt and A. Frank (2006) Approximating
Textual Entailment with LFG and FrameNet Frames
In Proceedings of the Second Recognising Textual
Entailment Workshop, Venice, Italy.http//www.col
i.uni-saarland.de/projects/salsa/page.php?idpubli
cations - K. Erk and S. Pado (2006) Shalmaneser - a
flexible toolbox for semantic role assignment.
In Proceedings of LREC-06, Genoa.http//www.coli.
uni-saarland.de/projects/salsa/page.php?idpublica
tions - A. Burchardt, K. Erk, and A. Frank (2005) A
WordNet Detour to FrameNet. In Proceedings of
the GLDV 2005 Workshop GermaNet II,
Bonn.http//www.coli.uni-saarland.de/projects/sal
sa/page.php?idpublications - R. Crouch (2005). Packed Rewriting for Mapping
Semantics to KR. In Proceedings of the Sixth
International Workshop on Computational
Semantics, Tilburg.http//www2.parc.com/istl/grou
ps/nltt/papers/iwcs05_crouch.pdf
42(No Transcript)
43Approximating Textual EntailmentSimilarity/Entail
ment measures and feature extraction
text graph hypothesis graph match graph proportional h/t and m/h ratio
lexical lex_id lex_id lex_id ratio_lexid
syntactic node_m (pred, coref, pro) edge_syn_m (all, gf, subc) ratio_nodes ratio_edges
Semantic strict (lfg_)frames_t (lfg_)roles_t (lfg_)frames_h (lfg_)roles_h (lfg_)frames_m (lfg_)roles_m ratio_(lfg_)frames ratio_(lfg_)roles
weak node_frameFN/derived_m mode_framerel/detour/wnrel_m node_heuristic_entailment_m node_modal_ctxt_mismatch_m
Connect-edness clusters_no, clusters_avg_size clusters_avgsize_rel_h clusters_abssize_rel_h
other fragmentary fragmentary rte_task
44Error analysisSparse features
- Feature set
- High-frequency features that measure similarity
- Few, and low-frequency features that model
dissimilarity - Bias towards similarity
- 29,5 false positives
- 12,75 false negatives
- Plans for further development
- Introducing distance measures (semantic and
structural) - Getting a grip on remaining differences, i.e.
non-matched edges between matching clusters