Approximating Textual Entailment with LFG and FrameNet Frames - PowerPoint PPT Presentation

About This Presentation

Title:

Approximating Textual Entailment with LFG and FrameNet Frames

Description:

... release oil to help relieve the U.S. fuel crisis caused by Hurricane Katrina. ... Fred & Detour different sense assignments (FN coverage) Linguistic Components ... – PowerPoint PPT presentation

Number of Views:39

Avg rating:3.0/5.0

Slides: 45

Provided by: coliUnis

Category:

more less

Transcript and Presenter's Notes

Title: Approximating Textual Entailment with LFG and FrameNet Frames

1
Approximating Textual Entailment with LFG and
FrameNet Frames

Aljoscha Burchardt and Anette Frank
Computational Linguistics Department
Language Technology Lab
Saarland University
DFKI GmbH
Saarbrücken Saarbrücken

SALSA Workshop, Saarbrücken, June 27-28,
2006 Multilingual semantic annotation theory and
applications
2
Overview

The PASCAL Recognizing Textual Entailment task
(RTE) What is it, and how to approach it?
The SALSA RTE SystemA baseline system for
approximating Textual Entailment
Building on LFG-based syntactic analysis and
frame semantics
Computing structural and semantic overlap as an
approximation of textual entailment in a learning
architecture
Open architecture for future extensions towards
deeper modelling
Linguistic analysis LFG and FrameNet frames
Approximating Textual Entailment
Computing a match graph for structural and
semantic overlap
Feature extraction and machine learning
Results of this years RTE task
Discussion, error analysis and perspectives
Conclusion

3
The PASCAL RTE Task What is it?

A recently established Challenge for the NLP/AI
community
Testing a systems capacity to recognize Textual
Entailment
Realistic, open-domain data set
drawn from system outputs in NLP applications
IR, IE, QA, SUM
Controlled set-up balanced training and test
sets
800/800 text-hypothesis pairs

4
Taking a look at the data

Fine-grained linguistic analysis
T Oscar-winning actor Nicolas Cages new son
and Superman have sth. in common ...
H Nicolas Cages new son was awarded an Oscar.
No (IE)
Lexical semantics and paraphrases
(nominalisation, synonymy)
T on December 10th 1936 King Edward VIII gave
up his right to the British throne.
H King Edward VIII abdicated on the 10th of
December, 1936. Yes (QA)
Inference and world knowledge
T Olson, 62, previously worked as a partner at
Ernst Young LLP, before joining the Fed board
in 2001, to serve a term ending in 2010.
H Olson is a member of the Fed board. Yes
(IE)
Modality
T U.S. Secretary of State Condoleezza Rice said
Thursday that North Korea should return to
nuclear disarmament talks and ...
H North Korea says it will rejoin nuclear talks.
No (SUM)
Temporal and local restrictions (monotonicity)
T In most Pacific countries there are very few
women in parliament.
H Women are poorly represented in parliament.
Yes (!) (IR)

5
Textual Entailment
We say that T entails H if the meaning of H can
be inferred from the meaning of T, as would
typically be interpreted by people. This
somewhat informal definition is based on (and
assumes) common human understanding of language
as well as common background knowledge. Cases
in which inference is very probable (but not
completely certain) are still judged
True. (Dagan, Glickmann, Magnini, RTE 2005
Workshop Proceedings)

Circumscribing Textual Entailment? See
discussions in Zaenen, Karttunen and Crouch
(2005), Manning
(2006), Crouch,
Karttunen and Zaenen (2006).
6
A Challenge, ... in fact

T Hundreds of divers and treasure hunters,
including the Duke of Argyll, have risked their
lives in the dangerous waters of the Isle of Mull
trying to discover the reputed 30,000,000 pounds
in Gold carried by this vessel--the target of the
most enduring treasure hunt in British history.
H Shipwreck salvaging was attempted. (Yes, IR)
T The 26-member International Energy Agency
said, Friday, that member countries would release
oil to help relieve the U.S. fuel crisis caused
by Hurricane Katrina.
H Responding to a plea from the International
Energy Agency for member countries to release
reserves, Canada is prepared to help. (No, SUM)

7
Approximating Textual Entailment

How to reconcile obvious complexity and required
depth?
Parsing complexity
Semantic analysis
Argument structure, anaphora, lexical meaning,
semantic and discourse relations, presupposition,
...
Inferences based on linguistic meaning and world
knowledge
Statistical/ML approximation of Textual
Entailment
Based on state-of-the-art syntactic and shallow
semantic analysis
Measuring structural and semantic overlap
With possibilities for extensions towards deeper
modelling
Inference on partial structures (lexical
entailment)
Targeted modelling of specific aspects, e.g.
modality contexts

8
A baseline system for approximating Textual
Entailment

Fine-grained LFG-based syntactic analysis
English LFG grammar (Riezler et al.
2002)broad-coverage with high-quality
probabilistic disambiguation
Frame Semantics
Coarse-grained lexical-semantic classification of
predicates with role-based argument structure
encoding
Extended semantic representations WordNet
senses, SUMO concepts
Computing structural and semantic overlap
Hypothesis high/low ratio of H/T overlap gt
entailment yes/no

9
A baseline system for approximating Textual
Entailment

Fine-grained LFG-based syntactic analysis
English LFG grammar (Riezler et al.
2002)broad-coverage with high-quality
probabilistic disambiguation
Frame Semantics
Coarse-grained lexical-semantic classification of
predicates with role-based argument structure
encoding
Extended semantic representations WordNet
senses, SUMO concepts
Computing structural and semantic overlap
A learning problem measures of overlap, weighted
entailment decision

10
The SALSA RTE System
Linguistic analysis componentsand Integration
XLE parsingLFG f-structure
f-structure w/ (extended) frame- semantic
projection
Fred/Detour Rosy frames roles
WordNet-based WSDWordNet SUMO
Using XLE term rewriting system (Crouch 2005)
11
Linguistic ComponentsLFG analysis combined with
FrameNet frames

Deep syntactic LFG analysis
Broad-coverage grammar with probabilistic
disambiguation
Fine-grained grammatical function analysis with
integrated NER
Performance on RTE-II development and test set
Coverage ? 99 (? 86 full parses, ? 13 partial
parses)
On RTE H/T pairs ? 76 fully analysed pairs ?
2 single analysis only
Frame semantic analysis
Focusing on lexical semantic classes and
role-based argument structure
Disregarding aspects of deep semantics
modality, quantification, ...
Normalisation over syntactic and lexical
alternations (diatheses, lexicalisation, PoS)

12
Linguistic ComponentsFrame and role assignment

Shalmaneser (Erk Pado, 2006)
Shallow semantic parser for FrameNet frame and
role assignment
Fred statistical frame assignment
WSD system for predicates, in terms of frames
Rosy semantic role assignment
Argument recognition and argument labelling
Using state-of-the-art features from robust
syntactic parsing
Detour (to FrameNet via WordNet) (Burchardt et
al., 2005)
Aim overcome lexical gaps in FrameNet
A rule-based frame assignment system that takes a
detour to FrameNet via WordNet
Determine similarity of unknown LUs to existing
frames (their LUs) based on WordNet-similarity
measures

13
Linguistic ComponentsFrame and role assignment

Fred
Rosy
Fred,
Detour
Rosy

14
Linguistic ComponentsFrame and role assignment

Fred Detour different sense assignments (FN
coverage)

15
Linguistic ComponentsIntegration and extended
semantics projection

Porting frame and role assignments to LFG
f-structure
Defining a frame semantics projection using head
lemmata as interface layer (accounts for parser
discrepancies)
Using XLE rewrite system (Crouch 2005)

Head-indexed frame role assignments
16
Linguistic ComponentsIntegration and extended
semantics projection

Rule-based extensions of LFG-frame structures
Frames corresponding to LFG NE classes
Locations, companies, dates,
Extra-thematic roles, based on LFG adjunct
classes, etc.
Time, Reason, Location, Concessive,
adjunct(Z,Y), ntype_sem(Y,time)
gt s(Z,SemZ), s(Y,SemY), time(SemZ,SemY).
Extended semantics projection WordNet and SUMO
classes
WSD Banerjee Pedersen, 2003
WordNet SUMO/MILO mapping Niles and Pease
(20019

17
Linguistic ComponentsIntegration and extended
semantics projection

Normalisations of syntactic structure
Passive Mapping SUBJ and OBJ to dsubj and dobj
argument slots
Coindexing relative pronouns and relativised
head, appositives, etc.
Heuristic rules collect antecedent candidate sets
for pronominals
FEF Frame-Exchange-Format
(Partial) Visualisation of extended
syntactic-semantic graph structures in FEFViewer
(Alexander Koller, Coli Saarbrücken)

18
A walk-through-example from RTE 2006

Pair 716
Text
In 1983, Aki Kaurismäki directed his first
full-time feature.
Hypothesis
Aki Kaurismäki directed a film.

19
LFG F-Structuresin XLE graphical display
20
Automatic Frame Annotation for Textin SALTO
Viewer
Collins Parse
21
Automatic Frame Annotation for Hypothesis

716_h Aki Karusmäki directed a film.

22
LFG and Frames for Hypothesisin FEFViewer
Aki Kaurismäki directed a film.
23
The SALSA RTE System
Recognizing Textual Entailment Graph matching
Statistical approximation
Linguistic analysis componentsand Integration
hypothesis
text
XLE parsingLFG f-structure
f-structure w/ frames concepts
f-structure w/ frames concepts
f-structure w/ (extended) frame- semantic
projection
Fred/Detour Rosy frames roles
text-hypothesis-match graph

matching nodes and edges
different match types (similarity types)
extensions for deeper modelling (modality,
lexical entailment)

WordNet-based WSDWordNet SUMO
Feature extraction
Model training classification
24
Hypothesis-Text-Match GraphsComputing structural
and semantic overlap

Computing structural and semantic overlap
Computing a match graph from text and
hypothesis graphs
Matches are established by different aspects and
degrees of similarity
Approximating textual entailment
High/low overlap ratio of hypothesis and match
graph gt entailment yes/no

25
Hypothesis-Text-Match Graphs Different matching
strategies

Match graph/Text overlap Ratio of matched
material and non-matched material in Text
Match graph/Hypothesis overlap Ratio of the
matched material and non-matched material in
Hypothesis
T Leo Fender invented the first electric guitar
and the electric bass guitar.
H Leo Fender invented the first electric guitar.
I 7/12 58 II 7/7 100

hypothesis
26
Hypothesis-Text-Match GraphsComputing structural
and semantic overlap

Graph matching using XLE rewrite system
Defining different types of match conditions on
t- and h-graph, triggering new nodes and edges
in m-graph, with match-type info
Matching algorithm tied to rewrite-logic
Locally defined matches (no graph traversal)
Starting with (multiple) node matches
Edge matches restricted to connect matched nodes

text-hypothesis gt text-hypothesis-match
frame(hx1,killing)
frame(m(z1,x1,y1), killing), match_type(m(z1,x1,
y1),killing,frame)
gt
frame(ty1,killing)
Rewrite rule frame(hX1,Frame),
frame(tY1,Frame) gt frame(m(Z1,X1,Y1),Frame),
match_type(m(Z1,X1,Y1),Frame,frame).
27
Hypothesis-Text-Match GraphsComputing structural
and semantic overlap

Aspects of similarity
Syntax-based (i.e. lexical and structural)
similarity
Identical PREDs and attribute values trigger node
matches
Identical ATTRIBUTES (GF, morph. features)
trigger edge matches
Semantics-based similarity
Identical FRAMES and CONCEPTS trigger node
matches
Identical ROLES trigger edge matches
Match graph consists of identical partial
syntactic semantic graphs
Degrees of similarity (strict vs. weak matching)
Non-identical, but structurally related PREDs
coreferentially related (relative clauses,
appositives, pronominals)
Non-identical, but semantically related PREDs
(WN-related, pathlt3)
Non-identical, but semantically related FRAMES
(FN-/Detour-related)
Match graph establishes overlapping partial
graphs (marked by match types)

28
t In 1983, Aki Kaurismäki directed his first
full-time feature.
29
Approximating Textual Entailment Extensions for
deeper modelling Modality

Detecting indicators of inconsistent modality
types
T A pet must have rabies protection confirmed by
a blood test.
H A case of rabies was confirmed.
Marking modal contexts in text and hypothesis
5 modality types conditional, future, diamond,
box, negation
Handling inconsistent modality types in matching
process
Introducing negatively marked match nodes
Blocking embedded structures for similarity-based
matches
Thus, reducing the size of the match graph

30
Approximating Textual Entailment Extensions for
deeper modelling Lexical Entailments

Bridging partial non-matching text and hypothesis
pairs
T Olson, 62, previously worked as a partner at
Ernst Young LLP, as a Minnesota bank president
and as a congressional aide, before joining the
Fed board in 2001, to serve a term ending in
2010.
H Olsen is a member of the Fed board.
Lexically induced inferences, defined as rewrite
rules on h/t/m graphs
Similar non-lexical heuristic inferences
Appositions prime minister X ? X is prime
minister
Possessive constructions Xs Y ? the Y of X

t (X1) joins X2 h (Y1) member-of Y2
m(Z2,Y2,X2) gt match_type(heuristic_entailment_
match).
31
Approximating Textual EntailmentMachine learning

Feature selection with WEKA Classifiers
Many learners select intuitively important
features, but also idiosyncratic ones
Selected learners and models
Model 1 Simple Conjunctive Rule classifier
generated a single rule
Medium/high threshold on pred/frame matches as
criterion for rejection
High degree of frame similarity /w medium
predicate similarity models entailment
Model 2 Meta-classifier LogitBoost (additive
logistic regression)
Features (1.-4.) used in iteration final
feature set 1.,2.,4.

1. No. of predicate matches relative to hypothesis
2. No. of frame (Fred,Detour) matches relative to hypothesis
3. No. of roles (Rosy) matches relative to hypothesis
4. Match graph size rel. to hypothesis, incl. syn, sem, ontological info
32
Results in RTE-II

SALSA RTE system results
Both models score SUM gt IR gt QA gt IE
Refined model better on QA simple model better
on SUM
Overall RTE-II results
Average accuracy 60 (Median 59)
Shallow overlap measures vary considerably
between data sets, whereas deeper approaches
remain more stable
Tendency towards deeper, knowledge-rich methods

Dev set all tasks
Model 1 61.1
Model 2 59.8
RTE-II all tasks IE IR QA SUM
Model 1 59.0 49.5 59.5 54.5 72.5
Model 2 57.8 48.5 58.5 57.0 67.0
Accuracy range (in) 53 - 56 58 - 61 62 - 64 74 -75
No. of groups 7 11 3 2
33
Discussion of ResultsTrue positives

High ratio of matching predicates, frames, and
f-structure
Typical phenomena
Non-identical predicates compensated by matching
frames (626)
Missing frame assignments compensated by WN
relatedness
die pass away (wn-related, 103)
Active-passive diathesis resolved by f-structure
normalisation (129)
Relative overlap measures also work for longer
hypotheses

T Everest summiter David Hiddleston has passed away in an avalanche of Mt. Tasman. H A person died in an avalanche. (103)
T An earthquake has hit the east coast of Hokkaido, Japan, with a magnitude of 7.0 Mw. H An earthquake occurred on the east coast of Hokkaido, Japan. (626)
T In one of the latest attacks, a US soldier on patrol was killed by a single shot from a sniper in northern Baghdad, the military said yesterday. H A sniper killed a U.S. soldier on patrol in Baghdad with a single shot. (129)
34
Discussion of ResultsTrue negatives

Modal context marking seems to be effective
27 of all true negatives involved modality
mismatches, while only 11.9 of all sentences
involve marked modal contexts
Future plans
Extend to lexically induced modality/facticity
indicators
Testing for non-monotonicity contexts

T The goal of preserving indigenous culture can hardly be achieved by a handful of researchers and curators at museums of ethnology and folk culture. H Indigenous folk art is preserved. (233)
T Even today, within the deepest recesses of our mind, lies a primordial fear that will not allow us to enter the sea without thinking about the possibility of being attacked by a shark. H A shark attacked a human being. (322)
35
Error analysisFalse positives

Typical cases
Semantic dissimilarity
Non-matching predicates within larger match
graphs, which are in fact semantically dissimilar
Structural distance
Matching nodes within a match graph correspond to
far distant nodes in the text graph compared to
neighbouring nodes in the match graph

36
Error analysisFalse positives
Unconnected nodes matched with distant nodes in
text grap
TSome 420 people have been hanged in Singapore
since 1991, mostly for drug trafficking, an
Amnesty International 2004 report said. That
gives the country of 4.4 million people the
highest execution rate in the world relative to
population. H4.4 million people were executed in
Singapore. (198) False positive
37
Error analysisFalse positives

Graph matching process
Not a top-down process
Starts by relating any nodes, and builds growing
clusters by finding matching edges
This allows criss-cross matching of nodes in the
match graph

Introduce weighted edges that reflect the
relative distance of pairs of match nodes in
text and hypothesis (path distance)

38
Error analysisFalse positives

Graph matching process
Not a top-down process
Starts by relating any nodes, and builds growing
clusters by finding matching edges
This allows criss-cross matching of nodes in the
match graph

text
hypothesis

Introduce weighted edges that reflect the
relative distance of pairs of match nodes in
text and hypothesis (path distance)

39
Conclusions

A medium-depth approach Approximating Textual
Entailment
Lexical and syntactic overlap, semantic
similarity (WordNet)
Frame semantics lexical semantic classes
argument structure
Flexible graph matching method with extensions to
deeper processing
Modality contexts, lexical inferences
Perspectives for future extensions
Engineering and fine-tuning
Combination with shallow (and deeper) methods in
voting architecture
Frame and role assignment
Sense discrimination outlier detection (Erk,
2006)
Coverage integration with other resources
(VerbNet, NomBank)
Modelling dissimilarity
Semantic distance measures and distance-weighted
graph edges
Acquisition of lexical modality indicators and
(lexical) entailment rules

40
References

RTE Proceedings
RTE Challenge Homepage http//www.pascal-network.
org/Challenges/RTE2
I. Dagan, O. Glickman, and B. Magnini(2005) The
PASCAL recognising textual entailment challenge.
In Proceedings of the RTE-1 Workshop,
Southampton, UK.
B. Magnini and I. Dagan, editors (2006).
Proceedings of the Second PASCAL Recognising
Textual Entailment Challenge, Venice, Italy.
Electronic proceedings and slides
http//ir-srv.cs.biu.ac.il64080/RTE2/proceedings
/
Discussion about RTE Task
Zaenen, Karttunen and Crouch, 2005 Local
Textual Inference can it be defined or
circumscribed?, In ACL 2005 Workshop on
Empirical Modelling of Semantic Equivalence and
Entailment, Ann Arbor, Michigan.
Manning (2006) Local Textual Inference It's
hard to circumscribe, but you know it when you
see it - and NLP needs it, MS. Stanford
University.
Crouch, Karttunen and Zaenen (2006)
Circumscribing is not excluding A reply to
Manning, MS. Palo Alto Research Center.
All papers http//www2.parc.com/istl/members/zaen
en/

41
References

A. Burchardt and A. Frank (2006) Approximating
Textual Entailment with LFG and FrameNet Frames
In Proceedings of the Second Recognising Textual
Entailment Workshop, Venice, Italy.http//www.col
i.uni-saarland.de/projects/salsa/page.php?idpubli
cations
K. Erk and S. Pado (2006) Shalmaneser - a
flexible toolbox for semantic role assignment.
In Proceedings of LREC-06, Genoa.http//www.coli.
uni-saarland.de/projects/salsa/page.php?idpublica
tions
A. Burchardt, K. Erk, and A. Frank (2005) A
WordNet Detour to FrameNet. In Proceedings of
the GLDV 2005 Workshop GermaNet II,
Bonn.http//www.coli.uni-saarland.de/projects/sal
sa/page.php?idpublications
R. Crouch (2005). Packed Rewriting for Mapping
Semantics to KR. In Proceedings of the Sixth
International Workshop on Computational
Semantics, Tilburg.http//www2.parc.com/istl/grou
ps/nltt/papers/iwcs05_crouch.pdf

42
(No Transcript)
43
Approximating Textual EntailmentSimilarity/Entail
ment measures and feature extraction
text graph hypothesis graph match graph proportional h/t and m/h ratio
lexical lex_id lex_id lex_id ratio_lexid
syntactic node_m (pred, coref, pro) edge_syn_m (all, gf, subc) ratio_nodes ratio_edges
Semantic strict (lfg_)frames_t (lfg_)roles_t (lfg_)frames_h (lfg_)roles_h (lfg_)frames_m (lfg_)roles_m ratio_(lfg_)frames ratio_(lfg_)roles
weak node_frameFN/derived_m mode_framerel/detour/wnrel_m node_heuristic_entailment_m node_modal_ctxt_mismatch_m
Connect-edness clusters_no, clusters_avg_size clusters_avgsize_rel_h clusters_abssize_rel_h
other fragmentary fragmentary rte_task
44
Error analysisSparse features

Feature set
High-frequency features that measure similarity
Few, and low-frequency features that model
dissimilarity
Bias towards similarity
29,5 false positives
12,75 false negatives
Plans for further development
Introducing distance measures (semantic and
structural)
Getting a grip on remaining differences, i.e.
non-matched edges between matching clusters