Textual Entailment

About This Presentation

Title:

Textual Entailment

Description:

... Clean Mag has a 100 percent pollution retrieval ... Normalization Date/Time arguments ... of Textual Entailment Systems Textual Entailment ... – PowerPoint PPT presentation

Number of Views:216

Avg rating:3.0/5.0

Slides: 173

Provided by: IdoDa5

Category:

more less

Transcript and Presenter's Notes

Title: Textual Entailment

1
Textual Entailment

Dan Roth,
University of Illinois,
Urbana-Champaign
USA

Ido Dagan Bar Ilan University Israel

Fabio Massimo Zanzotto
University of Rome
Italy

ACL -2007
2
Outline

Motivation and Task Definition
A Skeletal review of Textual Entailment Systems
Knowledge Acquisition Methods
Applications of Textual Entailment
A Textual Entailment view of Applied Semantics

3
I. Motivation and Task Definition
4
Motivation

Text applications require semantic inference
A common framework for applied semantics is
needed, but still missing
Textual entailment may provide such framework

5
Desiderata for Modeling Framework

A framework for a target level of language
processing should provide
Generic (feasible) module for applications
Unified (agreeable) paradigm for investigating
language phenomena
Most semantics research is scattered
WSD, NER, SRL, lexical semantics relations (e.g.
vs. syntax)
Dominating approach - interpretation

6
Natural Language and Meaning
Meaning
Language
7
Variability of Semantic Expression
The Dow Jones Industrial Average closed up 255
Dow ends up
Dow gains 255 points
Stock market hits a record high
Dow climbs 255

Model variability as relations between text
expressions
Equivalence text1 ? text2 (paraphrasing)
Entailment text1 ? text2 the general case

8
Typical Application Inference Entailment
Question Expected answer formWho bought
Overture? gtgt X bought Overture
Overtures acquisition by Yahoo
Yahoo bought Overture
entails
hypothesized answer
text

Similar for IE X acquire Y
Similar for semantic IR t Overture was
bought for
Summarization (multi-document) identify
redundant info
MT evaluation (and recent ideas for MT)
Educational applications

9
KRAQ'05 Workshop - KNOWLEDGE and REASONING for
ANSWERING QUESTIONS (IJCAI-05)

CFP
Reasoning aspects    information fusion,
search criteria expansion models
summarization and intensional answers,
reasoning under uncertainty or with incomplete
knowledge,
Knowledge representation and integration
levels of knowledge involved (e.g. ontologies,
domain knowledge), knowledge
extraction models and techniques to
optimize response accuracy but similar needs
for other applications can entailment provide
a common empirical framework?

10
Classical Entailment Definition

Chierchia McConnell-Ginet (2001)A text t
entails a hypothesis h if h is true in every
circumstance (possible world) in which t is true
Strict entailment - doesn't account for some
uncertainty allowed in applications

11
Almost certain Entailments

t The technological triumph known as GPS was
incubated in the mind of Ivan Getting.
h Ivan Getting invented the GPS.

12
Applied Textual Entailment

A directional relation between two text
fragments Text (t) and Hypothesis (h)

t entails h (t?h) if humans reading t will infer that h is most likely true

Operational (applied) definition
Human gold standard - as in NLP applications
Assuming common background knowledge which is
indeed expected from applications

13
Probabilistic Interpretation

Definition
t probabilistically entails h if
P(h is true t) gt P(h is true)
t increases the likelihood of h being true
Positive PMI t provides information on hs
truth
P(h is true t ) entailment confidence
The relevant entailment score for applications
In practice most likely entailment expected

14
The Role of Knowledge

For textual entailment to hold we require
text AND knowledge ? h
but
knowledge should not entail h alone
Systems are not supposed to validate hs truth
regardless of t (e.g. by searching h on the web)

15
PASCAL Recognizing Textual Entailment (RTE)
ChallengesEU FP-6 Funded PASCAL Network of
Excellence 2004-7
Bar-Ilan University ITC-irst and CELCT,
Trento MITRE Microsoft Research
16
Generic Dataset by Application Use

7 application settings in RTE-1, 4 in RTE-2/3
QA
IE
Semantic IR
Comparable documents / multi-doc summarization
MT evaluation
Reading comprehension
Paraphrase acquisition
Most data created from actual applications output
RTE-2/3 800 examples in development and test
sets
50-50 YES/NO split

17
RTE Examples
TEXT HYPOTHESIS TASK ENTAIL-MENT
1 Regan attended a ceremony in Washington to commemorate the landings in Normandy. Washington is located in Normandy. IE False
2 Google files for its long awaited IPO. Google goes public. IR True
3 a shootout at the Guadalajara airport in May, 1993, that killed Cardinal Juan Jesus Posadas Ocampo and six others. Cardinal Juan Jesus Posadas Ocampo died in 1993. QA True
4 The SPD got just 21.5 of the vote in the European Parliament elections, while the conservative opposition parties polled 44.5. The SPD is defeated by the opposition parties. IE True
18
Participation and Impact

Very successful challenges, world wide
RTE-1 17 groups
RTE-2 23 groups
150 downloads
RTE-3 25 groups
Joint workshop at ACL-07
High interest in the research community
Papers, conference sessions and areas, PhDs,
influence on funded projects
Textual Entailment special issue at JNLE
ACL-07 tutorial

19
Methods and Approaches (RTE-2)

Measure similarity match between t and h
(coverage of h by t)
Lexical overlap (unigram, N-gram, subsequence)
Lexical substitution (WordNet, statistical)
Syntactic matching/transformations
Lexical-syntactic variations (paraphrases)
Semantic role labeling and matching
Global similarity parameters (e.g. negation,
modality)
Cross-pair similarity
Detect mismatch (for non-entailment)
Interpretation to logic representation logic
inference

20
Dominant approach Supervised Learning
Similarity FeaturesLexical, n-gram,syntactic sem
antic, global
Classifier
YES
t,h
NO
Feature vector

Features model similarity and mismatch
Classifier determines relative weights of
information sources
Train on development set and auxiliary t-h corpora

21
RTE-2 Results
Average Precision Accuracy First Author (Group)
80.8 75.4 Hickl (LCC)
71.3 73.8 Tatu (LCC)
64.4 63.9 Zanzotto (Milan Rome)
62.8 62.6 Adams (Dallas)
66.9 61.6 Bos (Rome Leeds)
58.1-60.5 11 groups
52.9-55.6 7 groups
Average 60 Median 59
22
Analysis

For the first time methods that carry some
deeper analysis seemed (?) to outperform shallow
lexical methods

?
Cf. Kevin Knights invited talk at EACL-06,
titled Isnt linguistic Structure Important,
Asked the Engineer

Still, most systems, which do utilize deep
analysis, did not score significantly better than
the lexical baseline

23
Why?

System reports point at
Lack of knowledge (syntactic transformation
rules, paraphrases, lexical relations, etc.)
Lack of training data
It seems that systems that coped better with
these issues performed best
Hickl et al. - acquisition of large entailment
corpora for training
Tatu et al. large knowledge bases (linguistic
and world knowledge)

24
Some suggested research directions

Knowledge acquisition
Unsupervised acquisition of linguistic and world
knowledge from general corpora and web
Acquiring larger entailment corpora
Manual resources and knowledge engineering
Inference
Principled framework for inference and fusion of
information levels
Are we happy with bags of features?

25
Complementary Evaluation Modes

Seek mode
Input h and corpus
Output all entailing t s in corpus
Captures information seeking needs, but requires
post-run annotation (TREC-style)
Entailment subtasks evaluations
Lexical, lexical-syntactic, logical, alignment
Contribution to various applications
QA Harabagiu Hickl, ACL-06 RE Romano et
al., EACL-06

26
II. A Skeletal review of Textual Entailment
Systems
27
Textual Entailment
Entails Subsumed by
Eyeing the huge market potential, currently led
by Google, Yahoo took over search company
Overture Services Inc. last year
?
Yahoo acquired Overture
Overture is a search company
Google is a search company
Google owns Overture
.
Phrasal verb paraphrasing
Entity matching
Alignment
Semantic Role Labeling
How?
Integration
28
A general Strategy for Textual Entailment
Given a sentence T
Given a sentence H
?e
Re-represent T
Re-represent H
Lexical Syntactic Semantic
Lexical Syntactic Semantic
Knowledge Base semantic structural pragmatic
Transformations/rules
Representation
Decision Find the set of Transformations/Features
of the new representation (or use these to
create a cost function) that allows embedding
of H in T.
Re-represent T
Re-represent T
Re-represent T
Re-represent T
Re-represent T
Re-represent T
Re-represent T
29
Details of The Entailment Strategy

Preprocessing
Multiple levels of lexical pre-processing
Syntactic Parsing
Shallow semantic parsing
Annotating semantic phenomena
Representation
Bag of words, n-grams through tree/graphs based
representation
Logical representations

Knowledge Sources
Syntactic mapping rules
Lexical resources
Semantic Phenomena specific modules
RTE specific knowledge sources
Additional Corpora/Web resources
Control Strategy Decision Making
Single pass/iterative processing
Strict vs. Parameter based
Justification
What can be said about the decision?

30
The Case of Shallow Lexical Approaches

Preprocessing
Identify Stop Words
Representation
Bag of words

Knowledge Sources
Shallow Lexical resources typically Wordnet
Control Strategy Decision Making
Single pass
Compute Similarity use threshold tuned on a
development set (could be per task)
Justification
It works

31
Shallow Lexical Approaches (Example)

Lexical/word-based semantic overlap score based
on matching each word in H with some word in T
Word similarity measure may use WordNet
May take account of subsequences, word order
Learn threshold on maximum word-based match
score

Clearly, this may not appeal to what we think as
understanding, and it is easy to generate cases
for which this does not work well. However, it
works (surprisingly) well with respect to current
evaluation metrics (data sets?)
Text The Cassini spacecraft arrived at Titan
in July, 2006.
Text NASAs Cassini-Huygens spacecraft
traveled to Saturn in 2006.
Text The Cassini spacecraft has taken images
that show rivers on Saturns moon Titan.
Hyp The Cassini spacecraft has reached Titan.
32
An Algorithm LocalLexcialMatching

For each word in Hypothesis, Text
if word matches stopword remove word
if no words left in Hypothesis or Text return
0
numberMatched 0
for each word W_H in Hypothesis
for each word W_T in Text
HYP_LEMMAS Lemmatize(W_H)
TEXT_LEMMAS Lemmatize(W_T)
Use Wordnets
if any term in HYP_LEMMAS matches any term in
TEXT_LEMMAS
using LexicalCompare()
numberMatched
Return numberMatched/HYP_Lemmas

33
An Algorithm LocalLexicalMatching (Cont.)
LLM Performance RTE2 Dev 63.00 Test
60.50 RTE 3 Dev 67.50 Test 65.63

LexicalCompare()
if(LEMMA_H LEMMA_T)
return TRUE
if(HypernymDistanceFromTo(textWord,
hypothesisWord) lt 3)
return TRUE
if(MeronymyDistanceFromTo(textWord,
hypothesisWord) lt 3)
returnTRUE
if(MemberOfDistanceFromTo(textWord,
hypothesisWord) lt 3)
return TRUE
if(SynonymOf(textWord, hypothesisWord)
return TRUE
Notes
LexicalCompare is Asymmetric makes use of
single relation type
Additional differences could be attributed to
stop word list (e.g, including aux verbs)
Straightforward improvements such as bi-grams do
not help.
More sophisticated lexical knowledge (entities
time) should help.

34
Details of The Entailment Strategy (Again)

Preprocessing
Multiple levels of lexical pre-processing
Syntactic Parsing
Shallow semantic parsing
Annotating semantic phenomena
Representation
Bag of words, n-grams through tree/graphs based
representation
Logical representations

Knowledge Sources
Syntactic mapping rules
Lexical resources
Semantic Phenomena specific modules
RTE specific knowledge sources
Additional Corpora/Web resources
Control Strategy Decision Making
Single pass/iterative processing
Strict vs. Parameter based
Justification
What can be said about the decision?

35
Preprocessing

Syntactic Processing
Syntactic Parsing (Collins Charniak CCG)
Dependency Parsing (types)
Lexical Processing
Tokenization lemmatization
For each word in Hypothesis, Text
Phrasal verbs
Idiom processing
Named Entities Normalization
Date/Time arguments Normalization
Semantic Processing
Semantic Role Labeling
Nominalization
Modality/Polarity/Factive
Co-reference

Only a few systems

often used only during decision making

often used only during decision making
36
Details of The Entailment Strategy (Again)

Preprocessing
Multiple levels of lexical pre-processing
Syntactic Parsing
Shallow semantic parsing
Annotating semantic phenomena
Representation
Bag of words, n-grams through tree/graphs based
representation
Logical representations

Knowledge Sources
Syntactic mapping rules
Lexical resources
Semantic Phenomena specific modules
RTE specific knowledge sources
Additional Corpora/Web resources
Control Strategy Decision Making
Single pass/iterative processing
Strict vs. Parameter based
Justification
What can be said about the decision?

37
Basic Representations
MeaningRepresentation
Inference
Logical Forms
Semantic Representation
Representation
Syntactic Parse
Local Lexical
Raw Text
Textual Entailment

Most approaches augment the basic structure
defined by the processing level with additional
annotation and make use of a tree/graph/frame-base
d system.

38
Basic Representations (Syntax)
Syntactic Parse
Local Lexical
Hyp The Cassini spacecraft has reached Titan.
39
Basic Representations (Shallow Semantics
Pred-Arg )

T The government purchase of the Roanoke
building, a former prison, took place in 1902.
H The Roanoke building, which was a former
prison, was bought by the government in 1902.

take
The govt. purchase prison
place
in 1902
purchase
The Roanoke building
buy
The Roanoke prison
In 1902
The government
be
a former prison
The Roanoke building
RothSammons07
40
Basic Representations (Logical Representation)
Bos Markert The semantic representation langu
age is a first-order fragment a language used in
Discourse Representation Theory (DRS),
conveying argument structure with a
neo-Davidsonian analysis and Including the
recursive DRS structure to cover negation,
disjunction, and implication.
41
Representing Knowledge Sources

Rather straight forward in the Logical Framework

Tree/Graph base representation may also use rule
based transformations to encode different kinds
of knowledge, sometimes represented as generic or
knowledge based tree transformations.

42
Representing Knowledge Sources (cont.)

In general, there is a mix of procedural and rule
based encodings of knowledge sources
Done by hanging more information on parse tree or
predicate argument representation Example from
LCCs system
Or different frame-based annotation systems for
encoding information, that are processed
procedurally.

43
Details of The Entailment Strategy (Again)

Preprocessing
Multiple levels of lexical pre-processing
Syntactic Parsing
Shallow semantic parsing
Annotating semantic phenomena
Representation
Bag of words, n-grams through tree/graphs based
representation
Logical representations

Knowledge Sources
Syntactic mapping rules
Lexical resources
Semantic Phenomena specific modules
RTE specific knowledge sources
Additional Corpora/Web resources
Control Strategy Decision Making
Single pass/iterative processing
Strict vs. Parameter based
Justification
What can be said about the decision?

44
Knowledge Sources

The knowledge sources available to the system are
the most significant component of supporting TE.
Different systems draw differently the line
between preprocessing capabilities and knowledge
resources.
The way resources are handled is also different
across different approaches.

45
Enriching Preprocessing

In addition to syntactic parsing several
approaches enrich the representation with various
linguistics resources
Pos tagging
Stemming
Predicate argument representation verb
predicates and nominalization
Entity Annotation Stand alone NERs with a
variable number of classes
Acronym handling and Entity Normalization
mapping mentions of the same entity mentioned in
different ways to a single ID.
Co-reference resolution
Dates, times and numeric values identification
and normalization.
Identification of semantic relations complex
nominals, genitives, adjectival phrases, and
adjectival clauses.
Event identification and frame construction.

46
Lexical Resources

Recognizing that a word or a phrase in S entails
a word or a phrase in H is essential in
determining Textual Entailment.
Wordnet is the most commonly used resoruce
In most cases, a Wordnet based similarity measure
between words is used. This is typically a
symmetric relation.
Lexical chains over Wordnet are used in some
cases, care is taken to disallow some chains of
specific relations.
Extended Wordnet is being used to make use of
Entities
Derivation relation which links verbs with their
corresponding nominalized nouns.

47
Lexical Resources (Cont.)

Lexical Paraphrasing Rules
A number of efforts to acquire relational
paraphrase rules are under way, and several
systems are making use of resources such as DIRT
and TEASE.
Some systems seems to have acquired paraphrase
rules that are in the RTE corpus
person killed --gt claimed one life
hand reins over to --gt give starting job to
same-sex marriage --gt gay nuptials
cast ballots in the election -gt vote
dominant firm --gt monopoly power
death toll --gt kill
try to kill --gt attack
lost their lives --gt were killed
left people dead --gt people were killed

48
Semantic Phenomena

A large number of semantic phenomena have been
identified as significant to Textual Entailment.
A large number of them are being handled (in a
restricted way) by some of the systems. Very
little quantification per-phenomena has been
done, if at all.
Semantic implications of interpreting syntactic
structures Braz et. al05 Bar-Haim et. al. 07
Conjunctions
Jake and Jill ran up the hill Jake ran up the
hill
Jake and Jill met on the hill Jake met on the
hill
Clausal modifiers
But celebrations were muted as many Iranians
observed a Shi'ite mourning month.
Many Iranians observed a Shi'ite mourning month.
Semantic Role Labeling handles this phenomena
automatically

49
Semantic Phenomena (Cont.)

Relative clauses
The assailants fired six bullets at the car,
which carried Vladimir Skobtsov.
The car carried Vladimir Skobtsov.
Semantic Role Labeling handles this phenomena
automatically
Appositives
Frank Robinson, a one-time manager of the
Indians, has the distinction for the NL.
Frank Robinson is a one-time manager of the
Indians.
Passive
We have been approached by the investment banker.
The investment banker approached us.
Semantic Role Labeling handles this phenomena
automatically
Genitive modifier
Malaysia's crude palm oil output is
estimated to have risen..
The crude palm oil output of Malasia is
estimated to have risen .

50
Logical Structure

Factivity Uncovering the context in which a
verb phrase is embedded
The terrorists tried to enter the building.
The terrorists entered the building.
Polarity negative markers or a negation-denoting
verb (e.g. deny, refuse, fail)
The terrorists failed to enter the building.
The terrorists entered the building.
Modality/Negation Dealing with modal auxiliary
verbs (can, must, should), that modify verbs
meanings and with the identification of the scope
of negation.
Superlatives/Comperatives/Monotonicity
inflecting adjectives or adverbs.
Quantifiers, determiners and articles

51
Some Examples Braz et. al. IJCAI
workshop05PARC Corpus

T Legally, John could drive.
H John drove.
.
S Bush said that Khan sold centrifuges to North
Korea.
H Centrifuges were sold to North Korea.
.
S No US congressman visited Iraq until the war.
H Some US congressmen visited Iraq before the
war.
S The room was full of women.
H The room was full of intelligent women.
S The New York Times reported that Hanssen sold
FBI secrets to the Russians and could face the
death penalty.
H Hanssen sold FBI secrets to the Russians.
S All soldiers were killed in the ambush.
H Many soldiers were killed in the ambush.

52
Details of The Entailment Strategy (Again)

Preprocessing
Multiple levels of lexical pre-processing
Syntactic Parsing
Shallow semantic parsing
Annotating semantic phenomena
Representation
Bag of words, n-grams through tree/graphs based
representation
Logical representations

Knowledge Sources
Syntactic mapping rules
Lexical resources
Semantic Phenomena specific modules
RTE specific knowledge sources
Additional Corpora/Web resources
Control Strategy Decision Making
Single pass/iterative processing
Strict vs. Parameter based
Justification
What can be said about the decision?

53
Control Strategy and Decision Making

Single Iteration
Strict Logical approaches are, in principle, a
single stage computation.
The pair is processed and transform into the
logic form.
Existing Theorem Provers act on the pair along
with the KB.
Multiple iterations
Graph based algorithms are typically iterative.
Following Punyakanok et. al 04 transformations
are applied and entailment test is done after
each transformation is applied.
Transformation can be chained, but sometimes the
order makes a difference. The algorithm can be a
greedy algorithm or can be more exhaustive, and
search for the best path found Braz et.
al05Bar-Haim et.al 07

54
Transformation Walkthrough Braz et. al05

T The government purchase of the Roanoke
building, a former prison, took place in 1902.
H The Roanoke building, which was a former
prison, was bought by the government in 1902.

Does H follow from T?
55
Transformation Walkthrough (1)

T The government purchase of the Roanoke
building, a former prison, took place in 1902.
H The Roanoke building, which was a former
prison, was bought by the government in 1902.

take
The govt. purchase prison
place
in 1902
purchase
The Roanoke building
buy
The Roanoke prison
In 1902
The government
be
a former prison
The Roanoke building
56
Transformation Walkthrough (2)

T The government purchase of the Roanoke
building, a former prison, took place in 1902.
The government purchase of the Roanoke
building,
a former prison, occurred in 1902.
H The Roanoke building, which was a former
prison, was bought by the government.

Phrasal Verb Rewriter
occur
The govt. purchase prison
in 1902
57
Transformation Walkthrough (3)

T The government purchase of the Roanoke
building, a former prison, occurred in 1902.
The government purchase the Roanoke building in
1902.
H The Roanoke building, which was a former
prison, was bought by the government in 1902.

Nominalization Promoter
NOTE depends on earlier transformation order
is important!
purchase
The government
the Roanoke building, a former prison
In 1902
58
Transformation Walkthrough (4)

T The government purchase of the Roanoke
building, a former prison, occurred in 1902.
The Roanoke building be a former prison.
H The Roanoke building, which was a former
prison, was bought by the government in 1902.

Apposition Rewriter
be
The Roanoke building
a former prison
59
Transformation Walkthrough (5)

T The government purchase of the Roanoke
building, a former prison, took place in 1902.
H The Roanoke building, which was a former
prison, was bought by the government in 1902.

purchase
The Roanoke prison
In 1902
The government
be
a former prison
The Roanoke building
WordNet
buy
The Roanoke prison
In 1902
The government
be
a former prison
The Roanoke building
60
Characteristics

Multiple paths gt optimization problem
Shortest or highest-confidence path through
transformations
Order is important may need to explore different
orderings
Module dependencies are local module B does
not need access to module As KB/inference, only
its output
If outcome is true, the (optimal) set of
transformations and local comparisons form a proof

61
Summary Control Strategy and Decision Making

Despite the appeal of the Strict Logical
approaches as of today, they do not work well
enough.
Bos Markert
Strict logical approach is failing significantly
behind good LLMs and multiple levels of lexical
pre-processing
Only incorporating rather shallow features and
using it in the evaluation saves this approach.
Braz et. al.
Strict graph based representation is not doing as
well as LLM.
Tatu et. al
Results show that strict logical approach is
inferior to LLMs, but when put together, it
produces some gain.
Using Machine Learning methods as a way to
combine systems and multiple features has been
found very useful.

62
Hybrid/Ensemble Approaches

Bos et al. use theorem prover and model builder
Expand models of T, H using model builder, check
sizes of models
Test consistency with background knowledge with
T, H
Try to prove entailment with and without
background knowledge
Tatu et al. (2006) use ensemble approach
Create two logical systems, one lexical alignment
system
Combine system scores using coefficients found
via search (train on annotated data)
Modify coefficients for different tasks
Zanzotto et al. (2006) try to learn from
comparison of structures of T, H for true vs.
false entailment pairs
Use lexical, syntactic annotation to characterize
match between T, H for successful, unsuccessful
entailment pairs
Train Kernel/SVM to distinguish between match
graphs

63
Justification

For most approaches justification is given only
by the data Preprocessed
Empirical Evaluation
Logical Approaches
There is a proof theoretic justification
Modulo the power of the resources and the ability
to map a sentence to a logical form.
Graph/tree based approaches
There is a model theoretic justification
The approach is sound, but not complete, modulo
the availably of resources.

64
Justifying Graph Based Approaches Braz et. al 05

R - a knowledge representation language, with a
well defined
syntax and semantics or a domain D.
For text snippets s, t
rs, rt - their representations in R.
M(rs), M(rt) their model theoretic
representations
There is a well defined notion of subsumption in
R, defined model theoretically
u, v 2 R u is subsumed by v when M(u) µ
M(v)
Not an algorithm need a proof theory.

65
Defining Semantic Entailment (2)

The proof theory is weak will show rs µ rt only
when they are relatively similar syntactically.
r 2 R is faithful to s if M(rs) M(r)
Definition Let s, t, be text snippets with
representations rs, rt 2 R.
We say that s semantically entails t if
there is a representation r 2 R that is faithful
to s, for which we can prove that r µ rt
Given rs need to generate many equivalent
representations rs and test rs µ rt

Cannot be done exhaustively How to generate
alternative representations?
66
Defining Semantic Entailment (3)

A rewrite rule (l,r) is a pair of expressions in
R such that l µ r
Given a representation rs of s and a rule (r,l)
for which rs µ l the augmentation of rs via
(l,r) is rs rs Æ r.
Claim rs is faithful to s.
Proof In general, since rs rs Æ r then
M(rs) M(rs) Å M(r) However, since rs µ l µ r
then M(rs) µ M(r).
Consequently M(rs) M(rs)
And the augmented representation is
faithful to s.

µ
rs
l µ r, rs µ l
rs rs Æ r
67
Comments

The claim suggests an algorithm for generating
alternative (equivalent) representations and for
semantic entailment.
The resulting algorithm is a sound algorithm, but
is not complete.
Completeness depends on the quality of the KB of
rules.
The power of this algorithm is in the rules KB.
l and r might be very different
syntactically, but by satisfying model theoretic
subsumption they provide expressivity to the
re-representation in a way that facilitates the
overall subsumption.

68
Non-Entailment

The problem of determining non-entailment is
harder, mostly due to its structure.
Most approaches determine non-entailment
heuristically.
Set a threshold for a cost function. If not met
by the pair, say now
Several approach has identified specific features
the hind on non-entialment.
A model Theoretic approach for non-entailment has
also been developed, although its effectiveness
isnt clear yet.

69
What are we missing?

It is completely clear that the key resource
missing is knowledge.
Better resources translate immediately to better
results.
At this point existing resources seem to be
lacking in coverage and accuracy.
Not enough high quality public resources no
quantification.
Some Examples
Lexical Knowledge Some cases are difficult to
acquire systematically.
A bought Y ? A has/owns Y
Many of the current lexical resources are very
noisy.
Numbers, quantitative reasoning
Time and Date Temporal Reasoning.
Robust event based reasoning and information
integration

70
Textual Entailment as a Classification Task
71
RTE as classification task

RTE is a classification task
Given a pair we need to decide if T implies H or
T does not implies H
We can learn a classifier from annotated examples
What do we need
A learning algorithm
A suitable feature space

Page 71
72
Defining the feature space

How do we define the feature space?
Possible features
Distance Features - Features of some distance
between T and H
Entailment trigger Features
Pair Feature The content of the T-H pair is
represented
Possible representations of the sentences
Bag-of-words (possibly with n-grams)
Syntactic representation
Semantic representation

Page 72
73
Distance Features

Possible features
Number of words in common
Longest common subsequence
Longest common syntactic subtree

Page 73
74
Entailment Triggers

Possible features
from (de Marneffe et al., 2006)
Polarity features
presence/absence of neative polarity contexts
(not,no or few, without)
Oil price surged?Oil prices didnt grow
Antonymy features
presence/absence of antonymous words in T and H
Oil price is surging?Oil prices is falling
down
Adjunct features
dropping/adding of syntactic adjunct when moving
from T to H
all solid companies pay dividends ?all solid
companies pay cash dividends

Page 74
75
Pair Features

Possible features
Bag-of-word spaces of T and H
Syntactic spaces of T and H

T
H
companies_H
companies_T
insurance_H
dividends_T
dividends_H
year_T
solid_T
year_H
solid_H
end_H
end_T
pay_T
pay_H

Page 75
76
Pair Features what can we learn?

Bag-of-word spaces of T and H
We can learn
T implies H as when T contains end
T does not imply H when H contains end

T
H
companies_H
companies_T
insurance_H
dividends_T
dividends_H
year_T
solid_T
year_H
solid_H
end_H
end_T
pay_T
pay_H

It seems to be totally irrelevant!!!
Page 76
77
ML Methods in the possible feature spaces
Pair
(ZanzottoMoschitti, 2006)
(BosMarkert, 2006)
(de Marneffe et al., 2006)
Entailment Trigger
Possible Features
(Hickl et al., 2006)
(Ipken et al., 2006)
Distance
(KozarevaMontoyo, 2006)
()
(Herrera et al., 2006)
()
()
(Rodney et al., 2006)
Syntactic
Semantic
Bag-of-words
Sentence representation
Page 77
78
Effectively using the Pair Feature Space
(Zanzotto, Moschitti, 2006)

Roadmap
Motivation Reason why it is important even if it
seems not.
Understanding the model with an example
Challenges
A simple example
Defining the cross-pair similarity

Page 78
79
Observing the Distance Feature Space
(Zanzotto, Moschitti, 2006)
common syntactic dependencies
In a distance feature space
the two pairs are very likely the same point
common words
Page 79
80
What can happen in the pair feature space?
(Zanzotto, Moschitti, 2006)
Page 80
81
Observations

Some examples are difficult to be exploited in
the distance feature space
We need a space that considers the content and
the structure of textual entailment examples
Let us explore
the pair space!
using the Kernel Trick define the space
defining the distance K(P1 , P2) instead of
defining the feautures

K(T1 ? H1,T1 ? H2)
Page 81
82
Target
(Zanzotto, Moschitti, 2006)

How do we build it
Using a syntactic interpretation of sentences
Using a similarity among trees KT(T,T) this
similarity counts the number of subtrees in
common between T and T
This is a syntactic pair feature space
Question do we need something more?

Cross-pair similarity
KS((T,H),(T,H))? KT(T,T) KT(H,H)

Page 82
83
Observing the syntactic pair feature space
(Zanzotto, Moschitti, 2006)

Can we use syntactic tree similarity?

Page 83
84
Observing the syntactic pair feature space
(Zanzotto, Moschitti, 2006)

Can we use syntactic tree similarity?

Page 84
85
Observing the syntactic pair feature space
(Zanzotto, Moschitti, 2006)

Can we use syntactic tree similarity? Not only!

Page 85
86
Observing the syntactic pair feature space
(Zanzotto, Moschitti, 2006)

Can we use syntactic tree similarity? Not only!
We want to use/exploit also the implied rewrite
rule

a
c
d
a
c
d
b
b
a
c
d
a
c
d
b
b
Page 86
87
Exploiting Rewrite Rules
(Zanzotto, Moschitti, 2006)

To capture the textual entailment recognition
rule (rewrite rule or inference rule), the
cross-pair similarity measure should consider
the structural/syntactical similarity between,
respectively, texts and hypotheses
the similarity among the intra-pair relations
between constituents

How to reduce the problem to a tree similarity
computation?
Page 87
88
Exploiting Rewrite Rules
(Zanzotto, Moschitti, 2006)
Page 88
89
Exploiting Rewrite Rules
Intra-pair operations
(Zanzotto, Moschitti, 2006)
Page 89
90
Exploiting Rewrite Rules
Intra-pair operations ? Finding anchors
(Zanzotto, Moschitti, 2006)
Page 90
91
Exploiting Rewrite Rules

Intra-pair operations
Finding anchors
Naming anchors with placeholders

(Zanzotto, Moschitti, 2006)
Page 91
92
Exploiting Rewrite Rules

Intra-pair operations
Finding anchors
Naming anchors with placeholders
Propagating placeholders

(Zanzotto, Moschitti, 2006)
Page 92
93
Exploiting Rewrite Rules

Intra-pair operations
Finding anchors
Naming anchors with placeholders
Propagating placeholders

Cross-pair operations
(Zanzotto, Moschitti, 2006)
Page 93
94
Exploiting Rewrite Rules

Cross-pair operations
Matching placeholders across pairs

Intra-pair operations
Finding anchors
Naming anchors with placeholders
Propagating placeholders

(Zanzotto, Moschitti, 2006)
Page 94
95
Exploiting Rewrite Rules

Cross-pair operations
Matching placeholders across pairs
Renaming placeholders

Intra-pair operations
Finding anchors
Naming anchors with placeholders
Propagating placeholders

Page 95
96
Exploiting Rewrite Rules

Intra-pair operations
Finding anchors
Naming anchors with placeholders
Propagating placeholders

Cross-pair operations
Matching placeholders across pairs
Renaming placeholders
Calculating the similarity between syntactic
trees with co-indexed leaves

Page 96
97
Exploiting Rewrite Rules

Intra-pair operations
Finding anchors
Naming anchors with placeholders
Propagating placeholders

Cross-pair operations
Matching placeholders across pairs
Renaming placeholders
Calculating the similarity between syntactic
trees with co-indexed leaves

(Zanzotto, Moschitti, 2006)
Page 97
98
Exploiting Rewrite Rules
(Zanzotto, Moschitti, 2006)

The initial example sim(H1,H3) gt sim(H2,H3)?

Page 98
99
Defining the Cross-pair similarity
(Zanzotto, Moschitti, 2006)

The cross pair similarity is based on the
distance between syntatic trees with co-indexed
leaves
where
C is the set of all the correspondences between
anchors of (T,H) and (T,H)
t(S, c) returns the parse tree of the hypothesis
(text) S where placeholders of these latter are
replaced by means of the substitution c
i is the identity substitution
KT(t1, t2) is a function that measures the
similarity between the two trees t1 and t2.

Page 99
100
Defining the Cross-pair similarity
Page 100
101
Refining Cross-pair Similarity
(Zanzotto, Moschitti, 2006)

Controlling complexity
We reduced the size of the set of anchors using
the notion of chunk
Reducing the computational cost
Many subtree computations are repeated during the
computation of KT(t1, t2). This can be exploited
for a better dynamic progamming algorithm
(MoschittiZanzotto, 2007)
Focussing on information within a pair relevant
for the entailment
Text trees are pruned according to where anchors
attach

Page 101
102
BREAK (30 min)
103
III. Knowledge Acquisition Methods
104
Knowledge Acquisition for TE

What kind of knowledge we need?
Explicit Knowledge (Structured Knowledge Bases)
Relations among words (or concepts)
Symmetric Synonymy, cohypohymy
Directional hyponymy, part of,
Relations among sentence prototypes
Symmetric Paraphrasing
Directional Inference Rules/Rewrite Rules
Implicit Knowledge
Relations among sentences
Symmetric paraphrasing examples
Directional entailment examples

Page 104
105
Acquisition of Explicit Knowledge
Page 105
106
Acquisition of Explicit Knowledge

The questions we need to answer
What?
What we want to learn? Which resources do we
need?
Using what?
Which are the principles we have?
How?
How do we organize the knowledge acquisition
algorithm

Page 106
107
Acquisition of Explicit Knowledge what?

Types of knowledge
Symmetric
Co-hyponymy
Between words cat ? dog
Synonymy
Between words buy ? acquire
Sentence prototypes (paraphrasing) X bought Y ?
X acquired Z of the Ys shares
Directional semantic relations
Words cat ? animal , buy ? own , wheel partof
car
Sentence prototypes X acquired Z of the Ys
shares ? X owns Y

Page 107
108
Acquisition of Explicit Knowledge Using what?

Underlying hypothesis
Harris Distributional Hypothesis (DH) (Harris,
1964)
Words that tend to occur in the same contexts
tend to have similar meanings.
Robisons Point-wise Assertion Patterns (PAP)
(Robison, 1970)
It is possible to extract relevant semantic
relations with some pattern.

sim(w1,w2)?sim(C(w1), C(w2))
w1 is in a relation r with w2 if the context
pattern(w1, w2 )
Page 108
109
Distributional Hypothesis (DH)
simw(W1,W2)?simctx(C(W1), C(W2))
Context (Feature) Space
Words or Forms
Corpus source of contexts
C(w1)
sun is constituted of hydrogen
w1 constitute
The Sun is composed of hydrogen
w2 compose
C(w2)
Page 109
110
Point-wise Assertion Patterns (PAP)
w1 is in a relation r with w2 if the contexts
patternsr(w1, w2 )
relation
w1 part_of w2
Corpus source of contexts
patterns
w1 is constituted of w2 w1 is composed of w2
sun is constituted of hydrogen
selects correct vs incorrect relations among
words
The Sun is composed of hydrogen
Statistical Indicator Scorpus(w1,w2)
part_of(sun,hydrogen)
Page 110
111
DH and PAP cooperate
Distributional Hypothesis
Point-wise assertion Patterns
Context (Feature) Space
Words or Forms
Corpus source of contexts
C(w1)
sun is constituted of hydrogen
w1 constitute
The Sun is composed of hydrogen
w2 compose
C(w2)
Page 111
112
Knowledge Acquisition Where methods differ?

On the word side
Target equivalence classes Concepts or Relations
Target forms words or expressions
On the context side
Feature Space
Similarity function

Page 112
113
KA4TE a first classification of some methods
Verb Entailment (Zanzotto et al., 2006)
Directional
Noun Entailment (GeffetDagan, 2005)
Relation Pattern Learning (ESPRESSO) (PantelPenna
cchiotti, 2006)
ISA patterns (Hearst, 1992)
Types of knowledge
ESPRESSO (PantelPennacchiotti, 2006)
Hearst
Concept Learning (LinPantel, 2001a)
Symmetric
TEASE (Szepktor et al.,2004)
Inference Rules (DIRT) (LinPantel, 2001b)
Point-wise assertion Patterns
Distributional Hypothesis
Underlying hypothesis
Page 113
114
Noun Entailment Relation
(GeffetDagan, 2006)

Type of knowledge directional relations
Underlying hypothesis distributional hypothesis
Main Idea distributional inclusion hypothesis

w1 ? w2
if
All the prominent features
of w1 occur with w2 in a
sufficiently large corpus

Context (Feature) Space
Words or Forms
Page 114
115
Verb Entailment Relations
(Zanzotto, Pennacchiotti, Pazienza, 2006)

Type of knowledge oriented relations
Underlying hypothesis point-wise assertion
patterns
Main Idea

Point-wise Mutual information
Statistical Indicator S?(v1,v2)
relation
v1 ? v2
patterns
agentive_nominalization(v2) v1
Page 115
116
Verb Entailment Relations
(Zanzotto, Pennacchiotti, Pazienza, 2006)

Understanding the idea
Selectional restriction
fly(x) ? has_wings(x)
in general
v(x) ? c(x) (if x is the subject of v then x has
the property c)
Agentive nominalization
agentive noun is the doer or the performer of an
action v
X is player may be read as play(x)
c(x) is clearly v(x) if the property c is
derived by v with an agentive nominalization

Skipped
Page 116
117
Verb Entailment Relations

Understanding the idea
Given the expression
player wins
Seen as a selctional restriction
win(x) ? play(x)
Seen as a selectional preference
P(play(x)win(x)) gt P(play(x))

Skipped
Page 117
118
Knowledge Acquisition for TE How?

The algorithmic nature of a DHPAP method
Direct
Starting point target words
Indirect
Starting point context feature space
Iterative
Interplay between the context feature space and
the target words

Page 118
119
Direct Algorithm
sim(w1,w2)?sim(C(w1), C(w2))

Select target words wi from the corpus or from a
dictionary
Retrieve contexts of each wi and represent them
in the feature space C(wi )
For each pair (wi, wj)
Compute the similarity sim(C(wi), C(wj )) in the
context space
If sim(wi, wj ) sim(C(wi), C(wj ))gtt,
wi and wj belong to the same equivalence class W

sim(w1,w2)?sim(I(C(w1)), I(C(w2)))
Context (Feature) Space
Words or Forms
C(w1)
w1 cat
w2 dog
C(w2)
Page 119
120
Indirect Algorithm

Given an equivalence class W, select relevant
contexts and represent them in the feature space
Retrieve target words (w1, , wn) that appear in
these contexts. These are likely to be words in
the equivalence class W
Eventually, for each wi, retrieve C(wiI) from the
corpus
Compute the centroid I(C(W))
For each for each wi,
if sim(I(C(W), wi)ltt, eliminate wi from W.

sim(w1,w2)?sim(C(w1), C(w2))
sim(w1,w2)?sim(I(C(w1)), I(C(w2)))
Context (Feature) Space
Words or Forms
C(w1)
w1 cat
w2 dog
C(w2)
Page 120
121
Iterative Algorithm

For each word wi in the equivalence class W,
retrieve the C(wi) contexts and represent them in
the feature space
Extract words wj that have contexts similar to
C(wi)
Extract contexts C(wj) of these new words
For each for each new word wj, if sim(C(W),
wj)gtt, put wj in W.

sim(w1,w2)?sim(C(w1), C(w2))
sim(w1,w2)?sim(I(C(w1)), I(C(w2)))
Context (Feature) Space
Words or Forms
C(w1)
w1 cat
w2 dog
Page 121
122
Knowledge Acquisition using DH and PAH

Direct Algorithms
Concepts from text via clustering (LinPantel,
2001)
Inference rules aka DIRT (LinPantel, 2001)
Indirect Algorithms
Hearsts ISA patterns (Hearst, 1992)
Question Answering patterns (RavichandranHovy,
2002)
Iterative Algorithms
Entailment rules from Web aka TEASE (Szepktor
et al., 2004)
Espresso (PantelPennacchiotti, 2006)

Page 122
123
TEASE
(Szepktor et al., 2004)

Type Iterative algorithm
On the word side
Target equivalence classes fine-grained
relations
Target forms verb with arguments
On the context side
Feature Space
Innovations with respect to reasearches lt 2004
First direct algorithm for extracting rules

prevent(X,Y)

X_fillermi?,Y_fillermi?
Page 123
124
TEASE
(Szepktor et al., 2004)
Lexicon
Input template X?subj-accuse-obj?Y
WEB
TEASE
Sample corpus for input template Paula Jones
accused Clinton BBC accused Blair Sanhedrin
accused St.Paul
Anchor Set Extraction(ASE)
Skipped
Anchor sets Paula Jones?subj
Clinton?obj Sanhedrin?subj St.Paul?obj
Template Extraction (TE)
Sample corpus for anchor sets Paula Jones called
Clinton indictable St.Paul defended before the
Sanhedrin
Templates X call Y indictableY defend before X
Page 124
iterate
125
TEASE
(Szepktor et al., 2004)

Innovations with respect to reasearches lt 2004
First direct algorithm for extracting rules
A feature selection is done to assess the most
informative features
Extracted forms are clustered to obtain the most
general sentence prototype of a given set of
equivalent forms

Skipped
Page 125
126
Espresso
(PantelPennacchiotti, 2006)

Type Iterative algorithm
On the word side
Target equivalence classes relations
Target forms expressions, sequences of tokens
Innovations with respect to reasearches lt 2006
A measure to determine specific vs. general
patterns (ranking in the equivalent forms)

compose(X,Y)
Y is composed by X, Y is made of X
Page 126
127
Espresso
(PantelPennacchiotti, 2006)
(leader , panel) (city , region) (oxygen , water)
1.0 (tree , land) 0.9 (atom, molecule) 0.7
(leader , panel) 0.6 (range of information, FBI
report) 0.6 (artifact , exhibit) 0.2 (oxygen ,
hydrogen)
Skipped
(tree , land) (oxygen , hydrogen) (atom,
molecule) (leader , panel) (range of information,
FBI report) (artifact , exhibit)
1.0 Y is composed by X 0.8 Y is part of X 0.2
X,Y
Y is composed by X X,Y Y is part of Y
Page 127
128
Espresso
(PantelPennacchiotti, 2006)

Innovations with respect to reasearches lt 2006
A measure to determine specific vs. general
patterns (ranking in the equivalent forms)
Both pattern and instance selections are
performed
Different Use of General and specific patter

Write a Comment

User Comments (0)