Title: Textual Entailment
1Textual Entailment
- Dan Roth,
- University of Illinois,
- Urbana-Champaign
- USA
Ido Dagan Bar Ilan University Israel
- Fabio Massimo Zanzotto
- University of Rome
- Italy
ACL -2007
2Outline
- Motivation and Task Definition
- A Skeletal review of Textual Entailment Systems
- Knowledge Acquisition Methods
- Applications of Textual Entailment
- A Textual Entailment view of Applied Semantics
3I. Motivation and Task Definition
4Motivation
- Text applications require semantic inference
- A common framework for applied semantics is
needed, but still missing - Textual entailment may provide such framework
5Desiderata for Modeling Framework
- A framework for a target level of language
processing should provide - Generic (feasible) module for applications
- Unified (agreeable) paradigm for investigating
language phenomena - Most semantics research is scattered
- WSD, NER, SRL, lexical semantics relations (e.g.
vs. syntax) - Dominating approach - interpretation
6Natural Language and Meaning
Meaning
Language
7Variability of Semantic Expression
The Dow Jones Industrial Average closed up 255
Dow ends up
Dow gains 255 points
Stock market hits a record high
Dow climbs 255
- Model variability as relations between text
expressions - Equivalence text1 ? text2 (paraphrasing)
- Entailment text1 ? text2 the general case
8Typical Application Inference Entailment
Question Expected answer formWho bought
Overture? gtgt X bought Overture
Overtures acquisition by Yahoo
Yahoo bought Overture
entails
hypothesized answer
text
- Similar for IE X acquire Y
- Similar for semantic IR t Overture was
bought for - Summarization (multi-document) identify
redundant info - MT evaluation (and recent ideas for MT)
- Educational applications
9KRAQ'05 Workshop - KNOWLEDGE and REASONING for
ANSWERING QUESTIONS (IJCAI-05)
- CFP
- Reasoning aspects   information fusion,  Â
search criteria expansion models   Â
summarization and intensional answers,  Â
reasoning under uncertainty or with incomplete
knowledge, - Knowledge representation and integration  Â
levels of knowledge involved (e.g. ontologies,
domain knowledge),   knowledge
extraction models and techniques to
optimize response accuracy but similar needs
for other applications can entailment provide
a common empirical framework?
10Classical Entailment Definition
- Chierchia McConnell-Ginet (2001)A text t
entails a hypothesis h if h is true in every
circumstance (possible world) in which t is true - Strict entailment - doesn't account for some
uncertainty allowed in applications
11Almost certain Entailments
- t The technological triumph known as GPS was
incubated in the mind of Ivan Getting. - h Ivan Getting invented the GPS.
12Applied Textual Entailment
- A directional relation between two text
fragments Text (t) and Hypothesis (h)
t entails h (t?h) if humans reading t will infer that h is most likely true
- Operational (applied) definition
- Human gold standard - as in NLP applications
- Assuming common background knowledge which is
indeed expected from applications
13Probabilistic Interpretation
- Definition
- t probabilistically entails h if
- P(h is true t) gt P(h is true)
- t increases the likelihood of h being true
- Positive PMI t provides information on hs
truth - P(h is true t ) entailment confidence
- The relevant entailment score for applications
- In practice most likely entailment expected
14The Role of Knowledge
- For textual entailment to hold we require
- text AND knowledge ? h
- but
- knowledge should not entail h alone
- Systems are not supposed to validate hs truth
regardless of t (e.g. by searching h on the web)
15PASCAL Recognizing Textual Entailment (RTE)
ChallengesEU FP-6 Funded PASCAL Network of
Excellence 2004-7
Bar-Ilan University ITC-irst and CELCT,
Trento MITRE Microsoft Research
16Generic Dataset by Application Use
- 7 application settings in RTE-1, 4 in RTE-2/3
- QA
- IE
- Semantic IR
- Comparable documents / multi-doc summarization
- MT evaluation
- Reading comprehension
- Paraphrase acquisition
- Most data created from actual applications output
- RTE-2/3 800 examples in development and test
sets - 50-50 YES/NO split
17RTE Examples
TEXT HYPOTHESIS TASK ENTAIL-MENT
1 Regan attended a ceremony in Washington to commemorate the landings in Normandy. Washington is located in Normandy. IE False
2 Google files for its long awaited IPO. Google goes public. IR True
3 a shootout at the Guadalajara airport in May, 1993, that killed Cardinal Juan Jesus Posadas Ocampo and six others. Cardinal Juan Jesus Posadas Ocampo died in 1993. QA True
4 The SPD got just 21.5 of the vote in the European Parliament elections, while the conservative opposition parties polled 44.5. The SPD is defeated by the opposition parties. IE True
18Participation and Impact
- Very successful challenges, world wide
- RTE-1 17 groups
- RTE-2 23 groups
- 150 downloads
- RTE-3 25 groups
- Joint workshop at ACL-07
- High interest in the research community
- Papers, conference sessions and areas, PhDs,
influence on funded projects - Textual Entailment special issue at JNLE
- ACL-07 tutorial
19Methods and Approaches (RTE-2)
- Measure similarity match between t and h
(coverage of h by t) - Lexical overlap (unigram, N-gram, subsequence)
- Lexical substitution (WordNet, statistical)
- Syntactic matching/transformations
- Lexical-syntactic variations (paraphrases)
- Semantic role labeling and matching
- Global similarity parameters (e.g. negation,
modality) - Cross-pair similarity
- Detect mismatch (for non-entailment)
- Interpretation to logic representation logic
inference
20Dominant approach Supervised Learning
Similarity FeaturesLexical, n-gram,syntactic sem
antic, global
Classifier
YES
t,h
NO
Feature vector
- Features model similarity and mismatch
- Classifier determines relative weights of
information sources - Train on development set and auxiliary t-h corpora
21RTE-2 Results
Average Precision Accuracy First Author (Group)
80.8 75.4 Hickl (LCC)
71.3 73.8 Tatu (LCC)
64.4 63.9 Zanzotto (Milan Rome)
62.8 62.6 Adams (Dallas)
66.9 61.6 Bos (Rome Leeds)
58.1-60.5 11 groups
52.9-55.6 7 groups
Average 60 Median 59
22Analysis
- For the first time methods that carry some
deeper analysis seemed (?) to outperform shallow
lexical methods
?
Cf. Kevin Knights invited talk at EACL-06,
titled Isnt linguistic Structure Important,
Asked the Engineer
- Still, most systems, which do utilize deep
analysis, did not score significantly better than
the lexical baseline
23Why?
- System reports point at
- Lack of knowledge (syntactic transformation
rules, paraphrases, lexical relations, etc.) - Lack of training data
- It seems that systems that coped better with
these issues performed best - Hickl et al. - acquisition of large entailment
corpora for training - Tatu et al. large knowledge bases (linguistic
and world knowledge)
24Some suggested research directions
- Knowledge acquisition
- Unsupervised acquisition of linguistic and world
knowledge from general corpora and web - Acquiring larger entailment corpora
- Manual resources and knowledge engineering
- Inference
- Principled framework for inference and fusion of
information levels - Are we happy with bags of features?
25Complementary Evaluation Modes
- Seek mode
- Input h and corpus
- Output all entailing t s in corpus
- Captures information seeking needs, but requires
post-run annotation (TREC-style) - Entailment subtasks evaluations
- Lexical, lexical-syntactic, logical, alignment
- Contribution to various applications
- QA Harabagiu Hickl, ACL-06 RE Romano et
al., EACL-06
26II. A Skeletal review of Textual Entailment
Systems
27Textual Entailment
Entails Subsumed by
Eyeing the huge market potential, currently led
by Google, Yahoo took over search company
Overture Services Inc. last year
?
Yahoo acquired Overture
Overture is a search company
Google is a search company
Google owns Overture
.
Phrasal verb paraphrasing
Entity matching
Alignment
Semantic Role Labeling
How?
Integration
28A general Strategy for Textual Entailment
Given a sentence T
Given a sentence H
?e
Re-represent T
Re-represent H
Lexical Syntactic Semantic
Lexical Syntactic Semantic
Knowledge Base semantic structural pragmatic
Transformations/rules
Representation
Decision Find the set of Transformations/Features
of the new representation (or use these to
create a cost function) that allows embedding
of H in T.
Re-represent T
Re-represent T
Re-represent T
Re-represent T
Re-represent T
Re-represent T
Re-represent T
29Details of The Entailment Strategy
- Preprocessing
- Multiple levels of lexical pre-processing
- Syntactic Parsing
- Shallow semantic parsing
- Annotating semantic phenomena
- Representation
- Bag of words, n-grams through tree/graphs based
representation - Logical representations
- Knowledge Sources
- Syntactic mapping rules
- Lexical resources
- Semantic Phenomena specific modules
- RTE specific knowledge sources
- Additional Corpora/Web resources
- Control Strategy Decision Making
- Single pass/iterative processing
- Strict vs. Parameter based
- Justification
- What can be said about the decision?
30The Case of Shallow Lexical Approaches
- Preprocessing
- Identify Stop Words
- Representation
- Bag of words
- Knowledge Sources
- Shallow Lexical resources typically Wordnet
- Control Strategy Decision Making
- Single pass
- Compute Similarity use threshold tuned on a
development set (could be per task) - Justification
- It works
31Shallow Lexical Approaches (Example)
- Lexical/word-based semantic overlap score based
on matching each word in H with some word in T - Word similarity measure may use WordNet
- May take account of subsequences, word order
- Learn threshold on maximum word-based match
score
Clearly, this may not appeal to what we think as
understanding, and it is easy to generate cases
for which this does not work well. However, it
works (surprisingly) well with respect to current
evaluation metrics (data sets?)
Text The Cassini spacecraft arrived at Titan
in July, 2006.
Text NASAs Cassini-Huygens spacecraft
traveled to Saturn in 2006.
Text The Cassini spacecraft has taken images
that show rivers on Saturns moon Titan.
Hyp The Cassini spacecraft has reached Titan.
32An Algorithm LocalLexcialMatching
- For each word in Hypothesis, Text
- if word matches stopword remove word
- if no words left in Hypothesis or Text return
0 - numberMatched 0
- for each word W_H in Hypothesis
- for each word W_T in Text
- HYP_LEMMAS Lemmatize(W_H)
- TEXT_LEMMAS Lemmatize(W_T)
- Use Wordnets
- if any term in HYP_LEMMAS matches any term in
TEXT_LEMMAS - using LexicalCompare()
- numberMatched
- Return numberMatched/HYP_Lemmas
33An Algorithm LocalLexicalMatching (Cont.)
LLM Performance RTE2 Dev 63.00 Test
60.50 RTE 3 Dev 67.50 Test 65.63
- LexicalCompare()
- if(LEMMA_H LEMMA_T)
- return TRUE
- if(HypernymDistanceFromTo(textWord,
hypothesisWord) lt 3) - return TRUE
- if(MeronymyDistanceFromTo(textWord,
hypothesisWord) lt 3) - returnTRUE
- if(MemberOfDistanceFromTo(textWord,
hypothesisWord) lt 3) - return TRUE
- if(SynonymOf(textWord, hypothesisWord)
- return TRUE
- Notes
- LexicalCompare is Asymmetric makes use of
single relation type - Additional differences could be attributed to
stop word list (e.g, including aux verbs) - Straightforward improvements such as bi-grams do
not help. - More sophisticated lexical knowledge (entities
time) should help.
34Details of The Entailment Strategy (Again)
- Preprocessing
- Multiple levels of lexical pre-processing
- Syntactic Parsing
- Shallow semantic parsing
- Annotating semantic phenomena
- Representation
- Bag of words, n-grams through tree/graphs based
representation - Logical representations
- Knowledge Sources
- Syntactic mapping rules
- Lexical resources
- Semantic Phenomena specific modules
- RTE specific knowledge sources
- Additional Corpora/Web resources
- Control Strategy Decision Making
- Single pass/iterative processing
- Strict vs. Parameter based
- Justification
- What can be said about the decision?
35Preprocessing
- Syntactic Processing
- Syntactic Parsing (Collins Charniak CCG)
- Dependency Parsing (types)
- Lexical Processing
- Tokenization lemmatization
- For each word in Hypothesis, Text
- Phrasal verbs
- Idiom processing
- Named Entities Normalization
- Date/Time arguments Normalization
- Semantic Processing
- Semantic Role Labeling
- Nominalization
- Modality/Polarity/Factive
- Co-reference
Only a few systems
often used only during decision making
often used only during decision making
36Details of The Entailment Strategy (Again)
- Preprocessing
- Multiple levels of lexical pre-processing
- Syntactic Parsing
- Shallow semantic parsing
- Annotating semantic phenomena
- Representation
- Bag of words, n-grams through tree/graphs based
representation - Logical representations
- Knowledge Sources
- Syntactic mapping rules
- Lexical resources
- Semantic Phenomena specific modules
- RTE specific knowledge sources
- Additional Corpora/Web resources
- Control Strategy Decision Making
- Single pass/iterative processing
- Strict vs. Parameter based
- Justification
- What can be said about the decision?
37Basic Representations
MeaningRepresentation
Inference
Logical Forms
Semantic Representation
Representation
Syntactic Parse
Local Lexical
Raw Text
Textual Entailment
- Most approaches augment the basic structure
defined by the processing level with additional
annotation and make use of a tree/graph/frame-base
d system.
38Basic Representations (Syntax)
Syntactic Parse
Local Lexical
Hyp The Cassini spacecraft has reached Titan.
39Basic Representations (Shallow Semantics
Pred-Arg )
- T The government purchase of the Roanoke
building, a former prison, took place in 1902. - H The Roanoke building, which was a former
prison, was bought by the government in 1902.
take
The govt. purchase prison
place
in 1902
purchase
The Roanoke building
buy
The Roanoke prison
In 1902
The government
be
a former prison
The Roanoke building
RothSammons07
40Basic Representations (Logical Representation)
Bos Markert The semantic representation langu
age is a first-order fragment a language used in
Discourse Representation Theory (DRS),
conveying argument structure with a
neo-Davidsonian analysis and Including the
recursive DRS structure to cover negation,
disjunction, and implication.
41Representing Knowledge Sources
- Rather straight forward in the Logical Framework
- Tree/Graph base representation may also use rule
based transformations to encode different kinds
of knowledge, sometimes represented as generic or
knowledge based tree transformations.
42Representing Knowledge Sources (cont.)
- In general, there is a mix of procedural and rule
based encodings of knowledge sources - Done by hanging more information on parse tree or
predicate argument representation Example from
LCCs system - Or different frame-based annotation systems for
encoding information, that are processed
procedurally.
43Details of The Entailment Strategy (Again)
- Preprocessing
- Multiple levels of lexical pre-processing
- Syntactic Parsing
- Shallow semantic parsing
- Annotating semantic phenomena
- Representation
- Bag of words, n-grams through tree/graphs based
representation - Logical representations
- Knowledge Sources
- Syntactic mapping rules
- Lexical resources
- Semantic Phenomena specific modules
- RTE specific knowledge sources
- Additional Corpora/Web resources
- Control Strategy Decision Making
- Single pass/iterative processing
- Strict vs. Parameter based
- Justification
- What can be said about the decision?
44Knowledge Sources
- The knowledge sources available to the system are
the most significant component of supporting TE. - Different systems draw differently the line
between preprocessing capabilities and knowledge
resources. - The way resources are handled is also different
across different approaches.
45Enriching Preprocessing
- In addition to syntactic parsing several
approaches enrich the representation with various
linguistics resources - Pos tagging
- Stemming
- Predicate argument representation verb
predicates and nominalization - Entity Annotation Stand alone NERs with a
variable number of classes - Acronym handling and Entity Normalization
mapping mentions of the same entity mentioned in
different ways to a single ID. - Co-reference resolution
- Dates, times and numeric values identification
and normalization. - Identification of semantic relations complex
nominals, genitives, adjectival phrases, and
adjectival clauses. - Event identification and frame construction.
46Lexical Resources
- Recognizing that a word or a phrase in S entails
a word or a phrase in H is essential in
determining Textual Entailment. - Wordnet is the most commonly used resoruce
- In most cases, a Wordnet based similarity measure
between words is used. This is typically a
symmetric relation. - Lexical chains over Wordnet are used in some
cases, care is taken to disallow some chains of
specific relations. - Extended Wordnet is being used to make use of
Entities - Derivation relation which links verbs with their
corresponding nominalized nouns.
47Lexical Resources (Cont.)
- Lexical Paraphrasing Rules
- A number of efforts to acquire relational
paraphrase rules are under way, and several
systems are making use of resources such as DIRT
and TEASE. - Some systems seems to have acquired paraphrase
rules that are in the RTE corpus - person killed --gt claimed one life
- hand reins over to --gt give starting job to
- same-sex marriage --gt gay nuptials
- cast ballots in the election -gt vote
- dominant firm --gt monopoly power
- death toll --gt kill
- try to kill --gt attack
- lost their lives --gt were killed
- left people dead --gt people were killed
48Semantic Phenomena
- A large number of semantic phenomena have been
identified as significant to Textual Entailment. - A large number of them are being handled (in a
restricted way) by some of the systems. Very
little quantification per-phenomena has been
done, if at all. - Semantic implications of interpreting syntactic
structures Braz et. al05 Bar-Haim et. al. 07 - Conjunctions
- Jake and Jill ran up the hill Jake ran up the
hill - Jake and Jill met on the hill Jake met on the
hill - Clausal modifiers
- But celebrations were muted as many Iranians
observed a Shi'ite mourning month. - Many Iranians observed a Shi'ite mourning month.
- Semantic Role Labeling handles this phenomena
automatically
49Semantic Phenomena (Cont.)
- Relative clauses
- The assailants fired six bullets at the car,
which carried Vladimir Skobtsov. - The car carried Vladimir Skobtsov.
- Semantic Role Labeling handles this phenomena
automatically - Appositives
- Frank Robinson, a one-time manager of the
Indians, has the distinction for the NL. - Frank Robinson is a one-time manager of the
Indians. - Passive
- We have been approached by the investment banker.
- The investment banker approached us.
- Semantic Role Labeling handles this phenomena
automatically - Genitive modifier
- Malaysia's crude palm oil output is
estimated to have risen.. - The crude palm oil output of Malasia is
estimated to have risen .
50Logical Structure
- Factivity Uncovering the context in which a
verb phrase is embedded - The terrorists tried to enter the building.
- The terrorists entered the building.
- Polarity negative markers or a negation-denoting
verb (e.g. deny, refuse, fail) - The terrorists failed to enter the building.
- The terrorists entered the building.
- Modality/Negation Dealing with modal auxiliary
verbs (can, must, should), that modify verbs
meanings and with the identification of the scope
of negation. - Superlatives/Comperatives/Monotonicity
inflecting adjectives or adverbs. - Quantifiers, determiners and articles
51Some Examples Braz et. al. IJCAI
workshop05PARC Corpus
- T Legally, John could drive.
- H John drove.
- .
- S Bush said that Khan sold centrifuges to North
Korea. - H Centrifuges were sold to North Korea.
- .
- S No US congressman visited Iraq until the war.
- H Some US congressmen visited Iraq before the
war. - S The room was full of women.
- H The room was full of intelligent women.
- S The New York Times reported that Hanssen sold
FBI secrets to the Russians and could face the
death penalty. - H Hanssen sold FBI secrets to the Russians.
- S All soldiers were killed in the ambush.
- H Many soldiers were killed in the ambush.
52Details of The Entailment Strategy (Again)
- Preprocessing
- Multiple levels of lexical pre-processing
- Syntactic Parsing
- Shallow semantic parsing
- Annotating semantic phenomena
- Representation
- Bag of words, n-grams through tree/graphs based
representation - Logical representations
- Knowledge Sources
- Syntactic mapping rules
- Lexical resources
- Semantic Phenomena specific modules
- RTE specific knowledge sources
- Additional Corpora/Web resources
- Control Strategy Decision Making
- Single pass/iterative processing
- Strict vs. Parameter based
- Justification
- What can be said about the decision?
53Control Strategy and Decision Making
- Single Iteration
- Strict Logical approaches are, in principle, a
single stage computation. - The pair is processed and transform into the
logic form. - Existing Theorem Provers act on the pair along
with the KB. - Multiple iterations
- Graph based algorithms are typically iterative.
- Following Punyakanok et. al 04 transformations
are applied and entailment test is done after
each transformation is applied. - Transformation can be chained, but sometimes the
order makes a difference. The algorithm can be a
greedy algorithm or can be more exhaustive, and
search for the best path found Braz et.
al05Bar-Haim et.al 07
54Transformation Walkthrough Braz et. al05
- T The government purchase of the Roanoke
building, a former prison, took place in 1902. - H The Roanoke building, which was a former
prison, was bought by the government in 1902.
Does H follow from T?
55Transformation Walkthrough (1)
- T The government purchase of the Roanoke
building, a former prison, took place in 1902. - H The Roanoke building, which was a former
prison, was bought by the government in 1902.
take
The govt. purchase prison
place
in 1902
purchase
The Roanoke building
buy
The Roanoke prison
In 1902
The government
be
a former prison
The Roanoke building
56Transformation Walkthrough (2)
- T The government purchase of the Roanoke
building, a former prison, took place in 1902. - The government purchase of the Roanoke
building, - a former prison, occurred in 1902.
- H The Roanoke building, which was a former
prison, was bought by the government.
Phrasal Verb Rewriter
occur
The govt. purchase prison
in 1902
57Transformation Walkthrough (3)
- T The government purchase of the Roanoke
building, a former prison, occurred in 1902. - The government purchase the Roanoke building in
1902. -
- H The Roanoke building, which was a former
prison, was bought by the government in 1902.
Nominalization Promoter
NOTE depends on earlier transformation order
is important!
purchase
The government
the Roanoke building, a former prison
In 1902
58Transformation Walkthrough (4)
- T The government purchase of the Roanoke
building, a former prison, occurred in 1902. - The Roanoke building be a former prison.
-
- H The Roanoke building, which was a former
prison, was bought by the government in 1902.
Apposition Rewriter
be
The Roanoke building
a former prison
59Transformation Walkthrough (5)
- T The government purchase of the Roanoke
building, a former prison, took place in 1902. - H The Roanoke building, which was a former
prison, was bought by the government in 1902.
purchase
The Roanoke prison
In 1902
The government
be
a former prison
The Roanoke building
WordNet
buy
The Roanoke prison
In 1902
The government
be
a former prison
The Roanoke building
60Characteristics
- Multiple paths gt optimization problem
- Shortest or highest-confidence path through
transformations - Order is important may need to explore different
orderings - Module dependencies are local module B does
not need access to module As KB/inference, only
its output - If outcome is true, the (optimal) set of
transformations and local comparisons form a proof
61Summary Control Strategy and Decision Making
- Despite the appeal of the Strict Logical
approaches as of today, they do not work well
enough. - Bos Markert
- Strict logical approach is failing significantly
behind good LLMs and multiple levels of lexical
pre-processing - Only incorporating rather shallow features and
using it in the evaluation saves this approach. - Braz et. al.
- Strict graph based representation is not doing as
well as LLM. - Tatu et. al
- Results show that strict logical approach is
inferior to LLMs, but when put together, it
produces some gain. - Using Machine Learning methods as a way to
combine systems and multiple features has been
found very useful.
62Hybrid/Ensemble Approaches
- Bos et al. use theorem prover and model builder
- Expand models of T, H using model builder, check
sizes of models - Test consistency with background knowledge with
T, H - Try to prove entailment with and without
background knowledge - Tatu et al. (2006) use ensemble approach
- Create two logical systems, one lexical alignment
system - Combine system scores using coefficients found
via search (train on annotated data) - Modify coefficients for different tasks
- Zanzotto et al. (2006) try to learn from
comparison of structures of T, H for true vs.
false entailment pairs - Use lexical, syntactic annotation to characterize
match between T, H for successful, unsuccessful
entailment pairs - Train Kernel/SVM to distinguish between match
graphs
63Justification
- For most approaches justification is given only
by the data Preprocessed - Empirical Evaluation
- Logical Approaches
- There is a proof theoretic justification
- Modulo the power of the resources and the ability
to map a sentence to a logical form. -
- Graph/tree based approaches
- There is a model theoretic justification
- The approach is sound, but not complete, modulo
the availably of resources.
64Justifying Graph Based Approaches Braz et. al 05
- R - a knowledge representation language, with a
well defined - syntax and semantics or a domain D.
- For text snippets s, t
- rs, rt - their representations in R.
- M(rs), M(rt) their model theoretic
representations - There is a well defined notion of subsumption in
R, defined model theoretically - u, v 2 R u is subsumed by v when M(u) µ
M(v) - Not an algorithm need a proof theory.
65Defining Semantic Entailment (2)
- The proof theory is weak will show rs µ rt only
when they are relatively similar syntactically. - r 2 R is faithful to s if M(rs) M(r)
- Definition Let s, t, be text snippets with
representations rs, rt 2 R. - We say that s semantically entails t if
there is a representation r 2 R that is faithful
to s, for which we can prove that r µ rt - Given rs need to generate many equivalent
representations rs and test rs µ rt
Cannot be done exhaustively How to generate
alternative representations?
66Defining Semantic Entailment (3)
- A rewrite rule (l,r) is a pair of expressions in
R such that l µ r - Given a representation rs of s and a rule (r,l)
for which rs µ l the augmentation of rs via
(l,r) is rs rs Æ r. - Claim rs is faithful to s.
- Proof In general, since rs rs Æ r then
M(rs) M(rs) Å M(r) However, since rs µ l µ r
then M(rs) µ M(r). - Consequently M(rs) M(rs)
- And the augmented representation is
faithful to s.
µ
rs
l µ r, rs µ l
rs rs Æ r
67Comments
- The claim suggests an algorithm for generating
alternative (equivalent) representations and for
semantic entailment. - The resulting algorithm is a sound algorithm, but
is not complete. - Completeness depends on the quality of the KB of
rules. - The power of this algorithm is in the rules KB.
- l and r might be very different
syntactically, but by satisfying model theoretic
subsumption they provide expressivity to the
re-representation in a way that facilitates the
overall subsumption.
68Non-Entailment
- The problem of determining non-entailment is
harder, mostly due to its structure. - Most approaches determine non-entailment
heuristically. - Set a threshold for a cost function. If not met
by the pair, say now - Several approach has identified specific features
the hind on non-entialment. - A model Theoretic approach for non-entailment has
also been developed, although its effectiveness
isnt clear yet.
69What are we missing?
- It is completely clear that the key resource
missing is knowledge. - Better resources translate immediately to better
results. - At this point existing resources seem to be
lacking in coverage and accuracy. - Not enough high quality public resources no
quantification. - Some Examples
- Lexical Knowledge Some cases are difficult to
acquire systematically. - A bought Y ? A has/owns Y
- Many of the current lexical resources are very
noisy. - Numbers, quantitative reasoning
- Time and Date Temporal Reasoning.
- Robust event based reasoning and information
integration
70Textual Entailment as a Classification Task
71RTE as classification task
- RTE is a classification task
- Given a pair we need to decide if T implies H or
T does not implies H - We can learn a classifier from annotated examples
- What do we need
- A learning algorithm
- A suitable feature space
Page 71
72Defining the feature space
- How do we define the feature space?
- Possible features
- Distance Features - Features of some distance
between T and H - Entailment trigger Features
- Pair Feature The content of the T-H pair is
represented - Possible representations of the sentences
- Bag-of-words (possibly with n-grams)
- Syntactic representation
- Semantic representation
Page 72
73Distance Features
- Possible features
- Number of words in common
- Longest common subsequence
- Longest common syntactic subtree
Page 73
74Entailment Triggers
- Possible features
- from (de Marneffe et al., 2006)
- Polarity features
- presence/absence of neative polarity contexts
(not,no or few, without) - Oil price surged?Oil prices didnt grow
- Antonymy features
- presence/absence of antonymous words in T and H
- Oil price is surging?Oil prices is falling
down - Adjunct features
- dropping/adding of syntactic adjunct when moving
from T to H - all solid companies pay dividends ?all solid
companies pay cash dividends
Page 74
75Pair Features
- Possible features
- Bag-of-word spaces of T and H
- Syntactic spaces of T and H
T
H
companies_H
companies_T
insurance_H
dividends_T
dividends_H
year_T
solid_T
year_H
solid_H
end_H
end_T
pay_T
pay_H
Page 75
76Pair Features what can we learn?
- Bag-of-word spaces of T and H
- We can learn
- T implies H as when T contains end
- T does not imply H when H contains end
T
H
companies_H
companies_T
insurance_H
dividends_T
dividends_H
year_T
solid_T
year_H
solid_H
end_H
end_T
pay_T
pay_H
It seems to be totally irrelevant!!!
Page 76
77ML Methods in the possible feature spaces
Pair
(ZanzottoMoschitti, 2006)
(BosMarkert, 2006)
(de Marneffe et al., 2006)
Entailment Trigger
Possible Features
(Hickl et al., 2006)
(Ipken et al., 2006)
Distance
(KozarevaMontoyo, 2006)
()
(Herrera et al., 2006)
()
()
(Rodney et al., 2006)
Syntactic
Semantic
Bag-of-words
Sentence representation
Page 77
78Effectively using the Pair Feature Space
(Zanzotto, Moschitti, 2006)
- Roadmap
- Motivation Reason why it is important even if it
seems not. - Understanding the model with an example
- Challenges
- A simple example
- Defining the cross-pair similarity
Page 78
79Observing the Distance Feature Space
(Zanzotto, Moschitti, 2006)
common syntactic dependencies
In a distance feature space
the two pairs are very likely the same point
common words
Page 79
80What can happen in the pair feature space?
(Zanzotto, Moschitti, 2006)
Page 80
81Observations
- Some examples are difficult to be exploited in
the distance feature space - We need a space that considers the content and
the structure of textual entailment examples - Let us explore
- the pair space!
- using the Kernel Trick define the space
defining the distance K(P1 , P2) instead of
defining the feautures
K(T1 ? H1,T1 ? H2)
Page 81
82Target
(Zanzotto, Moschitti, 2006)
- How do we build it
- Using a syntactic interpretation of sentences
- Using a similarity among trees KT(T,T) this
similarity counts the number of subtrees in
common between T and T - This is a syntactic pair feature space
- Question do we need something more?
- Cross-pair similarity
- KS((T,H),(T,H))? KT(T,T) KT(H,H)
Page 82
83Observing the syntactic pair feature space
(Zanzotto, Moschitti, 2006)
- Can we use syntactic tree similarity?
Page 83
84Observing the syntactic pair feature space
(Zanzotto, Moschitti, 2006)
- Can we use syntactic tree similarity?
Page 84
85Observing the syntactic pair feature space
(Zanzotto, Moschitti, 2006)
- Can we use syntactic tree similarity? Not only!
Page 85
86Observing the syntactic pair feature space
(Zanzotto, Moschitti, 2006)
- Can we use syntactic tree similarity? Not only!
- We want to use/exploit also the implied rewrite
rule
a
c
d
a
c
d
b
b
a
c
d
a
c
d
b
b
Page 86
87Exploiting Rewrite Rules
(Zanzotto, Moschitti, 2006)
- To capture the textual entailment recognition
rule (rewrite rule or inference rule), the
cross-pair similarity measure should consider - the structural/syntactical similarity between,
respectively, texts and hypotheses - the similarity among the intra-pair relations
between constituents
How to reduce the problem to a tree similarity
computation?
Page 87
88Exploiting Rewrite Rules
(Zanzotto, Moschitti, 2006)
Page 88
89Exploiting Rewrite Rules
Intra-pair operations
(Zanzotto, Moschitti, 2006)
Page 89
90Exploiting Rewrite Rules
Intra-pair operations ? Finding anchors
(Zanzotto, Moschitti, 2006)
Page 90
91Exploiting Rewrite Rules
- Intra-pair operations
- Finding anchors
- Naming anchors with placeholders
(Zanzotto, Moschitti, 2006)
Page 91
92Exploiting Rewrite Rules
- Intra-pair operations
- Finding anchors
- Naming anchors with placeholders
- Propagating placeholders
(Zanzotto, Moschitti, 2006)
Page 92
93Exploiting Rewrite Rules
- Intra-pair operations
- Finding anchors
- Naming anchors with placeholders
- Propagating placeholders
Cross-pair operations
(Zanzotto, Moschitti, 2006)
Page 93
94Exploiting Rewrite Rules
- Cross-pair operations
- Matching placeholders across pairs
- Intra-pair operations
- Finding anchors
- Naming anchors with placeholders
- Propagating placeholders
(Zanzotto, Moschitti, 2006)
Page 94
95Exploiting Rewrite Rules
- Cross-pair operations
- Matching placeholders across pairs
- Renaming placeholders
- Intra-pair operations
- Finding anchors
- Naming anchors with placeholders
- Propagating placeholders
Page 95
96Exploiting Rewrite Rules
- Intra-pair operations
- Finding anchors
- Naming anchors with placeholders
- Propagating placeholders
- Cross-pair operations
- Matching placeholders across pairs
- Renaming placeholders
- Calculating the similarity between syntactic
trees with co-indexed leaves
Page 96
97Exploiting Rewrite Rules
- Intra-pair operations
- Finding anchors
- Naming anchors with placeholders
- Propagating placeholders
- Cross-pair operations
- Matching placeholders across pairs
- Renaming placeholders
- Calculating the similarity between syntactic
trees with co-indexed leaves
(Zanzotto, Moschitti, 2006)
Page 97
98Exploiting Rewrite Rules
(Zanzotto, Moschitti, 2006)
- The initial example sim(H1,H3) gt sim(H2,H3)?
Page 98
99Defining the Cross-pair similarity
(Zanzotto, Moschitti, 2006)
- The cross pair similarity is based on the
distance between syntatic trees with co-indexed
leaves - where
- C is the set of all the correspondences between
anchors of (T,H) and (T,H) - t(S, c) returns the parse tree of the hypothesis
(text) S where placeholders of these latter are
replaced by means of the substitution c - i is the identity substitution
- KT(t1, t2) is a function that measures the
similarity between the two trees t1 and t2.
Page 99
100Defining the Cross-pair similarity
Page 100
101Refining Cross-pair Similarity
(Zanzotto, Moschitti, 2006)
- Controlling complexity
- We reduced the size of the set of anchors using
the notion of chunk - Reducing the computational cost
- Many subtree computations are repeated during the
computation of KT(t1, t2). This can be exploited
for a better dynamic progamming algorithm
(MoschittiZanzotto, 2007) - Focussing on information within a pair relevant
for the entailment - Text trees are pruned according to where anchors
attach
Page 101
102BREAK (30 min)
103III. Knowledge Acquisition Methods
104Knowledge Acquisition for TE
- What kind of knowledge we need?
- Explicit Knowledge (Structured Knowledge Bases)
- Relations among words (or concepts)
- Symmetric Synonymy, cohypohymy
- Directional hyponymy, part of,
- Relations among sentence prototypes
- Symmetric Paraphrasing
- Directional Inference Rules/Rewrite Rules
- Implicit Knowledge
- Relations among sentences
- Symmetric paraphrasing examples
- Directional entailment examples
Page 104
105Acquisition of Explicit Knowledge
Page 105
106Acquisition of Explicit Knowledge
- The questions we need to answer
- What?
- What we want to learn? Which resources do we
need? - Using what?
- Which are the principles we have?
- How?
- How do we organize the knowledge acquisition
algorithm
Page 106
107Acquisition of Explicit Knowledge what?
- Types of knowledge
- Symmetric
- Co-hyponymy
- Between words cat ? dog
- Synonymy
- Between words buy ? acquire
- Sentence prototypes (paraphrasing) X bought Y ?
X acquired Z of the Ys shares - Directional semantic relations
- Words cat ? animal , buy ? own , wheel partof
car - Sentence prototypes X acquired Z of the Ys
shares ? X owns Y
Page 107
108Acquisition of Explicit Knowledge Using what?
- Underlying hypothesis
- Harris Distributional Hypothesis (DH) (Harris,
1964) - Words that tend to occur in the same contexts
tend to have similar meanings. - Robisons Point-wise Assertion Patterns (PAP)
(Robison, 1970) - It is possible to extract relevant semantic
relations with some pattern.
sim(w1,w2)?sim(C(w1), C(w2))
w1 is in a relation r with w2 if the context
pattern(w1, w2 )
Page 108
109Distributional Hypothesis (DH)
simw(W1,W2)?simctx(C(W1), C(W2))
Context (Feature) Space
Words or Forms
Corpus source of contexts
C(w1)
sun is constituted of hydrogen
w1 constitute
The Sun is composed of hydrogen
w2 compose
C(w2)
Page 109
110Point-wise Assertion Patterns (PAP)
w1 is in a relation r with w2 if the contexts
patternsr(w1, w2 )
relation
w1 part_of w2
Corpus source of contexts
patterns
w1 is constituted of w2 w1 is composed of w2
sun is constituted of hydrogen
selects correct vs incorrect relations among
words
The Sun is composed of hydrogen
Statistical Indicator Scorpus(w1,w2)
part_of(sun,hydrogen)
Page 110
111DH and PAP cooperate
Distributional Hypothesis
Point-wise assertion Patterns
Context (Feature) Space
Words or Forms
Corpus source of contexts
C(w1)
sun is constituted of hydrogen
w1 constitute
The Sun is composed of hydrogen
w2 compose
C(w2)
Page 111
112Knowledge Acquisition Where methods differ?
- On the word side
- Target equivalence classes Concepts or Relations
- Target forms words or expressions
- On the context side
- Feature Space
- Similarity function
Page 112
113KA4TE a first classification of some methods
Verb Entailment (Zanzotto et al., 2006)
Directional
Noun Entailment (GeffetDagan, 2005)
Relation Pattern Learning (ESPRESSO) (PantelPenna
cchiotti, 2006)
ISA patterns (Hearst, 1992)
Types of knowledge
ESPRESSO (PantelPennacchiotti, 2006)
Hearst
Concept Learning (LinPantel, 2001a)
Symmetric
TEASE (Szepktor et al.,2004)
Inference Rules (DIRT) (LinPantel, 2001b)
Point-wise assertion Patterns
Distributional Hypothesis
Underlying hypothesis
Page 113
114Noun Entailment Relation
(GeffetDagan, 2006)
- Type of knowledge directional relations
- Underlying hypothesis distributional hypothesis
- Main Idea distributional inclusion hypothesis
- w1 ? w2
- if
- All the prominent features
- of w1 occur with w2 in a
- sufficiently large corpus
Context (Feature) Space
Words or Forms
Page 114
115Verb Entailment Relations
(Zanzotto, Pennacchiotti, Pazienza, 2006)
- Type of knowledge oriented relations
- Underlying hypothesis point-wise assertion
patterns - Main Idea
Point-wise Mutual information
Statistical Indicator S?(v1,v2)
relation
v1 ? v2
patterns
agentive_nominalization(v2) v1
Page 115
116Verb Entailment Relations
(Zanzotto, Pennacchiotti, Pazienza, 2006)
- Understanding the idea
- Selectional restriction
- fly(x) ? has_wings(x)
- in general
- v(x) ? c(x) (if x is the subject of v then x has
the property c) - Agentive nominalization
- agentive noun is the doer or the performer of an
action v - X is player may be read as play(x)
- c(x) is clearly v(x) if the property c is
derived by v with an agentive nominalization
Skipped
Page 116
117Verb Entailment Relations
- Understanding the idea
- Given the expression
- player wins
- Seen as a selctional restriction
- win(x) ? play(x)
- Seen as a selectional preference
- P(play(x)win(x)) gt P(play(x))
Skipped
Page 117
118Knowledge Acquisition for TE How?
- The algorithmic nature of a DHPAP method
- Direct
- Starting point target words
- Indirect
- Starting point context feature space
- Iterative
- Interplay between the context feature space and
the target words
Page 118
119Direct Algorithm
sim(w1,w2)?sim(C(w1), C(w2))
- Select target words wi from the corpus or from a
dictionary - Retrieve contexts of each wi and represent them
in the feature space C(wi ) - For each pair (wi, wj)
- Compute the similarity sim(C(wi), C(wj )) in the
context space - If sim(wi, wj ) sim(C(wi), C(wj ))gtt,
- wi and wj belong to the same equivalence class W
sim(w1,w2)?sim(I(C(w1)), I(C(w2)))
Context (Feature) Space
Words or Forms
C(w1)
w1 cat
w2 dog
C(w2)
Page 119
120Indirect Algorithm
- Given an equivalence class W, select relevant
contexts and represent them in the feature space - Retrieve target words (w1, , wn) that appear in
these contexts. These are likely to be words in
the equivalence class W - Eventually, for each wi, retrieve C(wiI) from the
corpus - Compute the centroid I(C(W))
- For each for each wi,
- if sim(I(C(W), wi)ltt, eliminate wi from W.
sim(w1,w2)?sim(C(w1), C(w2))
sim(w1,w2)?sim(I(C(w1)), I(C(w2)))
Context (Feature) Space
Words or Forms
C(w1)
w1 cat
w2 dog
C(w2)
Page 120
121Iterative Algorithm
- For each word wi in the equivalence class W,
retrieve the C(wi) contexts and represent them in
the feature space - Extract words wj that have contexts similar to
C(wi) - Extract contexts C(wj) of these new words
- For each for each new word wj, if sim(C(W),
wj)gtt, put wj in W.
sim(w1,w2)?sim(C(w1), C(w2))
sim(w1,w2)?sim(I(C(w1)), I(C(w2)))
Context (Feature) Space
Words or Forms
C(w1)
w1 cat
w2 dog
Page 121
122Knowledge Acquisition using DH and PAH
- Direct Algorithms
- Concepts from text via clustering (LinPantel,
2001) - Inference rules aka DIRT (LinPantel, 2001)
-
- Indirect Algorithms
- Hearsts ISA patterns (Hearst, 1992)
- Question Answering patterns (RavichandranHovy,
2002) -
- Iterative Algorithms
- Entailment rules from Web aka TEASE (Szepktor
et al., 2004) - Espresso (PantelPennacchiotti, 2006)
Page 122
123TEASE
(Szepktor et al., 2004)
- Type Iterative algorithm
- On the word side
- Target equivalence classes fine-grained
relations - Target forms verb with arguments
-
- On the context side
- Feature Space
- Innovations with respect to reasearches lt 2004
- First direct algorithm for extracting rules
prevent(X,Y)
X_fillermi?,Y_fillermi?
Page 123
124TEASE
(Szepktor et al., 2004)
Lexicon
Input template X?subj-accuse-obj?Y
WEB
TEASE
Sample corpus for input template Paula Jones
accused Clinton BBC accused Blair Sanhedrin
accused St.Paul
Anchor Set Extraction(ASE)
Skipped
Anchor sets Paula Jones?subj
Clinton?obj Sanhedrin?subj St.Paul?obj
Template Extraction (TE)
Sample corpus for anchor sets Paula Jones called
Clinton indictable St.Paul defended before the
Sanhedrin
Templates X call Y indictableY defend before X
Page 124
iterate
125TEASE
(Szepktor et al., 2004)
- Innovations with respect to reasearches lt 2004
- First direct algorithm for extracting rules
- A feature selection is done to assess the most
informative features - Extracted forms are clustered to obtain the most
general sentence prototype of a given set of
equivalent forms
Skipped
Page 125
126Espresso
(PantelPennacchiotti, 2006)
- Type Iterative algorithm
- On the word side
- Target equivalence classes relations
- Target forms expressions, sequences of tokens
- Innovations with respect to reasearches lt 2006
- A measure to determine specific vs. general
patterns (ranking in the equivalent forms) -
compose(X,Y)
Y is composed by X, Y is made of X
Page 126
127Espresso
(PantelPennacchiotti, 2006)
(leader , panel) (city , region) (oxygen , water)
1.0 (tree , land) 0.9 (atom, molecule) 0.7
(leader , panel) 0.6 (range of information, FBI
report) 0.6 (artifact , exhibit) 0.2 (oxygen ,
hydrogen)
Skipped
(tree , land) (oxygen , hydrogen) (atom,
molecule) (leader , panel) (range of information,
FBI report) (artifact , exhibit)
1.0 Y is composed by X 0.8 Y is part of X 0.2
X,Y
Y is composed by X X,Y Y is part of Y
Page 127
128Espresso
(PantelPennacchiotti, 2006)
- Innovations with respect to reasearches lt 2006
- A measure to determine specific vs. general
patterns (ranking in the equivalent forms) - Both pattern and instance selections are
performed - Different Use of General and specific patter