Textual Entailment - PowerPoint PPT Presentation

About This Presentation
Title:

Textual Entailment

Description:

... Clean Mag has a 100 percent pollution retrieval ... Normalization Date/Time arguments ... of Textual Entailment Systems Textual Entailment ... – PowerPoint PPT presentation

Number of Views:216
Avg rating:3.0/5.0
Slides: 173
Provided by: IdoDa5
Category:

less

Transcript and Presenter's Notes

Title: Textual Entailment


1
Textual Entailment
  • Dan Roth,
  • University of Illinois,
  • Urbana-Champaign
  • USA

Ido Dagan Bar Ilan University Israel
  • Fabio Massimo Zanzotto
  • University of Rome
  • Italy

ACL -2007
2
Outline
  1. Motivation and Task Definition
  2. A Skeletal review of Textual Entailment Systems
  3. Knowledge Acquisition Methods
  4. Applications of Textual Entailment
  5. A Textual Entailment view of Applied Semantics

3
I. Motivation and Task Definition
4
Motivation
  • Text applications require semantic inference
  • A common framework for applied semantics is
    needed, but still missing
  • Textual entailment may provide such framework

5
Desiderata for Modeling Framework
  • A framework for a target level of language
    processing should provide
  • Generic (feasible) module for applications
  • Unified (agreeable) paradigm for investigating
    language phenomena
  • Most semantics research is scattered
  • WSD, NER, SRL, lexical semantics relations (e.g.
    vs. syntax)
  • Dominating approach - interpretation

6
Natural Language and Meaning
Meaning
Language
7
Variability of Semantic Expression
The Dow Jones Industrial Average closed up 255
Dow ends up
Dow gains 255 points
Stock market hits a record high
Dow climbs 255
  • Model variability as relations between text
    expressions
  • Equivalence text1 ? text2 (paraphrasing)
  • Entailment text1 ? text2 the general case

8
Typical Application Inference Entailment
Question Expected answer formWho bought
Overture? gtgt X bought Overture
Overtures acquisition by Yahoo
Yahoo bought Overture
entails
hypothesized answer
text
  • Similar for IE X acquire Y
  • Similar for semantic IR t Overture was
    bought for
  • Summarization (multi-document) identify
    redundant info
  • MT evaluation (and recent ideas for MT)
  • Educational applications

9
KRAQ'05 Workshop - KNOWLEDGE and REASONING for
ANSWERING QUESTIONS (IJCAI-05)
  • CFP
  • Reasoning aspects    information fusion,   
    search criteria expansion models    
    summarization and intensional answers,   
    reasoning under uncertainty or with incomplete
    knowledge,
  • Knowledge representation and integration   
    levels of knowledge involved (e.g. ontologies,
    domain knowledge),    knowledge
    extraction models and techniques to
    optimize response accuracy but similar needs
    for other applications can entailment provide
    a common empirical framework?

10
Classical Entailment Definition
  • Chierchia McConnell-Ginet (2001)A text t
    entails a hypothesis h if h is true in every
    circumstance (possible world) in which t is true
  • Strict entailment - doesn't account for some
    uncertainty allowed in applications

11
Almost certain Entailments
  • t The technological triumph known as GPS was
    incubated in the mind of Ivan Getting.
  • h Ivan Getting invented the GPS.

12
Applied Textual Entailment
  • A directional relation between two text
    fragments Text (t) and Hypothesis (h)

t entails h (t?h) if humans reading t will infer that h is most likely true
  • Operational (applied) definition
  • Human gold standard - as in NLP applications
  • Assuming common background knowledge which is
    indeed expected from applications

13
Probabilistic Interpretation
  • Definition
  • t probabilistically entails h if
  • P(h is true t) gt P(h is true)
  • t increases the likelihood of h being true
  • Positive PMI t provides information on hs
    truth
  • P(h is true t ) entailment confidence
  • The relevant entailment score for applications
  • In practice most likely entailment expected

14
The Role of Knowledge
  • For textual entailment to hold we require
  • text AND knowledge ? h
  • but
  • knowledge should not entail h alone
  • Systems are not supposed to validate hs truth
    regardless of t (e.g. by searching h on the web)

15
PASCAL Recognizing Textual Entailment (RTE)
ChallengesEU FP-6 Funded PASCAL Network of
Excellence 2004-7
Bar-Ilan University ITC-irst and CELCT,
Trento MITRE Microsoft Research
16
Generic Dataset by Application Use
  • 7 application settings in RTE-1, 4 in RTE-2/3
  • QA
  • IE
  • Semantic IR
  • Comparable documents / multi-doc summarization
  • MT evaluation
  • Reading comprehension
  • Paraphrase acquisition
  • Most data created from actual applications output
  • RTE-2/3 800 examples in development and test
    sets
  • 50-50 YES/NO split

17
RTE Examples
TEXT HYPOTHESIS TASK ENTAIL-MENT
1 Regan attended a ceremony in Washington to commemorate the landings in Normandy. Washington is located in Normandy. IE False
2 Google files for its long awaited IPO. Google goes public. IR True
3 a shootout at the Guadalajara airport in May, 1993, that killed Cardinal Juan Jesus Posadas Ocampo and six others. Cardinal Juan Jesus Posadas Ocampo died in 1993. QA True
4 The SPD got just 21.5 of the vote in the European Parliament elections, while the conservative opposition parties polled 44.5. The SPD is defeated by the opposition parties. IE True
18
Participation and Impact
  • Very successful challenges, world wide
  • RTE-1 17 groups
  • RTE-2 23 groups
  • 150 downloads
  • RTE-3 25 groups
  • Joint workshop at ACL-07
  • High interest in the research community
  • Papers, conference sessions and areas, PhDs,
    influence on funded projects
  • Textual Entailment special issue at JNLE
  • ACL-07 tutorial

19
Methods and Approaches (RTE-2)
  • Measure similarity match between t and h
    (coverage of h by t)
  • Lexical overlap (unigram, N-gram, subsequence)
  • Lexical substitution (WordNet, statistical)
  • Syntactic matching/transformations
  • Lexical-syntactic variations (paraphrases)
  • Semantic role labeling and matching
  • Global similarity parameters (e.g. negation,
    modality)
  • Cross-pair similarity
  • Detect mismatch (for non-entailment)
  • Interpretation to logic representation logic
    inference

20
Dominant approach Supervised Learning
Similarity FeaturesLexical, n-gram,syntactic sem
antic, global
Classifier
YES
t,h
NO
Feature vector
  • Features model similarity and mismatch
  • Classifier determines relative weights of
    information sources
  • Train on development set and auxiliary t-h corpora

21
RTE-2 Results
Average Precision Accuracy First Author (Group)
80.8 75.4 Hickl (LCC)
71.3 73.8 Tatu (LCC)
64.4 63.9 Zanzotto (Milan Rome)
62.8 62.6 Adams (Dallas)
66.9 61.6 Bos (Rome Leeds)
58.1-60.5 11 groups
52.9-55.6 7 groups
Average 60 Median 59
22
Analysis
  • For the first time methods that carry some
    deeper analysis seemed (?) to outperform shallow
    lexical methods

?
Cf. Kevin Knights invited talk at EACL-06,
titled Isnt linguistic Structure Important,
Asked the Engineer
  • Still, most systems, which do utilize deep
    analysis, did not score significantly better than
    the lexical baseline

23
Why?
  • System reports point at
  • Lack of knowledge (syntactic transformation
    rules, paraphrases, lexical relations, etc.)
  • Lack of training data
  • It seems that systems that coped better with
    these issues performed best
  • Hickl et al. - acquisition of large entailment
    corpora for training
  • Tatu et al. large knowledge bases (linguistic
    and world knowledge)

24
Some suggested research directions
  • Knowledge acquisition
  • Unsupervised acquisition of linguistic and world
    knowledge from general corpora and web
  • Acquiring larger entailment corpora
  • Manual resources and knowledge engineering
  • Inference
  • Principled framework for inference and fusion of
    information levels
  • Are we happy with bags of features?

25
Complementary Evaluation Modes
  • Seek mode
  • Input h and corpus
  • Output all entailing t s in corpus
  • Captures information seeking needs, but requires
    post-run annotation (TREC-style)
  • Entailment subtasks evaluations
  • Lexical, lexical-syntactic, logical, alignment
  • Contribution to various applications
  • QA Harabagiu Hickl, ACL-06 RE Romano et
    al., EACL-06

26
II. A Skeletal review of Textual Entailment
Systems
27
Textual Entailment
Entails Subsumed by
Eyeing the huge market potential, currently led
by Google, Yahoo took over search company
Overture Services Inc. last year
?
Yahoo acquired Overture
Overture is a search company
Google is a search company
Google owns Overture
.
Phrasal verb paraphrasing
Entity matching
Alignment
Semantic Role Labeling
How?
Integration
28
A general Strategy for Textual Entailment
Given a sentence T
Given a sentence H
?e
Re-represent T
Re-represent H
Lexical Syntactic Semantic
Lexical Syntactic Semantic
Knowledge Base semantic structural pragmatic
Transformations/rules
Representation
Decision Find the set of Transformations/Features
of the new representation (or use these to
create a cost function) that allows embedding
of H in T.
Re-represent T
Re-represent T
Re-represent T
Re-represent T
Re-represent T
Re-represent T
Re-represent T
29
Details of The Entailment Strategy
  • Preprocessing
  • Multiple levels of lexical pre-processing
  • Syntactic Parsing
  • Shallow semantic parsing
  • Annotating semantic phenomena
  • Representation
  • Bag of words, n-grams through tree/graphs based
    representation
  • Logical representations
  • Knowledge Sources
  • Syntactic mapping rules
  • Lexical resources
  • Semantic Phenomena specific modules
  • RTE specific knowledge sources
  • Additional Corpora/Web resources
  • Control Strategy Decision Making
  • Single pass/iterative processing
  • Strict vs. Parameter based
  • Justification
  • What can be said about the decision?

30
The Case of Shallow Lexical Approaches
  • Preprocessing
  • Identify Stop Words
  • Representation
  • Bag of words
  • Knowledge Sources
  • Shallow Lexical resources typically Wordnet
  • Control Strategy Decision Making
  • Single pass
  • Compute Similarity use threshold tuned on a
    development set (could be per task)
  • Justification
  • It works

31
Shallow Lexical Approaches (Example)
  • Lexical/word-based semantic overlap score based
    on matching each word in H with some word in T
  • Word similarity measure may use WordNet
  • May take account of subsequences, word order
  • Learn threshold on maximum word-based match
    score

Clearly, this may not appeal to what we think as
understanding, and it is easy to generate cases
for which this does not work well. However, it
works (surprisingly) well with respect to current
evaluation metrics (data sets?)
Text The Cassini spacecraft arrived at Titan
in July, 2006.
Text NASAs Cassini-Huygens spacecraft
traveled to Saturn in 2006.
Text The Cassini spacecraft has taken images
that show rivers on Saturns moon Titan.
Hyp The Cassini spacecraft has reached Titan.
32
An Algorithm LocalLexcialMatching
  • For each word in Hypothesis, Text
  • if word matches stopword remove word
  • if no words left in Hypothesis or Text return
    0
  • numberMatched 0
  • for each word W_H in Hypothesis
  • for each word W_T in Text
  • HYP_LEMMAS Lemmatize(W_H)
  • TEXT_LEMMAS Lemmatize(W_T)
  • Use Wordnets
  • if any term in HYP_LEMMAS matches any term in
    TEXT_LEMMAS
  • using LexicalCompare()
  • numberMatched
  • Return numberMatched/HYP_Lemmas

33
An Algorithm LocalLexicalMatching (Cont.)
LLM Performance RTE2 Dev 63.00 Test
60.50 RTE 3 Dev 67.50 Test 65.63
  • LexicalCompare()
  • if(LEMMA_H LEMMA_T)
  • return TRUE
  • if(HypernymDistanceFromTo(textWord,
    hypothesisWord) lt 3)
  • return TRUE
  • if(MeronymyDistanceFromTo(textWord,
    hypothesisWord) lt 3)
  • returnTRUE
  • if(MemberOfDistanceFromTo(textWord,
    hypothesisWord) lt 3)
  • return TRUE
  • if(SynonymOf(textWord, hypothesisWord)
  • return TRUE
  • Notes
  • LexicalCompare is Asymmetric makes use of
    single relation type
  • Additional differences could be attributed to
    stop word list (e.g, including aux verbs)
  • Straightforward improvements such as bi-grams do
    not help.
  • More sophisticated lexical knowledge (entities
    time) should help.

34
Details of The Entailment Strategy (Again)
  • Preprocessing
  • Multiple levels of lexical pre-processing
  • Syntactic Parsing
  • Shallow semantic parsing
  • Annotating semantic phenomena
  • Representation
  • Bag of words, n-grams through tree/graphs based
    representation
  • Logical representations
  • Knowledge Sources
  • Syntactic mapping rules
  • Lexical resources
  • Semantic Phenomena specific modules
  • RTE specific knowledge sources
  • Additional Corpora/Web resources
  • Control Strategy Decision Making
  • Single pass/iterative processing
  • Strict vs. Parameter based
  • Justification
  • What can be said about the decision?

35
Preprocessing
  • Syntactic Processing
  • Syntactic Parsing (Collins Charniak CCG)
  • Dependency Parsing (types)
  • Lexical Processing
  • Tokenization lemmatization
  • For each word in Hypothesis, Text
  • Phrasal verbs
  • Idiom processing
  • Named Entities Normalization
  • Date/Time arguments Normalization
  • Semantic Processing
  • Semantic Role Labeling
  • Nominalization
  • Modality/Polarity/Factive
  • Co-reference

Only a few systems

often used only during decision making

often used only during decision making
36
Details of The Entailment Strategy (Again)
  • Preprocessing
  • Multiple levels of lexical pre-processing
  • Syntactic Parsing
  • Shallow semantic parsing
  • Annotating semantic phenomena
  • Representation
  • Bag of words, n-grams through tree/graphs based
    representation
  • Logical representations
  • Knowledge Sources
  • Syntactic mapping rules
  • Lexical resources
  • Semantic Phenomena specific modules
  • RTE specific knowledge sources
  • Additional Corpora/Web resources
  • Control Strategy Decision Making
  • Single pass/iterative processing
  • Strict vs. Parameter based
  • Justification
  • What can be said about the decision?

37
Basic Representations
MeaningRepresentation
Inference
Logical Forms
Semantic Representation
Representation
Syntactic Parse
Local Lexical
Raw Text
Textual Entailment
  • Most approaches augment the basic structure
    defined by the processing level with additional
    annotation and make use of a tree/graph/frame-base
    d system.

38
Basic Representations (Syntax)
Syntactic Parse
Local Lexical
Hyp The Cassini spacecraft has reached Titan.
39
Basic Representations (Shallow Semantics
Pred-Arg )
  • T The government purchase of the Roanoke
    building, a former prison, took place in 1902.
  • H The Roanoke building, which was a former
    prison, was bought by the government in 1902.

take
The govt. purchase prison
place
in 1902
purchase
The Roanoke building
buy
The Roanoke prison
In 1902
The government
be
a former prison
The Roanoke building
RothSammons07
40
Basic Representations (Logical Representation)
Bos Markert The semantic representation langu
age is a first-order fragment a language used in
Discourse Representation Theory (DRS),
conveying argument structure with a
neo-Davidsonian analysis and Including the
recursive DRS structure to cover negation,
disjunction, and implication.
41
Representing Knowledge Sources
  • Rather straight forward in the Logical Framework
  • Tree/Graph base representation may also use rule
    based transformations to encode different kinds
    of knowledge, sometimes represented as generic or
    knowledge based tree transformations.

42
Representing Knowledge Sources (cont.)
  • In general, there is a mix of procedural and rule
    based encodings of knowledge sources
  • Done by hanging more information on parse tree or
    predicate argument representation Example from
    LCCs system
  • Or different frame-based annotation systems for
    encoding information, that are processed
    procedurally.

43
Details of The Entailment Strategy (Again)
  • Preprocessing
  • Multiple levels of lexical pre-processing
  • Syntactic Parsing
  • Shallow semantic parsing
  • Annotating semantic phenomena
  • Representation
  • Bag of words, n-grams through tree/graphs based
    representation
  • Logical representations
  • Knowledge Sources
  • Syntactic mapping rules
  • Lexical resources
  • Semantic Phenomena specific modules
  • RTE specific knowledge sources
  • Additional Corpora/Web resources
  • Control Strategy Decision Making
  • Single pass/iterative processing
  • Strict vs. Parameter based
  • Justification
  • What can be said about the decision?

44
Knowledge Sources
  • The knowledge sources available to the system are
    the most significant component of supporting TE.
  • Different systems draw differently the line
    between preprocessing capabilities and knowledge
    resources.
  • The way resources are handled is also different
    across different approaches.

45
Enriching Preprocessing
  • In addition to syntactic parsing several
    approaches enrich the representation with various
    linguistics resources
  • Pos tagging
  • Stemming
  • Predicate argument representation verb
    predicates and nominalization
  • Entity Annotation Stand alone NERs with a
    variable number of classes
  • Acronym handling and Entity Normalization
    mapping mentions of the same entity mentioned in
    different ways to a single ID.
  • Co-reference resolution
  • Dates, times and numeric values identification
    and normalization.
  • Identification of semantic relations complex
    nominals, genitives, adjectival phrases, and
    adjectival clauses.
  • Event identification and frame construction.

46
Lexical Resources
  • Recognizing that a word or a phrase in S entails
    a word or a phrase in H is essential in
    determining Textual Entailment.
  • Wordnet is the most commonly used resoruce
  • In most cases, a Wordnet based similarity measure
    between words is used. This is typically a
    symmetric relation.
  • Lexical chains over Wordnet are used in some
    cases, care is taken to disallow some chains of
    specific relations.
  • Extended Wordnet is being used to make use of
    Entities
  • Derivation relation which links verbs with their
    corresponding nominalized nouns.

47
Lexical Resources (Cont.)
  • Lexical Paraphrasing Rules
  • A number of efforts to acquire relational
    paraphrase rules are under way, and several
    systems are making use of resources such as DIRT
    and TEASE.
  • Some systems seems to have acquired paraphrase
    rules that are in the RTE corpus
  • person killed --gt claimed one life
  • hand reins over to --gt give starting job to
  • same-sex marriage --gt gay nuptials
  • cast ballots in the election -gt vote
  • dominant firm --gt monopoly power
  • death toll --gt kill
  • try to kill --gt attack
  • lost their lives --gt were killed
  • left people dead --gt people were killed

48
Semantic Phenomena
  • A large number of semantic phenomena have been
    identified as significant to Textual Entailment.
  • A large number of them are being handled (in a
    restricted way) by some of the systems. Very
    little quantification per-phenomena has been
    done, if at all.
  • Semantic implications of interpreting syntactic
    structures Braz et. al05 Bar-Haim et. al. 07
  • Conjunctions
  • Jake and Jill ran up the hill Jake ran up the
    hill
  • Jake and Jill met on the hill Jake met on the
    hill
  • Clausal modifiers
  • But celebrations were muted as many Iranians
    observed a Shi'ite mourning month.
  • Many Iranians observed a Shi'ite mourning month.
  • Semantic Role Labeling handles this phenomena
    automatically

49
Semantic Phenomena (Cont.)
  • Relative clauses
  • The assailants fired six bullets at the car,
    which carried Vladimir Skobtsov.
  • The car carried Vladimir Skobtsov.
  • Semantic Role Labeling handles this phenomena
    automatically
  • Appositives
  • Frank Robinson, a one-time manager of the
    Indians, has the distinction for the NL.
  • Frank Robinson is a one-time manager of the
    Indians.
  • Passive
  • We have been approached by the investment banker.
  • The investment banker approached us.
  • Semantic Role Labeling handles this phenomena
    automatically
  • Genitive modifier
  • Malaysia's crude palm oil output is
    estimated to have risen..
  • The crude palm oil output of Malasia is
    estimated to have risen .

50
Logical Structure
  • Factivity Uncovering the context in which a
    verb phrase is embedded
  • The terrorists tried to enter the building.
  • The terrorists entered the building.
  • Polarity negative markers or a negation-denoting
    verb (e.g. deny, refuse, fail)
  • The terrorists failed to enter the building.
  • The terrorists entered the building.
  • Modality/Negation Dealing with modal auxiliary
    verbs (can, must, should), that modify verbs
    meanings and with the identification of the scope
    of negation.
  • Superlatives/Comperatives/Monotonicity
    inflecting adjectives or adverbs.
  • Quantifiers, determiners and articles

51
Some Examples Braz et. al. IJCAI
workshop05PARC Corpus
  • T Legally, John could drive.
  • H John drove.
  • .
  • S Bush said that Khan sold centrifuges to North
    Korea.
  • H Centrifuges were sold to North Korea.
  • .
  • S No US congressman visited Iraq until the war.
  • H Some US congressmen visited Iraq before the
    war.
  • S The room was full of women.
  • H The room was full of intelligent women.
  • S The New York Times reported that Hanssen sold
    FBI secrets to the Russians and could face the
    death penalty.
  • H Hanssen sold FBI secrets to the Russians.
  • S All soldiers were killed in the ambush.
  • H Many soldiers were killed in the ambush.

52
Details of The Entailment Strategy (Again)
  • Preprocessing
  • Multiple levels of lexical pre-processing
  • Syntactic Parsing
  • Shallow semantic parsing
  • Annotating semantic phenomena
  • Representation
  • Bag of words, n-grams through tree/graphs based
    representation
  • Logical representations
  • Knowledge Sources
  • Syntactic mapping rules
  • Lexical resources
  • Semantic Phenomena specific modules
  • RTE specific knowledge sources
  • Additional Corpora/Web resources
  • Control Strategy Decision Making
  • Single pass/iterative processing
  • Strict vs. Parameter based
  • Justification
  • What can be said about the decision?

53
Control Strategy and Decision Making
  • Single Iteration
  • Strict Logical approaches are, in principle, a
    single stage computation.
  • The pair is processed and transform into the
    logic form.
  • Existing Theorem Provers act on the pair along
    with the KB.
  • Multiple iterations
  • Graph based algorithms are typically iterative.
  • Following Punyakanok et. al 04 transformations
    are applied and entailment test is done after
    each transformation is applied.
  • Transformation can be chained, but sometimes the
    order makes a difference. The algorithm can be a
    greedy algorithm or can be more exhaustive, and
    search for the best path found Braz et.
    al05Bar-Haim et.al 07

54
Transformation Walkthrough Braz et. al05
  • T The government purchase of the Roanoke
    building, a former prison, took place in 1902.
  • H The Roanoke building, which was a former
    prison, was bought by the government in 1902.

Does H follow from T?
55
Transformation Walkthrough (1)
  • T The government purchase of the Roanoke
    building, a former prison, took place in 1902.
  • H The Roanoke building, which was a former
    prison, was bought by the government in 1902.

take
The govt. purchase prison
place
in 1902
purchase
The Roanoke building
buy
The Roanoke prison
In 1902
The government
be
a former prison
The Roanoke building
56
Transformation Walkthrough (2)
  • T The government purchase of the Roanoke
    building, a former prison, took place in 1902.
  • The government purchase of the Roanoke
    building,
  • a former prison, occurred in 1902.
  • H The Roanoke building, which was a former
    prison, was bought by the government.

Phrasal Verb Rewriter
occur
The govt. purchase prison
in 1902
57
Transformation Walkthrough (3)
  • T The government purchase of the Roanoke
    building, a former prison, occurred in 1902.
  • The government purchase the Roanoke building in
    1902.
  • H The Roanoke building, which was a former
    prison, was bought by the government in 1902.

Nominalization Promoter
NOTE depends on earlier transformation order
is important!
purchase
The government
the Roanoke building, a former prison
In 1902
58
Transformation Walkthrough (4)
  • T The government purchase of the Roanoke
    building, a former prison, occurred in 1902.
  • The Roanoke building be a former prison.
  • H The Roanoke building, which was a former
    prison, was bought by the government in 1902.

Apposition Rewriter
be
The Roanoke building
a former prison
59
Transformation Walkthrough (5)
  • T The government purchase of the Roanoke
    building, a former prison, took place in 1902.
  • H The Roanoke building, which was a former
    prison, was bought by the government in 1902.

purchase
The Roanoke prison
In 1902
The government
be
a former prison
The Roanoke building
WordNet
buy
The Roanoke prison
In 1902
The government
be
a former prison
The Roanoke building
60
Characteristics
  • Multiple paths gt optimization problem
  • Shortest or highest-confidence path through
    transformations
  • Order is important may need to explore different
    orderings
  • Module dependencies are local module B does
    not need access to module As KB/inference, only
    its output
  • If outcome is true, the (optimal) set of
    transformations and local comparisons form a proof

61
Summary Control Strategy and Decision Making
  • Despite the appeal of the Strict Logical
    approaches as of today, they do not work well
    enough.
  • Bos Markert
  • Strict logical approach is failing significantly
    behind good LLMs and multiple levels of lexical
    pre-processing
  • Only incorporating rather shallow features and
    using it in the evaluation saves this approach.
  • Braz et. al.
  • Strict graph based representation is not doing as
    well as LLM.
  • Tatu et. al
  • Results show that strict logical approach is
    inferior to LLMs, but when put together, it
    produces some gain.
  • Using Machine Learning methods as a way to
    combine systems and multiple features has been
    found very useful.

62
Hybrid/Ensemble Approaches
  • Bos et al. use theorem prover and model builder
  • Expand models of T, H using model builder, check
    sizes of models
  • Test consistency with background knowledge with
    T, H
  • Try to prove entailment with and without
    background knowledge
  • Tatu et al. (2006) use ensemble approach
  • Create two logical systems, one lexical alignment
    system
  • Combine system scores using coefficients found
    via search (train on annotated data)
  • Modify coefficients for different tasks
  • Zanzotto et al. (2006) try to learn from
    comparison of structures of T, H for true vs.
    false entailment pairs
  • Use lexical, syntactic annotation to characterize
    match between T, H for successful, unsuccessful
    entailment pairs
  • Train Kernel/SVM to distinguish between match
    graphs

63
Justification
  • For most approaches justification is given only
    by the data Preprocessed
  • Empirical Evaluation
  • Logical Approaches
  • There is a proof theoretic justification
  • Modulo the power of the resources and the ability
    to map a sentence to a logical form.
  • Graph/tree based approaches
  • There is a model theoretic justification
  • The approach is sound, but not complete, modulo
    the availably of resources.

64
Justifying Graph Based Approaches Braz et. al 05
  • R - a knowledge representation language, with a
    well defined
  • syntax and semantics or a domain D.
  • For text snippets s, t
  • rs, rt - their representations in R.
  • M(rs), M(rt) their model theoretic
    representations
  • There is a well defined notion of subsumption in
    R, defined model theoretically
  • u, v 2 R u is subsumed by v when M(u) µ
    M(v)
  • Not an algorithm need a proof theory.

65
Defining Semantic Entailment (2)
  • The proof theory is weak will show rs µ rt only
    when they are relatively similar syntactically.
  • r 2 R is faithful to s if M(rs) M(r)
  • Definition Let s, t, be text snippets with
    representations rs, rt 2 R.
  • We say that s semantically entails t if
    there is a representation r 2 R that is faithful
    to s, for which we can prove that r µ rt
  • Given rs need to generate many equivalent
    representations rs and test rs µ rt

Cannot be done exhaustively How to generate
alternative representations?
66
Defining Semantic Entailment (3)
  • A rewrite rule (l,r) is a pair of expressions in
    R such that l µ r
  • Given a representation rs of s and a rule (r,l)
    for which rs µ l the augmentation of rs via
    (l,r) is rs rs Æ r.
  • Claim rs is faithful to s.
  • Proof In general, since rs rs Æ r then
    M(rs) M(rs) Å M(r) However, since rs µ l µ r
    then M(rs) µ M(r).
  • Consequently M(rs) M(rs)
  • And the augmented representation is
    faithful to s.

µ
rs
l µ r, rs µ l
rs rs Æ r
67
Comments
  • The claim suggests an algorithm for generating
    alternative (equivalent) representations and for
    semantic entailment.
  • The resulting algorithm is a sound algorithm, but
    is not complete.
  • Completeness depends on the quality of the KB of
    rules.
  • The power of this algorithm is in the rules KB.
  • l and r might be very different
    syntactically, but by satisfying model theoretic
    subsumption they provide expressivity to the
    re-representation in a way that facilitates the
    overall subsumption.

68
Non-Entailment
  • The problem of determining non-entailment is
    harder, mostly due to its structure.
  • Most approaches determine non-entailment
    heuristically.
  • Set a threshold for a cost function. If not met
    by the pair, say now
  • Several approach has identified specific features
    the hind on non-entialment.
  • A model Theoretic approach for non-entailment has
    also been developed, although its effectiveness
    isnt clear yet.

69
What are we missing?
  • It is completely clear that the key resource
    missing is knowledge.
  • Better resources translate immediately to better
    results.
  • At this point existing resources seem to be
    lacking in coverage and accuracy.
  • Not enough high quality public resources no
    quantification.
  • Some Examples
  • Lexical Knowledge Some cases are difficult to
    acquire systematically.
  • A bought Y ? A has/owns Y
  • Many of the current lexical resources are very
    noisy.
  • Numbers, quantitative reasoning
  • Time and Date Temporal Reasoning.
  • Robust event based reasoning and information
    integration

70
Textual Entailment as a Classification Task
71
RTE as classification task
  • RTE is a classification task
  • Given a pair we need to decide if T implies H or
    T does not implies H
  • We can learn a classifier from annotated examples
  • What do we need
  • A learning algorithm
  • A suitable feature space

Page 71
72
Defining the feature space
  • How do we define the feature space?
  • Possible features
  • Distance Features - Features of some distance
    between T and H
  • Entailment trigger Features
  • Pair Feature The content of the T-H pair is
    represented
  • Possible representations of the sentences
  • Bag-of-words (possibly with n-grams)
  • Syntactic representation
  • Semantic representation

Page 72
73
Distance Features
  • Possible features
  • Number of words in common
  • Longest common subsequence
  • Longest common syntactic subtree

Page 73
74
Entailment Triggers
  • Possible features
  • from (de Marneffe et al., 2006)
  • Polarity features
  • presence/absence of neative polarity contexts
    (not,no or few, without)
  • Oil price surged?Oil prices didnt grow
  • Antonymy features
  • presence/absence of antonymous words in T and H
  • Oil price is surging?Oil prices is falling
    down
  • Adjunct features
  • dropping/adding of syntactic adjunct when moving
    from T to H
  • all solid companies pay dividends ?all solid
    companies pay cash dividends

Page 74
75
Pair Features
  • Possible features
  • Bag-of-word spaces of T and H
  • Syntactic spaces of T and H

T
H
companies_H
companies_T
insurance_H
dividends_T
dividends_H
year_T
solid_T
year_H
solid_H
end_H
end_T
pay_T
pay_H




Page 75
76
Pair Features what can we learn?
  • Bag-of-word spaces of T and H
  • We can learn
  • T implies H as when T contains end
  • T does not imply H when H contains end

T
H
companies_H
companies_T
insurance_H
dividends_T
dividends_H
year_T
solid_T
year_H
solid_H
end_H
end_T
pay_T
pay_H




It seems to be totally irrelevant!!!
Page 76
77
ML Methods in the possible feature spaces
Pair
(ZanzottoMoschitti, 2006)
(BosMarkert, 2006)
(de Marneffe et al., 2006)
Entailment Trigger
Possible Features
(Hickl et al., 2006)
(Ipken et al., 2006)
Distance
(KozarevaMontoyo, 2006)
()
(Herrera et al., 2006)
()
()
(Rodney et al., 2006)
Syntactic
Semantic
Bag-of-words
Sentence representation
Page 77
78
Effectively using the Pair Feature Space
(Zanzotto, Moschitti, 2006)
  • Roadmap
  • Motivation Reason why it is important even if it
    seems not.
  • Understanding the model with an example
  • Challenges
  • A simple example
  • Defining the cross-pair similarity

Page 78
79
Observing the Distance Feature Space
(Zanzotto, Moschitti, 2006)
common syntactic dependencies
In a distance feature space
the two pairs are very likely the same point
common words
Page 79
80
What can happen in the pair feature space?
(Zanzotto, Moschitti, 2006)
Page 80
81
Observations
  • Some examples are difficult to be exploited in
    the distance feature space
  • We need a space that considers the content and
    the structure of textual entailment examples
  • Let us explore
  • the pair space!
  • using the Kernel Trick define the space
    defining the distance K(P1 , P2) instead of
    defining the feautures

K(T1 ? H1,T1 ? H2)
Page 81
82
Target
(Zanzotto, Moschitti, 2006)
  • How do we build it
  • Using a syntactic interpretation of sentences
  • Using a similarity among trees KT(T,T) this
    similarity counts the number of subtrees in
    common between T and T
  • This is a syntactic pair feature space
  • Question do we need something more?
  • Cross-pair similarity
  • KS((T,H),(T,H))? KT(T,T) KT(H,H)

Page 82
83
Observing the syntactic pair feature space
(Zanzotto, Moschitti, 2006)
  • Can we use syntactic tree similarity?

Page 83
84
Observing the syntactic pair feature space
(Zanzotto, Moschitti, 2006)
  • Can we use syntactic tree similarity?

Page 84
85
Observing the syntactic pair feature space
(Zanzotto, Moschitti, 2006)
  • Can we use syntactic tree similarity? Not only!

Page 85
86
Observing the syntactic pair feature space
(Zanzotto, Moschitti, 2006)
  • Can we use syntactic tree similarity? Not only!
  • We want to use/exploit also the implied rewrite
    rule

a
c
d
a
c
d
b
b
a
c
d
a
c
d
b
b
Page 86
87
Exploiting Rewrite Rules
(Zanzotto, Moschitti, 2006)
  • To capture the textual entailment recognition
    rule (rewrite rule or inference rule), the
    cross-pair similarity measure should consider
  • the structural/syntactical similarity between,
    respectively, texts and hypotheses
  • the similarity among the intra-pair relations
    between constituents

How to reduce the problem to a tree similarity
computation?
Page 87
88
Exploiting Rewrite Rules
(Zanzotto, Moschitti, 2006)
Page 88
89
Exploiting Rewrite Rules
Intra-pair operations
(Zanzotto, Moschitti, 2006)
Page 89
90
Exploiting Rewrite Rules
Intra-pair operations ? Finding anchors
(Zanzotto, Moschitti, 2006)
Page 90
91
Exploiting Rewrite Rules
  • Intra-pair operations
  • Finding anchors
  • Naming anchors with placeholders

(Zanzotto, Moschitti, 2006)
Page 91
92
Exploiting Rewrite Rules
  • Intra-pair operations
  • Finding anchors
  • Naming anchors with placeholders
  • Propagating placeholders

(Zanzotto, Moschitti, 2006)
Page 92
93
Exploiting Rewrite Rules
  • Intra-pair operations
  • Finding anchors
  • Naming anchors with placeholders
  • Propagating placeholders

Cross-pair operations
(Zanzotto, Moschitti, 2006)
Page 93
94
Exploiting Rewrite Rules
  • Cross-pair operations
  • Matching placeholders across pairs
  • Intra-pair operations
  • Finding anchors
  • Naming anchors with placeholders
  • Propagating placeholders

(Zanzotto, Moschitti, 2006)
Page 94
95
Exploiting Rewrite Rules
  • Cross-pair operations
  • Matching placeholders across pairs
  • Renaming placeholders
  • Intra-pair operations
  • Finding anchors
  • Naming anchors with placeholders
  • Propagating placeholders

Page 95
96
Exploiting Rewrite Rules
  • Intra-pair operations
  • Finding anchors
  • Naming anchors with placeholders
  • Propagating placeholders
  • Cross-pair operations
  • Matching placeholders across pairs
  • Renaming placeholders
  • Calculating the similarity between syntactic
    trees with co-indexed leaves

Page 96
97
Exploiting Rewrite Rules
  • Intra-pair operations
  • Finding anchors
  • Naming anchors with placeholders
  • Propagating placeholders
  • Cross-pair operations
  • Matching placeholders across pairs
  • Renaming placeholders
  • Calculating the similarity between syntactic
    trees with co-indexed leaves

(Zanzotto, Moschitti, 2006)
Page 97
98
Exploiting Rewrite Rules
(Zanzotto, Moschitti, 2006)
  • The initial example sim(H1,H3) gt sim(H2,H3)?

Page 98
99
Defining the Cross-pair similarity
(Zanzotto, Moschitti, 2006)
  • The cross pair similarity is based on the
    distance between syntatic trees with co-indexed
    leaves
  • where
  • C is the set of all the correspondences between
    anchors of (T,H) and (T,H)
  • t(S, c) returns the parse tree of the hypothesis
    (text) S where placeholders of these latter are
    replaced by means of the substitution c
  • i is the identity substitution
  • KT(t1, t2) is a function that measures the
    similarity between the two trees t1 and t2.

Page 99
100
Defining the Cross-pair similarity
Page 100
101
Refining Cross-pair Similarity
(Zanzotto, Moschitti, 2006)
  • Controlling complexity
  • We reduced the size of the set of anchors using
    the notion of chunk
  • Reducing the computational cost
  • Many subtree computations are repeated during the
    computation of KT(t1, t2). This can be exploited
    for a better dynamic progamming algorithm
    (MoschittiZanzotto, 2007)
  • Focussing on information within a pair relevant
    for the entailment
  • Text trees are pruned according to where anchors
    attach

Page 101
102
BREAK (30 min)
103
III. Knowledge Acquisition Methods
104
Knowledge Acquisition for TE
  • What kind of knowledge we need?
  • Explicit Knowledge (Structured Knowledge Bases)
  • Relations among words (or concepts)
  • Symmetric Synonymy, cohypohymy
  • Directional hyponymy, part of,
  • Relations among sentence prototypes
  • Symmetric Paraphrasing
  • Directional Inference Rules/Rewrite Rules
  • Implicit Knowledge
  • Relations among sentences
  • Symmetric paraphrasing examples
  • Directional entailment examples

Page 104
105
Acquisition of Explicit Knowledge
Page 105
106
Acquisition of Explicit Knowledge
  • The questions we need to answer
  • What?
  • What we want to learn? Which resources do we
    need?
  • Using what?
  • Which are the principles we have?
  • How?
  • How do we organize the knowledge acquisition
    algorithm

Page 106
107
Acquisition of Explicit Knowledge what?
  • Types of knowledge
  • Symmetric
  • Co-hyponymy
  • Between words cat ? dog
  • Synonymy
  • Between words buy ? acquire
  • Sentence prototypes (paraphrasing) X bought Y ?
    X acquired Z of the Ys shares
  • Directional semantic relations
  • Words cat ? animal , buy ? own , wheel partof
    car
  • Sentence prototypes X acquired Z of the Ys
    shares ? X owns Y

Page 107
108
Acquisition of Explicit Knowledge Using what?
  • Underlying hypothesis
  • Harris Distributional Hypothesis (DH) (Harris,
    1964)
  • Words that tend to occur in the same contexts
    tend to have similar meanings.
  • Robisons Point-wise Assertion Patterns (PAP)
    (Robison, 1970)
  • It is possible to extract relevant semantic
    relations with some pattern.

sim(w1,w2)?sim(C(w1), C(w2))
w1 is in a relation r with w2 if the context
pattern(w1, w2 )
Page 108
109
Distributional Hypothesis (DH)
simw(W1,W2)?simctx(C(W1), C(W2))
Context (Feature) Space
Words or Forms
Corpus source of contexts
C(w1)
sun is constituted of hydrogen
w1 constitute
The Sun is composed of hydrogen
w2 compose
C(w2)
Page 109
110
Point-wise Assertion Patterns (PAP)
w1 is in a relation r with w2 if the contexts
patternsr(w1, w2 )
relation
w1 part_of w2
Corpus source of contexts
patterns
w1 is constituted of w2 w1 is composed of w2
sun is constituted of hydrogen
selects correct vs incorrect relations among
words
The Sun is composed of hydrogen
Statistical Indicator Scorpus(w1,w2)
part_of(sun,hydrogen)
Page 110
111
DH and PAP cooperate
Distributional Hypothesis
Point-wise assertion Patterns
Context (Feature) Space
Words or Forms
Corpus source of contexts
C(w1)
sun is constituted of hydrogen
w1 constitute
The Sun is composed of hydrogen
w2 compose
C(w2)
Page 111
112
Knowledge Acquisition Where methods differ?
  • On the word side
  • Target equivalence classes Concepts or Relations
  • Target forms words or expressions
  • On the context side
  • Feature Space
  • Similarity function

Page 112
113
KA4TE a first classification of some methods
Verb Entailment (Zanzotto et al., 2006)
Directional
Noun Entailment (GeffetDagan, 2005)
Relation Pattern Learning (ESPRESSO) (PantelPenna
cchiotti, 2006)
ISA patterns (Hearst, 1992)
Types of knowledge
ESPRESSO (PantelPennacchiotti, 2006)
Hearst
Concept Learning (LinPantel, 2001a)
Symmetric
TEASE (Szepktor et al.,2004)
Inference Rules (DIRT) (LinPantel, 2001b)
Point-wise assertion Patterns
Distributional Hypothesis
Underlying hypothesis
Page 113
114
Noun Entailment Relation
(GeffetDagan, 2006)
  • Type of knowledge directional relations
  • Underlying hypothesis distributional hypothesis
  • Main Idea distributional inclusion hypothesis
  • w1 ? w2
  • if
  • All the prominent features
  • of w1 occur with w2 in a
  • sufficiently large corpus

Context (Feature) Space
Words or Forms
Page 114
115
Verb Entailment Relations
(Zanzotto, Pennacchiotti, Pazienza, 2006)
  • Type of knowledge oriented relations
  • Underlying hypothesis point-wise assertion
    patterns
  • Main Idea

Point-wise Mutual information
Statistical Indicator S?(v1,v2)
relation
v1 ? v2
patterns
agentive_nominalization(v2) v1
Page 115
116
Verb Entailment Relations
(Zanzotto, Pennacchiotti, Pazienza, 2006)
  • Understanding the idea
  • Selectional restriction
  • fly(x) ? has_wings(x)
  • in general
  • v(x) ? c(x) (if x is the subject of v then x has
    the property c)
  • Agentive nominalization
  • agentive noun is the doer or the performer of an
    action v
  • X is player may be read as play(x)
  • c(x) is clearly v(x) if the property c is
    derived by v with an agentive nominalization

Skipped
Page 116
117
Verb Entailment Relations
  • Understanding the idea
  • Given the expression
  • player wins
  • Seen as a selctional restriction
  • win(x) ? play(x)
  • Seen as a selectional preference
  • P(play(x)win(x)) gt P(play(x))

Skipped
Page 117
118
Knowledge Acquisition for TE How?
  • The algorithmic nature of a DHPAP method
  • Direct
  • Starting point target words
  • Indirect
  • Starting point context feature space
  • Iterative
  • Interplay between the context feature space and
    the target words

Page 118
119
Direct Algorithm
sim(w1,w2)?sim(C(w1), C(w2))
  • Select target words wi from the corpus or from a
    dictionary
  • Retrieve contexts of each wi and represent them
    in the feature space C(wi )
  • For each pair (wi, wj)
  • Compute the similarity sim(C(wi), C(wj )) in the
    context space
  • If sim(wi, wj ) sim(C(wi), C(wj ))gtt,
  • wi and wj belong to the same equivalence class W

sim(w1,w2)?sim(I(C(w1)), I(C(w2)))
Context (Feature) Space
Words or Forms
C(w1)
w1 cat
w2 dog
C(w2)
Page 119
120
Indirect Algorithm
  • Given an equivalence class W, select relevant
    contexts and represent them in the feature space
  • Retrieve target words (w1, , wn) that appear in
    these contexts. These are likely to be words in
    the equivalence class W
  • Eventually, for each wi, retrieve C(wiI) from the
    corpus
  • Compute the centroid I(C(W))
  • For each for each wi,
  • if sim(I(C(W), wi)ltt, eliminate wi from W.

sim(w1,w2)?sim(C(w1), C(w2))
sim(w1,w2)?sim(I(C(w1)), I(C(w2)))
Context (Feature) Space
Words or Forms
C(w1)
w1 cat
w2 dog
C(w2)
Page 120
121
Iterative Algorithm
  1. For each word wi in the equivalence class W,
    retrieve the C(wi) contexts and represent them in
    the feature space
  2. Extract words wj that have contexts similar to
    C(wi)
  3. Extract contexts C(wj) of these new words
  4. For each for each new word wj, if sim(C(W),
    wj)gtt, put wj in W.

sim(w1,w2)?sim(C(w1), C(w2))
sim(w1,w2)?sim(I(C(w1)), I(C(w2)))
Context (Feature) Space
Words or Forms
C(w1)
w1 cat
w2 dog
Page 121
122
Knowledge Acquisition using DH and PAH
  • Direct Algorithms
  • Concepts from text via clustering (LinPantel,
    2001)
  • Inference rules aka DIRT (LinPantel, 2001)
  • Indirect Algorithms
  • Hearsts ISA patterns (Hearst, 1992)
  • Question Answering patterns (RavichandranHovy,
    2002)
  • Iterative Algorithms
  • Entailment rules from Web aka TEASE (Szepktor
    et al., 2004)
  • Espresso (PantelPennacchiotti, 2006)

Page 122
123
TEASE
(Szepktor et al., 2004)
  • Type Iterative algorithm
  • On the word side
  • Target equivalence classes fine-grained
    relations
  • Target forms verb with arguments
  • On the context side
  • Feature Space
  • Innovations with respect to reasearches lt 2004
  • First direct algorithm for extracting rules

prevent(X,Y)

X_fillermi?,Y_fillermi?
Page 123
124
TEASE
(Szepktor et al., 2004)
Lexicon
Input template X?subj-accuse-obj?Y
WEB
TEASE
Sample corpus for input template Paula Jones
accused Clinton BBC accused Blair Sanhedrin
accused St.Paul
Anchor Set Extraction(ASE)
Skipped
Anchor sets Paula Jones?subj
Clinton?obj Sanhedrin?subj St.Paul?obj
Template Extraction (TE)
Sample corpus for anchor sets Paula Jones called
Clinton indictable St.Paul defended before the
Sanhedrin
Templates X call Y indictableY defend before X
Page 124
iterate
125
TEASE
(Szepktor et al., 2004)
  • Innovations with respect to reasearches lt 2004
  • First direct algorithm for extracting rules
  • A feature selection is done to assess the most
    informative features
  • Extracted forms are clustered to obtain the most
    general sentence prototype of a given set of
    equivalent forms

Skipped
Page 125
126
Espresso
(PantelPennacchiotti, 2006)
  • Type Iterative algorithm
  • On the word side
  • Target equivalence classes relations
  • Target forms expressions, sequences of tokens
  • Innovations with respect to reasearches lt 2006
  • A measure to determine specific vs. general
    patterns (ranking in the equivalent forms)

compose(X,Y)
Y is composed by X, Y is made of X
Page 126
127
Espresso
(PantelPennacchiotti, 2006)
(leader , panel) (city , region) (oxygen , water)
1.0 (tree , land) 0.9 (atom, molecule) 0.7
(leader , panel) 0.6 (range of information, FBI
report) 0.6 (artifact , exhibit) 0.2 (oxygen ,
hydrogen)
Skipped
(tree , land) (oxygen , hydrogen) (atom,
molecule) (leader , panel) (range of information,
FBI report) (artifact , exhibit)
1.0 Y is composed by X 0.8 Y is part of X 0.2
X,Y
Y is composed by X X,Y Y is part of Y
Page 127
128
Espresso
(PantelPennacchiotti, 2006)
  • Innovations with respect to reasearches lt 2006
  • A measure to determine specific vs. general
    patterns (ranking in the equivalent forms)
  • Both pattern and instance selections are
    performed
  • Different Use of General and specific patter
Write a Comment
User Comments (0)
About PowerShow.com