Title: Feature Vector Quality and Distributional Similarity
1Textual EntailmentA Perspective on Applied
Text Understanding
Ido Dagan Bar-Ilan University, Israel Joint
works with Oren Glickman, Idan Szpektor, Roy Bar
Haim Bar Ilan University, Israel Maayan
Geffet Hebrew University, Israel Hristo Tanev,
Bernardo Magnini, Alberto Lavelli, Lorenza
Romano ITC-irst, Italy Bonaventura Coppola and
Milen Kouylekov University of Trento
and ITC-irst, Italy
2Talk Focus A Framework for Applied Semantics
- The textual entailment task what and why?
- Empirical evaluation PASCAL RTE Challenge
- Problem scope, decomposition and analysis
- Different perspective on semantic inference
- Probabilistic framework
- Cf. syntax, MT clear task, methodology and
community
3Natural Language and Meaning
Meaning
Language
4Variability of Semantic Expression
All major stock markets surged
Dow gains 255 points
Dow ends up
Stock market hits a record high
Dow climbs 255
The Dow Jones Industrial Average closed up 255
5Variability Recognition Major Inference in
Applications
Question Answering (QA)
Information Extraction (IE)
Information Retrieval (IR)
Multi Document Summarization (MDS)
6Typical Application Inference
Question Expected answer formWho bought
Overture? gtgt X bought Overture
Overtures acquisition by Yahoo
Yahoo bought Overture
hypothesized answer
text
- Similar for IE X buy Y
- Similar for semantic IR t Overture was
bought - Summarization (multi-document) identify
redundant info - MT evaluation (and recent proposals for MT?)
7KRAQ'05 Workshop - KNOWLEDGE and REASONING for
ANSWERING QUESTIONS (IJCAI-05)
- CFP
- Reasoning aspects information fusion,
search criteria expansion models
summarization and intensional answers,
reasoning under uncertainty or with incomplete
knowledge, - Knowledge representation and integration
levels of knowledge involved (e.g. ontologies,
domain knowledge), knowledge
extraction models and techniques to
optimize response accuracy, coherence and
integration.
8Inference for Textual Question Answering Workshop
(AAAI-05)
- CFP
- abductions, default reasoning, inference with
epistemic logic or description logic - inference methods for QA need to be robust, cover
all ambiguities of language - available knowledge sources that can be used for
inference - but similar needs for other applications can
we address a uniform empirical task?
9Applied Textual Entailment Abstract Semantic
Variability Inference
Hypothesis (h) John Wayne was born in Iowa
- QA Where was John Wayne Born?
- Answer Iowa
inference
Text (t) The birthplace of John Wayne is in Iowa
10The Generic Entailment Task
Hypothesis (h) John Wayne was born in Iowa
- Given the text t, can we infer that h is (most
likely) true?
inference
Text (t) The birthplace of John Wayne is in Iowa
11Classical Entailment Definition
- Chierchia McConnell-Ginet (2001)A text t
entails a hypothesis h if h is true in every
circumstance (possible world) in which t is true - Strict entailment - doesn't account for some
uncertainty allowed in applications
12Almost certain Entailments
- t The technological triumph known as GPS was
incubated in the mind of Ivan Getting. - h Ivan Getting invented the GPS.
- t According to the Encyclopedia Britannica,
Indonesia is the largest archipelagic nation in
the world, consisting of 13,670 islands. - h 13,670 islands make up Indonesia.
13Textual Entailment Human Reading Comprehension
- From a childrens English learning book(Sela and
Greenberg) - Reference Text The Bermuda Triangle lies in
the Atlantic Ocean, off the coast of Florida. - Hypothesis (True/False?) The Bermuda Triangle is
near the United States
???
14Reading Comprehension QA
- By Canadian Broadcasting Corporation
- T The school has turned its one-time metal shop
lost to budget cuts almost two years ago -
into a money-making professional fitness club. - Q When did the metal shop close?
- A Almost two years ago
15Recognizing Textual Entailment (RTE)
ChallengePASCAL NOE Challenge2004-5
Ido Dagan, Oren glickman Bar-Ilan University,
Israel Bernardo Magnini ITC-irst, Trento, Italy
16Generic Dataset by Application Use
- QA
- IE
- Similar for semantic IR Overture was
acquired by Yahoo - Comparable documents (summarization)
- MT evaluation
- Reading comprehension
- Paraphrase acquisition
17Some Examples
TEXT HYPOTHESIS TASK ENTAIL-MENT
1 iTunes software has seen lower sales in Europe. Strong sales for iTunes in Europe. IR False
2 Cavern Club sessions paid the Beatles 15 evenings and 5 lunchtime. The Beatles perform at Cavern Club at lunchtime. IR True
3 a shootout at the Guadalajara airport in May, 1993, that killed Cardinal Juan Jesus Posadas Ocampo and six others. Cardinal Juan Jesus Posadas Ocampo died in 1993. QA True
- 567 development examples, 800 test examples
18Dataset Characteristics
- Examples selected and annotated manually
- Using automatic systems where available
- Balanced True/False split
- True certain or highly probable entailment
- Filtering controversial examples
- Example distribution?
- Mode explorative rather than competitive
19Arthur Bernstein Competition
- Competition, even a piano competition, is
legitimate as long as it is just an anecdotal
side effect of the musical culture scene, and
doesnt threat to overtake the center stage - Haaretz News Paper
- Culture Section, April 1st, 2005
20Submissions
- 17 participating groups
- 26 system submissions
- Microsoft Research manual analysis of dataset at
lexical-syntactic matching level
21Broad Range of System Types
- Knowledge sources and inferences
- Direct t-h matching
- Word overlap / Syntactic tree matching
- Lexical relations
- WordNet statistical (corpus based)
- Theorem Provers / Logical inference
- Adding a fuzzy scoring mechanism
- Supervised / unsupervised learning methods
22(No Transcript)
23Accuracy
24Where are we?
25Whats next RTE-2
- Organizers
- Bar Ilan, CELCT (Trento), MITRE, MS-Research
- Main dataset utilizing real systems outputs
- QA, IE, IR, summarization
- Human performance dataset
- Reading comprehension, human QA (planned)
- Schedule (RTE website)
- October development set
- February results submission (test set January)
- April 10 PASCAL workshop in Venice!
- right after EACL
26Other Evaluation Modes
- Entailment subtasks evaluations
- Lexical, lexical-syntactic, alignment
- Seek mode
- Input h and corpus
- Output All entailing ts in corpus
- Captures nicely information seeking needs, but
requires post-run annotation (like TREC) - Contribution to specific applications
27Decomposition ofEntailment Levels
Empirical Modeling of Meaning Equivalence and
Entailment ACL-05 Workshop Roy Bar-Haim
Idan Szpektor Oren Glickman Bar-Ilan University
28Why?
- Entailment Modeling is Complex!!
- Was apparent at RTE1
- How can we decompose it, for
- Better analysis and sub-task modeling
- Piecewise evaluation
- Avoid this is the performance of my complex
system methodology
29Combination of Inference Types
The oddest thing about the UAE is that only 500,000 of the 2 million people living in the country are UAE citizens. T
The population of the United Arab Emirates is 2 million. H
T ? H
30Combination of Inference Types
The oddest thing about the UAE is that only 500,000 of the 2 million people living in the country are UAE citizens.
The oddest thing about the UAE is that only 500,000 of the 2 million people living in the UAE are UAE citizens.
2 million people live in UAE.
The population of the UAE is 2 million.
The population of the United Arab Emirates is 2 million
T
Co-reference
Syntactic trans.
paraphrasing
Lexical world knowledge
H
Diverse inference types, different levels of
representation
31Defining Intermediate Models
- Lexical
- Lexical-syntactic
32Lexical Model
- T and H are represented as bag of terms
- T ?L H if
- for each term u ? H there exists a term v ? T
such that v ?L u - v ?L u if
- they share the same lemma and POS
- OR
- they are connected by a chain of lexical
transformations
33Lexical Transformations
acquisition ? acquire terrorist ? terror Morphological derivations
Synonyms (buy ? acquire) Hypernyms (produce ? make) Meronym (executive ? company) Ontological relations
Bill Gates ? Microsofts founder kill ? die Lexical world knowledge
- We assume perfect word sense disambiguation
34Lexical Entailment - Examples
Crude oil prices soared to record levels T
Crude oil prices rise H
?
T?LH
35Lexical Entailment - Examples
Crude oil prices soared to record levels T
Crude oil prices rise. H
36Lexical Entailment - Examples
Crude oil prices soared to record levels T
Crude oil prices rise H
Synonym
37Lexical Entailment - Examples
Crude oil prices soared to record levels T
Crude oil prices rise H
Synonym
T?LH ?
38Lexical Entailment - Examples
A coyote was shot after biting girl in park T
A girl was shot in a park H
?
T?LH
39Lexical Entailment - Examples
A coyote was shot after biting girl in Vanier Park T
girl was shot in a park A H
T?LH ?
40Lexical-Syntactic Model
- T and H are represented by syntactic dependency
relations - T ?LS H if the relations within H can be matched
by the relations in T - The coverage can be obtained through a sequence
of lexical-syntactic transformations
41Lexical-Syntactic Transformations
Synonyms, hypernyms, etc. (as before) Lexical
Active/Passive Apposition do not change lexical elements Syntactic
X take in Y ? Y join X X is Y man by birth ? X was born in Y change both lexical elements and structure Lexical-synt. Entailment Paraphrases
The country ? UAE Co-reference
- We assume perfect disambiguation and reference
resolution
42Lexical-Syntactic Entailment - Examples
subj
Crude oil prices soared to record levels T
Crude oil prices rise H
subj
T?LSH ?
43Lexical-Syntactic Entailment - Examples
subj
A Coyote was shot after biting girl in Vanier Park T
A girl was shot in a park H
subj
T?LSH ?
44Beyond Lexical-Syntactic Models
The SPD got just 21.5 of the vote in the European Parliament elections, while the conservative opposition parties polled 44.5 T
The SPD was defeated by the opposition parties. H
45Empirical Analysis
46Annotation
- 240 T-H pairs of RTE1 dataset
- T ?L H T ?LS H
- High annotator agreement (authors)
Kappa Agreement Entailment Model
0.78 89.6 Lexical
0.73 88.8 Lexical-Syntactic
- Kappa substantial agreement
47Model evaluation results
F1 Precision Recall Model
0.50 59 44 Lexical
0.63 86 50 Lexical Syntactic
- Low precision for Lexical model
- Lexical match fails to predict entailment
- High precision for Lexical Syntactic model
- Checking syntactic relations is crucial
- Medium recall for both levels
- Higher levels of inference are missing
48contribution of individual components RTE 1
positive examples
?R f Inference type
16 14 19 Synonym
14 10 16 Morphological
10 8 12 Lexical world knowledge
6 4 7 Hypernym
1 1 1 Meronym
31 26 37 Entailment paraphrases
19 17 22 Syntactic Transformations
8 5 10 Co-reference
Lexical
Lex-Syn
49Summary (1)
- Annotating and analaysing entailment components
- Guide research on entailment
- Opens new research problems and redirects old
ones
50Summary (2)
- Allows better evaluation of systems
- Performance of individual components
- Future work expand analysis to additional
levels of representation and inferences - Identify the exciting semantic phenomena
51A Different Perspective on Semantic Inference
52Text Mapping vs. Interpretation
- Focus on the entailment relation as a (directed)
mapping between language expressions - Identify the contextual constraints for mappings
- Vs. interpret language into meaning
representations (explicitly stipulated senses,
logical form, etc.) - Can still be a mean, rather than goal
- How far (faster) can we get?
- Cf. MT direct, transfer, interlingua
53Making sense of (implicit) senses
- What is the RIGHT set of senses?
- Any concrete set is problematic/subjective
- but WSD forces you to choose one
- A lexical entailment perspective
- Instead of identifying an explicitly stipulated
sense of a word occurrence - identify whether a word occurrence (i.e. its
implicit sense) entails another word occurrence,
in context
54Thats what applications need
- Lexical matching recognize sense equivalence
Q announcement of new models of chairs
T1 IKEA announced a new comfort chair
T2 MIT announced a new CS chair position
- Lexical expansion Recognize sense entailment
Q announcement of new models of furniture
T1 IKEA announced a new comfort chair
T2 MIT announced a new CS chair position
55Bottom Line
- Address semantic inference as text mapping,
rather than interpretation - From applications perspective - interpretation
may be a mean, not the goal - we shouldnt create artificial problems, which
might be harder than those we need to solve
56Probabilistic Framework forTextual Entailment
Oren Glickman, Ido Dagan,Moshe Koppel and Jacob
Goldberger Bar Ilan University ACL-05 Workshop,
AAAI-05
57Motivation
- Approach entailment uncertainty by principled
probabilistic models - Following success of statistical MT, parsing,
language modeling etc. - Integrating inferences and knowledge sources
- Vs. ad-hoc scoring
- Need to define concrete probability space
- Generative model
58Notation
- t -- a text (t ?T)
- h -- a hypothesis (h ? H)
- propositional statements which can be assigned a
truth value - w H ? true, false -- a possible world
- truth assignment for every hypothesis
59A Generative Model
- We assume a probabilistic generative model
- generation event of ltt,wgt a text along with a
(hidden) possible world - based on a joint probability distribution
John was born in France (t)
John Speaks French ? 1John was born in Paris
? 1 John likes fois gras ? 0 John is
married to Alice ? 1 (w)
Hidden Possible World (w)
60Probabilities
- For a given text t and hypothesis h, we consider
the following probabilities - P(Trh1)
- Probability that h is assigned a truth value of 1
in a generated ltt,wgt pair - P(Trh1 t)
- Probability that h is assigned a truth value of 1
given that the corresponding text is t
61Probabilistic Textual Entailment
- Definition
- t probabilistically entails h if
- P(Trh 1 t) gt P(Trh 1)
- t increases the likelihood of h being true
- Positive PMI t provides information on hs
truth - P(Trh 1 t) entailment confidence
- The relevant entailment score for applications
- In practice high confidence required
62Setting Properties (1)
- Logical vs. Textual Entailment
- Logical entailment proposition ? proposition
- Textual entailment text ? text
- Conditioning on generation of texts rather than
on propositional values - Davids father was born in Italy ? David was born
in Italy - Possible ambiguities of the texts are taken into
account - Play baseball with a bat ? play baseball with an
animal
63Setting Properties (2)
- We do not distinguish between inferences that are
based on - language semantics e.g. murdering ? killing
- vs. domain or world knowledge
- e.g. live in Paris ? live in France
- Setting accounts for all causes of uncertainty
64Setting Properties (3)
- for a given text t and hypothesis h
- ?h P(Trh1t) ? 1
- But rather
- P(Trh1t) P(Trh0 t) 1
- Vs. generative language models (cf. speech, MT,
LM for IR)
65Having a probability space
- we can now define concrete probabilistic models
for various entailment phenomena
66Initial Lexical Models
- Alignment-based (ACL-05 Workshop)
- The probability that a term in h is entailed by a
particular term in t - Bayesian classification (AAAI-05)
- The probability that a term in h is entailed by
(fits in) the entire text of t - An unsupervised text categorization setting (with
EM) each term is a category - Demonstrate directions for probabilistic modeling
and unsupervised estimation
67Additional WorkAcquiring Entailment Relations
- Lexical (Geffet and Dagan, 2004/2005)
- A clear goal for distributional similarity
- Obtain characteristic features via bootstrapping
- Test characteristic feature inclusion (vs.
overlap) - Lexical Syntactic TEASE (Szpektor et al. 2004)
- Deduce entailment from joint anchor sets
- Initial prospects for unsupervised IE
- Next obtain probabilities for these entailment
rules
68Conclusions Textual entailment
- Provides a framework for semantic inference
- Application-independent abstraction
- Text mapping rather than interpretation
- Raises interesting problems to work on
- Amenable for empirical evaluation and
decomposition - May be modeled in principled probabilistic terms
Thank you!
69Textual Entailment References
- Workshops
- PASCAL Challenges Workshop for Recognizing
Textual Entailment, 2005http//www.cs.biu.ac.il/
glikmao/rte05/index.htmlNote see 2nd RTE
Challenge at http//www.cs.biu.ac.il/barhair/RTE2
/ - ACL 2005 Workshop on Empirical Modeling
of Semantic Equivalence and Entailment, 2005 - http//acl.ldc.upenn.edu/W/W05/W05-1200
- Papers from recent conferences and workshops
- J. Bos K. Markert. 2005. Recognising Textual
Entailment with Logical Inference. Proceedings of
EMNLP 2005. - R. Braz, R. Girju, V. Punyakanok, D. Roth, and M.
Sammons. 2005. An Inference Model for Semantic
Entailment in Natural Language. Twentieth
National Conference on Artificial Intelligence
(AAAI-05) - R. Braz, R. Girju, V. Punyakanok, D. Roth, and M.
Sammons. 2005. Knowledge Representation for
Semantic Entailment and Question-Answering.
IJCAI-05 Workshop on Knowledge and Reasoning for
Answering Questions. - C. Corley, A. Csomai and R. Mihalcea. Text
Semantic Similarity, with Applications. RANLP-05. - I. Dagan and O. Glickman. 2004. Probabilistic
textual entailment Generic applied modeling of
language variability. In PASCAL Workshop on
Learning Methods for Text Understanding and
Mining, Grenoble.
70Textual Entailment References (2)
- M. Geffet and I. Dagan. Feature Vector Quality
and Distributional Similarity. Proceedings of The
20th International Conference on Computational
Linguistics (COLING), 2004. - M. Geffet and I. Dagan. 2005. "The Distributional
Inclusion Hypotheses and Lexical Entailment", ACL
2005, Michigan, USA. - O. Glickman, I. Dagan and M. Koppel. 2005. A
Probabilistic Classification Approach for Lexical
Textual Entailment, Twentieth National Conference
on Artificial Intelligence (AAAI-05) - A. Haghighi, A. Y. Ng, and C. D. Manning. 2005.
Robust Textual Inference via Graph Matching.
HLT-EMNLP 2005. - M. Kouylekov and B. Magnini. 2005. Tree Edit
Distance for Textual Entailment. RANLP 2005. - R. Raina, A. Y. Ng, and C. Manning. 2005. Robust
textual inference via learning and abductive
reasoning. Twentieth National Conference on
Artificial Intelligence (AAAI-05) - V. Rus, A. Graesser and K. Desai. 2005.
Lexico-Syntactic Subsumption for Textual
Entailment. RANLP 2005. - M. Tatu and D. Moldovan. 2005. A Semantic
Approach to Recognizing Textual Entailment.
HLT-EMNLP 2005. - We would be glad to receive more references on
textual entailment. Please send them to
barhair_at_cs.biu.ac.il