Title: Investigating a Generic Paraphrasebased Approach for Relation Extraction
1Investigating a GenericParaphrase-based Approach
for Relation Extraction
- Lorenza Romano1, Milen Kouylekov1, Idan
Szpektor2, Ido Dagan2 andAlberto Lavelli1 - 1 ITC-irst, Italy
- 2Bar Ilan University, Israel
- Published in EACL 2006
2Variability of Semantic Expression
sunburns are prevented when using sunscreen
sunscreen guards against sunburns
sunscreen prevents sunburns
sunscreen cuts risk of sunburns
sunscreen for prevention of sunburns
3Variability Recognition Major Inference in
Applications
Question Answering (QA)
Information Extraction (IE)
Information Retrieval (IR)
Multi Document Summarization (MDS)
4Textual Entailment Model for Semantic
Variability
- Directional relation between two text fragments
Text (t) and Hypothesis (h)
- Operational (applied) definition
- As in NLP applications
- Assuming common background knowledge
5Entailment Paraphrase Rules
- Major obstacle for entailment engines lack of
background and linguistic knowledge resources - Entailment Rule - directional relation between
two parse sub-trees with variables, where the
first one entails the second - X prevent Y ? X lower the risk of Y
- X prevent Y ? X guard against Y (paraphrases)
- Need large knowledgebase of entailment rules!
- Several approaches for automatic acquisition of
paraphrases entailment rules - Difficult to evaluate
6Relation Extraction
- Subfield of Information Extraction
- Identify different ways of expressing a target
relation - Examples Management Succession, Birth - Death,
Mergers and Acquisitions, Protein Interaction - Traditionally performed in a supervised manner
- Requires dozens-hundreds examples per relation
- Examples should cover broad semantic variability
- Costly - Feasible???
- Little work on unsupervised approaches
- May be viewed as an entailment problem
- Sunscreen guards against sunburns ? Sunscreen
prevents sunburns
7Our Goals
Entailment Approach for Relation Extraction
Unsupervised Relation Extraction System
Evaluation Framework for Entailment Rule
Acquisition
8Proposed Approach
Input Template X prevent Y
Entailment Rule Acquisition
TEASE
Templates X prevention for Y, X treat Y, X reduce
Y
TransformationRules
Syntactic Matcher
Relation Instances ltsunscreen, sunburnsgt
9TEASE Algorithm
(Szpektor et al., 2004) Unsupervised acquisition
of entailment rules
Input template X?subj-prevent-obj?Y
WEB
TEASE
Sample corpus for input template Aspirin
prevents Heart Attack Universal precaution
prevents HIV Safety devices prevent fatal
injuries
Anchor Set Extraction(ASE)
Anchor sets Aspirin Heart Attack Safety
devices fatal injuries
Template Extraction (TE)
Sample corpus for anchor sets Aspirin protects
against Heart Attack Safety devices reduce
fatal injuries
Templates X protect against Y X reduce Y
iterate
10Example Output
Correct templates learned by TEASE for X prevent
Y
11Syntactic Matching
X prevent Y
Sunscreen, which prevents moles and sunburns,
should be used regularly.
- Syntactic Matcher is needed
- Should handle syntactic variability
- Our approach transformation rules
12Syntactic Matcher - Examplefor X prevent Y
- Sunscreen, which prevents moles and sunburns, .
sunscreen
prevent
obj
subj
Y
X
which
subj
prevents
obj
()
moles
mod
conj
and
sunburns
13Syntactic Variability Phenomena
Template X activate Y
14Experiment Dataset
- (Bunescu 2005)
- Task Recognizing interactions between annotated
proteins pairs - 200 Medline abstracts
- Gold standard dataset protein pairs
- Input template X interact with Y
- Randomly split abstracts
- 60 development set
- 40 test set
15Manual Dataset Analysisfor Development Set
Motivations
- Assess the recall potential of our approach,
based on lexical syntactic templates - Assess the recall potential of the TEASE
acquisition algorithm - Distribution of syntactic phenomena
16Manual Dataset Analysisfor Development Set
Example
- Sentence iCdi1, a human G1 and S phase
protein phosphatase that associates with Cdk2 - Manual annotation
- Template X associate with Y
- Syntactic phenomenon apposition, relative clause
17Manual Analysis - Results
93 of the interacting protein pairs can be
identified using lexical syntactic templates
18Manual Analysis - Results (2)
Number of templates vs. recall (within 93)
19Manual Analysis - Results (3)
Occurrence percentage of each syntactic
phenomenon
20TEASE Output for X interact with Y
A sample of correct templates learned
21TEASE Algorithm Potential Recall on Development
Set
- Iterative - taking the top 5 ranked template as
input - Morph - assuming a morphological derivation engine
22TEASE Potential Recall (2)
Coverage for most frequent templates in the
training set - with input - with iterative
- with morphological
binding of X to Y
X bind Y
association of X with Y
interaction of X with Y
Xs association with Y
23Results of Full System
- Problems
- Dependency parser problems
- TEASE precision (Incorrect Templates)
- Some syntactic variations we didnt cover
- No morphological derivation engine
24Vs. Supervised Approaches
25Conclusions
- Modeling semantic variability - Textual
Entailment Framework - Unsupervised domain independent RE approach
- Potential Coverage - 93 of relation instances
- No additional training required for new relations
- Evaluation framework for entailment acquisition
algorithms - RE-based TEASE evaluation
- Potential Coverage - 63 of relation instances
- Encouraging unsupervised performance
26Future Work
- Improving syntactic matcher
- Acquiring and learning morphological and
syntactic variations - Improving TEASE
- Recall using syntactic matcher during
acquisition - Precision directional probabilistic estimation
- Experiment with many relations
- Fully automatic acquisition for new relations
- No annotation!
27Thank You!Questions?