Investigating a Generic Paraphrasebased Approach for Relation Extraction - PowerPoint PPT Presentation

1 / 27
About This Presentation
Title:

Investigating a Generic Paraphrasebased Approach for Relation Extraction

Description:

Apposition. Y is activated by X. Passive form. Example ... Syntactic phenomenon: apposition, relative clause. Manual Analysis - Results. 5. X Y interaction ... – PowerPoint PPT presentation

Number of Views:27
Avg rating:3.0/5.0
Slides: 28
Provided by: milaCsTe
Category:

less

Transcript and Presenter's Notes

Title: Investigating a Generic Paraphrasebased Approach for Relation Extraction


1
Investigating a GenericParaphrase-based Approach
for Relation Extraction
  • Lorenza Romano1, Milen Kouylekov1, Idan
    Szpektor2, Ido Dagan2 andAlberto Lavelli1
  • 1 ITC-irst, Italy
  • 2Bar Ilan University, Israel
  • Published in EACL 2006

2
Variability of Semantic Expression
sunburns are prevented when using sunscreen
sunscreen guards against sunburns
sunscreen prevents sunburns
sunscreen cuts risk of sunburns
sunscreen for prevention of sunburns
3
Variability Recognition Major Inference in
Applications
Question Answering (QA)
Information Extraction (IE)
Information Retrieval (IR)
Multi Document Summarization (MDS)
4
Textual Entailment Model for Semantic
Variability
  • Directional relation between two text fragments
    Text (t) and Hypothesis (h)
  • Operational (applied) definition
  • As in NLP applications
  • Assuming common background knowledge

5
Entailment Paraphrase Rules
  • Major obstacle for entailment engines lack of
    background and linguistic knowledge resources
  • Entailment Rule - directional relation between
    two parse sub-trees with variables, where the
    first one entails the second
  • X prevent Y ? X lower the risk of Y
  • X prevent Y ? X guard against Y (paraphrases)
  • Need large knowledgebase of entailment rules!
  • Several approaches for automatic acquisition of
    paraphrases entailment rules
  • Difficult to evaluate

6
Relation Extraction
  • Subfield of Information Extraction
  • Identify different ways of expressing a target
    relation
  • Examples Management Succession, Birth - Death,
    Mergers and Acquisitions, Protein Interaction
  • Traditionally performed in a supervised manner
  • Requires dozens-hundreds examples per relation
  • Examples should cover broad semantic variability
  • Costly - Feasible???
  • Little work on unsupervised approaches
  • May be viewed as an entailment problem
  • Sunscreen guards against sunburns ? Sunscreen
    prevents sunburns

7
Our Goals
Entailment Approach for Relation Extraction
Unsupervised Relation Extraction System
Evaluation Framework for Entailment Rule
Acquisition
8
Proposed Approach
Input Template X prevent Y
Entailment Rule Acquisition
TEASE
Templates X prevention for Y, X treat Y, X reduce
Y
TransformationRules
Syntactic Matcher
Relation Instances ltsunscreen, sunburnsgt
9
TEASE Algorithm
(Szpektor et al., 2004) Unsupervised acquisition
of entailment rules
Input template X?subj-prevent-obj?Y
WEB
TEASE
Sample corpus for input template Aspirin
prevents Heart Attack Universal precaution
prevents HIV Safety devices prevent fatal
injuries
Anchor Set Extraction(ASE)
Anchor sets Aspirin Heart Attack Safety
devices fatal injuries
Template Extraction (TE)
Sample corpus for anchor sets Aspirin protects
against Heart Attack Safety devices reduce
fatal injuries
Templates X protect against Y X reduce Y
iterate
10
Example Output
Correct templates learned by TEASE for X prevent
Y
11
Syntactic Matching
X prevent Y
Sunscreen, which prevents moles and sunburns,
should be used regularly.
  • Syntactic Matcher is needed
  • Should handle syntactic variability
  • Our approach transformation rules

12
Syntactic Matcher - Examplefor X prevent Y
  • Sunscreen, which prevents moles and sunburns, .

sunscreen
prevent
obj
subj
Y
X
which
subj
prevents
obj
()
moles
mod
conj
and
sunburns
13
Syntactic Variability Phenomena
Template X activate Y
14
Experiment Dataset
  • (Bunescu 2005)
  • Task Recognizing interactions between annotated
    proteins pairs
  • 200 Medline abstracts
  • Gold standard dataset protein pairs
  • Input template X interact with Y
  • Randomly split abstracts
  • 60 development set
  • 40 test set

15
Manual Dataset Analysisfor Development Set
Motivations
  • Assess the recall potential of our approach,
    based on lexical syntactic templates
  • Assess the recall potential of the TEASE
    acquisition algorithm
  • Distribution of syntactic phenomena

16
Manual Dataset Analysisfor Development Set
Example
  • Sentence iCdi1, a human G1 and S phase
    protein phosphatase that associates with Cdk2
  • Manual annotation
  • Template X associate with Y
  • Syntactic phenomenon apposition, relative clause

17
Manual Analysis - Results
93 of the interacting protein pairs can be
identified using lexical syntactic templates
18
Manual Analysis - Results (2)
Number of templates vs. recall (within 93)
19
Manual Analysis - Results (3)
Occurrence percentage of each syntactic
phenomenon
20
TEASE Output for X interact with Y
A sample of correct templates learned
21
TEASE Algorithm Potential Recall on Development
Set
  • Iterative - taking the top 5 ranked template as
    input
  • Morph - assuming a morphological derivation engine

22
TEASE Potential Recall (2)
Coverage for most frequent templates in the
training set - with input - with iterative
- with morphological
binding of X to Y
X bind Y
association of X with Y
interaction of X with Y
Xs association with Y
23
Results of Full System
  • Problems
  • Dependency parser problems
  • TEASE precision (Incorrect Templates)
  • Some syntactic variations we didnt cover
  • No morphological derivation engine

24
Vs. Supervised Approaches
25
Conclusions
  • Modeling semantic variability - Textual
    Entailment Framework
  • Unsupervised domain independent RE approach
  • Potential Coverage - 93 of relation instances
  • No additional training required for new relations
  • Evaluation framework for entailment acquisition
    algorithms
  • RE-based TEASE evaluation
  • Potential Coverage - 63 of relation instances
  • Encouraging unsupervised performance

26
Future Work
  • Improving syntactic matcher
  • Acquiring and learning morphological and
    syntactic variations
  • Improving TEASE
  • Recall using syntactic matcher during
    acquisition
  • Precision directional probabilistic estimation
  • Experiment with many relations
  • Fully automatic acquisition for new relations
  • No annotation!

27
Thank You!Questions?
Write a Comment
User Comments (0)
About PowerShow.com