RTE Stanford - PowerPoint PPT Presentation

About This Presentation

Title:

RTE Stanford

Description:

Jenny Finkel, Jeff Michels, Kristina Toutanova, Bill ... (NP (NNP Shrek) (CD 2)) (VP (VBD rang_up) (NP (QP ($ $) (CD 92) (CD million)))) MONEY, 9200000 ... – PowerPoint PPT presentation

Number of Views:80

Avg rating:3.0/5.0

Slides: 35

Provided by: christo394

Learn more at: https://nlp.stanford.edu

Category:

more less

Transcript and Presenter's Notes

Title: RTE Stanford

1
RTE _at_ Stanford

Rajat Raina, Aria Haghighi, Christopher Cox,
Jenny Finkel, Jeff Michels, Kristina Toutanova,
Bill MacCartney, Marie-Catherine de Marneffe,
Christopher D. Manning and Andrew Y. Ng
PASCAL Challenges Workshop
April 12, 2005

2
Our approach

Represent using syntactic dependencies
But also use semantic annotations.
Try to handle language variability.
Perform semantic inference over this
representation
Use linguistic knowledge sources.
Compute a cost for inferring hypothesis from
text.
Low cost ? Hypothesis is entailed.

3
Outline of this talk

Representation of sentences
Syntax Parsing and post-processing
Adding annotations on representation (e.g.,
semantic roles)
Inference by graph matching
Inference by abductive theorem proving
A combined system
Results and error analysis

4
Sentence processing

Parse with a standard PCFG parser. Klein
Manning, 2003
Al Qaeda Aal -Qa?ieda
Train on some extra sentences from recent news.
Used a high-performing Named Entity Recognizer
(next slide)
Force parse tree to be consistent with certain NE
tags.
Example American Ministry of Foreign Affairs
announced that Russia called the United States...
(S
(NP (NNP American_Ministry_of_Foreign_Affairs))
(VP (VBD announced)
()))

5
Named Entity Recognizer

Trained a robust conditional random field model.
Finkel et al., 2003
Interpretation of numeric quantity statements
Example
T Kessler's team conducted 60,643 face-to-face
interviews with adults in 14 countries.
H Kessler's team interviewed more than 60,000
adults in 14 countries. TRUE
Annotate numerical values implied by
6.2 bn, more than 60000, around 10,
MONEY/DATE named entities

6
Parse tree post-processing

Recognize collocations using WordNet
Example Shrek 2 rang up 92 million.
(S
(NP (NNP Shrek) (CD 2))
(VP (VBD rang_up)
(NP
(QP ( ) (CD 92) (CD million))))
(. .))

MONEY, 9200000
7
Parse tree ? Dependencies

Find syntactic dependencies
Transform parse tree representations into typed
syntactic dependencies, including a certain
amount of collapsing and normalization
Example Bills mother walked to the grocery
store.
subj(walked, mother)
poss(mother, Bill)
to(walked, store)
nn(store, grocery)
Dependencies can also be written as a logical
formula
mother(A) Bill(B) poss(B, A) grocery(C)
store(C) walked(E, A, C)

8
Representations
Logical formula mother(A) Bill(B) poss(B,
A) grocery(C) store(C) walked(E, A, C)

Dependency graph

VBD
PERSON
ARGM-LOC
VBD
PERSON

Can make representation richer
walked is a verb
Bill is a PERSON (named entity).
store is the location/destination of walked.

9
Annotations

Parts-of-speech, named entities
Already computed.
Semantic roles
Example
T C and D Technologies announced that it has
closed the acquisition of Datel, Inc.
H1 C and D Technologies acquired Datel Inc.
TRUE
H2 Datel acquired C and D Technologies. FALSE
Use a state-of-the-art semantic role classifier
to label verb arguments. Toutanova et al. 2005

10
More annotations

Coreference
Example
T Since its formation in 1948, Israel
H Israel was established in 1948. TRUE
Use a conditional random field model for
coreference detection.
Note Appositive references were previously
detected.
T Bush, the President of USA, went to Florida.
H Bush is the President of USA. TRUE
Other annotations
Word stems (very useful)
Word senses (no performance gain in our system)

11
Event nouns

Use a heuristic to find event nouns
Augment text representation using WordNet
derivational links.
Example
T witnessed the murder of police commander ...
H Police officer killed. TRUE
Text logical formula
murder(M) police_commander(P) of(M, P)
Augment
murder(E, M, P)

NOUN
VERB
12
Outline of this talk

Representation of sentences
Syntax Parsing and post-processing
Adding annotations on representation (e.g.,
semantic roles)
Inference by graph matching
Inference by abductive theorem proving
A combined system
Results and error analysis

13
Graph Matching Approach

Why Graph Matching?
Dependency tree has natural graphical
interpretation
Successful in other domains e.g., Lossy image
matching
Input Hypothesis (H) and Text (T) Graphs
Toy example
Vertices are words and phrases
Edges are labeled dependencies
Output Cost of matching H to T (next slide)

14
Graph Matching Idea

Idea Align H to T so that vertices are similar
and preserve relations (as in machine
translation)
A matching M is a mapping from vertices of H to
vertices of T
Thus, for each vertex v in H, M(v) is a vertex
in T

15
Graph Matching Costs

The cost of a matching MatchCost(M) measures the
quality of a matching M
VertexCost(M) Compare vertices in H with
matched vertices in T
RelationCost(M) Compare edges (relations) in H
with corresponding edges (relations) in T
MatchCost(M) (1 - ß) VertexCost(M) ß
RelationCost(M)

16
Graph Matching Costs

VertexCost(M)
For each vertex v in H, and vertex M(v) in T
Do vertex heads share same stem and/or POS ?
Is T vertex head a hypernym of H vertex head?
Are vertex heads similar phrases? (next slide)
RelationCost(M)
For each edge (v,v) in H, and edge (M(v),M(v))
in T
Are parent/child pairs in H parent/child in T ?
Are parent/child pairs in H ancestor/descendant
in T ?
Do parent/child pairs in H share a common
ancestor in T?

17
Digression Phrase similarity

Measures based on WordNet (Resnik/Lesk).
Distributional similarity
Example run and marathon are related.
Latent Semantic Analysis to discover words that
are distributionally similar (i.e., have common
neighbors).
Used a web-search based measure
Query google.com for all pages with
run
marathon
Both run and marathon
Learning paraphrases. Similar to DIRT Lin and
Pantel, 2001
World knowledge (labor intensive)
CEO Chief_Executive_Officer
Philippines ? Filipino
Can add common facts Paris is the capital of
France,

18
Graph Matching Costs

VertexCost(M)
For each vertex v in H, and vertex M(v) in T
Do vertex heads share same stem and or POS ?
Is T vertex head a hypernym of H vertex head?
Are vertex heads similar phrases? (next slide)
RelationCost(M)
For each edge (v,v) in H, and edge (M(v),M(v))
in T
Are parent/child pairs in H parent/child in T ?
Are parent/child pairs in H ancestor/descendant
in T ?
Do parent/child pairs in H share a common
ancestor in T?

19
Graph Matching Example
VertexCost (0.0 0.2 0.4)/3 0.2
RelationCost 0 (Graphs Isomorphic) ß 0.45
(say) MatchCost 0.55 (0.2) 0.45
(0.0) 0.11
20
Outline of this talk

Representation of sentences
Syntax Parsing and post-processing
Adding annotations on representation (e.g.,
semantic roles)
Inference by graph matching
Inference by abductive theorem proving
A combined system
Results and error analysis

21
Abductive inference

Idea
Represent text and hypothesis as logical
formulae.
A hypothesis can be inferred from the text if and
only if the hypothesis logical formula can be
proved from the text logical formula.
Toy example

Prove?
Allow assumptions at various costs BMW(t) 2
gt car(t) bought(p, q, r) 1 gt purchased(p, q,
r)
22
Abductive assumptions

Assign costs to all assumptions of the form

Build an assumption cost model

23
Abductive theorem proving

Each assumption provides a potential proof step.
Find the proof with the minimum total cost
Uniform cost search
If there is a low-cost proof, the hypothesis is
entailed.
Example
T John(A) BMW(B) bought(E, A, B) H John(x)
car(y) purchased(z, x, y)
Here is a possible proof by resolution
refutation (for the earlier costs)
0 -John(x) -car(y) -purchased(z, x, y) Given
negation of hypothesis
0 -car(y) -purchased(z, A, y) Unify with
John(A)
2 -purchased(z, A, B) Unify with BMW(B)
3 NULL Unify with purchased(E, A, B)
Proof cost 3

24
Abductive theorem proving

Can automatically learn good assumption costs
Start from a labeled dataset (e.g. the PASCAL
development set)
Intuition Find assumptions that are used in the
proofs for TRUE examples, and lower their costs
(by framing a log-linear model). Iterate.
Details Raina et al., in submission

25
Some interesting features

Examples of handling complex constructions in
graph matching/abductive inference.
Antonyms/Negation High cost for matching verbs,
if they are antonyms or one is negated and the
other not.
T Stocks fell. H
Stocks rose. FALSE
T Clintons book was not a hit H Clintons
book was a hit. FALSE
Non-factive verbs
T John was charged for doing X. H John
did X. FALSE
Can detect because doing in text has
non-factive charged as
a parent but did does not have such a parent.

26
Some interesting features

Superlative check
T This is the tallest tower in western Japan.
H This is the tallest tower in Japan. FALSE

27
Outline of this talk

Representation of sentences
Syntax Parsing and post-processing
Adding annotations on representation (e.g.,
semantic roles)
Inference by graph matching
Inference by abductive theorem proving
A combined system
Results and error analysis

28
Results

Combine inference methods
Each system produces a score.
Separately normalize each systems score
variance.
Suppose normalized scores are s1 and s2.
Final score S w1s1 w2s2
Learn classifier weights w1 and w2 on the
development set using logistic regression. Two
submissions
Train one classifier weight for all RTE tasks.
(General)
Train different classifier weights for each RTE
task. (ByTask)

29
Results