Title: RTE Stanford
1RTE _at_ Stanford
- Rajat Raina, Aria Haghighi, Christopher Cox,
- Jenny Finkel, Jeff Michels, Kristina Toutanova,
- Bill MacCartney, Marie-Catherine de Marneffe,
- Christopher D. Manning and Andrew Y. Ng
- PASCAL Challenges Workshop
- April 12, 2005
2Our approach
- Represent using syntactic dependencies
- But also use semantic annotations.
- Try to handle language variability.
- Perform semantic inference over this
representation - Use linguistic knowledge sources.
- Compute a cost for inferring hypothesis from
text. - Low cost ? Hypothesis is entailed.
3Outline of this talk
- Representation of sentences
- Syntax Parsing and post-processing
- Adding annotations on representation (e.g.,
semantic roles) - Inference by graph matching
- Inference by abductive theorem proving
- A combined system
- Results and error analysis
4Sentence processing
- Parse with a standard PCFG parser. Klein
Manning, 2003 - Al Qaeda Aal -Qa?ieda
- Train on some extra sentences from recent news.
- Used a high-performing Named Entity Recognizer
(next slide) - Force parse tree to be consistent with certain NE
tags. - Example American Ministry of Foreign Affairs
announced that Russia called the United States... - (S
- (NP (NNP American_Ministry_of_Foreign_Affairs))
- (VP (VBD announced)
- ()))
5Named Entity Recognizer
- Trained a robust conditional random field model.
Finkel et al., 2003 - Interpretation of numeric quantity statements
- Example
- T Kessler's team conducted 60,643 face-to-face
interviews with adults in 14 countries. - H Kessler's team interviewed more than 60,000
adults in 14 countries. TRUE - Annotate numerical values implied by
- 6.2 bn, more than 60000, around 10,
- MONEY/DATE named entities
6Parse tree post-processing
- Recognize collocations using WordNet
- Example Shrek 2 rang up 92 million.
- (S
- (NP (NNP Shrek) (CD 2))
- (VP (VBD rang_up)
- (NP
- (QP ( ) (CD 92) (CD million))))
- (. .))
MONEY, 9200000
7Parse tree ? Dependencies
- Find syntactic dependencies
- Transform parse tree representations into typed
syntactic dependencies, including a certain
amount of collapsing and normalization - Example Bills mother walked to the grocery
store. - subj(walked, mother)
- poss(mother, Bill)
- to(walked, store)
- nn(store, grocery)
- Dependencies can also be written as a logical
formula - mother(A) Bill(B) poss(B, A) grocery(C)
store(C) walked(E, A, C)
8Representations
Logical formula mother(A) Bill(B) poss(B,
A) grocery(C) store(C) walked(E, A, C)
VBD
PERSON
ARGM-LOC
VBD
PERSON
- Can make representation richer
- walked is a verb
- Bill is a PERSON (named entity).
- store is the location/destination of walked.
-
-
9Annotations
- Parts-of-speech, named entities
- Already computed.
- Semantic roles
- Example
- T C and D Technologies announced that it has
closed the acquisition of Datel, Inc. - H1 C and D Technologies acquired Datel Inc.
TRUE - H2 Datel acquired C and D Technologies. FALSE
- Use a state-of-the-art semantic role classifier
to label verb arguments. Toutanova et al. 2005
10More annotations
- Coreference
- Example
- T Since its formation in 1948, Israel
- H Israel was established in 1948. TRUE
- Use a conditional random field model for
coreference detection. - Note Appositive references were previously
detected. - T Bush, the President of USA, went to Florida.
- H Bush is the President of USA. TRUE
- Other annotations
- Word stems (very useful)
- Word senses (no performance gain in our system)
11Event nouns
- Use a heuristic to find event nouns
- Augment text representation using WordNet
derivational links. - Example
- T witnessed the murder of police commander ...
- H Police officer killed. TRUE
- Text logical formula
- murder(M) police_commander(P) of(M, P)
- Augment
- murder(E, M, P)
NOUN
VERB
12Outline of this talk
- Representation of sentences
- Syntax Parsing and post-processing
- Adding annotations on representation (e.g.,
semantic roles) - Inference by graph matching
- Inference by abductive theorem proving
- A combined system
- Results and error analysis
13Graph Matching Approach
- Why Graph Matching?
- Dependency tree has natural graphical
interpretation - Successful in other domains e.g., Lossy image
matching - Input Hypothesis (H) and Text (T) Graphs
- Toy example
- Vertices are words and phrases
- Edges are labeled dependencies
- Output Cost of matching H to T (next slide)
14Graph Matching Idea
- Idea Align H to T so that vertices are similar
and preserve relations (as in machine
translation) - A matching M is a mapping from vertices of H to
vertices of T - Thus, for each vertex v in H, M(v) is a vertex
in T
15Graph Matching Costs
- The cost of a matching MatchCost(M) measures the
quality of a matching M - VertexCost(M) Compare vertices in H with
matched vertices in T - RelationCost(M) Compare edges (relations) in H
with corresponding edges (relations) in T - MatchCost(M) (1 - ß) VertexCost(M) ß
RelationCost(M)
16Graph Matching Costs
- VertexCost(M)
- For each vertex v in H, and vertex M(v) in T
- Do vertex heads share same stem and/or POS ?
- Is T vertex head a hypernym of H vertex head?
- Are vertex heads similar phrases? (next slide)
- RelationCost(M)
- For each edge (v,v) in H, and edge (M(v),M(v))
in T - Are parent/child pairs in H parent/child in T ?
- Are parent/child pairs in H ancestor/descendant
in T ? - Do parent/child pairs in H share a common
ancestor in T?
17Digression Phrase similarity
- Measures based on WordNet (Resnik/Lesk).
- Distributional similarity
- Example run and marathon are related.
- Latent Semantic Analysis to discover words that
are distributionally similar (i.e., have common
neighbors). - Used a web-search based measure
- Query google.com for all pages with
- run
- marathon
- Both run and marathon
- Learning paraphrases. Similar to DIRT Lin and
Pantel, 2001 - World knowledge (labor intensive)
- CEO Chief_Executive_Officer
- Philippines ? Filipino
- Can add common facts Paris is the capital of
France,
18Graph Matching Costs
- VertexCost(M)
- For each vertex v in H, and vertex M(v) in T
- Do vertex heads share same stem and or POS ?
- Is T vertex head a hypernym of H vertex head?
- Are vertex heads similar phrases? (next slide)
- RelationCost(M)
- For each edge (v,v) in H, and edge (M(v),M(v))
in T - Are parent/child pairs in H parent/child in T ?
- Are parent/child pairs in H ancestor/descendant
in T ? - Do parent/child pairs in H share a common
ancestor in T?
19Graph Matching Example
VertexCost (0.0 0.2 0.4)/3 0.2
RelationCost 0 (Graphs Isomorphic) ß 0.45
(say) MatchCost 0.55 (0.2) 0.45
(0.0) 0.11
20Outline of this talk
- Representation of sentences
- Syntax Parsing and post-processing
- Adding annotations on representation (e.g.,
semantic roles) - Inference by graph matching
- Inference by abductive theorem proving
- A combined system
- Results and error analysis
21Abductive inference
- Idea
- Represent text and hypothesis as logical
formulae. - A hypothesis can be inferred from the text if and
only if the hypothesis logical formula can be
proved from the text logical formula. - Toy example
Prove?
Allow assumptions at various costs BMW(t) 2
gt car(t) bought(p, q, r) 1 gt purchased(p, q,
r)
22Abductive assumptions
- Assign costs to all assumptions of the form
- Build an assumption cost model
23Abductive theorem proving
- Each assumption provides a potential proof step.
- Find the proof with the minimum total cost
- Uniform cost search
- If there is a low-cost proof, the hypothesis is
entailed. - Example
- T John(A) BMW(B) bought(E, A, B) H John(x)
car(y) purchased(z, x, y) -
- Here is a possible proof by resolution
refutation (for the earlier costs) - 0 -John(x) -car(y) -purchased(z, x, y) Given
negation of hypothesis - 0 -car(y) -purchased(z, A, y) Unify with
John(A) - 2 -purchased(z, A, B) Unify with BMW(B)
- 3 NULL Unify with purchased(E, A, B)
- Proof cost 3
24Abductive theorem proving
- Can automatically learn good assumption costs
- Start from a labeled dataset (e.g. the PASCAL
development set) - Intuition Find assumptions that are used in the
proofs for TRUE examples, and lower their costs
(by framing a log-linear model). Iterate.
Details Raina et al., in submission
25Some interesting features
- Examples of handling complex constructions in
graph matching/abductive inference. - Antonyms/Negation High cost for matching verbs,
if they are antonyms or one is negated and the
other not. - T Stocks fell. H
Stocks rose. FALSE - T Clintons book was not a hit H Clintons
book was a hit. FALSE - Non-factive verbs
- T John was charged for doing X. H John
did X. FALSE - Can detect because doing in text has
non-factive charged as - a parent but did does not have such a parent.
26Some interesting features
- Superlative check
- T This is the tallest tower in western Japan.
- H This is the tallest tower in Japan. FALSE
27Outline of this talk
- Representation of sentences
- Syntax Parsing and post-processing
- Adding annotations on representation (e.g.,
semantic roles) - Inference by graph matching
- Inference by abductive theorem proving
- A combined system
- Results and error analysis
28Results
- Combine inference methods
- Each system produces a score.
- Separately normalize each systems score
variance. - Suppose normalized scores are s1 and s2.
- Final score S w1s1 w2s2
- Learn classifier weights w1 and w2 on the
development set using logistic regression. Two
submissions - Train one classifier weight for all RTE tasks.
(General) - Train different classifier weights for each RTE
task. (ByTask)
29Results
- Best other results Accuracy58.6, CWS0.617
- Balanced predictions. 55.4, 51.2 predicted
TRUE on test set.
30Results by task
31Partial coverage results
Task-specific optimization seems better!
ByTask
ByTask
General
- Can also draw coverage-CWS curves. For example
- at 50 coverage, CWS 0.781
- at 25 coverage, CWS 0.873
32Some interesting issues
- Phrase similarity
- away from the coast ? farther inland
- won victory in presidential election ? became
President - stocks get a lift ? stocks rise
- life threatening ? fatal
- Dictionary definitions
- believe there is only one God ? are monotheistic
- World knowledge
- K Club, venue of the Ryder Cup, ? K Club will
host the Ryder Cup
33Future directions
- Need more NLP components in there
- Better treatment of frequent nominalizations,
parenthesized material, etc. - Need much more ability to do inference
- Fine distinctions between meanings, and fine
similarities. - e.g., reach a higher level and rise
- We need a high-recall, reasonable precision
similarity measure! - Other resources (e.g., antonyms) are also very
sparse. - More task-specific optimization.
34