Title: DRT in der Praxis Johan Bos
1 DRT in der PraxisJohan Bos
Dipartimento di Informatica University of Rome
"La Sapienza
2The question
- Given what we know about DRT, both from a
theoretical and practical perspective, can we use
it for practical applications?
3Outline
- Wide coverage parsing with DRT
- Inference and DRT
- Recognising Textual Entailment
4Wide-coverage parsing
- What is meant by wide-coverage parsing?
- Rapid developments in statistical parsing the
last decades - Parsers trained on large annotated corpora, e.g.
Penn Tree Bank - Examples are parsers like those from Collins and
Charniak
5Wide-coverage parsing and DRT
- Say we wished to produce DRSs on the output of
these parsers - We would need quite detailed syntax derivations
- Closer inspection reveals that many of the
parsers use many several thousands phrase
structure rules - Long distance dependencies are not recovered
- Conclusion most of these parsers produced
syntactic analyses not suitable for systematic
semantic work
6The CCG parser
- This changed with
- the development of CCG bank and
- the implementation of a fast CCG parser
- CCG
- Combinatory Categorial Grammar
7Combinatory Categorial Grammar
- CCG is a lexicalised theory of grammar (Steedman
2001) - Deals with complex cases of coordination and
long-distance dependencies - Lexicalised, hence easy to implement
- English wide-coverage grammar
- Fast robust parser available
8Categorial Grammar
- Lexicalised theory of syntax
- Many different lexical categories
- Few grammar rules
- Finite set of categories defined over a base of
core categories - Core categories s np n pp
- Combined categories np/n s\np
(s\np)/np
9CCG type-driven lexicalised grammar
10CCG combinatorial rules
- Forward Application (FA)
- Backward Application (BA)
- Generalised Forward Composition (FC)
- Backward Crossed Composition (BC)
- Type Raising (TR)
- Coordination
11CCG derivation
- NP/Na Nspokesman S\NPlied
-
12CCG derivation
- NP/Na Nspokesman S\NPlied
-
13CCG derivation
- NP/Na Nspokesman S\NPlied
- ------------------------------- (FA)
-
14CCG derivation
- NP/Na Nspokesman S\NPlied
- ------------------------------- (FA)
- NP a spokesman
-
15CCG derivation
- NP/Na Nspokesman S\NPlied
- ------------------------------- (FA)
- NP a spokesman
- ----------------------------------------
(BA)
16CCG derivation
- NP/Na Nspokesman S\NPlied
- ------------------------------- (FA)
- NP a spokesman
- ----------------------------------------
(BA) - S a spokesman lied
17CCG derivation
- NP/Na Nspokesman S\NPlied
- ------------------------------- (FA)
- NP a spokesman
- ----------------------------------------
(BA) - S a spokesman lied
18Coordination in CCG
- npArtie (s\np)/nplikes (x\x)/xand
npTony (s\np)/nphates npbeans - ---------------- (TR)
---------------- (TR) - s/(s\np)Artie
s/(s\np)Tony - ------------------------------------ (FC)
--------------------------------------- (FC)
- s/np Artie likes
s/npTony hates -
--------------------------------------------------
----- (FA) -
(s/np)\(s/np)and Tony hates - ----------------------------------
-----------------------------------------------
(BA) -
s/np Artie likes and Tony hates -
----------------------------------------
-------------- (FA) -
s Artie likes and Tony hates
beans
19Combining CCG with DRT
- Use the Lambda Calculus to combine CCG with DRT
- Each lexical entry gets a DRS with lambda-bound
variables, representing the missing information - Each combinatorial rule in CCG gets a semantic
interpretation, again using the tools of the
lambda calculus
20Interpreting Combinatorial Rules
- Each combinatorial rule in CCG is expressed in
terms of the lambda calculus - Forward ApplicationFA(?,?) ?_at_?
- Backward ApplicationBA(?,?) ?_at_?
- Type RaisingTR(?) ?x.x_at_?
- Function CompositionFC(?,?) ?x.?_at_x_at_?
21CCG lexical semantics
22CCG derivation
- NP/Na Nspokesman
S\NPlied - ?p. ?q. p_at_xq_at_x ?z.
?x.x_at_?y.
23CCG derivation
- NP/Na Nspokesman
S\NPlied - ?p. ?q. p_at_xq_at_x ?z.
?x.x_at_?y. - ------------------------------------------------
(FA) - NP a spokesman
- ?p. ?q. p_at_xq_at_x_at_?z.
-
24CCG derivation
- NP/Na Nspokesman
S\NPlied - ?p. ?q. p_at_xq_at_x ?z.
?x.x_at_?y. - --------------------------------------------------
------ (FA) - NP a spokesman
- ?q. q_at_x
-
25CCG derivation
- NP/Na Nspokesman
S\NPlied - ?p. ?q. p_at_xq_at_x ?z.
?x.x_at_?y. - --------------------------------------------------
------ (FA) - NP a spokesman
- ?q. q_at_x
-
26CCG derivation
- NP/Na Nspokesman
S\NPlied - ?p. ?q. p_at_xq_at_x ?z.
?x.x_at_?y. - --------------------------------------------------
------ (FA) - NP a spokesman
- ?q. q_at_x
- ---------------------------------------
----------------------------------------- (BA) -
S a spokesman lied - ?x.x_at_?y.
_at_?q. q_at_x
27CCG derivation
- NP/Na Nspokesman
S\NPlied - ?p. ?q. p_at_xq_at_x ?z.
?x.x_at_?y. - --------------------------------------------------
------ (FA) - NP a spokesman
- ?q. q_at_x
- ---------------------------------------
----------------------------------------- (BA) -
S a spokesman lied - ?q.
q_at_x _at_ ?y.
28CCG derivation
- NP/Na Nspokesman
S\NPlied - ?p. ?q. p_at_xq_at_x ?z.
?x.x_at_?y. - --------------------------------------------------
------ (FA) - NP a spokesman
- ?q. q_at_x
- ---------------------------------------
----------------------------------------- (BA) -
S a spokesman lied -
29CCG derivation
- NP/Na Nspokesman
S\NPlied - ?p. ?q. p_at_xq_at_x ?z.
?x.x_at_?y. - --------------------------------------------------
------ (FA) - NP a spokesman
- ?q. q_at_x
- ---------------------------------------
----------------------------------------- (BA) -
S a spokesman lied -
30The Clark Curran Parser
- Use standard statistical techniques
- Robust wide-coverage parser
- Clark Curran (ACL 2004)
- Grammar derived from CCGbank
- 409 different categories
- Hockenmaier Steedman (ACL 2002)
- Results 96 coverage WSJ
- Bos et al. (COLING 2004)
31Example Output
- ExamplePierre Vinken, 61 years old, will join
the board as a nonexecutive director Nov. 29. Mr.
Vinken is chairman of Elsevier N.V., the Dutch
publishing group. - Semantic representation, DRT
- Complete Wall Street Journal
32Inference
- The problemGiven a semantic representation (DRS)
for a set of sentences, how can we perform
logical inferences with them? - ApproachTranslate DRS into first-order
logic,use off-the-shelf inference engines.
33Why First-Order Logic?
- Why not use higher-order logic?
- Better match with formal semantics
- But Undecidable/no fast provers available
- Why not use weaker logics?
- Modal/description logics (decidable fragments)
- But Cant cope with all of natural language
- Why use first-order logic?
- Undecidable, but good inference tools available
- DRS translation to first-order logic
34From DRS to FOL
35From DRS to FOL
?y (
)
36From DRS to FOL
?
?y(woman(y)
)
37From DRS to FOL
?
?y(woman(y) ?x (
))
38From DRS to FOL
?
?y(woman(y) ?x (man(x)
))
39From DRS to FOL
)))
?y(woman(y) ?x (man(x) ? ?e(
40From DRS to FOL
?y(woman(y) ?x (man(x) ? ?e(adore(e)
)))
41From DRS to FOL
?y(woman(y) ?x (man(x) ?
?e(adore(e) agent(e,x) theme(e,y) )))
42Inference
- Inference tasks
- Consistency checking
- Informativeness checking
- Inference tools (FOL)
- Theorem proving
- Model building
43Theorem proving
- Checks whether a set of first-order formulas is
valid or not
44Model building
- Tries to construct a model for a set of
first-order formulas - Finite model
- Builds models by iteration
45Consistency Checking
- Assume B is a DRS for a text ?
- Translate B to first-order formula ?
- Then
- If a theorem prover succeeds in finding a proof
for ??, then ? is inconsistent - If a model builder succeeds to construct a model
for ?, then ? is consistent
46Yin and Yang of Inference
- Theorem Proving and Model Building function as
opposite forces
47Applications
- Has been used for different kind of applications
- Question Answering
- Recognising Textual Entailment
48Recognising Textual Entailment
- A task for NLP systems to recognise entailment
between two (short) texts - Introduced in 2004/2005 as part of the PASCAL
Network of Excellence - Proved to be a difficult, but popular task
- Pascal provided a development and test set of
several hundred examples
49RTE Example (entailment)
RTE 1977 (TRUE)
His family has steadfastly denied the
charges. ----------------------------------------
------------- The charges were denied by his
family.
?
50RTE Example (no entailment)
RTE 2030 (FALSE)
Lyon is actually the gastronomical capital of
France. ------------------------------------------
----------- Lyon is the capital of France.
X
51RTE is hard, example 1
Example (TRUE)
The leaning tower is a building in Pisa. Pisa is
a town in Italy. ---------------------------------
-------------------- The leaning tower is a
building in Italy.
?
52RTE is hard, example 1
Example (FALSE)
The leaning tower is the highest building in
Pisa. Pisa is a town in Italy. -------------------
---------------------------------- The leaning
tower is the highest building in Italy.
X
53RTE is hard, example 2
Example (TRUE)
John is walking around. --------------------------
--------------------------- John is walking.
?
54RTE is hard, example 2
Example (FALSE)
John is farting around. --------------------------
--------------------------- John is farting.
X
55Aristotles Syllogisms
ARISTOTLE 1 (TRUE)
All men are mortal. Socrates is a
man. ------------------------------- Socrates is
mortal.
?
56Aristotles Syllogisms
ARISTOTLE 2 (FALSE)
All men are mortal. Socrates is not a
man. ------------------------------- Socrates is
mortal.
X
57How to deal with RTE
- There are several methods
- We will look at five of them to see how difficult
RTE actually is
58Recognising Textual Entailment
59Flipping a coin
- Advantages
- Easy to implement
- Disadvantages
- Just 50 accuracy
60Recognising Textual Entailment
- Method 2
- Calling a friend
61Calling a friend
- Advantages
- High accuracy (95)
- Disadvantages
- Lose friends
- High phonebill
62Recognising Textual Entailment
- Method 3
- Ask the audience
63Ask the audience
RTE 893 (????)
The first settlements on the site of Jakarta
wereestablished at the mouth of the Ciliwung,
perhapsas early as the 5th century
AD. ----------------------------------------------
------------------ The first settlements on the
site of Jakarta wereestablished as early as the
5th century AD.
64Human Upper Bound
RTE 893 (TRUE)
The first settlements on the site of Jakarta
wereestablished at the mouth of the Ciliwung,
perhapsas early as the 5th century
AD. ----------------------------------------------
------------------ The first settlements on the
site of Jakarta wereestablished as early as the
5th century AD.
?
65Recognising Textual Entailment
66Word Overlap Approaches
- Popular approach
- Ranging in sophistication from simple bag of word
to use of WordNet - Accuracy rates ca. 55
67Word Overlap
- Advantages
- Relatively straightforward algorithm
- Disadvantages
- Hardly better than flipping a coin
68RTE State-of-the-Art
- Pascal RTE challenge
- Hard problem
- Requires semantics
69Recognising Textual Entailment
- Method 5
- DRT and theorem proving
70Using Theorem Proving
- Given a textual entailment pair T/H with text T
and hypothesis H - Produce DRSs for T and H
- Translate these DRSs into FOL
- Give this to the theorem prover
- T ? H
-
- If the theorem prover finds a proof, then we
predict that T entails H
71Vampire (Riazanov Voronkov 2002)
- Lets try this. We will use the theorem prover
Vampire (currently the best known theorem prover
for FOL) - This gives us good results for
- apposition
- relative clauses
- coodination
- intersective adjectives/complements
- passive/active alternations
72Example (Vampire proof)
RTE-2 112 (TRUE)
On Friday evening, a car bomb exploded outside a
Shiite mosque in Iskandariyah, 30 miles south of
the capital. -------------------------------------
---------------- A bomb exploded outside a mosque.
?
73Example (Vampire proof)
RTE-2 489 (TRUE)
Initially, the Bundesbank opposed the
introduction of the euro but was compelled to
accept it in light of the political pressure of
the capitalist politicians who supportedits
introduction. ------------------------------------
----------------- The introduction of the euro
has been opposed.
?
74Background Knowledge
- However, it doesnt give us good results for
cases requiring additional knowledge - Lexical knowledge
- World knowledge
- We will use WordNet as a start to get additional
knowledge - All of WordNet is too much, so we create
MiniWordNets
75MiniWordNets
- MiniWordNets
- Use hyponym relations from WordNet to build an
ontology - Do this only for the relevant symbols
- Convert the ontology into first-order axioms
76MiniWordNet an example
- Example text
- There is no asbestos in our products now.
Neither Lorillard nor the researchers who studied
the workers were aware of any research on smokers
of the Kent cigarettes.
77MiniWordNet an example
- Example text
- There is no asbestos in our products now.
Neither Lorillard nor the researchers who studied
the workers were aware of any research on smokers
of the Kent cigarettes.
78(No Transcript)
79?x(user(x)?person(x)) ?x(worker(x)?person(x)) ?x(r
esearcher(x)?person(x))
80?x(person(x)??risk(x)) ?x(person(x)??cigarette(x))
.
81Using Background Knowledge
- Given a textual entailment pair T/H with text T
and hypothesis H - Produce DRS for T and H
- Translate drs(T) and drs(H) into FOL
- Create Background Knowledge for TH
- Give this to the theorem prover
- (BK T) ? H
-
82MiniWordNets at work
RTE 1952 (TRUE)
Crude oil prices soared to record
levels. ------------------------------------------
----------- Crude oil prices rise.
?
- Background Knowledge?x(soar(x)?rise(x))
83Troubles with theorem proving
- Theorem provers are extremely precise.
- They wont tell you when there is almost a
proof. - Even if there is a little background knowledge
missing, Vampire will say - NO
84Vampire no proof
RTE 1049 (TRUE)
Four Venezuelan firefighters who were traveling
to a training course in Texas were killed when
their sport utility vehicle drifted onto the
shoulder of a Highway and struck a parked
truck. -------------------------------------------
--------------------- Four firefighters were
killed in a car accident.
?
85Using Model Building
- Need a robust way of inference
- Use model builder Paradox
- Claessen Sorensson (2003)
- Use size of (minimal) model
- Compare size of model of T and TH
- If the difference is small, then it is likely
that T entails H
86Minimal Models
- Model builders normally generate models by
iteration over the domain size - As a side-effect, the output is a model with a
minimal domain size - From a linguistic point of view, this is
interesting, as there is no redundant information - Minimal in extensions
87Using Model Building
- Given a textual entailment pair T/H withtext T
and hypothesis H - Produce DRSs for T and H
- Translate these DRSs into FOL
- Generate Background Knowledge
- Give this to the Model Builder
- i) BK T
- ii) BK T H
-
- If the models for i) and ii) are similar in size,
then we predict that T entails H
88Example 1
- T John met Mary in RomeH John met Mary
- Model T 3 entitiesModel TH 3 entities
- Modelsize difference 0
- Prediction entailment
89Example 2
- T John met Mary H John met Mary in Rome
- Model T 2 entitiesModel TH 3 entities
- Modelsize difference 1
- Prediction no entailment
90Model size differences
- Of course this is a very rough approximation
- But it turns out to be a useful one
- Gives us a notion of robustness
- Negation
- Give not T and not T H to model builder
- Disjunction
- Not necessarily one unique minimal model
91How well does this work?
- We tried this at the RTE 2004/05
- Combined this with a shallow approach (word
overlap) - Using standard machine learning methods to build
a decision tree - Features used
- Proof (yes/no)
- Model size
- Model size difference
- Word Overlap
- Task (source of RTE pair)
92RTE Results 2004/5
Bos Markert 2005
93Lack of Background Knowledge
RTE-2 235 (TRUE)
Indonesia says the oil blocks are within its
borders, as does Malaysia, which has also sent
warships to the area, claiming that its waters
and airspace have been violated. ----------------
----------------------------------------------- Th
ere is a territorial waters dispute.
?
94Conclusions
- Nowadays computational semantics is able to
handle some difficult problems - DRT is not just a theory. It is a complete
architecture allowing us to experiment with
computational semantics - State-of-the-art inference engines can help to
study or apply semantics - Appropriate background knowledge is often the
deciding factor for success