Title: Working with Discourse Representation Theory Patrick Blackburn
1Working with Discourse Representation
TheoryPatrick Blackburn Johan Bos Lecture
5Applying DRT
2Today
- Given what we know about DRT, both from a
theoretical and practical perspective, can we use
it for practical applications?
3Outline
- Spoken dialogue system with DRT
- Using DRT and inference to control a mobile
robot - Wide coverage parsing with DRT
- Recognising Textual Entailment
4Human-Computer Dialogue
- Focus on small domains
- Grammatical coverage ensured
- Background knowledge encoding
- Spoken Dialogue system
- Godot the robot
- Speech recognition and synthesis
- People could give Godot directions, ask it
questions, tell it new information - Godot was a REAL robot
5Godot the Robot
Godot with Tetsushi Oka
6Typical conversation with Godot
- Simple dialogues in beginning
- Human Robot?
- Robot Yes?
- Human Where are you?
- Robot I am in the hallway.
- Human OK. Go to the rest room!
7Adding DRT to the robot
- The language model that the robot used for speech
recognition returned DRSs - We used the model builder MACE and the theorem
prover SPASS for inference - The model produced by MACE was used to find out
what the robot should do - This was possible as MACE produces minimal models
- Of course we also checked for consistency and
informativity
8Advanced conversation with Godot
- Dealing with inconsistency and informativity
- Human Robot, where are you?
- Robot I am in the hallway.
- Human You are in my office.
- Robot No, that is not true.
- Human You are in the hallway.
- Robot Yes I know!
9Videos of Godot
Video 1 Godot in the basement of Bucceuch Place
Video 2 Screenshot of dialogue manager with
DRSs and camera view of Godot
10Minimal Models
- Model builders normally generate models by
iteration over the domain size - As a side-effect, the output is a model with a
minimal domain size - From a linguistic point of view, this is
interesting, as there is no redundant information
11Using models
- ExamplesTurn on a light.Turn on every
light.Turn on everything except the radio. Turn
off the red light or the blue light.Turn on
another light.
12Adding presupposition
- Godot was connected to an automated home
environment - One day, I asked Godot to switch on all the
lights - However, Godot refused to do this, responding
that it was unable to do so. - Why was that?
- At first I thought that the theorem prover made a
mistake. - But it turned out that one of the lights was
already on.
13Intermediate Accommodation
- Because I had coded to switch on X having a
precondition that X is not on, the theorem prover
found a proof. - Coding this as a presupposition, would not give
an inconsistency, but a beautiful case of
intermediate accommodation. - In other words
- Switch on all the lights!? All lights are off
switch them on.Switch on all the lights that
are currently off
14Sketch of resolution
15Global Accommodation
16Intermediate Accommodation
17Local Accommodation
18Outline
- Spoken dialogue system with DRT
- Using DRT and inference to control a mobile
robot - Wide coverage parsing with DRT
- Recognising Textual Entailment
19Wide-coverage DRT
- Nowadays we have robust wide-coverage parsers
that use stochastic methods for producing a parse
tree - Trained on Penn Tree Bank
- Examples are parsers like those from Collins and
Charniak
20Wide-coverage parsers
- Say we wished to produce DRSs on the output of
these parsers - We would need quite detailed syntax derivations
- Closer inspection reveals that many of the
parsers use many several thousands phrase
structure rules - Often, long distance dependencies are not
recovered
21Combinatory Categorial Grammar
- CCG is a lexicalised theory of grammar (Steedman
2001) - Deals with complex cases of coordination and
long-distance dependencies - Lexicalised, hence easy to implement
- English wide-coverage grammar
- Fast robust parser available
22Categorial Grammar
- Lexicalised theory of syntax
- Many different lexical categories
- Few grammar rules
- Finite set of categories defined over a base of
core categories - Core categories s np n pp
- Combined categories np/n s\np
(s\np)/np
23CCG type-driven lexicalised grammar
24CCG combinatorial rules
- Forward Application (FA)
- Backward Application (BA)
- Generalised Forward Composition (FC)
- Backward Crossed Composition (BC)
- Type Raising (TR)
- Coordination
25CCG derivation
- NP/Na Nspokesman S\NPlied
-
26CCG derivation
- NP/Na Nspokesman S\NPlied
-
27CCG derivation
- NP/Na Nspokesman S\NPlied
- ------------------------------- (FA)
-
28CCG derivation
- NP/Na Nspokesman S\NPlied
- ------------------------------- (FA)
- NP a spokesman
-
29CCG derivation
- NP/Na Nspokesman S\NPlied
- ------------------------------- (FA)
- NP a spokesman
- ----------------------------------------
(BA)
30CCG derivation
- NP/Na Nspokesman S\NPlied
- ------------------------------- (FA)
- NP a spokesman
- ----------------------------------------
(BA) - S a spokesman lied
31CCG derivation
- NP/Na Nspokesman S\NPlied
- ------------------------------- (FA)
- NP a spokesman
- ----------------------------------------
(BA) - S a spokesman lied
32Coordination in CCG
- npArtie (s\np)/nplikes (x\x)/xand
npTony (s\np)/nphates npbeans - ---------------- (TR)
---------------- (TR) - s/(s\np)Artie
s/(s\np)Tony - ------------------------------------ (FC)
--------------------------------------- (FC)
- s/np Artie likes
s/npTony hates -
--------------------------------------------------
----- (FA) -
(s/np)\(s/np)and Tony hates - ----------------------------------
-----------------------------------------------
(BA) -
s/np Artie likes and Tony hates -
----------------------------------------
-------------- (FA) -
s Artie likes and Tony hates
beans
33The Glue
- Use the Lambda Calculus to combine CCG with DRT
- Each lexical entry gets a DRS with lambda-bound
variables, representing the missing information - Each combinatorial rule in CCG gets a semantic
interpretation, again using the tools of the
lambda calculus
34Interpreting Combinatorial Rules
- Each combinatorial rule in CCG is expressed in
terms of the lambda calculus - Forward ApplicationFA(?,?) ?_at_?
- Backward ApplicationBA(?,?) ?_at_?
- Type RaisingTR(?) ?x.x_at_?
- Function CompositionFC(?,?) ?x.?_at_x_at_?
35CCG lexical semantics
36CCG derivation
- NP/Na Nspokesman
S\NPlied - ?p. ?q. p_at_xq_at_x ?z.
?x.x_at_?y.
37CCG derivation
- NP/Na Nspokesman
S\NPlied - ?p. ?q. p_at_xq_at_x ?z.
?x.x_at_?y. - ------------------------------------------------
(FA) - NP a spokesman
- ?p. ?q. p_at_xq_at_x_at_?z.
-
38CCG derivation
- NP/Na Nspokesman
S\NPlied - ?p. ?q. p_at_xq_at_x ?z.
?x.x_at_?y. - --------------------------------------------------
------ (FA) - NP a spokesman
- ?q. q_at_x
-
39CCG derivation
- NP/Na Nspokesman
S\NPlied - ?p. ?q. p_at_xq_at_x ?z.
?x.x_at_?y. - --------------------------------------------------
------ (FA) - NP a spokesman
- ?q. q_at_x
-
40CCG derivation
- NP/Na Nspokesman
S\NPlied - ?p. ?q. p_at_xq_at_x ?z.
?x.x_at_?y. - --------------------------------------------------
------ (FA) - NP a spokesman
- ?q. q_at_x
- ---------------------------------------
----------------------------------------- (BA) -
S a spokesman lied - ?x.x_at_?y.
_at_?q. q_at_x
41CCG derivation
- NP/Na Nspokesman
S\NPlied - ?p. ?q. p_at_xq_at_x ?z.
?x.x_at_?y. - --------------------------------------------------
------ (FA) - NP a spokesman
- ?q. q_at_x
- ---------------------------------------
----------------------------------------- (BA) -
S a spokesman lied - ?q.
q_at_x _at_ ?y.
42CCG derivation
- NP/Na Nspokesman
S\NPlied - ?p. ?q. p_at_xq_at_x ?z.
?x.x_at_?y. - --------------------------------------------------
------ (FA) - NP a spokesman
- ?q. q_at_x
- ---------------------------------------
----------------------------------------- (BA) -
S a spokesman lied -
43CCG derivation
- NP/Na Nspokesman
S\NPlied - ?p. ?q. p_at_xq_at_x ?z.
?x.x_at_?y. - --------------------------------------------------
------ (FA) - NP a spokesman
- ?q. q_at_x
- ---------------------------------------
----------------------------------------- (BA) -
S a spokesman lied -
44The Clark Curran Parser
- Use standard statistical techniques
- Robust wide-coverage parser
- Clark Curran (ACL 2004)
- Grammar derived from CCGbank
- 409 different categories
- Hockenmaier Steedman (ACL 2002)
- Results 96 coverage WSJ
- Bos et al. (COLING 2004)
- Example output
45Applications
- Has been used for different kind of applications
- Question Answering
- Recognising Textual Entailment
46Recognising Textual Entailment
- A task for NLP systems to recognise entailment
between two (short) texts - Introduced in 2004/2005 as part of the PASCAL
Network of Excellence - Proved to be a difficult, but popular task.
- Pascal provided a development and test set of
several hundred examples
47RTE Example (entailment)
RTE 1977 (TRUE)
His family has steadfastly denied the
charges. ----------------------------------------
------------- The charges were denied by his
family.
48RTE Example (no entailment)
RTE 2030 (FALSE)
Lyon is actually the gastronomical capital of
France. ------------------------------------------
----------- Lyon is the capital of France.
49Aristotles Syllogisms
ARISTOTLE 1 (TRUE)
All men are mortal. Socrates is a
man. ------------------------------- Socrates is
mortal.
50How to deal with RTE
- There are several methods
- We will look at five of them to see how difficult
RTE actually is
51Recognising Textual Entailment
52Flipping a coin
- Advantages
- Easy to implement
- Disadvantages
- Just 50 accuracy
53Recognising Textual Entailment
- Method 2
- Calling a friend
54Calling a friend
- Advantages
- High accuracy (95)
- Disadvantages
- Lose friends
- High phonebill
55Recognising Textual Entailment
- Method 3
- Ask the audience
56Ask the audience
RTE 893 (????)
The first settlements on the site of Jakarta
wereestablished at the mouth of the Ciliwung,
perhapsas early as the 5th century
AD. ----------------------------------------------
------------------ The first settlements on the
site of Jakarta wereestablished as early as the
5th century AD.
57Human Upper Bound
RTE 893 (TRUE)
The first settlements on the site of Jakarta
wereestablished at the mouth of the Ciliwung,
perhapsas early as the 5th century
AD. ----------------------------------------------
------------------ The first settlements on the
site of Jakarta wereestablished as early as the
5th century AD.
58Recognising Textual Entailment
59Word Overlap Approaches
- Popular approach
- Ranging in sophistication from simple bag of word
to use of WordNet - Accuracy rates ca. 55
60Word Overlap
- Advantages
- Relatively straightforward algorithm
- Disadvantages
- Hardly better than flipping a coin
61RTE State-of-the-Art
- Pascal RTE challenge
- Hard problem
- Requires semantics
62Recognising Textual Entailment
63Inference
- How do we perform inference with DRSs?
- Translate DRS into first-order logic,use
off-the-shelf inference engines. - What kind of inference engines?
- Theorem Provers
- Model Builders
64Using Theorem Proving
- Given a textual entailment pair T/H with text T
and hypothesis H - Produce DRSs for T and H
- Translate these DRSs into FOL
- Give this to the theorem prover
- T ? H
-
- If the theorem prover finds a proof, then T
entails H
65Vampire (Riazanov Voronkov 2002)
- Lets try this. We will use the theorem prover
Vampire (currently the best known theorem prover
for FOL) - This gives us good results for
- apposition
- relative clauses
- coodination
- intersective adjectives/complements
- passive/active alternations
66Example (Vampire proof)
RTE-2 112 (TRUE)
On Friday evening, a car bomb exploded outside a
Shiite mosque in Iskandariyah, 30 miles south of
the capital. -------------------------------------
---------------- A bomb exploded outside a mosque.
67Example (Vampire proof)
RTE-2 489 (TRUE)
Initially, the Bundesbank opposed the
introduction of the euro but was compelled to
accept it in light of the political pressure of
the capitalist politicians who supportedits
introduction. ------------------------------------
----------------- The introduction of the euro
has been opposed.
68Background Knowledge
- However, it doesnt give us good results for
cases requiring additional knowledge - Lexical knowledge
- World knowledge
- We will use WordNet as a start to get additional
knowledge - All of WordNet is too much, so we create
MiniWordNets
69MiniWordNets
- MiniWordNets
- Use hyponym relations from WordNet to build an
ontology - Do this only for the relevant symbols
- Convert the ontology into first-order axioms
70MiniWordNet an example
- Example text
- There is no asbestos in our products now.
Neither Lorillard nor the researchers who studied
the workers were aware of any research on smokers
of the Kent cigarettes.
71MiniWordNet an example
- Example text
- There is no asbestos in our products now.
Neither Lorillard nor the researchers who studied
the workers were aware of any research on smokers
of the Kent cigarettes.
72(No Transcript)
73?x(user(x)?person(x)) ?x(worker(x)?person(x)) ?x(r
esearcher(x)?person(x))
74?x(person(x)??risk(x)) ?x(person(x)??cigarette(x))
.
75Using Background Knowledge
- Given a textual entailment pair T/H with text T
and hypothesis H - Produce DRS for T and H
- Translate drs(T) and drs(H) into FOL
- Create Background Knowledge for TH
- Give this to the theorem prover
- (BK T) ? H
-
76MiniWordNets at work
RTE 1952 (TRUE)
Crude oil prices soared to record
levels. ------------------------------------------
----------- Crude oil prices rise.
- Background Knowledge?x(soar(x)?rise(x))
77Troubles with theorem proving
- Theorem provers are extremely precise.
- They wont tell you when there is almost a
proof. - Even if there is a little background knowledge
missing, Vampire will say - NO
78Vampire no proof
RTE 1049 (TRUE)
Four Venezuelan firefighters who were traveling
to a training course in Texas were killed when
their sport utility vehicle drifted onto the
shoulder of a Highway and struck a parked
truck. -------------------------------------------
--------------------- Four firefighters were
killed in a car accident.
79Using Model Building
- Need a robust way of inference
- Use model builder Paradox
- Claessen Sorensson (2003)
- Use size of (minimal) model
- Compare size of model of T and TH
- If the difference is small, then it is likely
that T entails H
80Using Model Building
- Given a textual entailment pair T/H withtext T
and hypothesis H - Produce DRSs for T and H
- Translate these DRSs into FOL
- Generate Background Knowledge
- Give this to the Model Builder
- i) BK T
- ii) BK T H
-
- If the models for i) and ii) are similar in size,
then we predict that T entails H
81Example 1
- T John met Mary in RomeH John met Mary
- Model T 3 entitiesModel TH 3 entities
- Modelsize difference 0
- Prediction entailment
82Example 2
- T John met Mary H John met Mary in Rome
- Model T 2 entitiesModel TH 3 entities
- Modelsize difference 1
- Prediction no entailment
83Model size differences
- Of course this is a very rough approximation
- But it turns out to be a useful one
- Gives us a notion of robustness
- Of course we need to deal with negation as well
- Give not T and not T H to model builder
- Not necessarily one unique minimal model
84Lack of Background Knowledge
RTE-2 235 (TRUE)
Indonesia says the oil blocks are within its
borders, as does Malaysia, which has also sent
warships to the area, claiming that its waters
and airspace have been violated. ----------------
----------------------------------------------- Th
ere is a territorial waters dispute.
85How well does this work?
- We tried this at the RTE 2004/05
- Combined this with a shallow approach (word
overlap) - Using standard machine learning methods to build
a decision tree - Features used
- Proof (yes/no)
- Model size
- Model size difference
- Word Overlap
- Task (source of RTE pair)
86RTE Results 2004/5
Bos Markert 2005
87Conclusions
- We have got the tools for doing computational
semantics in a principled way using DRT - For many applications, success depends on the
ability to systematically generate background
knowledge - Small restricted domains dialogue
- Open domain
88What we did in this course
- We introduced DRT, a notational variant of
first-order logic. - Semantically, we can handle in DRT anything we
can in FOL, including events. - Moreover, because it is so close to FOL, we can
use first-order methods to implement inference
for DRT. - The DRT box syntax, is essentially about nesting
contexts, which allows a uniform treatment of
anaphoric phenomena. - Moreover, this works not only on the theoretical
level, but is also implementable, and even
applicable.
89What we hope you got out of it
- First, we hope we made you aware that nowadays
computational semantics is able to handle some
difficult problems. - Second, we hope we made you aware that DRT is not
just a theory. It is a complete architecture
allowing us to experiment with computational
semantics. - Third, we hope you are aware that
state-of-the-art inference engines can help to
study or apply semantics.
90Where you can find more
- For more on DRT read the standard textbook
devoted to DRT by Kamp and Reyle. This book
discusses not only the basic theory, but also
plurals, tense, and aspect.
91Where you can find more
- For more on the basic architecture underlying
this work on computational semantics, and
particular on implementations on the lambda
calculus, and parallel use of theorem provers and
model builders, see - www.blackburnbos.org
92Where you can find more
- All of the theory we discussed in this course is
implemented in Prolog. This software can be
downloaded from www.blackburnbos.org. For an
introduction to Prolog written very much with
this software in mind, try Learn Prolog Now!
www.learnprolognow.org