Title: Systematic Mismatches Across Annotations
1Systematic Mismatches Across Annotations
- Alan Lee and Aravind Joshi
- Institute for Research in Cognitive Science
Department of Computer and Information Science,
University of Pennsylvania - ULA Workshop,
- U of Colorado, Boulder
- March 2008
2Preliminaries
- We observe that certain annotated features of the
Penn Discourse Treebank 2.0 (PDTB) do not match
up neatly with annotations at the syntactic
level. - What do certain mismatches suggest for linguistic
theory? How do we get from syntax to discourse? - How does this affect NLP applications?
3Outline
- Attribution spans
- Parallel Connectives
- AltLex
- Polarity and Determinacy
4Outline
- Attribution spans
- Parallel Connectives
- AltLex
- Polarity and Determinacy
5Attribution Spans
- Relation between agents and abstract objects
(discourse relations or their arguments) - Annotation Text Spans and Four features
(source, type, - polarity, determinacy). More on the features
later.
6- There have been no orders for the Cray-3 so far,
though the company says it is talking with
several prospects. - Discourse semantics contrary-to-expectation
relation between there being no orders for the
Cray-3 and there being a possibility of some
prospects. - Sentence semantics contrary-to-expectation
relation between there being no orders for the
Cray-3 and the company saying something.
7- Although takeover experts said they doubted Mr.
Steinberg will make a bid by himself, the
application by his Reliance Group Holdings Inc.
could signal his interest in helping revive a
failed labor-management bid. - Discourse semantics contrary-to-expectation
relation between Mr. Steinberg not making a bid
by himself and the RGH application signaling
his bidding interest. - Sentence semantics contrary-to-expectation
relation between experts saying something and
the RGH application signaling Mr. Steinbergs
bidding interest.
8- Mismatches occur with other relations as well,
such as causal relations - Investors are nervous about the issue because
they say the company's ability to meet debt
payments is dependent on too many variables,
including the sale of assets and the need to
mortgage property to retire some existing debt. - Discourse semantics causal relation between
investors being nervous and problems with the
companys ability to meet debt payments - Sentence semantics causal relation between
investors being nervous and investors saying
something!
9How to address mismatch?
- One possibility - treat attribution as a
different layer of structure in discourse. (and
also in syntax?) - This has the effect of reducing the complexity of
the discourse structure.
10Discourse Graphbank (Wolf Gibson 2005)
- Farm prices in October edged up 0.7 from
September - as raw milk prices continued their rise,
- the Agriculture Department said.
- Milk sold to the nation's dairy plants and
dealers averaged 14.50 for each hundred pounds, - up 50 cents from September and up 1.50 from
October 1988, - the department said.
11sim
elab
attr
attr
1-2
4-5
1
2
3
4
5
6
ce
elab
ce - cause/effect elab - elaboration sim -
similiarity attr - atribution
12elab
1-2
4-5
1
2
3,attr
4
5
6,attr
ce
elab
ce - cause/effect elab - elaboration sim
- similiarity attr - atribution
13Residual issues
Does attribution scope over the entire relation,
or just Arg1?
Even if B.A.T receives approval for the
restructuring, the company will remain in play,
say shareholders and analysts, though the
situation may unfold over the next 12 months,
rather than six.
Arg1 attributed to shareholders and
analysts Rel and Arg2 attributed to Writer
Guideline in case of doubt, attribute to the
Writer
14Residual issues
- Attribution cannot always be excluded by default
- Advocates said the 90-cent-an-hour rise, to 4.25
an hour by April 1991, is too small for the
working poor, while opponents argued that the
increase will still hurt small business and cost
many thousands of jobs.
What implications does this have for the approach
of treating attribution as an independent layer
of discourse?
15Outline
- Attribution spans
- Parallel Connectives
- AltLex
- Polarity and Determinacy
16Parallel Connectives
- Either he wasnt being real in the past or he
isnt being real right now. (1549) - Youve either got a chair or you dont. (2428)
- If the answers to these questions are
affirmative, then these institutional investors
are - likely to be favorably disposed toward a specific
poison pill. (0275) - Parallel connectives are annotated
discontinuously - In the PDTB, both parts of a parallel connective
are treated as equally prominent (no hierarchical
relationship)
17In Penn Treebank, the treatment of a parallel
connective depends on its position within
sentence. When Either is sentence-initial,
both either and or are annotated as CC.
- Either he wasnt being real in the past or he
isnt being real right now. (wsj_1549)
S
S
S
CC
CC
Either
he wasnt being real in the past
or
he isnt being real right now
18This is not possible when either is
sentence-medial. Here, either is treated as an
RB and or is as a CC.
- Youve either got a chair or you dont.
(wsj_2428)
S
S
S
CC
or
NP-SBJ
VP
ADVP
VP
you dont
RB
You
ve
got a chair
either
19- How to represent parallel connective?
- DL-TAG approach elementary discourse tree with
- two lexical anchors (DC discourse clause)
DC
DC
Either
DC?
or
DC?
because
DC?
DC?
But question remains how to transition from
syntactic structure to discourse structure?
20Outline
- Attribution spans
- Parallel Connectives
- AltLex
- Polarity and Determinacy
21Alternative Lexicalization(AltLex)
- A discourse relation is inferred between two
sentences which do not contain an Explicit
connective, but insertion of an Implicit
connective leads to redundancy. This is because
the relation is alternatively lexicalized by some
non-connective expression - Under a post-1987 crash reform, the Chicago
Mercantile Exchange wouldnt permit the December
SP futures to fall further than 12 points for a
half hour. AltLex (consequence) That caused a
brief period of panic seeling of stocks on the
Big Board.
22Discourse Connectives and Syntactic Constituency
- Most explicit connectives correspond to syntactic
constituencies. E.g. (because IN, but CC,
as a result PP, etc.) - Some small exceptions with parallel connectives,
as we have seen.
23- AltLex expressions often do not correspond to
syntactic constituencies. - Under a post-1987 crash reform, the Chicago
Mercantile Exchange wouldnt permit the December
SP futures to fall further than 12 points for a
half hour. AltLex (consequence) That caused a
brief period of panic selling of stocks on the
Big Board. -
S
NP-SBJ
VP
VBD
DT
DT
PP-LOC
That
caused
a brief period
of panic selling..
24- For a list of AltLex expressions annotated in
- the PDTB
- http//www.seas.upenn.edu/pdtb/altlex-strings.txt
- Or search using PDTB Browser (shameless
- plug)
- http//www.seas.upenn.edu/pdtb/PDTBAPI/pdtbbrowse
r.jnlp
25Outline
- Attribution spans
- Parallel Connectives
- AltLex
- Polarity and Determinacy
26Attribution Features
- Attribution is annotated on relations and
arguments, with FOUR - Features.
- Source encodes the different agents to whom
proposition is attributed - Wr Writer agent
- Ot Other non-writer agent
- Arb Generic/Atbitrary non-writer agent
- Inh Used only for arguments attribution
inherited from relation - Type encodes different types of Abstract Objects
- Comm Verbs of communication
- PAtt Verbs of propositional attitude
- Ftv Factive verbs
- Ctrl Control verbs
- Null Used only for arguments with no explicit
attribution
27Polarity vs Determinacy
- Polarity Indicates narrow scope of surface
negated attributions. - (Neg-raising, Klima 1964). Marked as Neg when
neg-raising - occurs. Null otherwise.
- John doesnt think the book fell ( John thinks
the - book didnt fall)
- Determinacy Attributions rendered indeterminate
in certain - contexts. Marked as Indet, or Null otherwise.
-
- John didnt say the book fell ( no lowering of
negation) - Only a certain class of verbs can have negative
polarity, - i.e. induce neg-raising. Verbs of Propositional
Attitude (PAtt) - have this behavior, but not others.
28Polarity vs Determinacy
-
- I dont believe they have the culture to
adequately service high-net-worth individuals.
(0927) -
- Discourse semantics
- I believe they DO NOT have the culture to
adequately service high-net-worth individuals.
(0927) - Negation of expect is lowered onto the
argument. The attribution is marked as negative
polarity. - Note that the attribution event of expecting
did occur (is determinate).
29Polarity vs Determinacy
-
- It didnt say if its earlier results were
influenced significantly by nonrecurring
elements. (1711) -
- Negation of say is NOT lowered onto the
argument. The attribution is marked as
indeterminate. - The attribution event (of saying) did not
actually occur. -
30 At Syntactic Level
- At which level should discrepancy in the
polarity vs determinacy type of - negation be captured?
- - In PropBank, negations of attribution verbs are
uniformly marked as a - negative feature for the adjunct feature ARGM.
- - In TimeML, they contain a polarity feature of
Neg. - I dont BELIEVE they have the culture to
adequately service high-net-worth individuals. - ARG1 I
- ARG2 they have the culture
- ARGM Neg (PropBank) No Neg for lower predicate
have - POLARITY Neg (TimeML)
- Should the negation be marked as ARGM for the
lower predicate - (have) instead?
31At Syntactic Level
- It didnt SAY if its earlier results were
influenced significantly - by nonrecurring elements.
- ARG1 It
- ARG2 if its earlier results were influenced
significantly - by nonrecurring elements
- ARGM Neg (PropBank)
- POLARITY Neg (TimeML)
- Saying event is indeterminate. Does this still
count as an event? - How to order this temporally?
32Some questions
- How much of discourse is projected from syntax?
- Is there a need for a different architecture,
different building blocks? - How are these issues manifested
cross-linguistically? Currently, discourse
annotation work being done for Hindi, Turkish,
Czech and Finnish (possibly).