Title: Extreme underspecification
1Extreme underspecification
- Using semantics to integrate deep and shallow
processing
2Acknowledgements
- Alex Lascarides, Ted Briscoe, Simone Teufel, Dan
Flickinger, Stephan Oepen, John Carroll, Anna
Ritchie, Ben Waldron - Deep Thought project members
- Cambridge Masters students
- Other colleagues at Cambridge, Saarbrücken,
Edinburgh, Brighton, Sussex and Oxford
3Talk overview
- Why integrate deep and shallow processing?
- and why use compositional semantics?
- Semantics from shallow processing
- Flattening deep semantics
- Underspecification
- Minimal semantic units
- Composition without lambdas
- Integration experiments with broad-coverage
systems/grammars (LinGO ERG and RASP) - How does this fit with deeper semantics?
4Deep processing
- Detailed, linguistically-motivated, e.g.,
HPSG, LFG, TAG, varieties of CG - Precise detailed compositional semantics
possible generation as well as parsing - Some are broad coverage and fast enough for real
time applications - BUT not robust (coverage gaps, ill-formed
input), too slow for IE etc, massive ambiguity
5Shallow (and intermediate) processing
- Shallow e.g. POS tagging, NP chunking
- Intermediate e.g., grammars with only a POS tag
lexicon (RASP) - Fast robust integrated stochastic techniques
for disambiguation - BUT no long-distance dependencies, allow
ungrammatical input (so limitations for
generation), no conventional semantics without
subcategorization
6Why integrate deep and shallow processing?
- Complementary strengths and weaknesses
- Weaknesses of each are inherent more complexity
means larger search space, greater information
requirement - hand-coding vs machine learning is not the main
issue treebanking costs, sparse data problems - Lexicon is the crucial resource difference
between deep and shallow approaches
7Applications that may benefit from integrated
approaches
- Summarization
- shallow parsing to identify possible key
passages, deep processing to check and combine - Email response
- deep parser uses shallow parsing for
disambiguation, back off when parse failure - Information extraction
- shallow first (as summarization), named entities
- Question answering
- deep parse questions, shallow parse answers
8Compositional semantics as the common
representation
- Need a common representation language pairwise
compatibility between systems is too limiting - Syntax is theory-specific
- Eventual goal should be semantics
- Crucial idea shallow processing gives
underspecified semantic representation
9Shallow processing and underspecified semantics
- Integrated parsing shallow parsed phrases
incorporated into deep parsed structures - Deep parsing invoked incrementally in response to
information needs - Reuse of knowledge sources
- domain knowledge, recognition of named entities,
transfer rules in MT - Integrated generation
- Formal properties clearer, representations more
generally usable
10Semantics from POS tagging
- every_AT1 cat_NN1 chase_VVD some_AT1 dog_NN1
- _every_q(x1), _cat_n(x2sg), _chase_v(epast),
_some_q(x3), _dog_n(x4sg) - Tag lexicon
- AT1 _lemma_q(x)
- NN1 _lemma_n(xsg)
- VVD _lemma_v(epast)
11Deep parser output
- Conventional semantic representation
- Every dog chased some cat
- every(x,cat(xsg),some(ysg,dog1(ysg),chase(esp,xsg,
ysg))) - some(ysg,dog1(ysg),every(xsg,cat(xsg),chase(esp,xs
g,ysg))) - Compositional reflects morphology and syntax
- Scope ambiguity
12Modifying syntax of deep grammar semantics
overview
- Underspecification of quantifier scope in this
talk, using Minimal Recursion Semantics (MRS) - Robust MRS
- Separating relations
- Explicit equalities
- Conventions for predicate names and sense
distinctions - Hierarchy of sorts on variables
13Scope underspecification
- Standard logical forms can be represented as
trees - Underspecified logical forms are partial trees
(or descriptions of sets of trees) - Constraints on scope control how trees may be
reconstructed
14Logical forms
- Generalized quantifier notation
- every(x,cat(xsg),some(ysg,dog1(ysg),chase(esp,xsg,
ysg))) - forall x cat(x) implies exists y dog1(y) and
chase(e,x,y) - some(ysg,dog1(ysg),every(xsg,cat(xsg),chase(esp,xs
g,ysg))) - exists y dog1(y) and forall x cat(x) implies
chase(e,x,y) - Event variables e.g., chase(e,x,y)
15PC trees
every
some
x
y
cat
some
dog1
every
x
y
y
chase
x
chase
dog1
cat
x
y
e
y
x
x
y
e
Every cat chased some dog
16PC trees share structure
every
some
x
y
cat
some
dog1
every
x
y
y
chase
x
chase
dog1
cat
x
y
e
y
x
x
y
e
17Bits of trees
Reconstruction conditions tree-ness variable
binding
chase
x
y
e
18Label nodes and holes
h0
lb4some
Valid solutions equate holes and labels
y
lb5dog1
h7
lb1every
y
x
lb2cat
h6
h0 hole corresponding to the top of the tree
x
lb3chase
x
y
e
19Maximize splitting
h0
lb4some
Constraints h8lb5 h9lb2
h8
y
h7
lb1every
x
h6
h9
lb2cat
lb3chase
lb5dog1
y
x
20Notation for underspecified scope
lb1every(x,h9,h6) lb2cat(x) lb5dog1(y) lb4some
(y,h8,h7) lb3chase(e,x,y)
top h0 h9lb2 h8lb5
MRS actually uses h9 qeq lb2 h8 qeq lb5
21Extreme underspecification
- Splitting up predicate argument structure
- Explicit equalities
- Hierarchies for predicates and sorts
- Goal is to split up semantic representation into
minimal components
22Separating arguments
- lb1every(x,h9,h6), lb2cat(x), lb5dog1(y),
lb4some(y,h8,h7), lb3chase(e,x,y),
h9lb2,h8lb5 - goes to
- lb1every(x), RSTR(lb1,h9), BODY(lb1,h6),
lb2cat(x), lb5dog1(y), lb4some(y),
RSTR(lb4,h8), BODY(lb4,h7), lb3chase(e),ARG1(lb3,
x),ARG2(lb3,y), h9lb2,h8lb5
23Explicit equalities
- lb1every(x1), RSTR(lb1,h9), BODY(lb1,h6),
- lb2cat(x2),
- lb5dog1(x4),
- lb4some(x3), RSTR(lb4,h8), BODY(lb4,h7),
- lb3chase(e),ARG1(lb3,x2),ARG2(lb3,x4),
- h9lb2,h8lb5,x1x2,x3x4
24Naming conventions
- lb1_every_q(x1sg),RSTR(lb1,h9),BODY(lb1,h6),
- lb2_cat_n(x2sg),
- lb5_dog_n_1(x4sg),
- lb4_some_q(x3sg),RSTR(lb4,h8),BODY(lb4,h7),
- lb3_chase_v(esp),ARG1(lb3,x2sg),ARG2(lb3,x4sg)
- h9lb2,h8lb5, x1sgx2sg,x3sgx4sg
25POS output as underspecification
- DEEP
- lb1_every_q(x1sg), RSTR(lb1,h9), BODY(lb1,h6),
lb2_cat_n(x2sg), lb5_dog_n_1(x4sg),
lb4_some_q(x3sg), RSTR(lb4,h8),
BODY(lb4,h7),lb3_chase_v(esp),
ARG1(lb3,x2sg),ARG2(lb3,x4sg), h9lb2,h8lb5,
x1sgx2sg,x3sgx4sg - POS
- lb1_every_q(x1), lb2_cat_n(x2sg),
lb3_chase_v(epast), lb4_some_q(x3),
lb5_dog_n(x4sg) (as previous slide but added
labels)
26POS output as underspecification
- DEEP
- lb1_every_q(x1sg), RSTR(lb1,h9),BODY(lb1,h6),
lb2_cat_n(x2sg), lb5_dog_n_1(x4sg),
lb4_some_q(x3sg), RSTR(lb4,h8),
BODY(lb4,h7),lb3_chase_v(esp),
ARG1(lb3,x2sg),ARG2(lb3,x3sg), h9lb2,h8lb5,
x1sgx2sg,x3sgx4sg - POS
- lb1_every_q(x1), lb2_cat_n(x2sg),
lb3_chase_v(epast), lb4_some_q(x3),
lb5_dog_n(x4sg)
27Hierarchies
- esp (simple past) is defined a subtype of epast
- in general, hierarchy of sorts defined as part of
the semantic interface (SEM-I) - dog_n_1 is a subtype of dog_n
- by convention, lemma_POS_sense is a subtype of
lemma_POS
28Extreme Underspecification
- Factorize deep representation to minimal units
- Only represent what you know for each type of
processor - Compatibility
- Sorts and (some) closed class word information in
SEM-I for consistency - No lexicon for shallow processing (apart from POS
tags)
29Semantics from RASP
- RASP robust, domain-independent, statistical
parsing (Briscoe and Carroll) - cant produce conventional semantics because no
subcategorization - can sometimes identify arguments
- S -gt NP VP NP supplies ARG1 for V
- partial identification
- VP -gt V NP
- S -gt NP S NP might be ARG2 or ARG3
30Underspecification of arguments
ARGN
ARG1or2
ARG2or3
ARG2
ARG1
ARG3
RASP arguments can be specified as ARGN, ARG2or3
etc Also useful for Japanese deep parsing?
31Software etc
- Open Source LinGO English Resource Grammar (ERG)
- LKB system parsing and generation, now includes
MRS-RMRS interconversion - RMRS output as XML
- RMRS comparison
- Preliminary RASP-RMRS
- First version of SEM-I
32Composition without lambdas
- Formalized, consistent composition
- integration at subsentential level
- standardization
- Traditional lambda calculus unsuitable
- Doesnt allow underspecification
- Syntactic requirements mixed up with the
semantics - Algebra is rational reconstruction of a feature
structure approach to composition
33Lexicalized composition
- h,e1, h3,xsubj ,
- h_probably(h2), h3_sleep(e), arg1(h3,x),
- e1e,h2 qeq h3
- hook externally accessible information
- slots when functor, slot is equated with
argument hook - relations accumulated monotonically
- equalities record hook-slot equations (not shown
from now on) - scope constraints (ignored from now on)
34probably sleeps
- h3,e, h3,xsubj, h3_sleep(e), ARG1(h3,x)
- sleeps
- h,e1, h2,e1mod, h_probably(h2)
- probably
- Syntax defines probably as semantic head,
composition using mod slot - h,e1, h3,xsubj,h_probably(h3),
h3_sleep(e1), arg1(h3,x) - probably sleeps
35Non-lexicalized grammars
- Lexicalized approach is a rational reconstruction
of semantic composition in the ERG (Copestake et
al, 2001) - Without lexical subcategorization, rely on
grammar rules to provide the ARGs - anchors rather than slots, to ground the ARGs
(single anchor for RASP)
36Some cat sleeps (in RASP)
- h3,e, lth3gt, h3_sleep(e)
- sleeps
- h,x, lth1gt, h1_some(x),RSTR(h1,h2),h2_cat(x)
- some cat
- S-gtNP VP
- HeadVP, ARG1(ltVP anchorgt,ltNP hook.indexgt)
- h3,e, lth3gt, h3_sleep(e), ARG1(h3,x),
h1_some(x),RSTR(h1,h2),h2_cat(x) - some cat sleeps
37The current project
38Deep Thought
- Saarbrücken, Sussex, Cambridge, NTNU, Xtramind,
CELI - Objectives demonstrate utility of deep
processing in IE and email response - German, Norwegian, Italian and English
- October 2002 October 2004
39Integrated IE a scenario
- Example
- I dont like the PBX 30
- Shallow processing finds interesting sentences
- Named entity system isolates entities
- h1name(x,PBX-30)
- Deep processor identifies relationships, modals,
negation etc - h2neg(h3), h3_like(y,x), h3name(x,PBX-30)
40Some issues
- shallow processors can sometimes be deeper
e.g. h1model-name(x,PBX-30) - Compatibility and standardization defining SEM-I
(semantic interface) - Limits on compatibility e.g., causative-
inchoative - Efficiency of comparison indexing
representations by character position
41The bigger picture ...
- deep processing reflects syntax and morphology
but limited lexical semantics - conventional vs predictable
- count/mass lentils/rice, furniture, lettuce
- adjectives heavy defeat, ?large problem
- prepositions and particles up
42Incremental development of wide-coverage semantics
- corpus-based acquisition techniques shallow
processing - eventual integration with deep processing
- statistical model of predicates e.g.,
large_j_rel pointer to vector space - logic isnt enough but is needed
43Conclusion
some
every
y
dog1
every
x
cat
some
x
y
chase
cat
y
x
chase
dog1
y
x
x
e
y
x
e
y
lb1every(x), RSTR(lb1,h9), BODY(lb1,h6),
lb2cat(x), lb5dog1(y), lb4some(y),
RSTR(lb4,h8), BODY(lb4,h7), lb3chase(e),ARG1(lb3,
x), ARG2(lb3,y), h9lb2,h8lb5
44Conclusion extreme underspecification
- Split up information content as much as possible
- Accumulate information by simple operations
- Dont represent what you dont know but preserve
everything you do know - Use a flat representation to allow pieces to be
accessed individually