Title: RMRS
1RMRS
- some background and current work
2Talk overview
- RMRS integrating processors via semantics
- Underspecified semantics from shallow processing
- Integration experiments with broad-coverage
systems/grammars (LinGO ERG and RASP) - Planned work
3Integrating processing
- No single system can do everything deep and
shallow processing have inherent strengths and
weaknesses - Domain-dependent and domain-independent
processing must be linked - Parsers and generators
- Common representation for processing above
sentence level (e.g., anaphora)
4Compositional semantics as a common representation
- Need a common representation language for
systems pairwise compatibility between systems
is too limiting - Syntax is theory-specific and unnecessarily
language-specific - Eventual goal should be semantics
- Core idea shallow processing gives
underspecified semantic representation, so deep
and shallow systems can be integrated - Full interlingua / common lexical semantics is
too difficult (certainly currently), but can link
predicates to ontologies, etc.
5Shallow processing and underspecified semantics
- Integrated parsing shallow parsed phrases
incorporated into deep parsed structures - Deep parsing invoked incrementally in response to
information needs - Reuse of knowledge sources
- domain knowledge, recognition of named entities,
transfer rules in MT - Integrated generation
- Formal properties clearer, representations more
generally usable - Deep semantics taken as normative
6RMRS approach current and planned applications
- Question answering
- Cambridge CSTIT deep parse questions, shallow
parse answers - QA from structured knowledge Frank et al
- Information extraction
- Deep Thought
- Chemistry texts (SciBorg (?))
- Dictionary definition parsing for Japanese and
English - Bond and Flickinger
- Rhetorical structure, multi-document
summarization, email response ... - also LOGON semantic transfer. MRSs from LFG
used in HPSG generator.
7RMRS Extreme underspecification
- Goal is to split up semantic representation into
minimal components (cf Verbmobil VITs) - Scope underspecification (MRS)
- Splitting up predicate argument structure
- Explicit equalities
- Hierarchies for predicates and sorts
- Compatibility with deep grammars
- Sorts and (some) closed class word information in
SEM-I (API for grammar, more later) - No lexicon for shallow processing (apart from POS
tags and possibly closed class words)
8RMRS principles
- Split up information content as much as possible
- Accumulate information monotonically by simple
operations - Dont represent what you dont know but preserve
everything you do know - Use a flat representation to allow pieces to be
accessed individually
9Separating arguments
- lb1every(x,h9,h6), lb2cat(x), lb5dog1(y),
lb4some(y,h8,h7), lb3chase(e,x,y),
h9lb2,h8lb5 - goes to
- lb1every(x), RSTR(lb1,h9), BODY(lb1,h6),
lb2cat(x), lb5dog1(y), lb4some(y),
RSTR(lb4,h8), BODY(lb4,h7), lb3chase(e),ARG1(lb3,
x),ARG2(lb3,y), h9lb2,h8lb5
10Naming conventionspredicate names without a
lexicon
- lb1_every_q(x1sg),RSTR(lb1,h9),BODY(lb1,h6),
- lb2_cat_n(x2sg),
- lb5_dog_n_1(x4sg),
- lb4_some_q(x3sg),RSTR(lb4,h8),BODY(lb4,h7),
- lb3_chase_v(esp),ARG1(lb3,x2sg),ARG2(lb3,x4sg)
- h9lb2,h8lb5, x1sgx2sg,x3sgx4sg
11POS output as underspecification
- DEEP
- lb1_every_q(x1sg), RSTR(lb1,h9), BODY(lb1,h6),
lb2_cat_n(x2sg), lb5_dog_n_1(x4sg),
lb4_some_q(x3sg), RSTR(lb4,h8),
BODY(lb4,h7),lb3_chase_v(esp),
ARG1(lb3,x2sg),ARG2(lb3,x4sg), h9lb2,h8lb5,
x1sgx2sg,x3sgx4sg - POS
- lb1_every_q(x1), lb2_cat_n(x2sg),
lb3_chase_v(epast), lb4_some_q(x3),
lb5_dog_n(x4sg)
12POS output as underspecification
- DEEP
- lb1_every_q(x1sg), RSTR(lb1,h9),BODY(lb1,h6),
lb2_cat_n(x2sg), lb5_dog_n_1(x4sg),
lb4_some_q(x3sg), RSTR(lb4,h8),
BODY(lb4,h7),lb3_chase_v(esp),
ARG1(lb3,x2sg),ARG2(lb3,x3sg), h9lb2,h8lb5,
x1sgx2sg,x3sgx4sg - POS
- lb1_every_q(x1), lb2_cat_n(x2sg),
lb3_chase_v(epast), lb4_some_q(x3),
lb5_dog_n(x4sg)
13Semantics from RASP
- RASP robust, domain-independent, statistical
parsing (Briscoe and Carroll) - cant produce conventional semantics because no
subcategorization - can often identify arguments
- S -gt NP VP NP supplies ARG1 for V
- potential for partial identification
- VP -gt V NP
- S -gt NP S NP might be ARG2 or ARG3
14Underspecification of arguments
ARGN
ARG1or2
ARG2or3
ARG2
ARG1
ARG3
RASP arguments can be specified as ARGN, ARG2or3
etc Also useful for Japanese deep parsing?
15RMRS construction
- ERG etc uses MRS -gt RMRS converter
- argument splitting etc
- also RMRS -gt MRS conversion
- POS-RMRS tag lexicon
- RASP-RMRS tag lexicon plus semantic rules
associated with RASP rules to match ERG - defaults when no rule RMRS specified
16RMRS composition with non-lexicalized grammars
- MRS composition assumes a lexicalized approach
algebra defined in Copestake, Lascarides and
Flickinger (2001) - RMRS with non-lexicalised grammars has similar
basic algebra - without lexical subcategorization, rely on
grammar rules to provide the ARGs - anchors rather than slots, to ground the ARGs
(single anchor for RASP) - developed on basis of semantic test suite
- most rules written by Anna Ritchie
17Some cat sleeps (in RASP)
- h3,e, lth3gt, h3_sleep(e)
- sleeps
- h,x, lth1gt, h1_some(x),RSTR(h1,h2),h2_cat(x)
- some cat
- S-gtNP VP
- HeadVP, ARG1(ltVP anchorgt,ltNP hook.indexgt)
- h3,e, lth3gt, h3_sleep(e), ARG1(h3,x),
h1_some(x),RSTR(h1,h2),h2_cat(x) - some cat sleeps
18Real rule ...
- lt!--rulegt
- ltnamegtS/np_vplt/namegt
- ltdtrsgtltdtrgtNPlt/dtrgtltdtrgtVPlt/dtrgtlt/dtrsgt
- ltheadgtRULElt/headgt
- ltsemstructgt
- lthookgtltindexgtElt/indexgtltlabelgtH1lt/labelgtlt/hookgt
- ltslotsgtltnoanchor/gtlt/slotsgt
- ltepgtltgpredgtPRPSTN_M_RELlt/gpredgtltlabelgtH1lt/labelgtltv
argtH2lt/vargtlt/epgt - ltrarggtltrargnamegtARG1lt/rargnamegtltlabelgtH3lt/labelgtltv
argtXlt/vargtlt/rarggt - lthcons hreln'qeq'gtlthigtltvargtH2lt/vargtlt/higtltlogtltvargt
Hlt/vargtlt/logtlt/hconsgt - lt/semstructgt
- ltequalitiesgtltrvgtXlt/rvgtltdhgtltdtrgtNPlt/dtrgtlthegtINDEXlt/
hegtlt/dhgtlt/equalitiesgt - ltequalitiesgtltrvgtHlt/rvgtltdhgtltdtrgtVPlt/dtrgtlthegtLABELlt/
hegtlt/dhgtlt/equalitiesgt - ltequalitiesgtltrvgtH3lt/rvgtltdhgtltdtrgtVPlt/dtrgtlthegtANCHOR
lt/hegtlt/dhgtlt/equalitiesgt - ltequalitiesgtltrvgtElt/rvgtltdhgtltdtrgtVPlt/dtrgtlthegtINDEXlt/
hegtlt/dhgtlt/equalitiesgt - lt/rule--gt
19ERG-RMRS / RASP-RMRS
20Inchoative
21Infinitival subject (unbound in RASP-RMRS)
22Ditransitive missing ARG3
23Mismatch Expletive it
24Mismatch larger numbers
25Comments on RASP-RMRS
- Fast enough (not significant compared to RASP
processing time because no ambiguity) - Too many RASP rules! Need to generalise over
classes. - Requires SEM-I API for MRS/RMRS from deep
grammar - RASP and ERG may change
- compatible test suites semi-automatic rule
update? - alternative technique for composition?
- Parse selection need to generalise over RMRSs
- weighted intersections of RMRSs (cf RASP
grammatical relations)
26SEM-I semantic interface
- Meta-level manually specified grammar
relations (constructions and closed-class) - Object-level linked to lexical database for deep
grammars - Object-level SEM-I auto-generated from expanded
lexical entries in deep grammars (because type
can contribute relations) - Validation of other lexicons
- Need closed class items for RMRS construction
from shallow processing
27Alignment and XML
- Comparing RMRSs for same text efficiently uses
characterization - labels RMRSs according to their source in the
text - currently characters, but byte offset? Japanese
etc? - RMRS-XML
- RMRS seen as levels of mark-up standoff
annotation
28SciBorg Chemistry texts
- eScience project starting in October at Cambridge
- Computer Laboratory (Copestake, Teufel),
Chemistry (Murray-Rust), CeSC (Parker) - Aims
- Develop an NL markup language which will act as a
platform for extraction of information. Link to
semantic web languages. - Develop IE technology and core ontologies for use
by publishers, researchers, readers, vendors and
regulatory organisations. - Model scientific argumentation and citation
purpose in order to support novel modes of
information access. - Demonstrate the applicability of this
infrastructure in a real-world eScience
environment.
29Research markup
- Chemistry The primary aims of the present study
are (i) the synthesis of an amino acid derivative
that can be incorporated into proteins /via/
standard solid-phase synthesis methods, and (ii)
a test of the ability of the derivative to
function as a photoswitch in a biological
environment. - Computational Linguistics The goal of the work
reported here is to develop a method that can
automatically refine the Hidden Markov Models to
produce a more accurate language model.
30RMRS and research markup
- Specify cues in RMRS
- Deep process cues feasible because
domain-independent - more general and reliable than shallow techniques
- allows for complex interrelationships
- Use zones for advanced citation maps and other
enhancements to repositories
31Conclusions
- RMRS semantic representation language allowing
linking of deep and shallower processors - RMRS construction phrase-level compatibility
between processors - Many potential applications