Title: Treebank-Based Wide Coverage Probabilistic LFG Resources
1Treebank-Based Wide Coverage Probabilistic LFG
Resources
Josef van Genabith, Aoife Cahill, Grzegorz
Chrupala, Jennifer Foster, Deirdre Hogan, Conor
Cafferkey, Mick Burke, Ruth ODonovan, Yvette
Graham, Karolina Owczarzak, Yuqing Guo, Ines
Rehbein, Natalie Schluter and Djame
Sedah National Centre for Language Technology
NCLT School of Computing, Dublin City University
2Overview
- Context/Motivation
- Treebank-Based Acquisition of Wide-Coverage LFG
Resources (Penn-II) - LFG
- Automatic F-Structure Annotation Algorithm
- Acquisition of Lexical Resources
- Parsing
- Parsing Architectures
- LDD-Resolution
- Comparison with Hand-Crafted (XLE, RASP) and
Treebank-Based (CCG, HPSG) Resources - Generation
- Basic Generator
- Generation Grammar Transforms
- History-Based Generation
- MT Evaluation
3Motivation
- What do grammars do?
- Grammars define languages as sets of strings
- Grammars define what strings are grammatical and
what strings are not - Grammars tell us about the syntactic structure of
(associated with) strings - Shallow vs. Deep grammars
- Shallow grammars do all of the above
- Deep grammars (in addition) relate text to
information/meaning representation - Information predicate-argument-adjunct
structure, deep dependency relations, logical
forms, - In natural languages, linguistic material is not
always interpreted locally where you encounter
it long-distance dependencies (LDDs) - Resolution of LDDs crucial to construct accurate
and complete information/meaning representations. - Deep grammars (text lt-gt meaning) (LDD
resolution)
4Motivation
- Constraint-Based Grammar Formalisms (FU, GPSG,
PATR-II, ) - Lexical-Functional Grammar (LFG)
- Head-Driven Phrase Structure Grammar (HPSG)
- Combinatory Categorial Grammar (CCG)
- Tree-Adjoining Grammar (TAG)
- Traditionally, deep constraint-based grammars are
hand-crafted - LFG ParGram, HPSG LingoErg, Core Language Engine
CLE, Alvey Tools, RASP, ALPINO, - Wide-coverage, deep constraint-based grammar
development is very time consuming, knowledge
extensive and expensive! - Very hard to scale hand-crafted grammars to
unrestricted text! - English XLE (Riezler et al. 2002) German XLE
(Forst and Rohrer 2006) Japanese XLE (Masuichi
and Okuma 2003) RASP (Carroll and Briscoe 2002)
ALPINO (Bouma, van Noord and Malouf, 2000)
5Motivation
- Instance of knowledge acquisition bottleneck
familiar from classical rationalist
rule/knowledge-based AI/NLP - Alternative to classical rationalist
rule/knowledge-based AI/NLP - Empiricist data-driven research paradigm
(AI/NLP) - Corpora, , machine-learning-based and
statistical approaches, - Treebank-based grammar acquisition, probabilistic
parsing - Advantage grammars can be induced (learned)
automatically - Very low development cost, wide-coverage, robust,
but - Most treebank-based grammar induction/parsing
technology produces shallow grammars - Shallow grammars dont resolve LDDs (but see
(Johnson 2002) ), do not map strings to
information/meaning representations
6Motivation
- Poses a number of research questions
- Can we address the knowledge acquisition
bottleneck for deep grammar development by
combining insights from rationalist and
empiricist research paradigms? - Specifically
- Can we automatically acquire wide-coverage
deep, probabilistic, constraint-based grammars
from treebanks? - How do we use them in parsing?
- Can we use them for generation?
- Can we acquire resources for different languages
and treebank encodings? - How do these resources compare with hand-crafted
resources? - How do they fare in applications ?
7Context
- TAG (Xia, 2001)
- LFG (Cahill, McCarthy, van Genabith and Way,
2002) - CCG (Hockenmaier Steedman, 2002)
- HPSG (Miyao and Tsujii, 2003)
- LFG
- (van Genabith, Sadler and Way, 1999)
- (Frank, 2000)
- (Sadler, van Genabith and Way, 2000)
- (Frank, Sadler, van Genabith and Way, 2003)
8Lexical-Functional Grammar (LFG)
9LFG Acquisition for English - Overview
- Treebank-Based Acquisition of LFG Resources
(Penn-II) - Lexical Functional Grammar LFG
- Penn-II Treebank Preprocessing/Clean-Up
- F-Str Annotation Algorithm
- Grammar and Lexicon Extraction
- Parsing Architectures (LDD Resolution)
- Comparison with best hand-crafted resources XLE
and RASP - Comparison with treebank-based CCG and HPSG
resources
10Lexical-Functional Grammar (LFG)
- Lexical-Functional Grammar (LFG) (Bresnan
Kaplan 1981, Bresnan 2001, Dalrymple 2001) is a
constraint-based theory of grammar. - Two (basic) levels of representation
- C-structure represents surface grammatical
configurations such as word order, annotated CFG
rules/trees - F-structure represents abstract syntactic
functions such as SUBJ(ject), OBJ(ect),
OBL(ique), PRED(icate), COMP(lement), ADJ(unct)
, AVM attribute-value matrices/feature
structures - F-structure approximates to basic
predicate-argument structure, dependency
representation, logical form (van Genabith and
Crouch, 1996 1997)
11Lexical-Functional Grammar (LFG)
12Lexical-Functional Grammar (LFG)
- Subcategorisation
- Semantic forms (subcat frames) seeltSUBJ,OBJgt
- Completeness all GFs in semantic form present at
local f-structure - Coherence only the GFs in semantic form present
at local f-structure - Long Distance Dependencies (LDDs) resolved at
f-structure with - Functional Uncertainty Equations (regular
expressions specifying paths in f-structure)
e.g. ?TOPICREL ?COMP OBJ - subcat frames
- Completeness/Coherence.
13Lexical-Functional Grammar (LFG)
14Introduction Penn-II LFG
- If we had f-structure annotated version of
Penn-II, we could use (standard) machine learning
methods to extract probabilistic, wide-coverage
LFG resources - How do we get f-structure annotated Penn-II?
- Manually? No 50,000 trees !
- Automatically! Yes F-Structure annotation
algorithm ! - Penn-II is a 2nd generation treebank contains
lots of annotations to support derivation of deep
meaning representations - trees, Penn-II functional tags (-SBJ, -TMP,
-LOC), traces coindexation - f-structure annotation algorithm exploits those.
15Treebank Annotation Penn-II LFG
16Treebank Annotation Penn-II LFG
17Treebank Preprocessing/Clean-Up Penn-II LFG
- Penn-II treebank often flat analyses
(coordination, NPs ), a certain amount of noise
inconsistent annotations, errors - No treebank preprocessing or clean-up in the LFG
approach (unlike CCG- and HPSG-based approaches) - Take Penn-II treebank as is, but
- Remove all trees with FRAG or X labelled
constituents - Frag fragments, X not known how to annotate
- Total of 48,424 trees as they are.
18Treebank Annotation Penn-II LFG
- Annotation-based (rather than conversion-based)
- Automatic annotation of nodes in Penn-II treebank
trees with f-structure equations - Annotation Algorithm exploits
- Head information
- Categorial information
- Configurational information
- Penn-II functional tags
- Trace information
19Treebank Annotation Penn-II LFG
- Architecture of a modular algorithm to assign LFG
f-structure equations to trees in the Penn-II
treebank
Head-Lexicalisation Magerman,1994
Left-Right Context Annotation Principles
Proto F-Structures
Coordination Annotation Principles
Proper F-Structures
Catch-All and Clean-Up
Traces
20Treebank Annotation Penn-II LFG
- Head Lexicalisation modified rules based on
(Magerman, 1994)
21Treebank Annotation Penn-II LFG
- Left-Right Context Annotation Principles
- Head of NP likely to be rightmost noun
- Mother ? Left Context Head Right Context
22Treebank Annotation Penn-II LFG
Left-Right Annotation Matrix
NP
Left Context Head Right Context
DT ?specdet? QP ?specquant? JJ, ADJP ???adjunct NN, NNS ?? NP ???app PP ???adjunct S, SBAR ???relmod
NP
NP
DT
ADJP
NN
?? NN
?specdet? DT
???adjunct ADJP
?
RB
JJ
deal
a
RB
JJ
deal
a
very politicized
very politicized
23 Treebank Annotation Penn-II LFG
24Treebank Annotation Penn-II LFG
- Do annotation matrix for each of the monadic
categories (without Fun tags) in Penn-II - Based on analysing the most frequent rule types
for each categorysuch that - sum total of token frequencies of these rule
types is greater than 85 of total number of rule
tokens for that category - 100 85
100 85 - NP 6595 102 VP 10239
307 - S 2602 20 ADVP
234 6 - Apply annotation matrix to all (i.e. also unseen)
rules/sub-trees, i.e. also those NP-LOC, NP-TMP
etc.
25Treebank Annotation Penn-II LFG
- Traces Module
- Long Distance Dependencies (LDDs)
- Topicalisation
- Questions
- Wh- and wh-less relative clauses
- Passivisation
- Control constructions
- ICH (interpret constituent here)
- RNR (right node raising)
-
- Translate Penn-II traces and coindexation into
corresponding reentrancy in f-structure
26Treebank Annotation Control Wh-Rel. LDD
27Treebank Annotation Penn-II LFG
Head-Lexicalisation Magerman,1995
Left-Right Context Annotation Principles
Proto F-Structures
Coordination Annotation Principles
Proper F-Structures
Catch-All and Clean-Up
Traces
Constraint Solver
28Treebank Annotation Penn-II LFG
- Collect f-structure equations
- Send to constraint solver
- Generates f-structures
- F-structure annotation algorithm in Java,
constraint solver in Prolog - 3 min annotating 50,000 Penn-II trees
- 5 min producing 50,000 f-structures
29Treebank Annotation Penn-II LFG
- Evaluation (Quantitative)
- Coverage
- Over 99.8 of Penn-II sentences (without X and
FRAG constituents) receive a single covering and
connected f-structure
0 F-structures 45 0.093
1 F-structure 48329 99.804
2 F-structures 50 0.103
30Treebank Annotation Penn-II LFG
- F-structure quality evaluation against DCU 105
Dependency Bank, a manually annotated dependency
gold standard of 105 sentences randomly extracted
from WSJ section 23. - Triples are extracted from the gold standard
- Evaluation software from (Crouch et al. 2002) and
(Riezler et al. 2002) - relation(predicate0, argument1)
DCU 105 All Annotations Preds-Only
Precision 97.06 94.28
Recall 96.80 94.28
31Treebank Annotation Penn-II LFG
- Following (Kaplan et al. 2004) evaluation against
PARC 700 Dependency Bank calculated for - all annotations ? PARC features ?
preds-only - Mapping required (Burke 2004, 2006)
PARC 700 PARC features
Precision 88.31
Recall 86.38
32Grammar and Lexicon Extraction Penn-II LFG
- Lexical Resources
- Lexical information extremely important in modern
lexicalised grammar formalisms - LFG, HPSG, CCG, TAG,
- Lexicon development is time consuming and
extremely expensive - Rarely if ever complete
- Familiar knowledge acquisition bottleneck
- Treebank-based subcategorisation frame induction
(LFG semantic forms) from Penn-II and III - Parser-based induction from British National
Corpus (BNC) - Evaluation against COMLEX, OALD, Korhonens data
set
33Grammar and Lexicon Extraction Penn-II LFG
- Lexicon Construction
- Manual vs. Automated
- Our Approach
- Subcat Frames not Predefined
- Functional and/or Categorial Information
- Parameterised for Prepositions and Particles
- Active and Passive
- Long Distance Dependencies
- Conditional Probabilities
34Grammar and Lexicon Extraction Penn-II LFG
35Grammar and Lexicon Extraction Penn-II LFG
applyltSUBJ,OBLforgt winltSUBJ,OBJgt
36Grammar and Lexicon Extraction Penn-II LFG
Lexicon extracted from Penn-II (ODonovan et al
2005)
37Grammar and Lexicon Extraction Penn-II LFG
38Grammar and Lexicon Extraction Penn-II LFG
- Parsing-Based Subcat Frame Extraction (ODonovan
2006) - Treebank-based vs. parsing-based subcat frame
extraction - Parsed British National Corpus BNC (100 million
words) with our automatically induced LFGs - 19 days on single machine 5 million words per
day - Subcat frame extraction for 10,000 verb lemmas
- Evaluation against COMLEX and OALD
- Evaluation against Korhonen (2002) gold standard
- Our method is statistically significantly better
than Korhonen (2002)
39Parsing Penn-II and LFG
- Overview Parsing Architectures
- Pipeline Integrated
- Long-Distance Dependency (LDD) Resolution at
F-Structure - Evaluation Comparison with Hand-Crafted
Resources (XLE and RASP) - Comparison against Treebank-Based CCG and HPSG
Resources
40Parsing Penn-II and LFG
41Lexical-Functional Grammar (LFG)
42Parsing Penn-II and LFG
- Require
- subcategorisation frames (ODonovan et al., 2004,
2005 ODonovan 2006) - functional uncertainty equations
- Previous Example
- claim(subj,comp), deny(subj,obj)
- ? topicrel ? comp obj (search along a path of
0 or more comps)
43Parsing Penn-II and LFG
- Subcat frames as above (ODonovan et al. 2004,
2005) - Functional Uncertainty equations
- Automatically acquire finite approximations of
FU-equations - Extract paths between co-indexed material in
automatically generated f-structures from
sections 02-21 from Penn-II - 26 TOPIC, 60 TOPICREL, 13 FOCUS path types
- 99.69 coverage of paths in WSJ Section 23
- Each path type associated with a probability
- LDD resolution ranked by Path x Subcat
probabilities (Cahill et al., 2004)
44Parsing Penn-II and LFG
- How do treebank-based constraint grammars compare
to deep hand-crafted grammars like XLE and RASP? - XLE (Riezler et al. 2002, Kaplan et al. 2004)
- hand-crafted, wide-coverage, deep,
state-of-the-art English LFG and XLE parsing
system with log-linear-based probability models
for disambiguation - PARC 700 Dependency Bank gold standard (King et
al. 2003), Penn-II Section 23-based - RASP (Carroll and Briscoe 2002)
- hand-crafted, wide-coverage, deep,
state-of-the-art English probabilistic
unification grammar and parsing system (RASP
Rapid Accurate Statistical Parsing) - CBS 500 Dependency Bank gold standard (Carroll,
Briscoe and Sanfillippo 1999), Susanne-based
45Parsing Penn-II and LFG
- (Bikel 2002) retrained to retain Penn-II
functional tags (-SBJ, -SBJ, -LOC,-TMP, -CLR,
-LGS, etc.) - Pipeline architecture
- tag text ? Bikel retrained f-structure
annotation algorithm LDD resolution ?
f-structures ? automatic conversion ? evaluation
against XLE/RASP gold standards PARC-700/CBS-500
Dependency Banks
46Parsing Penn-II and LFG
- Systematic differences between f-structures and
PARC 700 and CBS 500 dependency representations - Automatic conversion of f-structures to PARC 700
/ CBS 500 -like structures (Burke et al. 2004,
Burke 2006, Cahill et al. 2008) - Evaluation software (Crouch et al. 2002) and
(Carroll and Briscoe 2002) - Approximate Randomisation Test (Noreen 1989) for
statistical significance
47Parsing Penn-II and LFG
- Result dependency f-scores (CL 2008 paper)
- PARC 700 XLE vs. DCU-LFG
- 80.55 XLE
- 82.73 DCU-LFG (2.18)
- CBS 500 RASP vs. DCU-LFG
- 76.57 RASP
- 80.23 DCU-LFG (3.66)
- Results statistically significant at ? 95 level
(Noreen 1989) - Best result now against PARC 700 84.00 (3.45)
Charniak Reranker Grzegorz Penn-II
function-tag labeler
48Parsing Penn-II and LFG
PARC 700 Evaluation
49Parsing Penn-II and LFG
50Parsing Penn-II and LFG
51Parsing Penn-II and LFG
52Parsing Penn-II and LFG
53Parsing Penn-II and LFG
54Evaluation against Treebank-Based CCG and HPSG
- CCG Combinatory Categorial Grammar (Steedman
2000) - HPSG Head-Driven Phrase Structure Grammar
(Pollard Sag 1994) - Both constraint-based grammar formalisms
- Treebank-based CCG resources (Hockenmaier
Steedman 2002, Hockenmaier 2003, Clark Curran
2004, ) - Treebank-based HPSG resources (Miyao, Ninomiya
Tsujii 2003, Miyao Tsujii 2004, ) - DepBank reannotated version of PARC 700
(Briscoe Carroll 2006) with CBS 500style GRs - RASP (version 2) (Briscoe Carroll 2006)
55Evaluation against Treebank-Based CCG and HPSG
- CCG
- Small set of basic categories NP, N, PP, S
- Complex categories VP S\NP Vi S\NP Vdi
(S\NP)/NP - Small set of combination rules
- X/Y Y ? X
- Y X\Y ? X
- X/Y Y/Z ? X/Z
-
56Evaluation against Treebank-Based CCG and HPSG
- HPSG
- Uniform representation typed feature structures
and inheritance - Sign PHON, SYNSEM, DTRS
- Inheritance hierarchy
- Principles (HEAD-FEATURE, VALENCE, )
- Id-Schemata (HEAD-COMP, HEAD-MOD, )
57Evaluation against Treebank-Based CCG and HPSG
58Evaluation against Treebank-Based CCG and HPSG
59Evaluation against Treebank-Based CCG and HPSG
60Probability Models Penn-II LFG
61Probability Models Penn-II LFG
62Probability Models Penn-II LFG
- Results are interesting as
- Extensive treebank preprocessing (clean-up,
correction and restructuring) in CCG and (some
in) HPSG - none in LFG
- Custom-designed parsers and sophisticated
(log-linear, max ent) parse selection probability
models in HPSG and CCG - Mix of off-the-shelf and custom designed
components, each with their own probability model
in early-disambiguation processing pipeline in
LFG, no proper overall probability model, but an
approximation at best - Still competitive results
63Probability Models Penn-II LFG
- Probability Models
- Our approach does not constitute proper
probability model (Abney, 1996) - Why? Probability model leaks
- Highest ranking parse tree may feature
f-structure equations that cannot be resolved
into f-structure - Probability associated with that parse tree is
lost - Doesnt happen often in practice (coverage gt99.5
on unseen data) - Research on appropriate discriminative,
log-linear or maximum entropy models is important
(Miyao and Tsujii, 2002) (Riezler et al. 2002)
64Demo System
- http//lfg-demo.computing.dcu.ie/lfgparser.html
65Applications Generation
66Applications Generation
- Research Question
- Can we make the automatically induced LFG
resources reversible/bi-directional? - Can they be used for both (probabilistic) parsing
and generation?
67Generation Penn-II LFG
68Generation Penn-II LFG
69Generation Penn-II LFG
70Generation Penn-II LFG
71Generation Penn-II LFG
72Generation Penn-II LFG
73Generation Penn-II LFG
74Generation Penn-II LFG
Problem conditioning of generation rules on
purely local f-str features Solution I
generation grammar transformation (Cahill et al.
2006) Solution II history-based probabilistic
generation (Hogan et al. 2007, Cafferkey et al.
2007) condition generation rules on parent GF
75Generation Penn-II LFG
76Generation Penn-II LFG
77Generation Penn-II LFG
78Generation the Good, the Bad and the Ugly
- Orig Supporters of the legislation view the bill
as an effort to add stability and certainty to
the airline-acquisition process , and to preserve
the safety and fitness of the industry . - Gen Supporters of the legislation view the bill
as an effort to add stability and certainty to
the airline-acquisition process , and to preserve
the safety and fitness of the industry. - Orig The upshot of the downshoot is that the A
's go into San Francisco 's Candlestick Park
tonight up two games to none in the best-of-seven
fest . - Gen The upshot of the downshoot is that the A 's
tonight go into San Francisco 's Candlestick Park
up two games to none in the best-of-seven fest . - Orig By this time , it was 430 a.m. in New York
, and Mr. Smith fielded a call from a New York
customer wanting an opinion on the British stock
market , which had been having troubles of its
own even before Friday 's New York market break .
- Gen Mr. Smith fielded a call from New a customer
York wanting an opinion on the market British
stock which had been having troubles of its own
even before Friday 's New York market break by
this time and in New York , it was 430 a.m. . - Orig Only half the usual lunchtime crowd
gathered at the tony Corney Barrow wine bar on
Old Broad Street nearby . - Gen At wine tony Corney Barrow the bar on Old
Broad Street nearby gathered usual , lunchtime
only half the crowd , .
79Generation Penn-II LFG
80Generation Penn-II LFG
Problem conditioning of generation rules on
purely local f-str features Solution generation
grammar transformation (Cahill et al.
2006) Solution history-based probabilistic
generation (Hogan et al. 2007, Cafferkey et al.
2007) condition generation rules on parent GF
81Generation the Good, the Bad and the Ugly
- Orig By this time , it was 430 a.m. in New York
, and Mr. Smith fielded a call from a New York
customer wanting an opinion on the British stock
market , which had been having troubles of its
own even before Friday 's New York market break .
- Gen Mr. Smith fielded a call from New a customer
York wanting an opinion on the market British
stock which had been having troubles of its own
even before Friday 's New York market break by
this time and in New York , it was 430 a.m. .
(Cahill et al. 2006) GGT - Gen By this time , in New York , it was 430
a.m. , and Mr. Smith fielded a call from New a
customer York , wanting an opinion on the market
British stock which had been having troubles of
its own even before Friday s New York market
break . (Hogan et al. 2007) HB - Gen By this time , in New York , it was 430
a.m. , and Mr. Smith fielded a call from a New
York customer , wanting an opinion on the market
British stock which had been having troubles of
its own even before Friday s New York market
break . (Hogan et al. 2007) HB MWU
82Generation Chinese CTB2
- CTB2 (Yuqing Guo - Toshiba China Beijing RD Lab)
- (Cahill et al. 2006) out of the box
- Training articles 1-270 (3,480 sentences)
- Testing articles 301-325 (351 sentences)
83Applications Machine Translation
- Applications Machine Translation
- Labelled Dependency-Based MT Evaluation (LaDEva)
- Automatic Acquisition of Transfer Rules
84Applications Machine Translation
- Labelled-Dependency-Based MT Evaluation
- Most automatic MT evaluation metrics (BLEU, NIST)
are string (n-gram) based. - They unfairly punish perfectly legitimate
syntactic and lexical variation - Yesterday John resigned.
- John resigned yesterday.
- Yesterday John quit.
- Legitimate lexical variation throw in WordNet
synonyms into the string match - What about syntactic variation?
85Applications Machine Translation
- Idea use labelled dependencies for MT evaluation
- Why dependencies abstract away from some
particulars of surface realisation - Adjunct placement, order of conjuncts in a
coordination, topicalisation, ...
86Applications Machine Translation
- Idea is intuitive
- To make it happen you need a robust parser that
can parse MT output ? - Treebank-induced parsers parse anything !
- How do we judge whether labelled dependency-based
method is better than string-based methods? - We compare (correlation) with human
judgement/evaluation performance - Why humans not fooled by legitimate syntactic
variation
87Applications Machine Translation
- Experiment use LDC Multiple Translation Chinese
(MTC) Parts 2 and 4 data - 16,807 translation-reference human score segments
- 5,007 test, rest for training (weights etc.)
- To make this work, we throw in
- n-best parsing
- WordNet synonyms
- partial matching
- training weights
- etc
88Applications Machine Translation
89Applications Machine Translation
90References (MT Eval)
- Karolina Owczarzak, Yvette Graham and Josef van
Genabith Using F-structures in Machine
Translation Evaluation. In Proceedings of the
12th International Conference on Lexical
Functional Grammar, July 28-30, 2007, Stanford,
CA - Karolina Owczarzak, Josef van Genabith, and Andy
Way. Labelled Dependencies in Machine Translation
Evaluation. In Proceedings of ACL 2007 Workshop
on Statistical Machine Translation, pages
104-111, Prague, Czech Republic - Karolina Owczarzak, Josef van Genabith, and Andy
Way. Dependency-Based Automatic Evaluation for
Machine Translation. In Proceedings of HLT-NAACL
2007 Workshop on Syntax and Structure in
Statistical Translation. Rochester, NY.
91References (Parsing)
- Aoife Cahill, Michael Burke, Ruth O'Donovan,
Stefan Riezler, Josef van Genabith and Andy Way.
2008. Wide-Coverage Statistical Parsing Using
Automatic Dependency Structure Annotation.
Computational Linguistics, Volume 34, 1, MIT
Press, March 2008. (accepted for publication) - Joachim Wagner, Djamé Seddah, Jennifer Foster and
Josef van Genabith C-Structures and F-Structures
for the British National Corpus. In Proceedings
of the 12th International Conference on Lexical
Functional Grammar, July 28-30, 2007, Stanford,
CA - A. Cahill, M. Burke, R. O'Donovan, J. van
Genabith, and A. Way. Long-Distance Dependency
Resolution in Automatically Acquired
Wide-Coverage PCFG-Based LFG Approximations, In
Proceedings of the 42nd Annual Meeting of the
Association for Computational Linguistics
(ACL-04), July 21-26 2004, pages 320-327,
Barcelona, Spain, 2004 - Cahill A, M. McCarthy, J. van Genabith and A.
Way. Parsing with PCFGs and Automatic F-Structure
Annotation, In M. Butt and T. Holloway-King
(eds.) Proceedings of the Seventh International
Conference on LFG CSLI Publications, Stanford,
CA., pp.76--95. 2002
92References (Generation, Lex. Acq.)
- Deirdre Hogan, Conor Cafferkey, Aoife Cahill and
Josef van Genabith, Exploiting Multi-Word Units
in History-Based Probabilistic Generation, in
Proceedings of the Joint Conference on Empirical
Methods in Natural Language Processing and
Natural Language Learning (EMNLP-CoNLL 2007),
Prague, Czech Republic. pp.267-276 - A. Cahill and J. Van Genabith, Robust PCFG-Based
Generation using Automatically Acquired
LFG-Approximations, COLING/ACL 2006, Sydney,
Australia - R. O'Donovan, M. Burke, A. Cahill, J. van
Genabith and A. Way. Large-Scale Induction and
Evaluation of Lexical Resources from the Penn-II
and Penn-III Treebanks, Computational
Linguistics, 2005 - R. O'Donovan, M. Burke, A. Cahill, J. van
Genabith, and A. Way. Large-Scale Induction and
Evaluation of Lexical Resources from the Penn-II
Treebank, In Proceedings of the 42nd Annual
Meeting of the Association for Computational
Linguistics (ACL-04), July 21-26 2004, pages
368-375, Barcelona, Spain, 2004