Treebank-Based Wide Coverage Probabilistic LFG Resources

About This Presentation

Title:

Treebank-Based Wide Coverage Probabilistic LFG Resources

Description:

Lexicon development is time consuming and extremely expensive. Rarely if ever complete ... Grammar and Lexicon Extraction: Penn-II & LFG ... – PowerPoint PPT presentation

Number of Views:97

Avg rating:3.0/5.0

Slides: 93

Provided by: josefvan

Category:

more less

Transcript and Presenter's Notes

Title: Treebank-Based Wide Coverage Probabilistic LFG Resources

1
Treebank-Based Wide Coverage Probabilistic LFG
Resources
Josef van Genabith, Aoife Cahill, Grzegorz
Chrupala, Jennifer Foster, Deirdre Hogan, Conor
Cafferkey, Mick Burke, Ruth ODonovan, Yvette
Graham, Karolina Owczarzak, Yuqing Guo, Ines
Rehbein, Natalie Schluter and Djame
Sedah National Centre for Language Technology
NCLT School of Computing, Dublin City University
2
Overview

Context/Motivation
Treebank-Based Acquisition of Wide-Coverage LFG
Resources (Penn-II)
LFG
Automatic F-Structure Annotation Algorithm
Acquisition of Lexical Resources
Parsing
Parsing Architectures
LDD-Resolution
Comparison with Hand-Crafted (XLE, RASP) and
Treebank-Based (CCG, HPSG) Resources
Generation
Basic Generator
Generation Grammar Transforms
History-Based Generation
MT Evaluation

3
Motivation

What do grammars do?
Grammars define languages as sets of strings
Grammars define what strings are grammatical and
what strings are not
Grammars tell us about the syntactic structure of
(associated with) strings
Shallow vs. Deep grammars
Shallow grammars do all of the above
Deep grammars (in addition) relate text to
information/meaning representation
Information predicate-argument-adjunct
structure, deep dependency relations, logical
forms,
In natural languages, linguistic material is not
always interpreted locally where you encounter
it long-distance dependencies (LDDs)
Resolution of LDDs crucial to construct accurate
and complete information/meaning representations.
Deep grammars (text lt-gt meaning) (LDD
resolution)

4
Motivation

Constraint-Based Grammar Formalisms (FU, GPSG,
PATR-II, )
Lexical-Functional Grammar (LFG)
Head-Driven Phrase Structure Grammar (HPSG)
Combinatory Categorial Grammar (CCG)
Tree-Adjoining Grammar (TAG)
Traditionally, deep constraint-based grammars are
hand-crafted
LFG ParGram, HPSG LingoErg, Core Language Engine
CLE, Alvey Tools, RASP, ALPINO,
Wide-coverage, deep constraint-based grammar
development is very time consuming, knowledge
extensive and expensive!
Very hard to scale hand-crafted grammars to
unrestricted text!
English XLE (Riezler et al. 2002) German XLE
(Forst and Rohrer 2006) Japanese XLE (Masuichi
and Okuma 2003) RASP (Carroll and Briscoe 2002)
ALPINO (Bouma, van Noord and Malouf, 2000)

5
Motivation

Instance of knowledge acquisition bottleneck
familiar from classical rationalist
rule/knowledge-based AI/NLP
Alternative to classical rationalist
rule/knowledge-based AI/NLP
Empiricist data-driven research paradigm
(AI/NLP)
Corpora, , machine-learning-based and
statistical approaches,
Treebank-based grammar acquisition, probabilistic
parsing
Advantage grammars can be induced (learned)
automatically
Very low development cost, wide-coverage, robust,
but
Most treebank-based grammar induction/parsing
technology produces shallow grammars
Shallow grammars dont resolve LDDs (but see
(Johnson 2002) ), do not map strings to
information/meaning representations

6
Motivation

Poses a number of research questions
Can we address the knowledge acquisition
bottleneck for deep grammar development by
combining insights from rationalist and
empiricist research paradigms?
Specifically
Can we automatically acquire wide-coverage
deep, probabilistic, constraint-based grammars
from treebanks?
How do we use them in parsing?
Can we use them for generation?
Can we acquire resources for different languages
and treebank encodings?
How do these resources compare with hand-crafted
resources?
How do they fare in applications ?

7
Context

TAG (Xia, 2001)
LFG (Cahill, McCarthy, van Genabith and Way,
2002)
CCG (Hockenmaier Steedman, 2002)
HPSG (Miyao and Tsujii, 2003)
LFG
(van Genabith, Sadler and Way, 1999)
(Frank, 2000)
(Sadler, van Genabith and Way, 2000)
(Frank, Sadler, van Genabith and Way, 2003)

8
Lexical-Functional Grammar (LFG)

Parsing

9
LFG Acquisition for English - Overview

Treebank-Based Acquisition of LFG Resources
(Penn-II)
Lexical Functional Grammar LFG
Penn-II Treebank Preprocessing/Clean-Up
F-Str Annotation Algorithm
Grammar and Lexicon Extraction
Parsing Architectures (LDD Resolution)
Comparison with best hand-crafted resources XLE
and RASP
Comparison with treebank-based CCG and HPSG
resources

10
Lexical-Functional Grammar (LFG)

Lexical-Functional Grammar (LFG) (Bresnan
Kaplan 1981, Bresnan 2001, Dalrymple 2001) is a
constraint-based theory of grammar.
Two (basic) levels of representation
C-structure represents surface grammatical
configurations such as word order, annotated CFG
rules/trees
F-structure represents abstract syntactic
functions such as SUBJ(ject), OBJ(ect),
OBL(ique), PRED(icate), COMP(lement), ADJ(unct)
, AVM attribute-value matrices/feature
structures
F-structure approximates to basic
predicate-argument structure, dependency
representation, logical form (van Genabith and
Crouch, 1996 1997)

11
Lexical-Functional Grammar (LFG)
12
Lexical-Functional Grammar (LFG)

Subcategorisation
Semantic forms (subcat frames) seeltSUBJ,OBJgt
Completeness all GFs in semantic form present at
local f-structure
Coherence only the GFs in semantic form present
at local f-structure
Long Distance Dependencies (LDDs) resolved at
f-structure with
Functional Uncertainty Equations (regular
expressions specifying paths in f-structure)
e.g. ?TOPICREL ?COMP OBJ
subcat frames
Completeness/Coherence.

13
Lexical-Functional Grammar (LFG)
14
Introduction Penn-II LFG

If we had f-structure annotated version of
Penn-II, we could use (standard) machine learning
methods to extract probabilistic, wide-coverage
LFG resources
How do we get f-structure annotated Penn-II?
Manually? No 50,000 trees !
Automatically! Yes F-Structure annotation
algorithm !
Penn-II is a 2nd generation treebank contains
lots of annotations to support derivation of deep
meaning representations
trees, Penn-II functional tags (-SBJ, -TMP,
-LOC), traces coindexation
f-structure annotation algorithm exploits those.

15
Treebank Annotation Penn-II LFG
16
Treebank Annotation Penn-II LFG
17
Treebank Preprocessing/Clean-Up Penn-II LFG

Penn-II treebank often flat analyses
(coordination, NPs ), a certain amount of noise
inconsistent annotations, errors
No treebank preprocessing or clean-up in the LFG
approach (unlike CCG- and HPSG-based approaches)
Take Penn-II treebank as is, but
Remove all trees with FRAG or X labelled
constituents
Frag fragments, X not known how to annotate
Total of 48,424 trees as they are.

18
Treebank Annotation Penn-II LFG

Annotation-based (rather than conversion-based)
Automatic annotation of nodes in Penn-II treebank
trees with f-structure equations
Annotation Algorithm exploits
Head information
Categorial information
Configurational information
Penn-II functional tags
Trace information

19
Treebank Annotation Penn-II LFG

Architecture of a modular algorithm to assign LFG
f-structure equations to trees in the Penn-II
treebank

Head-Lexicalisation Magerman,1994
Left-Right Context Annotation Principles
Proto F-Structures
Coordination Annotation Principles
Proper F-Structures
Catch-All and Clean-Up
Traces

20
Treebank Annotation Penn-II LFG

Head Lexicalisation modified rules based on
(Magerman, 1994)

21
Treebank Annotation Penn-II LFG

Left-Right Context Annotation Principles
Head of NP likely to be rightmost noun
Mother ? Left Context Head Right Context

22
Treebank Annotation Penn-II LFG
Left-Right Annotation Matrix
NP
Left Context Head Right Context
DT ?specdet? QP ?specquant? JJ, ADJP ???adjunct NN, NNS ?? NP ???app PP ???adjunct S, SBAR ???relmod
NP
NP
DT
ADJP
NN
?? NN
?specdet? DT
???adjunct ADJP
?
RB
JJ
deal
a
RB
JJ
deal
a
very politicized
very politicized
23
Treebank Annotation Penn-II LFG
24
Treebank Annotation Penn-II LFG

Do annotation matrix for each of the monadic
categories (without Fun tags) in Penn-II
Based on analysing the most frequent rule types
for each categorysuch that
sum total of token frequencies of these rule
types is greater than 85 of total number of rule
tokens for that category
100 85
100 85
NP 6595 102 VP 10239
307
S 2602 20 ADVP
234 6
Apply annotation matrix to all (i.e. also unseen)
rules/sub-trees, i.e. also those NP-LOC, NP-TMP
etc.

25
Treebank Annotation Penn-II LFG

Traces Module
Long Distance Dependencies (LDDs)
Topicalisation
Questions
Wh- and wh-less relative clauses
Passivisation
Control constructions
ICH (interpret constituent here)
RNR (right node raising)
Translate Penn-II traces and coindexation into
corresponding reentrancy in f-structure

26
Treebank Annotation Control Wh-Rel. LDD
27
Treebank Annotation Penn-II LFG
Head-Lexicalisation Magerman,1995
Left-Right Context Annotation Principles
Proto F-Structures
Coordination Annotation Principles
Proper F-Structures
Catch-All and Clean-Up
Traces

Constraint Solver
28
Treebank Annotation Penn-II LFG

Collect f-structure equations
Send to constraint solver
Generates f-structures
F-structure annotation algorithm in Java,
constraint solver in Prolog
3 min annotating 50,000 Penn-II trees
5 min producing 50,000 f-structures

29
Treebank Annotation Penn-II LFG

Evaluation (Quantitative)
Coverage
Over 99.8 of Penn-II sentences (without X and
FRAG constituents) receive a single covering and
connected f-structure

0 F-structures 45 0.093
1 F-structure 48329 99.804
2 F-structures 50 0.103
30
Treebank Annotation Penn-II LFG

F-structure quality evaluation against DCU 105
Dependency Bank, a manually annotated dependency
gold standard of 105 sentences randomly extracted
from WSJ section 23.
Triples are extracted from the gold standard
Evaluation software from (Crouch et al. 2002) and
(Riezler et al. 2002)
relation(predicate0, argument1)

DCU 105 All Annotations Preds-Only
Precision 97.06 94.28
Recall 96.80 94.28
31
Treebank Annotation Penn-II LFG

Following (Kaplan et al. 2004) evaluation against
PARC 700 Dependency Bank calculated for
all annotations ? PARC features ?
preds-only
Mapping required (Burke 2004, 2006)

PARC 700 PARC features
Precision 88.31
Recall 86.38
32
Grammar and Lexicon Extraction Penn-II LFG

Lexical Resources
Lexical information extremely important in modern
lexicalised grammar formalisms
LFG, HPSG, CCG, TAG,
Lexicon development is time consuming and
extremely expensive
Rarely if ever complete
Familiar knowledge acquisition bottleneck
Treebank-based subcategorisation frame induction
(LFG semantic forms) from Penn-II and III
Parser-based induction from British National
Corpus (BNC)
Evaluation against COMLEX, OALD, Korhonens data
set

33
Grammar and Lexicon Extraction Penn-II LFG

Lexicon Construction
Manual vs. Automated

Our Approach
Subcat Frames not Predefined
Functional and/or Categorial Information
Parameterised for Prepositions and Particles
Active and Passive
Long Distance Dependencies
Conditional Probabilities

34
Grammar and Lexicon Extraction Penn-II LFG
35
Grammar and Lexicon Extraction Penn-II LFG
applyltSUBJ,OBLforgt winltSUBJ,OBJgt
36
Grammar and Lexicon Extraction Penn-II LFG
Lexicon extracted from Penn-II (ODonovan et al
2005)
37
Grammar and Lexicon Extraction Penn-II LFG
38
Grammar and Lexicon Extraction Penn-II LFG

Parsing-Based Subcat Frame Extraction (ODonovan
2006)
Treebank-based vs. parsing-based subcat frame
extraction
Parsed British National Corpus BNC (100 million
words) with our automatically induced LFGs
19 days on single machine 5 million words per
day
Subcat frame extraction for 10,000 verb lemmas
Evaluation against COMLEX and OALD
Evaluation against Korhonen (2002) gold standard
Our method is statistically significantly better
than Korhonen (2002)

39
Parsing Penn-II and LFG

Overview Parsing Architectures
Pipeline Integrated
Long-Distance Dependency (LDD) Resolution at
F-Structure
Evaluation Comparison with Hand-Crafted
Resources (XLE and RASP)
Comparison against Treebank-Based CCG and HPSG
Resources

40
Parsing Penn-II and LFG
41
Lexical-Functional Grammar (LFG)
42
Parsing Penn-II and LFG

Require
subcategorisation frames (ODonovan et al., 2004,
2005 ODonovan 2006)
functional uncertainty equations
Previous Example
claim(subj,comp), deny(subj,obj)
? topicrel ? comp obj (search along a path of
0 or more comps)

43
Parsing Penn-II and LFG

Subcat frames as above (ODonovan et al. 2004,
2005)
Functional Uncertainty equations
Automatically acquire finite approximations of
FU-equations
Extract paths between co-indexed material in
automatically generated f-structures from
sections 02-21 from Penn-II
26 TOPIC, 60 TOPICREL, 13 FOCUS path types
99.69 coverage of paths in WSJ Section 23
Each path type associated with a probability
LDD resolution ranked by Path x Subcat
probabilities (Cahill et al., 2004)

44
Parsing Penn-II and LFG

How do treebank-based constraint grammars compare
to deep hand-crafted grammars like XLE and RASP?
XLE (Riezler et al. 2002, Kaplan et al. 2004)
hand-crafted, wide-coverage, deep,
state-of-the-art English LFG and XLE parsing
system with log-linear-based probability models
for disambiguation
PARC 700 Dependency Bank gold standard (King et
al. 2003), Penn-II Section 23-based
RASP (Carroll and Briscoe 2002)
hand-crafted, wide-coverage, deep,
state-of-the-art English probabilistic
unification grammar and parsing system (RASP
Rapid Accurate Statistical Parsing)
CBS 500 Dependency Bank gold standard (Carroll,
Briscoe and Sanfillippo 1999), Susanne-based

45
Parsing Penn-II and LFG

(Bikel 2002) retrained to retain Penn-II
functional tags (-SBJ, -SBJ, -LOC,-TMP, -CLR,
-LGS, etc.)
Pipeline architecture
tag text ? Bikel retrained f-structure
annotation algorithm LDD resolution ?
f-structures ? automatic conversion ? evaluation
against XLE/RASP gold standards PARC-700/CBS-500
Dependency Banks

46
Parsing Penn-II and LFG

Systematic differences between f-structures and
PARC 700 and CBS 500 dependency representations
Automatic conversion of f-structures to PARC 700
/ CBS 500 -like structures (Burke et al. 2004,
Burke 2006, Cahill et al. 2008)
Evaluation software (Crouch et al. 2002) and
(Carroll and Briscoe 2002)
Approximate Randomisation Test (Noreen 1989) for
statistical significance

47
Parsing Penn-II and LFG

Result dependency f-scores (CL 2008 paper)
PARC 700 XLE vs. DCU-LFG
80.55 XLE
82.73 DCU-LFG (2.18)
CBS 500 RASP vs. DCU-LFG
76.57 RASP
80.23 DCU-LFG (3.66)
Results statistically significant at ? 95 level
(Noreen 1989)
Best result now against PARC 700 84.00 (3.45)
Charniak Reranker Grzegorz Penn-II
function-tag labeler

48
Parsing Penn-II and LFG
PARC 700 Evaluation
49
Parsing Penn-II and LFG
50
Parsing Penn-II and LFG
51
Parsing Penn-II and LFG
52
Parsing Penn-II and LFG
53
Parsing Penn-II and LFG
54
Evaluation against Treebank-Based CCG and HPSG

CCG Combinatory Categorial Grammar (Steedman
2000)
HPSG Head-Driven Phrase Structure Grammar
(Pollard Sag 1994)
Both constraint-based grammar formalisms
Treebank-based CCG resources (Hockenmaier
Steedman 2002, Hockenmaier 2003, Clark Curran
2004, )
Treebank-based HPSG resources (Miyao, Ninomiya
Tsujii 2003, Miyao Tsujii 2004, )
DepBank reannotated version of PARC 700
(Briscoe Carroll 2006) with CBS 500style GRs
RASP (version 2) (Briscoe Carroll 2006)

55
Evaluation against Treebank-Based CCG and HPSG

CCG
Small set of basic categories NP, N, PP, S
Complex categories VP S\NP Vi S\NP Vdi
(S\NP)/NP
Small set of combination rules
X/Y Y ? X
Y X\Y ? X
X/Y Y/Z ? X/Z

56
Evaluation against Treebank-Based CCG and HPSG

HPSG
Uniform representation typed feature structures
and inheritance
Sign PHON, SYNSEM, DTRS
Inheritance hierarchy
Principles (HEAD-FEATURE, VALENCE, )
Id-Schemata (HEAD-COMP, HEAD-MOD, )

57
Evaluation against Treebank-Based CCG and HPSG
58
Evaluation against Treebank-Based CCG and HPSG
59
Evaluation against Treebank-Based CCG and HPSG
60
Probability Models Penn-II LFG
61
Probability Models Penn-II LFG

Evaluation Results

62
Probability Models Penn-II LFG

Results are interesting as
Extensive treebank preprocessing (clean-up,
correction and restructuring) in CCG and (some
in) HPSG
none in LFG
Custom-designed parsers and sophisticated
(log-linear, max ent) parse selection probability
models in HPSG and CCG
Mix of off-the-shelf and custom designed
components, each with their own probability model
in early-disambiguation processing pipeline in
LFG, no proper overall probability model, but an
approximation at best
Still competitive results

63
Probability Models Penn-II LFG

Probability Models
Our approach does not constitute proper
probability model (Abney, 1996)
Why? Probability model leaks
Highest ranking parse tree may feature
f-structure equations that cannot be resolved
into f-structure
Probability associated with that parse tree is
lost
Doesnt happen often in practice (coverage gt99.5
on unseen data)
Research on appropriate discriminative,
log-linear or maximum entropy models is important
(Miyao and Tsujii, 2002) (Riezler et al. 2002)

64
Demo System

http//lfg-demo.computing.dcu.ie/lfgparser.html

65
Applications Generation

Applications Generation

66
Applications Generation

Research Question
Can we make the automatically induced LFG
resources reversible/bi-directional?
Can they be used for both (probabilistic) parsing
and generation?

67
Generation Penn-II LFG
68
Generation Penn-II LFG
69
Generation Penn-II LFG
70
Generation Penn-II LFG
71
Generation Penn-II LFG
72
Generation Penn-II LFG
73
Generation Penn-II LFG
74
Generation Penn-II LFG
Problem conditioning of generation rules on
purely local f-str features Solution I
generation grammar transformation (Cahill et al.
2006) Solution II history-based probabilistic
generation (Hogan et al. 2007, Cafferkey et al.
2007) condition generation rules on parent GF
75
Generation Penn-II LFG
76
Generation Penn-II LFG
77
Generation Penn-II LFG
78
Generation the Good, the Bad and the Ugly

Orig Supporters of the legislation view the bill
as an effort to add stability and certainty to
the airline-acquisition process , and to preserve
the safety and fitness of the industry .
Gen Supporters of the legislation view the bill
as an effort to add stability and certainty to
the airline-acquisition process , and to preserve
the safety and fitness of the industry.
Orig The upshot of the downshoot is that the A
's go into San Francisco 's Candlestick Park
tonight up two games to none in the best-of-seven
fest .
Gen The upshot of the downshoot is that the A 's
tonight go into San Francisco 's Candlestick Park
up two games to none in the best-of-seven fest .
Orig By this time , it was 430 a.m. in New York
, and Mr. Smith fielded a call from a New York
customer wanting an opinion on the British stock
market , which had been having troubles of its
own even before Friday 's New York market break .
Gen Mr. Smith fielded a call from New a customer
York wanting an opinion on the market British
stock which had been having troubles of its own
even before Friday 's New York market break by
this time and in New York , it was 430 a.m. .
Orig Only half the usual lunchtime crowd
gathered at the tony Corney Barrow wine bar on
Old Broad Street nearby .
Gen At wine tony Corney Barrow the bar on Old
Broad Street nearby gathered usual , lunchtime
only half the crowd , .

79
Generation Penn-II LFG
80
Generation Penn-II LFG
Problem conditioning of generation rules on
purely local f-str features Solution generation
grammar transformation (Cahill et al.
2006) Solution history-based probabilistic
generation (Hogan et al. 2007, Cafferkey et al.
2007) condition generation rules on parent GF
81
Generation the Good, the Bad and the Ugly

Orig By this time , it was 430 a.m. in New York
, and Mr. Smith fielded a call from a New York
customer wanting an opinion on the British stock
market , which had been having troubles of its
own even before Friday 's New York market break .
Gen Mr. Smith fielded a call from New a customer
York wanting an opinion on the market British
stock which had been having troubles of its own
even before Friday 's New York market break by
this time and in New York , it was 430 a.m. .
(Cahill et al. 2006) GGT
Gen By this time , in New York , it was 430
a.m. , and Mr. Smith fielded a call from New a
customer York , wanting an opinion on the market
British stock which had been having troubles of
its own even before Friday s New York market
break . (Hogan et al. 2007) HB
Gen By this time , in New York , it was 430
a.m. , and Mr. Smith fielded a call from a New
York customer , wanting an opinion on the market
British stock which had been having troubles of
its own even before Friday s New York market
break . (Hogan et al. 2007) HB MWU

82
Generation Chinese CTB2

CTB2 (Yuqing Guo - Toshiba China Beijing RD Lab)
(Cahill et al. 2006) out of the box
Training articles 1-270 (3,480 sentences)
Testing articles 301-325 (351 sentences)

83
Applications Machine Translation

Applications Machine Translation
Labelled Dependency-Based MT Evaluation (LaDEva)
Automatic Acquisition of Transfer Rules

84
Applications Machine Translation

Labelled-Dependency-Based MT Evaluation
Most automatic MT evaluation metrics (BLEU, NIST)
are string (n-gram) based.
They unfairly punish perfectly legitimate
syntactic and lexical variation
Yesterday John resigned.
John resigned yesterday.
Yesterday John quit.
Legitimate lexical variation throw in WordNet
synonyms into the string match
What about syntactic variation?

85
Applications Machine Translation

Idea use labelled dependencies for MT evaluation
Why dependencies abstract away from some
particulars of surface realisation
Adjunct placement, order of conjuncts in a
coordination, topicalisation, ...

86
Applications Machine Translation

Idea is intuitive
To make it happen you need a robust parser that
can parse MT output ?
Treebank-induced parsers parse anything !
How do we judge whether labelled dependency-based
method is better than string-based methods?
We compare (correlation) with human
judgement/evaluation performance
Why humans not fooled by legitimate syntactic
variation

87
Applications Machine Translation

Experiment use LDC Multiple Translation Chinese
(MTC) Parts 2 and 4 data
16,807 translation-reference human score segments
5,007 test, rest for training (weights etc.)
To make this work, we throw in
n-best parsing
WordNet synonyms
partial matching
training weights
etc

88
Applications Machine Translation
89
Applications Machine Translation
90
References (MT Eval)

Karolina Owczarzak, Yvette Graham and Josef van
Genabith Using F-structures in Machine
Translation Evaluation. In Proceedings of the
12th International Conference on Lexical
Functional Grammar, July 28-30, 2007, Stanford,
CA
Karolina Owczarzak, Josef van Genabith, and Andy
Way. Labelled Dependencies in Machine Translation
Evaluation. In Proceedings of ACL 2007 Workshop
on Statistical Machine Translation, pages
104-111, Prague, Czech Republic
Karolina Owczarzak, Josef van Genabith, and Andy
Way. Dependency-Based Automatic Evaluation for
Machine Translation. In Proceedings of HLT-NAACL
2007 Workshop on Syntax and Structure in
Statistical Translation. Rochester, NY.

91
References (Parsing)

Aoife Cahill, Michael Burke, Ruth O'Donovan,
Stefan Riezler, Josef van Genabith and Andy Way.
2008. Wide-Coverage Statistical Parsing Using
Automatic Dependency Structure Annotation.
Computational Linguistics, Volume 34, 1, MIT
Press, March 2008. (accepted for publication)
Joachim Wagner, Djamé Seddah, Jennifer Foster and
Josef van Genabith C-Structures and F-Structures
for the British National Corpus. In Proceedings
of the 12th International Conference on Lexical
Functional Grammar, July 28-30, 2007, Stanford,
CA
A. Cahill, M. Burke, R. O'Donovan, J. van
Genabith, and A. Way. Long-Distance Dependency
Resolution in Automatically Acquired
Wide-Coverage PCFG-Based LFG Approximations, In
Proceedings of the 42nd Annual Meeting of the
Association for Computational Linguistics
(ACL-04), July 21-26 2004, pages 320-327,
Barcelona, Spain, 2004
Cahill A, M. McCarthy, J. van Genabith and A.
Way. Parsing with PCFGs and Automatic F-Structure
Annotation, In M. Butt and T. Holloway-King
(eds.) Proceedings of the Seventh International
Conference on LFG CSLI Publications, Stanford,
CA., pp.76--95. 2002

92
References (Generation, Lex. Acq.)

Deirdre Hogan, Conor Cafferkey, Aoife Cahill and
Josef van Genabith, Exploiting Multi-Word Units
in History-Based Probabilistic Generation, in
Proceedings of the Joint Conference on Empirical
Methods in Natural Language Processing and
Natural Language Learning (EMNLP-CoNLL 2007),
Prague, Czech Republic. pp.267-276
A. Cahill and J. Van Genabith, Robust PCFG-Based
Generation using Automatically Acquired
LFG-Approximations, COLING/ACL 2006, Sydney,
Australia
R. O'Donovan, M. Burke, A. Cahill, J. van
Genabith and A. Way. Large-Scale Induction and
Evaluation of Lexical Resources from the Penn-II
and Penn-III Treebanks, Computational
Linguistics, 2005
R. O'Donovan, M. Burke, A. Cahill, J. van
Genabith, and A. Way. Large-Scale Induction and
Evaluation of Lexical Resources from the Penn-II
Treebank, In Proceedings of the 42nd Annual
Meeting of the Association for Computational
Linguistics (ACL-04), July 21-26 2004, pages
368-375, Barcelona, Spain, 2004