Linguistics 239E Week 9 - PowerPoint PPT Presentation

1 / 46

About This Presentation

Title:

Linguistics 239E Week 9

Description:

Linguistics 239E Week 9. Ron Kaplan and Tracy King. Generation. Evaluation and Testing ... allows dative case in input to become accusative ... – PowerPoint PPT presentation

Number of Views:23

Avg rating:3.0/5.0

Slides: 47

Provided by: Franci65

Category:

more less

Transcript and Presenter's Notes

Title: Linguistics 239E Week 9

1
Linguistics 239E Week 9
Generation Evaluation and Testing

Ron Kaplan and Tracy King

2
Issues from HW8

How to keep punctuation from being TOKENS
FRAGMENT --gt PUNCT
NP _at_FIRST-EQ
S _at_FIRST-EQ
TOKEN
_at_FIRST-EQ PUNCT
PUNCT (
REST).

3
Sample c-structure
4
Sample F-structure
5
Generation

Parsing string to analysis
Generation analysis to string
What type of input?
How to generate

6
Why generate?

Machine translation
Lang1 string -gt Lang1 fstr -gt Lang2 fstr -gt Lang2
string
Sentence condensation
Long string -gt fstr -gt smaller fstr -gt new string
Question answering
Production of NL reports
State of machine or process
Explanation of logical deduction
Grammar debugging

7
F-structures as input

Use f-structures as input to the generator
May parse sentences that shouldnt be generated
May want to constrain number of generated options
Input f-structure may be underspecified

8
XLE generator

Use the same grammar for parsing and generation
Advantages
maintainability
write rules and lexicons once
But
special generation tokenizer
different OT ranking

9
Generation tokenizer

White space
Parsing multiple white space becomes a single TB
John appears. -gt John TB appears TB . TB
Generation single TB becomes a single space (or
nothing)
John TB appears TB . TB -gt John appears.
John appears .

10
Generation tokenizer

Capitalization
Parsing optionally decap initially
They came -gt they came
Mary came -gt Mary came
Generation always capitalize initially
they came -gt They came
they came
May regularize other options
quotes, dashes, etc.

11
Generation morphology

Suppress variant forms
Parse both favor and favour
Generate only one

12
Morphconfig for parsing generation

STANDARD ENGLISH MOPRHOLOGY (1.0)
TOKENIZE
P!eng.tok.parse.fst G!eng.tok.gen.fst
ANALYZE
eng.infl-morph.fst G!amerbritfilter.fst
G!amergen.fst
----

13
Reversing the parsing grammar

The parsing grammar can be used directly as a
generator
Adapt the grammar with a special OT ranking
GENOPTIMALITYORDER
Why do this?
parse ungrammatical input
have too many options

14
Ungrammatical input

Linguistically ungrammatical
They walks.
They ate banana.
Stylistically ungrammatical
No ending punctuation They appear
Superfluous commas John, and Mary appear.
Shallow markup NP John and Mary appear.

15
Too many options

All the generated options can be linguistically
valid, but too many for applications
Occurs when more than one string has the same,
legitimate f-structure
PP placement
In the morning I left. I left in the morning.

16
Using the Gen OT ranking

Generally much simpler than in the parsing
direction
Usually only use standard marks and NOGOOD
no marks, no STOPPOINT
Can have a few marks that are shared by several
constructions
one or two for disprefered
one or two for prefered

17
Example Comma in coord

COORD(_CAT) _CAT _at_CONJUNCT
(COMMA _at_(OTMARK
GenBadPunct))
CONJ
_CAT _at_CONJUNCT.
GENOPTIMALITYORDER GenBadPunct NOGOOD.
parse They appear, and disappear.
generate without OT They appear(,) and
disappear.
with OT They appear and
disappear.

18
Example Prefer initial PP

S --gt (PP _at_ADJUNCT _at_(OT-MARK GenGood))
NP _at_SUBJ
VP.
VP --gt V
(NP _at_OBJ)
(PP _at_ADJUNCT).
GENOPTIMALITYORDER NOGOOD GenGood.
parse they appear in the morning.
generate without OT In the morning they appear.
They appear
in the morning.
with OT In the morning they
appear.

19
Generation commands

XLE command line
regenerate "They appear."
generate-from-file my-file.pl
(regenerate-from-directory, regenerate-testfile)
F-structure window
commands generate from this fs
Debugging commands
regenerate-morphemes

20
Debugging the generator

When generating from an f-structure produced by
the same grammar, XLE should always generate
Unless
OT marks block the only possible string
something is wrong with the tokenizer/morphology
regenerate-morphemes if this gets a
string
the tokenizer/morphology is not the
problem
Very hard to debug newest XLE has robustness
features to help

21
Underspecified Input

F-structures provided by applications are not
perfect
may be missing features
may have extra features
may simply not match the grammar coverage
Missing and extra features are often systematic
specify in XLE which features can be added and
deleted
Not matching the grammar is a more serious problem

22
Adding features

English to French translation
English nouns have no gender
French nouns need gender
Soln have XLE add gender
the French morphology will control
the value
Specify additions in xlerc
set-gen-adds add "GEND"
can add multiple features
set-gen-adds add "GEND CASE PCASE"
XLE will optionally insert the feature

Note Unconstrained additions make generation
undecidable
23
Example
The cat sleeps. -gt Le chat dort.

PRED 'dormirltSUBJgt'
SUBJ PRED 'chat'
NUM sg
SPEC def
TENSE present

PRED 'dormirltSUBJgt' SUBJ PRED 'chat'
NUM sg GEND masc
SPEC def TENSE present
24
Deleting features

French to English translation
delete the GEND feature
Specify deletions in xlerc
set-gen-adds remove "GEND"
can remove multiple features
set-gen-adds remove "GEND CASE PCASE"
XLE obligatorily removes the features
no GEND feature will remain in the f-structure
if a feature takes an f-structure value, that
f-structure is also removed

25
Changing values

If values of a feature do not match between the
input f-structure and the grammar
delete the feature and then add it
Example case assignment in translation
set-gen-adds remove "CASE"
set-gen-adds add "CASE"
allows dative case in input to become accusative
e.g., exceptional case marking verb in input
language but regular case in output language

26
Creating Paradigms

Deleting and adding features within one grammar
can produce paradigms
Specifiers
set-gen-adds remove "SPEC"
set-gen-adds add "SPEC DET DEMON"
regenerate "NP boys"
the those these boys

27
Generation for Debugging

Checking for grammar and lexicon errors
create-generator english.lfg
reports ill-formed rules, templates, feature
declarations, lexical entries
Checking for ill-formed sentences that can be
parsed
parse a sentence
see if all the results are legitimate strings
regenerate they appear.

28
Regeneration example

regenerate "In the park they often see the boy
with the telescope."
parsing In the park they often see the boy with
the telescope.
4 solutions, 0.39 CPU seconds, 178 subtrees
unified
They see the boy in the parkIn the park they
see the boy often with the telescope.
regeneration took 0.87 CPU seconds.

29
Regenerate testfile

regenerate-testfile
produces new file testfile.regen
sentences with parses and generated strings
lists sentences with no strings
if have no Gen OT marks, everything should
generate back to itself

30
Testing and Evaluation

Need to know
Does the grammar do what you think it should?
cover the constructions
still cover them after changes
not get spurious parses
not cover ungrammatical input
How good is it?
relative to a ground truth/gold standard
for a given application

31
Testsuites

XLE can parse and generate from testsuites
parse-testfile
regenerate-testfile
Issues
where to get the testsuites
how to know if the parse the grammar got is the
one that was intended

32
Basic testsuites

Set of sentences separated by blank lines
can specify category
NP the children who I see
can specify expected number of results
They saw her duck. (2! 0 0 0)
parse-testfile produces
xxx.new sentences plus new parse statistics
of parses time complexity
xxx.stats new parse statistics without the
sentences
xxx.errors changes in the statistics from
previous run

33
Testsuite examples

LEXICON _'s
ROOT He's leaving. (11 0.10 55)
ROOT It's broken. (21 0.11 59)
ROOT He's left. (31 0.12 92)
ROOT He's a teacher. (11 0.13 57)
RULE CPwh
ROOT Which book have you read? (14 0.15 123)
ROOT How does he be? (0! 0 0.08 0)
RULE NOMINALARGS
NP the money that they gave him (1 0.10 82)

34
.errors file
ROOT They left, then they arrived. (22 0.17
110) MISMATCH ON 339 (22 -gt 12) ROOT Is
important that he comes. (0! 0 0.15 316) ERROR
AND MISMATCH ON 784 (0! 0 -gt 1119)
35
.stats file
((1901) (11 0.21 72) -gt (11 0.21 72) (5
words)) ((1902) (11 0.10 82) -gt (11 0.12 82) (6
words)) ((1903) (1 0.04 15) -gt (1 0.04 15) (1
word)) XLE release of Feb 26, 2004
1129. Grammar /tilde/thking/pargram/english/sta
ndard/english.lfg. Grammar last modified on Feb
27, 2004 1358. 1903 sentences, 38 errors, 108
mismatches 0 sentences had 0 parses (added 0,
removed 56) 38 sentences with 0! 38 sentences
with 0! have solutions (added 29, removed 0) 57
starred sentences (added 57, removed 0) timeout
100 max_new_events_per_graph_when_skimming
500 maximum scratch storage per sentence 26.28
MB (642) maximum event count per sentence
1276360 average event count per graph 217.37
36
.stats file cont.
293.75 CPU secs total, 1.79 CPU secs max new
time/old time 1.23 elapsed time 337
seconds biggest increase 1.16 sec (677 1.63
sec) biggest decrease 0.64 sec (1386 0.54
sec) range parsed failed words seconds
subtrees optimal suboptimal 1-10 1844
0 4.25 0.14 80.73 1.44
2.49E01 11-20 59 0 11.98 0.54
497.12 10.41 2.05E04 all 1903
0 4.49 0.15 93.64 1.72
6.60E02 0.71 of the variance in seconds is
explained by the number of subtrees
37
Is it the right parse?

Use shallow markup to constrain possibilities
bracketing of desired constituents
POS tags
Compare resulting structure to a previously
banked one (perhaps a skeletal one)
significant amount of work if done by hand
bank f-structures from the grammar if good enough
reduce work by using partial structures
(e.g., just predicate argument structure)

38
Where to get the testsuite?

Basic coverage
create testsuite when writing the grammar
publically available testsuites
extract examples from the grammar comments
"COMEX NP-RULE NP the flimsy boxes"
examples specific enough to test one construction
at a time
Interactions
real world text necessary
may need to clean up the text somewhat

39
Evaluation

How good is the grammar?
Absolute scale
need a gold standard to compare against
Relative scale
comparing against other systems
For an application
some applications are more error tolerant than
others

40
Gold standards

Representation of the perfect parse for the
sentence
can bootstrap with a grammar for efficiency and
consistency
hand checking and correction
Determine how close the grammar's output is to
the gold standard
may have to do systematic mappings
may only care about certain relations

41
PARC700

700 sentences randomly chosen from section23 of
the UPenn WSJ corpus
How created
parsed with the grammar
saved the best parse
converted format to "triples"
hand corrected the output
Issues
very time consuming process
difficult to maintain consistency even with
bootstrapping and error checking tools

42
Sample triple from PARC700
sentence( id(wsj_2356.19, parc_23.34)
date(2002.6.12) validators(T.H. King, J.-P.
Marcotte) sentence_form(The device was
replaced.) structure( mood(replace0,
indicative) passive(replace0, )
stmt_type(replace0, declarative)
subj(replace0, device1) tense(replace0,
past) vtype(replace0, main)
det_form(device1, the) det_type(device1,
def) num(device1, sg) pers(device1, 3)))
43
Evaluation against PARC700