Ambiguity Management in Deep Grammar Engineering - PowerPoint PPT Presentation

1 / 44

About This Presentation

Title:

Ambiguity Management in Deep Grammar Engineering

Description:

Ambiguity Management in Deep Grammar Engineering Tracy Holloway King – PowerPoint PPT presentation

Number of Views:157

Avg rating:3.0/5.0

Slides: 45

Provided by: Franci451

Learn more at: http://depts.washington.edu

Category:

more less

Transcript and Presenter's Notes

Title: Ambiguity Management in Deep Grammar Engineering

1
Ambiguity Management in Deep Grammar Engineering

Tracy Holloway King

2
Ambiguity bug or feature?

Bug in computer programming languages
Feature in natural language
People good at resolving ambiguity in context
Ambiguity consequently often unperceived
Readjust paper holding clip
even though thousand-fold ambiguities are
common
Ambiguity promotes conciseness
Computers cant resolve ambiguity like humans
If we are going to build large-scale,
linguistically sophisticated grammars, we need
ways to handle ambiguity

3
Talk Outline

Sources of ambiguity
Grammar engineering approaches
Shallow markup
(Dis)preference marks
Stochastic disambiguation
Efficiency in ambiguity management

4
Sources of Ambiguity

Phonetic
I scream or ice cream
Tokenization
I like Jan. --- Jan. Or Jan.. (abbrev
January)
Morphological
walks --- plural noun or 3sg verb
untieable knot --- un(tieable) or (untie)able
Lexical
bank --- river bank or financial institution
Syntactic
The turkeys are ready to eat. --- fattened or
hungry
Semantic
Two boys ate fifteen pizzas. --- 15 each or 15
total
Pragmatic
Sue won. Ed gave her a good luck charm. ---
cause or result

5
PP AttachmentA classic example of syntactic
ambiguity

PP adjuncts can attach to VPs and NPs
Strings of PPs in the VP are ambiguous
I see the girl with the telescope.
I see the girl with the telescope.
I see the girl with the telescope.
Ambiguities proliferate exponentially
I see the girl with the telescope in the parkI
see the girl with the telescope in the parkI
see the girl with the telescope in the parkI
see the girl with the telescope in the parkI
see the girl with the telescope in the parkI
see the girl with the telescope in the park
The syntax has no way to determine the
attachment, even if humans can.

6
Coverage entails ambiguity

I fell in the park.
I know the girl in the park.
I see the girl in the park.

7
Ambiguity can be explosive

If alternatives multiply within or across
components

Semantics
Discourse
Tokenize
Morphology
Syntax
8
Ambiguity figures

Deep grammars are massively ambiguous
Example 700 from section 23 of WSJ
average of words 19.6
average of optimal parses 684
for 1-10 word sentences 3.8
for 11-20 word sentences 25.2
for 50-60 word sentences 12,888

9
Managing Ambiguity

Grammar engineering approaches
Trim early with shallow markup
(Dis)preference marks on rules
Choose most probable parse for applications that
need a single input
Use packing to parse and manipulate the
ambiguities efficiently

10
Talk Outline

Sources of ambiguity
Grammar engineering approaches
Shallow markup
(Dis)preference marks
Stochastic disambiguation
Efficiency in ambiguity management

11
Shallow markup

Part of speech marking as filter
I saw her duck/VB.
accuracy of tagger (v. good for English)
can use partial tagging (verbs and nouns)
Named entities
ltcompanygtGoldman, Sachs Co.lt/companygt bought
IBM.
good for proper names and times
hard to parse internal structure
Fall back technique if fail
slows parsing
accuracy vs. speed

12
Example shallow markup Named entities

Allow tokenizer to accept marked up input
parse ltpersongtMr. Thejskt
Thejslt/persongt arrived.
tokenized string
Mr. Thejskt Thejs TB NEperson Mr(TB). TB
Thejskt TB Thejs

Add lexical entries and rules for NE tags

13
Resulting C-structure
14
Resulting F-structure
15
Results for shallow markup
Full/All Full parses Optimalsolns Best F-sc Time
Unmarked 76 482/1753 82/79 65/100
?Named ent 78 263/1477 86/84 60/91
POS tag 62 248/1916 76/72 40/48
Kaplan and King 2003
16
(Dis)preference marks (OT marks)

Want to (dis)prefer certain constructions
prefer use when possible
disprefer do not use unless no other analysis
Implementation
Put marks in rules and lexical entries
Rank those marks
ranking can be different for different
grammars/corpora
Use most prefered parse(s)
can use as a two pass system for robust parsing

17
Ungrammatical input

Real world text contains ungrammatical input
Deep grammars tend to only cover grammatical
output
Common errors can be coded in the rules
may want to know that error occurred
(e.g., provide feedback in CALL grammars)
Disprefer parses of ungrammatical structures
tools for grammar writer to rank rules
two pass system
standard rules
rules for known ungrammatical constructions
default fall back rules

18
Sample ungrammatical structures

Mismatched subject-verb agreement
Verb3Sg SUBJ PERS 3
SUBJ NUM sg
BadVAgr
Missing copula
VPcop gt Vcop !
e (
PRED)'NullBelt( SUBJ)(XCOMP)gt'
MissingCopularVerb
NP ( XCOMP)!
AP ( XCOMP)!

19
Dispreferred grammatical structures

Prefer subcategorized infinitives to adverbials
I want it. I finished up (in order) to
leave.
I want it to leave.
VP --gt V
(NP ( OBJ)!)
(VPinf ( XCOMP)! InfSubcat
! ( ADJUNCT)
InfAdjunct ).
Post-copular gerunds
He is a boy. (His) going is difficult.
He is going.

20
OT Mark summary

Use (dis)preference marks to (dis)prefer
constructions or words
Allows inclusion of marginal/ungrammatical
constructions
Issues
Only works with ambiguities with known
preferences (not PP attachment)
Hard to determine ranking for many marks
Two-pass parsing can be slow

21
Talk Outline

Sources of ambiguity
Grammar engineering approaches
Shallow markup
(Dis)preference marks
Stochastic disambiguation
Efficiency in ambiguity management

22
Packing Pruning in XLE

XLE produces (too) many candidates
All valid (with respect to grammar and OT marks)
Not all equally likely
Some applications require a single best parse
or at most just a handful (n best)
Grammar writer cant specify correct choices
Many implicit properties of words and structures
with unclear significance

23
Pruning in XLE

Appeal to probability model to choose best parse
Assume previous experience is a good guide for
future decisions
Collect corpus of training sentences, build
probability model that optimizes for previous
good results
partially labelled training data is ok
NP-SBJ They see NP-OBJ the girl with the
telescope
Apply model to choose best analysis of new
sentences
efficient (XLE English grammar 5 of parse time)

24
Exponential models are appropriate(aka Maximum
Entropy or Log-linear models)

Assign probabilities to representations, not to
choices in a derivation
No independence assumption
Arithmetic combined with human insight
Human
Define properties of representations that may be
relevant
Based on any computable configuration of
features, trees
Arithmetic
Train to figure out the weight of each property

25
Properties employed in WSJ Experiment

800 property-functions
c-structure nodes and subtrees
recursively embedded phrases
f-structure attributes (grammatical functions)
atomic attribute-value pairs
left/right branching
(non)parallelism in coordination
lexical elements (subcategorization frames)
Some end up with no discrimination power after
training

26
Stochastic Disambiguation Summary

Training
Define a set of features by hand
Train on partially labelled data
Can train on low-ambiguity data
Use
Choose just one structure for applications that
want just one
XLE displays most probable first
5 of parse time to disambiguate
30 gain in F-score

27
Talk Outline

Sources of ambiguity
Grammar engineering approaches
Shallow markup
(Dis)preference marks
Stochastic disambiguation
Efficiency in ambiguity management

28
Computational consequences of ambiguity

Serious problem for computational systems
Broad coverage, hand written grammars frequently
produce thousands of analyses, sometimes millions
Machine learned grammars easily produce hundreds
of thousands of analyses if allowed to parse to
completion
Three approaches to ambiguity management
Pruning block unlikely analysis paths early
Procrastination do not expand analysis paths
that will lead to ambiguity explosion until
something else requires them
Also known as underspecification
Packing compact representation and computation
of all possible analyses

29
The Problem with Pruning
premature disambiguation

The conventional approach Use heuristics to
prune as soon as possible

X
X
X
Tokenize
Morphology
Syntax
Semantics
Discourse
X
Fast computation, wrong result
30
The problem with procrastination
passing the buck

Chunk parsing as an example
Collect noun groups, verb groups, PP groups
Leave it to later processing to figure out the
correct way of putting these together
Not all combinations are grammatically acceptable
Later processing must either
Call parser to check grammatical constraints
Have its own model of grammatical constraints
In the best case, solve a set of constraints the
partial parser includes with its output

31
The Problem with Packing

There may be too many analyses to pack
efficiently
A major problem for relatively unconstrained
machine induced grammars
Grammars overgenerate massively
Statistics used to prune out unlikely
sub-analyses
Less of a problem for carefully hand-coded broad
coverage grammars

32
Packing

Explosion of ambiguity results from a small
number of sub-analyses combining in different
ways to produce a large number of total analyses
(e.g. PP attachment)
Compute and represent each sub-analysis just once
Compute a factored representation of how these
sub-analyses combine

33
Generalizing Free Choice Packing
34
Dependent choices
35
Solution Label dependent choices

Label each choice with distinct Boolean
variables p, q, etc.
Record acceptable combinations as a Boolean
expression ?
Each analysis corresponds to a satisfying
truth-value assignment
(a line from ?s truth table that
assigns it true)

36
The Free Choice Gamble

Worst case, where everything interacts
As many choice variables as there are readings
Packing blows up, and becomes exponential
Best case, no interactions
N completely independent choices represent 2N
readings
Language interactions mostly limited local
Tends towards the best case
Free choice packing pays off for linguistic
analysis

37
Conclusions

Ambiguity has to be dealt with
Deep grammars use a variety of approaches
preprocessing
grammar engineering
stochastic disambiguation
Why use deep grammars if they are so ambiguous?

38
Deep analysis matters if you care about
the answer

Example
A delegation led by Vice President Philips, head
of the chemical division, flew to Chicago a
week after the incident.
Question Who flew to Chicago?
Candidate answers
division closest noun
head next closest
V.P. Philips next

39
Applications of Language Engineering
Shallow
Synthesis
Broad
Domain Coverage
Narrow
Deep
Low
High
Functionality
40
(No Transcript)
41
What to do with them?

Define yes-no / 1-0 features, f, that seem
important
Training determines weights on these features, ?,
to reflect their actual importance
Select parse x count occurrences of features
(0,1) and multiply by corresponding weights,
?.f(x)
Convert weighted feature counts to probabilities

42
Issues in Stochastic Disambiguation

What kind of probability model?
What kind of training data?
Efficiency of training, efficiency of
disambiguation?
Benefit vs. random choice of parse

43
Advantages of Free Choice Packing

Avoids procrastination
Nogoods are constraints that parser sends to
other component
Eliminating nogoods other components dont do
parsers work
Independence between choicesAllows processing
relying on independence assumptions
Counting number of readings
Apparently trivial but of crucial importance,
since statistical modelling requires the ability
to count
Hence, statistical processing
A general mechanism extending beyond parsing