DataOriented Parsing

About This Presentation

Title:

DataOriented Parsing

Description:

Data-Oriented Parsing. Remko Scha. Institute for Logic, Language and Computation ... live on this paradoxical slope to which it is doomed by the evanescence of its ... – PowerPoint PPT presentation

Number of Views:121

Avg rating:3.0/5.0

Slides: 79

Provided by: remko9

Category:

more less

Transcript and Presenter's Notes

Title: DataOriented Parsing

1
Data-Oriented Parsing

Remko Scha
Institute for Logic, Language and Computation
University of Amsterdam

Overview
The Big Picture (cognitive motivation)
A simple Data-Oriented Parsing model
Extended DOP models
Psycholinguistics revisited
Statistical considerations

Data-Oriented Parsing
The Big Picture

Data-Oriented Parsing
The Big Picture
(1) The key to understanding cognition is
understanding perception.

Data-Oriented Parsing
The Big Picture
(1) The key to understanding cognition is
understanding visual Gestalt perception.

Data-Oriented Parsing
The Big Picture
(1) The key to understanding cognition is
understanding visual Gestalt perception.
Conjecture Language processing and "thinking"
involve a metaphorical use of our Gestalt
perception capability.
R. Scha "Wat is het medium van het denken?" In
M.B. In 't Veld R. de Groot Beelddenken en
begripsdenken een paradox? Utrecht Agiel, 2005.

Data-Oriented Parsing
The Big Picture
(1) The key to understanding cognition is
understanding visual Gestalt perception.
(2) All perceptual processes are based on
detecting similarities and analogies with
concrete past experiences.

8
The Data-Oriented World View

All interpretive processes are based on detecting
similarities and analogies with concrete past
experiences.
E.g.
Visual Perception
Music Perception
Lexical Semantics
Concept Formation.

9
E.g. The Data-Oriented Perspective on Lexical
Semantics and Concept Formation.

A concept the extensional set of its
previously experienced instances.
Classifying new input under an existing concept
judging the input's similarity to these
instances.
Against
Explicit definitions
Prototypes

10
The Data-Oriented Perspective on Lexical
Semantics and Concept Formation.

A concept the extensional set of its
previously experienced instances.
Classifying new input under an existing concept
judging the input's similarity to these
instances.
Against
Explicit definitions
Prototypes
Learning

Part II
Data-Oriented Parsing

Data-Oriented Parsing
Processing new input utterances in terms of their
similarities and analogies with previously
experienced utterances.

13
Language processing by analogy Was proposed
already by "Bloomfield, Hockett, Paul, Saussure,
Jespersen, and many others". But "To attribute
the creative aspect of language use to 'analogy'
or 'grammatical patterns' is to use these terms
in a completely metaphorical way, with no clear
sense and with no relation to the technical
usage of linguistic theory." (Chomsky
1966)
14
Challenge To work out a formally precise
notion of "language processing by analogy".
15
Challenge To work out a formally precise
notion of "language processing by analogy". A
first step Data-Oriented Parsing Remember all
utterances with their syntactic tree-structures.
Analyse new input by recombining fragments of
these tree structures.
16
Data-Oriented Parsing

Memory-based approach to syntactic parsing and
disambiguation.
Basic idea use the subtrees from a syntactically
annotated corpus directly as a stochastic
grammar.

17
Data-Oriented Parsing (DOP)

Simplest version DOP1 (Bod, 1992).
Annotated corpus defines Stochastic Tree
Substitution Grammar

18
Data-Oriented Parsing (DOP)

Simplest version DOP1 (Bod 1992).
Annotated corpus defines Stochastic Tree
Substitution Grammar
(Slides adapted from Guy De Pauw,
University of Antwerp)

19
(No Transcript)
20
(No Transcript)
21
Fragment Collection
22
Generating "Peter killed the bear."
Note one parse has many derivations!
23
An annotated corpus defines a Stochastic Tree
Substitution Grammar

Probability of a Derivation
Product of the Probabilities of the Subtrees

24
An annotated corpus defines a Stochastic Tree
Substitution Grammar

Probability of a Derivation
Product of the Probabilities of the Subtrees
Probability of a Parse
Sum of the Probabilities of its Derivations

25

Example derivation for "Van Utrecht naar
Leiden."
26
Probability of substituting a subtree ti on a
node the number of occurrences of a subtree
ti, divided by the total number of occurrences
of subtrees t with the same root node label as ti
(ti) / (t root(t) root(ti)
) Probability of a derivation t1... tn the
product of the probabilities of the substitutions
that it involves Pi (ti) / (t root(t)
root(ti) ) Probability of a parse-tree the
sum of the probabilities of all derivations of
that parse-tree Si Pj (tij) / (t
root(t) root(tij) )
27
An annotated corpus defines a Stochastic Tree
Substitution Grammar

Probability of a Derivation
Product of the Probabilities of the Subtrees
Probability of a Parse
Sum of the Probabilities of its Derivations
Disambiguation Choose the Most Probable
Parse-tree

28
An annotated corpus defines a Stochastic Tree
Substitution Grammar

Q. Does this work?

29
An annotated corpus defines a Stochastic Tree
Substitution Grammar

Q. Does this work?
A. Yes. Experiments on a small fragment of the
ATIS corpus gave very good results. (Bod's
dissertation, 1995.)

30
An annotated corpus defines a Stochastic Tree
Substitution Grammar

Q. Do we really need all fragments?

31
An annotated corpus defines a Stochastic Tree
Substitution Grammar

Q. Do we really need all fragments?
A. Experiments on the ATIS corpus

32
Experiments on a small subset of the ATIS
corpus max words ? 1 2 3 4
6 8 unlimited max tree-depth ? 1 47
47 2 65 68 68 68 3 74
76 79 79 79 79 79 4 75 79
81 83 83 83 83 5 77 80
83 83 83 85 84 6 75 80
83 83 83 87 84 Parse accuracy (in
) as a function of the maximum number of
lexical items and the maximum tree-depth of the
fragments.
33
Beyond DOP1

Computational issues
Linguistic issues
Psycholinguistic issues
Statistical issues

34
Computational issues Part 1 the good news

TSG parsing can be based on the techniques of
CFG-parsing, and inherits some of their
properties.
Semi-ring algorithms are applicable for many
useful purposes

35
Computational issues Part 1 the good news

Semi-ring algorithms are applicable for many
useful purposes. In O(n3) of sentence-length, we
can
Build a parse-forest.
Compute the Most Probable Derivation.
Select a random parse.
Compute a Monte-Carlo estimation of the Most
Probable Parse.

36
Computational issues Part 2 the bad news

Computing the Most Probable Parse is NP-complete
(Sima'an). (Not a semi-ring algorithm.)
The grammar gets very large.

37
Computational issuesPart 3 Solutions

Non-probabilistic DOP Choose the shortest
derivation. (De Pauw, 1997 more recently, good
results by Bod on WSJ corpus.)
Compress the fragment-set. (Use Minimum
Description Length. Van der Werff, 2004.)
Rig the probability assignments so that the Most
Probable Derivation becomes applicable.

Linguistic issues

More powerful models
Kaplan Bod LFG-DOP (Based on
Lexical-Functional Grammar)
Hoogweg TIG-DOP (Based on Tree-Insertion
Grammar cf. Tree-Adjoining Grammar)
Sima'an The Tree-Gram Model (Markov-processes
on sister-nodes, conditioned on lexical heads)

Linguistic issues Future work

Linguistic issues Future work
Scha (1990), about an imagined future DOP
algorithm
It will be especially interesting to find out how
such an algorithm can deal with complex syntactic
phenomena such as "long distance movement". It is
quite possible that an optimal matching algorithm
does not operate exclusively on constructions
which occur explicitly in the surface-structure
perhaps "transformations" (in the classical
Chomskyan sense) play a role in the parsing
process.

Transformations
"John likes Mary."
"Mary is liked by John."
"Does John like Mary?"
"Who does John like?"
"Who do you think John likes?"
"Mary is the girl I think John likes."

Transformations
Wh-movement, Passivization, Topicalization,
Fronting, Scrambling, . . .?
Move-Alfa?

44
Psycholinguistics Revisited
45
Psycholinguistic Considerations

DOP is a performance model
DOP defines syntactic probabilities of sentences
and their analyses
(against the background of a weak, overgenerating
competence grammar the definition of all
formally possible sentence annotations).

46
Psycholinguistic Considerations

Does DOP account for performance phenomena?

47
Psycholinguistic Considerations

Probabilistic Disambiguation
Psychological experiments consistently show that
disambiguation preferences correlate with
occurrence frequencies.

48
Psycholinguistic Considerations

The "Garden Path" Phenomenon
"The horse raced past the barn "

49
Psycholinguistic Considerations

The "Garden Path" Phenomenon
"The horse raced past the barn fell."

50
Psycholinguistic Considerations

The "Garden Path" Phenomenon
"The horse raced past the barn fell."
Plausible model Incremental version of DOP
Analysis with very high probability kills
analyses with low probability.

51
Psycholinguistic Considerations

Utterance Generation
Cf. Kempen et al. (Leyden University)
(Non-probabilistic) generation mechanism which
combines tree fragments at random.

52
Psycholinguistic Considerations

Grammaticality Judgements
Cf. Stich Priming of Grammaticality Judgements.
Plausible model DOP with "recency effect".

53
Psycholinguistic Considerations

Integration with semantics
Cf. "Compositional Semantics" (Montague).
Assume semantically annotated corpus.Cf. Van den
Berg et al.
Factoring in the probabilities of semantic
subcategories Cf. Bonnema.

54
Psycholinguistic Considerations

Language dynamics
Grammar as an "emergent phenomenon" its
development to be explained in terms of
underlying, more detailed, possibly
incommensurable phenomena.

55
Psycholinguistic Considerations

Dynamics
E.g. Physics
Thermodynamics Describes the relations between
temperature, pressure, volume and entropy (in
equilibrium situations).
Statistical thermodynamics explains this in terms
of movements of molecules. (And movements of
molecules also account for non-equilibrium
situations.)
E.g. Biology
Theory of Evolution

"Doesn't every science live on this paradoxical
slope to which it is doomed by the evanescence of
its object in the very process of its
apprehension, and by the pitiless reversal this
dead object exerts on it?"
Baudrillard, 1983

57
Psycholinguistic Considerations

Language Acquisition
Q. How does a child get its first corpus?

58
Psycholinguistic Considerations

Language Acquisition
Q. How does a child get its first corpus?
A. By bootstrapping pragmatic/semantic
structures.

59
Psycholinguistic Considerations

Language Acquisition
Rule-based models which bootstrap the syntactic
structures from perceived semantic relations
Suggested by Schlesinger (1971, 1975)
Implemented by Chang Maia (2001)
Data-oriented version of this
Described by De Kreek (2003)

60
Psycholinguistic Considerations

Language Change
The data-oriented approach allows for gradual
changes in parsing and generation preferences.
It allows language change within a lifetime.
(Language change does not depend on
misunderstandings between successive generations.)

61
Psychological Considerations

Perception Revisited
How to generalize DOP to visual and musical
perception?

62
Psychological Considerations

Perception Revisited
How to generalize DOP to visual and musical
perception?
How to represent visual and musical Gestalts in a
formal way?
How to generalize DOP to arbitrary algebras?

63
Data-Oriented ParsingStatistical Issues
64

Statistical problems
DOP1 Relative Frequency Estimation on the
fragment set.
Bonnema et al. (1999)
The DOP1 estimator has strange properties The
largest trees in the corpus completely dominate
the statistics.
Maximum Likelihood Estimation is not a viable
alternative MLE completely overfits the corpus.

65
In DOP1, the largest trees in the corpus
completely dominate the statistics.
The above treebank contains 7 fragments with root
label S, each with probability 1/7. For the
input string 'ab', parse (a) will thus receive
probability 3/7 parse (d) will receive
probability 4/7.
66
In DOP1, the largest trees in the corpus
completely dominate the statistics.
Assume the above treebank, with equiprobable
initial rules S ? X and S ? A. Input string
'ab' will be analysed as a constituent of
category X, because of the relative improbability
of the fragments from (b)
67
In DOP1, the largest trees in the corpus
completely dominate the statistics.
Assume a treebank with 999 binary trees of depth
five and 1 tree of depth six. Now 99.8 of the
probability mass will go to fragments from the
only tree of depth six.
68
In DOP1, the largest trees in the corpus
completely dominate the statistics.

"Solution"
Heuristic constraints on tree-depth and number of
terminals and non-terminals. E.g., Sima'an
(1999)
Maximum of substitution sites (leaf
non-terminals) 2.
Maximum of lexical items 9.
Maximum of consecutive lexical items 3.
Maximum tree-depth 4.

69
In DOP1, the largest trees in the corpus
completely dominate the statistics.Needed a
different estimator.

Not a solution Maximum Likelihood Estimation.
MLE completely overfits the corpus The DOP
grammar which maximizes the chance of generating
the treebank assigns the following probabilities
to every full corpus tree its relative frequency
in the corpus
to every other fragment zero

(Bonnema Scha, 2003)
70
In DOP1, the largest trees in the corpus
completely dominate the statistics.Needed a
different estimator.

Bonnema et al. (1999) Treat every full
corpus-tree as the representation of a set
derivations.
If we assume a uniform probability distribution
over this set of derivations, we arrive at the
following "weighed relative frequency estimate".
A fragment ? with N(?) non-root non-terminal
nodes receives probability
P(?) 2N(?) F(?)

71
In DOP1, the largest trees in the corpus
completely dominate the statistics.Needed a
different estimator.

Bonnema et al. (1999) Treat every full
corpus-tree as the representation of a set
derivations.
If we assume a uniform probability distribution
over this set of derivations, we arrive at the
following "weighed relative frequency estimate".
A fragment ? with N(?) non-root non-terminal
nodes receives probability
P(?) 2N(?) F(?)
Sub-optimal assumption!

72
In DOP1, the largest trees in the corpus
completely dominate the statistics.Needed a
different estimator.

Solutions
Smoothing an overfitting estimation (Sima'an,
Buratto).
Held-out estimation (Zollmann).

Smoothing
Good-Turing estimation Estimating the
probability of unseen events on the basis of the
number of observed unique events, twice-occurring
events, etc.
Back-off Sparse-data problem with
trigram-models estimate the probabilities of
unseen trigrams on the basis of the probabilities
of their constituent bigrams and unigrams

Held-out estimation
Get the fragment set from one part of the corpus
and the probabilities from another part. Use ten
different splits and take the average.

75
(No Transcript)
76
The Data-Oriented Perspective on Perlocutionary
Effect

"The effect of a lecture depends on the habits of
the listener, because we expect the language to
which we are accustomed."
Aristotle, Metaphysics II 12,13

77
Data-Oriented Parsing as a cognitive model
S
VP
NP
NP
detevery
Nwoman
Nman
det a
Vloves
78
Data-Oriented Parsing as a cognitive model
S
VP
NP
NP
detevery
Nwoman
Nman
det a
Vloves

Write a Comment

User Comments (0)