Ch 12. Probabilistic Parsing - PowerPoint PPT Presentation

1 / 28

About This Presentation

Title:

Ch 12. Probabilistic Parsing

Description:

Give a learning tool some examples of the kinds of parse trees that are wanted ... None of factors where relevant to the probabilities of a parse tree. ... – PowerPoint PPT presentation

Number of Views:87

Avg rating:3.0/5.0

Slides: 29

Provided by: songyo

Category:

more less

Transcript and Presenter's Notes

Title: Ch 12. Probabilistic Parsing

1
Ch 12. Probabilistic Parsing

02.05.11
Song young in

2
Contents

Introduction
Some Concept
Parsing for disambiguation
Treebank
Weakening the independent assumptions of PCFGs
Tree probabilities and derivation probabilities
Phrase structure grammars and dependency grammars
Evaluation
Equivalent models

3
Introduction

THE PRACTICE of parsing
Can be considered as a implementation of the idea
of chunking
Chunking recognizing higher level units of
structure that allow us to compress description
of a sentence
Grammar induction
One way to capture the regularity of chunks over
different sentences is to learn structure of the
chunks.
But the structure found depend on the implicit
inductive bias of the learning program
Need to get what structure model to find before
starting building it
Decide what we want to do with parsed sentences.

4
Introduction

Goals of parsing
As a first step toward semantic representation
Detecting phrasal chunks for indexing in an IR
systems
As a language models
For this goals..
The overall goals is to produce a system that can
place a provably useful structure over arbitrary
sentence. Build a parser.
No need to insist that one begins with a tabular
rasa.

5
Some concepts.Parsing for disambiguation

Three way to use probabilities in a parser
Probabilities for determining the sentence.
When actual input is uncertain, determine the
probably correct sentence
Refer Figure 12.1
Probabilities for speedier parsing
To find the best parse more quickly.
Probabilities for choosing between parses
Choose most likely parse among the many parses of
the input string.

6
Treebank

Pure Grammar Induction approach
Tend not to produce the parse trees that people
want
A approach to this problem
Give a learning tool some examples of the kinds
of parse trees that are wanted
A collection of such example parses - treebank
Penn Treebank
Feature (Refer Figure 12.1)
Straightforward notation ( Lisp ) via bracketing
Phrase is fairly flat
Makes some attempt to indicate grammatical and
semantic functions.
Use a empty nodes

7
Treebank

At Usage of treebank
Induction Problem of extracting the grammatical
knowledge
Determine a PCFG from a treebank.
Count the frequencies of local trees and then
normalize these to give probabilities.
Linguist or Grammar.

8
Parsing Model Language Model

The idea of parsing
Take a sentence s and to work out parse trees for
it according to some grammar G
In probabilistic parsing
To place a ranking on possible parses showing how
likely each one is.
Or maybe to return the most likely parse of a
sentences
Definition of probabilistic parsing model
P(t s,G) where S P(t s,G) 1
t arg maxt P(t s,G)

9
Parsing Model Language Model

Parsing model
Little odd thing
Using probabilities conditioned on a particular
sentence.
Generally, need to base probability estimate on
some more general class of data
More usual approach
By defining a language model, assign a
probability to all trees generated by the grammar

10
Parsing Model Language Model

Language model
The joint probabilities, p( t,s G)
if yield (t) s, p ( t G ), otherwise 0
under such a model,
p ( t G ), p ( t ) is the probability of a
particular parse of a sentence according to the
grammar G
Overall probability of a sentences
S tyield (t) ? L p ( t ) 1
p( s ) S t p ( s, t ) S tyield (t) S
p ( t )
So,
t argmaxt p(t s) argmaxt p(t, s) / p(s)
argmaxt p(t, s)
So a language model can be used as a parsing
model for purpose of choosing between parses

11
Parsing Model Language Model

Collins work.
Language model provides a better foundations for
modeling

12
Weakening the independence assumption of PCFGs

Discourse context
The prior discourse context will influence our
interpretation of later sentences.
Many source of information are incorporated in
real time while people parse sentence.
PCFGs independence assumption
None of factors where relevant to the
probabilities of a parse tree.
In fact, all of these source of evidence are
relevant to and might be usable for
disambiguating probabilistic parses.
Collocation more local semantic disambiguation.
Prior text indication of discourse context

13
Lexicalization

Two weakness of independence assumption
Lack of lexicalization
Probabilities must be dependent on structural
context
Lack of lexicalization
Refer table 12.2
Need subcategorization frames.
Suggest need to include more information about
what actual words are, when making decision about
the structure of the parse tree

14
Lexicalization

Lack of lexicalization
Issue of choosing phrasal attachment positions.
Lexical content of phrases almost always provides
enough information to decide the correct
attachment site.
But syntactic category of the phrase normally
provides very little information.
Lexicalize CFG
Refer 12.8
Having each phrasal node be marked by its head
word
Effective strategy, but..
Dont consider some dependencies between pairs of
non-heads.

15
Probabilities dependent on structural context

Another weakness..
PCFGs are also deficient on purely structural
grounds
The assumption of structural context-freeness
remains. ( grammatical assumption.)
Refer to table 12.3
Refer to table 12.4
Take a more thorough corpus study to understand
some of the other effects
Pronouns are so infrequent in the second obj
position

16
Tree probabilities and derivational probabilities

Parse tree..
Can be thought of as a compact Record of a
branching process, conditioned solely on the
label of the node.
Derivation, derivation model
A sequence of top-down rewrites until one has a
phrase marker all of whose leaf nodes are
terminals.
Refer figure 12.3, (12.11)
A given parse tree is in terms of the prob of
derivation of it
To estimate probability of a tree. ( refer
(12.12) )

17
Tree probabilities and derivational probabilities

Canonical Derivation
In many cases, extra complication is unnecessary.
The choice of derivational order in the PCFG case
makes no difference to the final probabilities.
Final probability for a tree is otherwise the
same
Simplify things by finding a way of choosing a
unique derivation for each tree
Canonical derivation. ( like leftmost derivation
)
p(t) p(d) where d is the canonical derivation
of t
Whether this simplification is possible depends
on nature of the probabilistic condition in the
model.

18
Tree probabilities and derivation probabilities

Derivation model
Form equivalence classes of history via an
equivalencing function and estimate ( History
based grammar, IBM )
Frame work includes PCFGs as a specific case.

19
Phrasal structure grammars and dependency grammars

Dependency grammar
To describe linguistic structure in terms of
dependencies between words
Such a framework is referred to as a dependency
grammar.
In a dependency grammar,
One world is the head of a sentence.
All other words are either a dependent of that
word, or else dependent on some other word which
connects to the head word through a sequence of
dependencies.

20
Phrasal structure grammars and dependency grammars

Lauers work
Relation between Phrasal structure grammar and
dependency grammar.
To disambiguate a compound noun problem.
Refer (12.23) phrase structure model
Using corpus-evidence ( collocational bond )
between
pharse structure, structure model
Refer (12.24) dependency structure
pharse model, phrase structure
Dependency model outperforms the adjacency model.

21
Phrasal structure grammars and dependency grammars

Lauers work
Compare (12.24) with Lexicalized PCFGs model
(12.25)
Under lexcalized PCFGs, Find p(Nx) p(Nv).
So, decide between the possibilities is by
comparing p(Ny) vs p(Nu)
This is exactly equivalent to comparing the bond
between
pharse model, phrase structure
Isomorphisms
between various kinds of dependency grammar and
corresponding types of phrase structure grammars.

22
Phrasal structure grammars and dependency grammars

Two key advantage of dependency grammars.
Disambiguation decisions are being made directly
in terms of these word dependencies.
because dependency grammars work directly in
terms of these word dependencies.
Give one a way of decomposing phrase structure
rules. And estimate their probabilities.
A problem with induction parser
Penn Treebank is that there are many of rare
kinds of flat trees with many children because of
its tree is very flat.
In unseen data, encounter yet other such trees
that one has never seen before.
Tries to estimate the prob of a local subtree at
once

23
Phrasal structure grammars and dependency grammars

How a Dependency grammar decompose phrase.
By estimating the probability of each
head-dependent relationship separately.
Step.
If we have never seen the local tree in figure
12.5(a) before.
Instead of PCFGs back-off, decompose the tree
into dependencies as (b).
Privide tree like (c) and (d).
Reasonable estimation is enable.
But making a further important independence
assumption
Need some system to account for the relative
ordering of dependencies.

24
Evaluation

How to evaluate the success of a statistical
parser
Cross entropy of the model
Develop for language model
But.
Cross entropy or perplexity measures only the
probabilistic weak equivalence of models, and not
the tree structure.
Probabilistically weakly equivalent grammars have
the same cross entropy, but if they are not
strongly equivalent for the task.

25
Evaluation

Parse evaluation
Ultimate goal is to build a aimed system.
Task-based Evaluation
A better way to evaluate parsers is to embed them
in such a larger system and to investigate the
differences.
Tree accuracy ( Exact matching )
Strictest criterion
1 point if the parser gets the completely right,
otherwise 0
Sensible for criterion to match what ones parser
is maximizing

26
Evaluation

Parse evaluation
PARSEVAL measure
Which originate in an attempt to compare the
performance of non-statistical parsers.
Usually been applied in statistically NLP work
Basic Measure
Precision, Recall, Labeled Precision, Labeled
Recalls, Crossing Brackets, Crossing Accuracy
Refer figure 12.6, 12.7

27
Evaluation