Ch 12. Probabilistic Parsing - PowerPoint PPT Presentation

1 / 28
About This Presentation
Title:

Ch 12. Probabilistic Parsing

Description:

Give a learning tool some examples of the kinds of parse trees that are wanted ... None of factors where relevant to the probabilities of a parse tree. ... – PowerPoint PPT presentation

Number of Views:87
Avg rating:3.0/5.0
Slides: 29
Provided by: songyo
Category:

less

Transcript and Presenter's Notes

Title: Ch 12. Probabilistic Parsing


1
Ch 12. Probabilistic Parsing
  • 02.05.11
  • Song young in

2
Contents
  • Introduction
  • Some Concept
  • Parsing for disambiguation
  • Treebank
  • Weakening the independent assumptions of PCFGs
  • Tree probabilities and derivation probabilities
  • Phrase structure grammars and dependency grammars
  • Evaluation
  • Equivalent models

3
Introduction
  • THE PRACTICE of parsing
  • Can be considered as a implementation of the idea
    of chunking
  • Chunking recognizing higher level units of
    structure that allow us to compress description
    of a sentence
  • Grammar induction
  • One way to capture the regularity of chunks over
    different sentences is to learn structure of the
    chunks.
  • But the structure found depend on the implicit
    inductive bias of the learning program
  • Need to get what structure model to find before
    starting building it
  • Decide what we want to do with parsed sentences.

4
Introduction
  • Goals of parsing
  • As a first step toward semantic representation
  • Detecting phrasal chunks for indexing in an IR
    systems
  • As a language models
  • For this goals..
  • The overall goals is to produce a system that can
    place a provably useful structure over arbitrary
    sentence. Build a parser.
  • No need to insist that one begins with a tabular
    rasa.

5
Some concepts.Parsing for disambiguation
  • Three way to use probabilities in a parser
  • Probabilities for determining the sentence.
  • When actual input is uncertain, determine the
    probably correct sentence
  • Refer Figure 12.1
  • Probabilities for speedier parsing
  • To find the best parse more quickly.
  • Probabilities for choosing between parses
  • Choose most likely parse among the many parses of
    the input string.

6
Treebank
  • Pure Grammar Induction approach
  • Tend not to produce the parse trees that people
    want
  • A approach to this problem
  • Give a learning tool some examples of the kinds
    of parse trees that are wanted
  • A collection of such example parses - treebank
  • Penn Treebank
  • Feature (Refer Figure 12.1)
  • Straightforward notation ( Lisp ) via bracketing
  • Phrase is fairly flat
  • Makes some attempt to indicate grammatical and
    semantic functions.
  • Use a empty nodes

7
Treebank
  • At Usage of treebank
  • Induction Problem of extracting the grammatical
    knowledge
  • Determine a PCFG from a treebank.
  • Count the frequencies of local trees and then
    normalize these to give probabilities.
  • Linguist or Grammar.

8
Parsing Model Language Model
  • The idea of parsing
  • Take a sentence s and to work out parse trees for
    it according to some grammar G
  • In probabilistic parsing
  • To place a ranking on possible parses showing how
    likely each one is.
  • Or maybe to return the most likely parse of a
    sentences
  • Definition of probabilistic parsing model
  • P(t s,G) where S P(t s,G) 1
  • t arg maxt P(t s,G)

9
Parsing Model Language Model
  • Parsing model
  • Little odd thing
  • Using probabilities conditioned on a particular
    sentence.
  • Generally, need to base probability estimate on
    some more general class of data
  • More usual approach
  • By defining a language model, assign a
    probability to all trees generated by the grammar

10
Parsing Model Language Model
  • Language model
  • The joint probabilities, p( t,s G)
  • if yield (t) s, p ( t G ), otherwise 0
  • under such a model,
  • p ( t G ), p ( t ) is the probability of a
    particular parse of a sentence according to the
    grammar G
  • Overall probability of a sentences
  • S tyield (t) ? L p ( t ) 1
  • p( s ) S t p ( s, t ) S tyield (t) S
    p ( t )
  • So,
  • t argmaxt p(t s) argmaxt p(t, s) / p(s)
    argmaxt p(t, s)
  • So a language model can be used as a parsing
    model for purpose of choosing between parses

11
Parsing Model Language Model
  • Collins work.
  • Language model provides a better foundations for
    modeling

12
Weakening the independence assumption of PCFGs
  • Discourse context
  • The prior discourse context will influence our
    interpretation of later sentences.
  • Many source of information are incorporated in
    real time while people parse sentence.
  • PCFGs independence assumption
  • None of factors where relevant to the
    probabilities of a parse tree.
  • In fact, all of these source of evidence are
    relevant to and might be usable for
    disambiguating probabilistic parses.
  • Collocation more local semantic disambiguation.
  • Prior text indication of discourse context

13
Lexicalization
  • Two weakness of independence assumption
  • Lack of lexicalization
  • Probabilities must be dependent on structural
    context
  • Lack of lexicalization
  • Refer table 12.2
  • Need subcategorization frames.
  • Suggest need to include more information about
    what actual words are, when making decision about
    the structure of the parse tree

14
Lexicalization
  • Lack of lexicalization
  • Issue of choosing phrasal attachment positions.
  • Lexical content of phrases almost always provides
    enough information to decide the correct
    attachment site.
  • But syntactic category of the phrase normally
    provides very little information.
  • Lexicalize CFG
  • Refer 12.8
  • Having each phrasal node be marked by its head
    word
  • Effective strategy, but..
  • Dont consider some dependencies between pairs of
    non-heads.

15
Probabilities dependent on structural context
  • Another weakness..
  • PCFGs are also deficient on purely structural
    grounds
  • The assumption of structural context-freeness
    remains. ( grammatical assumption.)
  • Refer to table 12.3
  • Refer to table 12.4
  • Take a more thorough corpus study to understand
    some of the other effects
  • Pronouns are so infrequent in the second obj
    position

16
Tree probabilities and derivational probabilities
  • Parse tree..
  • Can be thought of as a compact Record of a
    branching process, conditioned solely on the
    label of the node.
  • Derivation, derivation model
  • A sequence of top-down rewrites until one has a
    phrase marker all of whose leaf nodes are
    terminals.
  • Refer figure 12.3, (12.11)
  • A given parse tree is in terms of the prob of
    derivation of it
  • To estimate probability of a tree. ( refer
    (12.12) )

17
Tree probabilities and derivational probabilities
  • Canonical Derivation
  • In many cases, extra complication is unnecessary.
  • The choice of derivational order in the PCFG case
    makes no difference to the final probabilities.
  • Final probability for a tree is otherwise the
    same
  • Simplify things by finding a way of choosing a
    unique derivation for each tree
  • Canonical derivation. ( like leftmost derivation
    )
  • p(t) p(d) where d is the canonical derivation
    of t
  • Whether this simplification is possible depends
    on nature of the probabilistic condition in the
    model.

18
Tree probabilities and derivation probabilities
  • Derivation model
  • Form equivalence classes of history via an
    equivalencing function and estimate ( History
    based grammar, IBM )
  • Frame work includes PCFGs as a specific case.

19
Phrasal structure grammars and dependency grammars
  • Dependency grammar
  • To describe linguistic structure in terms of
    dependencies between words
  • Such a framework is referred to as a dependency
    grammar.
  • In a dependency grammar,
  • One world is the head of a sentence.
  • All other words are either a dependent of that
    word, or else dependent on some other word which
    connects to the head word through a sequence of
    dependencies.

20
Phrasal structure grammars and dependency grammars
  • Lauers work
  • Relation between Phrasal structure grammar and
    dependency grammar.
  • To disambiguate a compound noun problem.
  • Refer (12.23) phrase structure model
  • Using corpus-evidence ( collocational bond )
    between
  • pharse structure, structure model
  • Refer (12.24) dependency structure
  • pharse model, phrase structure
  • Dependency model outperforms the adjacency model.

21
Phrasal structure grammars and dependency grammars
  • Lauers work
  • Compare (12.24) with Lexicalized PCFGs model
    (12.25)
  • Under lexcalized PCFGs, Find p(Nx) p(Nv).
  • So, decide between the possibilities is by
    comparing p(Ny) vs p(Nu)
  • This is exactly equivalent to comparing the bond
    between
  • pharse model, phrase structure
  • Isomorphisms
  • between various kinds of dependency grammar and
    corresponding types of phrase structure grammars.

22
Phrasal structure grammars and dependency grammars
  • Two key advantage of dependency grammars.
  • Disambiguation decisions are being made directly
    in terms of these word dependencies.
  • because dependency grammars work directly in
    terms of these word dependencies.
  • Give one a way of decomposing phrase structure
    rules. And estimate their probabilities.
  • A problem with induction parser
  • Penn Treebank is that there are many of rare
    kinds of flat trees with many children because of
    its tree is very flat.
  • In unseen data, encounter yet other such trees
    that one has never seen before.
  • Tries to estimate the prob of a local subtree at
    once

23
Phrasal structure grammars and dependency grammars
  • How a Dependency grammar decompose phrase.
  • By estimating the probability of each
    head-dependent relationship separately.
  • Step.
  • If we have never seen the local tree in figure
    12.5(a) before.
  • Instead of PCFGs back-off, decompose the tree
    into dependencies as (b).
  • Privide tree like (c) and (d).
  • Reasonable estimation is enable.
  • But making a further important independence
    assumption
  • Need some system to account for the relative
    ordering of dependencies.

24
Evaluation
  • How to evaluate the success of a statistical
    parser
  • Cross entropy of the model
  • Develop for language model
  • But.
  • Cross entropy or perplexity measures only the
    probabilistic weak equivalence of models, and not
    the tree structure.
  • Probabilistically weakly equivalent grammars have
    the same cross entropy, but if they are not
    strongly equivalent for the task.

25
Evaluation
  • Parse evaluation
  • Ultimate goal is to build a aimed system.
  • Task-based Evaluation
  • A better way to evaluate parsers is to embed them
    in such a larger system and to investigate the
    differences.
  • Tree accuracy ( Exact matching )
  • Strictest criterion
  • 1 point if the parser gets the completely right,
    otherwise 0
  • Sensible for criterion to match what ones parser
    is maximizing

26
Evaluation
  • Parse evaluation
  • PARSEVAL measure
  • Which originate in an attempt to compare the
    performance of non-statistical parsers.
  • Usually been applied in statistically NLP work
  • Basic Measure
  • Precision, Recall, Labeled Precision, Labeled
    Recalls, Crossing Brackets, Crossing Accuracy
  • Refer figure 12.6, 12.7

27
Evaluation
  • Problem of PARSEVAL
  • PARSEVAL measures are not very discriminating.
  • Charniak (96) s vanilla PCFG which ignores all
    lexical content
  • PARSEVAL measure is quite easy at structure of
    Penn-treebank.
  • PARSEVAL measures success at the level of
    individual decisions.
  • At NLP, consecutive decisions is more important
    and hard.

28
Evaluation
  • Penn-treebanks problem
  • Too flat.
  • Non-standard adjunction structure given to post
    noun-head modifiers
  • PARSEVAL measure seems too harsh.
Write a Comment
User Comments (0)
About PowerShow.com