The Earley Algorithm - PowerPoint PPT Presentation

1 / 24
About This Presentation
Title:

The Earley Algorithm

Description:

left-to-right pass over the input as a result of which ... After the pass, the chart encodes all the possible. parses of the input. ... – PowerPoint PPT presentation

Number of Views:885
Avg rating:3.0/5.0
Slides: 25
Provided by: sergeini
Category:
Tags: algorithm | earley | pass

less

Transcript and Presenter's Notes

Title: The Earley Algorithm


1
The Earley Algorithm The main idea The basic
top-down parsing algorithm with
bottom-up filtering (e.g., using the left-corner
heuristic) is the basis for efficient
parsing. The problems of left-recursive rules,
ambiguity and subtree reparsing are tackled
using dynamic programming.
2
In dynamic programming, tables containing
solutions to subproblems are stored and reused
when needed. In parsing, the table contains
subtrees for each complete constituent in the
input. This solves the reparsing problems --
before parsing a constituent the algorithm
attempts to look it up in the table first. This
also lays the groundwork for ambiguity
resolution by recording all the legal subtrees
that can be combined in a variety of ways in a
complete parse.
3
The Earley algorithm reduces the complexity of
the search down to polynomial by eliminating the
repetitions due to backtracking. Its complexity
is, in fact, cubic to the number of words in the
input.
4
The main loop of the Earley algorithm is a
single left-to-right pass over the input as a
result of which an array of n1 entries called a
chart is filled. For each word position in the
sentence, the chart stores the list of search
states representing the partial parse trees
generated so far. After the pass, the chart
encodes all the possible parses of the input.
Each legal subtree is represented only once and
can be used in any number of complete parses.
5
  • The states within each chart entry contain three
    types
  • of information
  • - a subtree corresponding to a single grammar
    rule,
  • information about progress in completing this
    tree
  • a dot within the RHS of a states grammar rule
    will indicate the progress
  • made in parsing it this will be called a dotted
    rule.
  • the position of the subtree in the input string
  • This will be represented by two numbers
    indicating where the state begins
  • and where its dot lies. These two numbers
    indicate a span of the rule ata particular point
    in its derivation.

6
For example, in parsing Book that flight the
Earley parser will create, among others, the
following three states S ? VP, 0,0 NP ? Det
Nominal, 1,2 VP ? V NP , 0,3
7
The fundamental operation of an Earley parser
is to walk through the n1 sets of states in the
chart left to right, processing the states
within each set in order. The walk consists in
applying one of three operators -- Predictor,
Scanner or Completer -- to each state. The
choice of the actual operator depends on the
states status. The operation always results in
the addition of new states to the end of either
the current or the next state set in the chart.
8
The algorithm moves in only one direction.
Always. It modifies the chart as it goes.
States are never removed and the algorithm
never backtracks to a previous chart position
once it finished with it. The halting condition
is the presence of the state S ? a, 0,n
9
(No Transcript)
10
Predictor, Scanner and Completer take a state as
input and create new states from it. The new
states are added to the chart -- unless they
are already in the chart! Predictor and
Completer add states to the current
chart entry. Scanner adds a state to the next
chart entry.
11
Predictor creates new states that embody
top-down expectations generated during the
parsing process. Predictor is called with any
state that has a non-terminal and not
pre-terminal (single part-of-speech) symbol to
the right of the dot. Predictor creates a new
state for each possible alternative expansion of
that non-terminal found in the grammar. The new
states are placed into the same chart entry as
thegenerating state, with the same begin and end
point as that generating state.
12
For example, calling Predictor with the state S
? VP, 0,0 results in adding the states VP ?
Verb, 0,0 and VP ? Verb NP, 0,0 to the
first chart entry
13
Scanner is called from a state that has a
part-of-speech (preterminal) category to the
right of the dot and incorporates a state
that corresponds to the predicted part of speech
into the chart. A new state is created by
advancing the dot over the predicted input PoS
category in the original state. This is the way
in which the Earley parser predictively disambigua
tes part-of-speech ambiguities only
those categories of a particular input word will
be introduced to the chart that are predicted by
some state (that is, conform to a particular
grammar rule).
14
For example, while processing the state VP ?
Verb NP, 0,0 the scanner consults the current
input word (since the category, in this case,
Verb, is pre-terminal). Since book can be a
verb, the expectations of this state are
matched. As a result, a new state VP ? Verb
NP, 0,1 is created and added to the chart
entry that follows the onecurrently being
processed.
15
Completer is called when the dot has reached the
end of a rule. This intuitively means that the
parser has successfully discovered a phrase
(category, constituent) spanning a substring of
the input. Completer finds and advances all
previously created states stored in the current
chart entry that were waiting for this
grammatical category (phrase, constituent) at
this position in the input. New states are
created by copying the old state but advancing
the dot over the expected constituent
(phrase, category) and putting the result in the
current chart entry.
16
For example, when Completer starts processing the
state NP ? Det Nominal , 1,3, it looks in
the chart for states ending at 1 and expecting
an NP. In our example, VP ? Verb NP, 0,1
will be such a state, created by
Scanner. Completer then adds to this chart entry
the state VP ? Verb NP , 0,3
17
A mnemonic device Predictor moves top-down
suggests future transitions Scanner moves
left-to-right consumes input, shifts
chart position Completer moves bottom-up builds
larger constituents from smaller ones
18
Book that flight
19
(No Transcript)
20
What we have seen so far is actually a
recognizer. Now we need a device for reading a
tree off of the chart. In fact, we need to be
able to read all legal trees off the chart. So,
the representation of each state is augmented
with another field to store information about
completed states that generated this states
constituents.
21
In practice, this is done through one change in
Completer. Completer creates new states by
advancing older incomplete ones when the
constituent following the dot is found. So, we
add to the older state a pointer to the list of
previous-states of the newly created
state. Then, to retrieve the parse tree means to
retrieve, recursively all the states that are
included in each of the complete S states in the
final chart entry.
22
(No Transcript)
23
(No Transcript)
24
(No Transcript)
Write a Comment
User Comments (0)
About PowerShow.com