Title: Earleys Algorithm: General ContextFree Parsing
1Earleys Algorithm General Context-Free Parsing
- Lecture 12
- P. N. Hilfinger
2Parsing General Context-Free Grammars
- Shift-reduce parsing can work for most practical
applications. - However, one must sometimes munge the grammar,
though not as much as LL(1). - Cannot handle ambiguity, nor situations where
resolving ambiguities requires looking far ahead. - Today, well look at a method that can Earleys
Algorithm. - In fact, shift-reduce parsing is a highly
optimized special case of this algorithm.
3Earleys Algorithm Basic Idea
- Scan tokens left-to-right.
- At each point, keep track of all possible
subtrees that could include the current point in
the input, based on everthing seen so far. - At the end of the input, if there is a tree that
is rooted at the start symbol, weve found a
parse (possibly many).
4Some Notation
- If input is ss1s2sn then position k in the
input is just after sk and before sk1, with
position 0 at the beginning and position n at the
end. - At each input position, k, compute a set of
items, where each item has the form - A ? ? ? ?, m
- where A ? ? ? is a production and 0mk.
- Together, the items in the set describe all
subtrees of possible parse trees that begin or
end at position k or have a child that does.
5Meaning of an Item
- An item A ? ? ? ?, m at position k means
- The input between positions m and k matches ?.
- Depending on what sk1sn is, there might be a
subtree formed from production A ? ? ? in the (or
a) parse tree for the entire string. - So when ? is empty, means that there is a
possible handle for A ? ? that ends at k. - So that leaves the problem of figuring out what
items to put in each set.
6Example
- Grammar
- E ? E T E ? T
- T ? T int T ? int
- Input
- 0 int 1 2 int 3 4 int 5
- At position 0, we expect to see an E to our
right, formed from one of Es productions. - Plus, since an E can start with a T, we wont be
surprised by a T formed from one of its
productions.
7Example Getting Started
int
0
1
E ? ? T, 0 E ? ? E T, 0
Start with items for start symbol E
T ? ? int, 0 T ? ? T int, 0
and (since E can start with T), also add
items for T
8Closure Items
- Whenever we have an item B?? ? A ?, j in item set
m, it indicates that a substring producing A
might start at this position. - Thats what the item A? ? ?, m means, so we also
add those items (for each production A? ?) to
item set m. - These are called closure items.
- Other items are kernel items.
9Example Computing next item set
int
0
1
E ? ? T, 0 E ? ? E T, 0 T ? ? int, 0 T ? ? T
int, 0
T ? int ?, 0
T ? T ? int, 0 E ? T ?, 0
E ? E ? T, 0
10Computing next item set
- For each item of the form A?? ? c ?, k in item
set m, where csm1 is the next input symbol,
insert A?? c ? ?, k in item set m1. - For each complete item, A?? ?, k in item set m1,
and each item B?? ? A ?, j back in item set k,
add item B?? A ? ?, j to item set m1. (When
creating a parse tree, the A in this new item
will have have children ?, as denoted by dashed
red arrows in our examples).
11Continuing the Example, Set 2
int
1
2
T ? int ?, 0
E ? E ? T, 0
closure items
T ? ? T int, 2
T ? T ? int, 0 E ? T ?, 0
T ? ? int, 2
E ? E ? T, 0
12Continuing the Example, Set 3
int
2
3
E ? E ? T, 0
T ? int ?, 2
T ? T ? int, 2
T ? ? T int, 2
E ? E T ?, 0
T ? ? int, 2
from item set 0
E ? E ? T, 0
13Continuing the Example, Sets 4 5
int
3
4
5
T ? int ?, 2
T ? T ? int, 2
T ? T int ?, 2
T ? T ? int, 2
T ? T ? int, 2
E ? E T ?, 0
E ? E T ?, 0
ACCEPT!
E ? E ? T, 0
E ? E ? T, 0
14Accepting the String
- In the last item set, have a completed item for
the start symbol that started in set 0. - That means the input between 0 and end matches
an entire production for the start symbol, so
the string parses correctly.
15Retrieving a Parse Tree or Derivation
- Start with a completed item in the last set that
produces the whole input (has form S?,0 for
start symbol S). - Follow the red arrows to find how to expand that
symbol. - Work backwards through the sets to find the
expansions of the other nonterminals.
16Getting a Tree from our Example (I)
int
5
E
T ? T int ?, 2
E T
T ? T ? int, 2
T
E ? E T ?, 0
To find out how to expand this T, go back to
chart 3 (before int)
start here
int
E ? E ? T, 0
17Getting a Tree from our Example (II)
int
E
3
T ? int ?, 2
E T
T ? T ? int, 2
T
To find out how to expand this E, go back to
chart 1 (before )
E ? E T ?, 0
int
int
E ? E ? T, 0
18Figuring out Where to Look
- In the last slide, we had to figure out where to
look for the derivation of the E in E T - We used the items
- T ? T ? int, 2 and T ? int ?, 2
- to get the T in E T, both of which tell us
that the T started after item set 2. - And since is a terminal, we then have to go
back one more.
19Getting a Tree from our Example (III)
int
1
E
T ? int ?, 0
E T
T ? T ? int, 0 E ? T ?, 0
T
T
E ? E ? T, 0
int
int
int
start here
20An Ambiguous Grammar (I)
- Grammar
- E ? E E E ? E E E ? int
- Input
- 0 int 1 2 int 3 4 int 5
0 int 1
E ? ? int, 0 E ? ? E E, 0 E ? ? E E, 0
E ? int ?, 0 E ? E ? E, 0 E ? E ? E, 0
21An Ambiguous Grammar (II)
1 2
int 3
E ? int ?, 0 E ? E ? E, 0 E ? E ? E, 0
E ? E ? E, 0 E ? ? int, 2 E ? ? E E, 2 E ? ?
E E, 2
E ? int ?, 2 E ? E ? E, 2 E ? E ? E, 2 E ? E
E ?, 0 E ? E ? E, 0 E ? E ? E, 0
22An Ambiguous Grammar (III)
3 4
int 5
E ? int ?, 2 E ? E ? E, 2 E ? E ? E, 2 E ? E
E ?, 0 E ? E ? E, 0 E ? E ? E, 0
E ? E ? E, 2 E ? E ? E, 0 E ? ? int, 4 E ? ?
E E, 4 E ? ? E E, 4
E ? int ?, 4 E ? E E ?, 2 E ? E E ?, 0 E ? E
? E, 4 E ? E ? E, 4 E ? E E ?, 0
There are two ways to produce the E starting at
0, reflecting ambiguity.
23Just for Fun
Grammar is ferociously ambiguous produces ? an
infinite number of ways!
E ? E ? E E
0
E ? ?, 0 E ? ? E E, 0 E ? E ? E, 0 E ? E E ?, 0
! ! !
24Relationship to LR Shift-Reduce Parsing
- With an LR(1) grammar, never have item sets where
two items have the same production, with the dot
in the same place, but different starting
positions. - So, ignoring the starting positions, there is a
finite number of possible item sets. - These are the states in the shift-reduce parser.