Title: COMBINING COMPATIBLE STATES DURING LR(1) PARSER CONSTRUCTION
1COMBINING COMPATIBLE STATES DURING LR(1) PARSER
CONSTRUCTION
2-
- The LR(0) algorithm for creating compilers is
one in which contexts are not evaluated, and
states are considered identical if they consist
of the same set of marked productions -
-
3-
-
- But this algorithm is insufficient for actual
programming languages, producing parsers with
numerous conflicts -
-
4-
-
- The LR(1) algorithm when applied to creating
compilers for real computer languages, such as
those for Java or C, results in a parsing
machine that is a order or more larger than those
produced by an LR(0) algorithm for the same
grammar. -
-
5- On the other hand the LR(1) algorithm, which
you made use of in your last assignment, produces
parsers, for the large grammars employed for
actual computer languages, which are a few orders
larger than those produced by the LR(0)
algorithm. -
-
6- As a compromise, various methods, including
the one employed by Yacc, have been devised for
subsets of the LR(1) languages, using a hybrid
approach. -
-
7-
- This works well for most programming languages,
but imposes a greater responsibility on the
compiler writer, to come up with a grammar that
does not lead to conflicts (i.e. to cases where
more than one action is defined at a parsing
machine state for the same next input symbol). - These methods only work for a subset of the
LR(1) grammars, and there are applications,
including ones involving natural language
processing, for which they are inadequate. -
-
8-
- However one can employ a definition of
compatibility between states, which works for all
LR(1) languages, and which produces parsers of
the same size as those referred to previously -
-
9- DEFINITION. The nucleus of state consists of the
configurations in the state in which the marker
is in a position greater that zero. - Example
- A configuration in a state of the form
- A ? bc.d, x,y
- would be a member of its nucleus, but a
configuration such as - A ? .bcd, x,y
- would not be a member.
10- DEFINITION OF COMPATIBILITY BETWEEN LR(1)
STATES - Let S and S? be two states in a LR(1) parsing
machine whose nuclei consist of the same marked
productions, which we will denote as P1,,Pn . -
- For 1 t n, let Ut denote the set of
contexts associated with marked production Pt in
state S, and let Ut? denote the set of contexts
associated with that marked production in state
S?. -
- Then states S and S? are compatible if, for
all 1 i lt j n, at least one of the following
condition holds - (a) Ui ? Uj? ? and Ui? ? Uj ?
- (? is the empty set, i.e. the
intersections involved are both empty) - (b) Ui ? Uj ? ?
- (c) Ui? ? Uj? ? ?
11- Note
- If states S and S? are as described above, and
their nuclei consist of only a single
configuration, then according to the above
definition they are compatible
12- In the case where S and S? as described above are
compatible, one can combine the states into a
single state whose nucleus consists of the same
marked productions listed above, while for 1 t
n, the set of contexts associated with marked
production Pt is Ut ? Ut? .
13-
-
-
- One way of looking at the definition is to say
that every pair of configurations in the nuclei
must pass a test, and that two states are
compatible only if they all in fact pass. -
-
-
-
-
-
14- Fortunately, in grammars for actual
programming languages such as Java, C, etc.,
there are at most 6 configurations in the nucleus
of any state. -
- The states may be large, with many immediate
successors, but the nuclei are all quite small. -
-
15- EXAMPLES
-
- We show only the nucleus of the states in
these examples, since, according to the
definition, states are compatible if and only if
their nuclei are. -
-
-
16- S
S -
- The above two states are not compatible
- because the pair consisting of the first and
last configurations fail the test. -
- For this pair condition (a) of the defn. is
not true, since the context of the first
configuration of S contains an x, and so does the
context of the third production of S - In addition neither of conditions (b) or (c)
are true.
A ? ab.c x,y B ? b.n s,t C ? rb.ed
u,v
A ? ab.c d B ? b.n s C ? rb.ed
x,v
17- S
S -
-
- The first and third configurations in this
case pass the test because condition (b) of the
defn. applies to the first and third
configurations of S. Both of these
configurations contain x in their set of
contexts. The states in this case are
compatible. - Remember, that while every pair of
configurations in the nucleus must pass the test,
it only requires that one of conditions (a), (b)
or (c) be true for a given pair for it to pass. -
-
A ? ab.c x,y B ? b.n s,t C ? rb.ed
x,v
A ? ab.c x,y,d B ? b.n s C ? rb.ed
x,v
18- Since the states are compatible, they can be
combined to form one whose nucleus is -
A ? ab.c x,y,d B ? b.n s,t C ?
rb.ed x,u,v
19- Note.
- In the figure on the next slide, where we omit
the context set of various configurations (i.e.
only show the marked production involved), the
inference involved is that they are irrelevant to
the assertions being made about the figure.
20States 2 and 8 are not compatible since the first
configuration of state 2 has d as context in
common with the second configuration of state 8.
In fact if we were to combine states 2 and 8, it
would produce a combination of states 3 and 9 as
its u-successor. This state would have a
conflict, in that in had reduce actions, for when
the next input symbol was d, for both Z ? tu and
V ? ?
21- Now consider the altered machine obtained if the
production X ? aYd where replaced by (say) X ?
aYa. In this case the first configuration of
state 2 would be Y ? t.W a. It would then
follow that states 2 and 8 were compatible and
could safely be combined to form - Y ? t.W a, e.
- Z ? t.u c, d
- W ? .uV
22- The Journal paper describing this method of
combining states contains a formal proof of its
correctness. But seeing ours is a practically
oriented course, we will just consider an
informal justification based on a few examples to
supply a flavor of the reasoning involved
23- The main argument is that if the parsing machine
containing the states S and S?, as described in
the defn. of compatibility, has no conflicts, and
S and S? are compatible, then the parsing machine
obtained by combining them will also have no
conflicts.
24- The argument is by contradiction. Lets consider
examples of the various ways that two
configurations in the combination of S and S?
could have conflicts or lead to conflicts between
other pairs of configurations in states reachable
from S. In each case we hope to show that either
the parsing machine as it was before S and S?
were combined contained conflicts in the first
place or that S and S? could not in fact have
been compatible.
25- Case 1. Let configs 1 and 2 of the combined
state formed from - states S and S be
- A ? r B.uv a,b
- C ? t B.uv a,c
- Seeing that the machine as it was before the
- combination contained no conflicts, and
specifically did not - contain a conflict in the uv successor of these
states, either - state S must have contained the a in its
version of config1, while state S? contained the
a in its version of config 2, or - vice-versa.
26- Case 1 contd.
- A ? r B.uv a,b
- C ? t B.uv a,c
- In either case neither condition (a) nor (b) of
the defn. - would then be true for the two configs, and
since - condition (c) is also not true, states S and S
could not - have been compatible in the first place.
27- Case 2. Let configs 1 and 2 of the combined
state be - A ? r B.uv a,b
- D ? t B.Ca
- C ?.uv a
- Either S or S? must contain A ? r B.uv a.. ,
- in which case the original parsing machine would
have had a conflict at its uv-successor. This is
in contradiction to our assumption that the
original parsing machine was conflict-free.
28- Case 3. Let configs 1 and 2 be
- A ? s B.Ea
- E ?.uv a
- D ? t B.Ca
- C ?.uv a
- Here again the original parsing machine would
have had conflicts in the uv-successors of both S
and S?
29- Case 4. Let configs 1 and 2 be
- A ? r B.uv
- D ? t B.uvr
- Here too the original parsing machine would have
had conflicts in the uv-successors of both S and
S?. In this case the conflict would have been
between a reduction and a transition.
30- EXERCISE
- Construct an LR(1) parsing machine for the
- grammar on the next slide, combining
compatible states as you encounter them
31- program ? main statement_list end main
- statement_list ? statement_list statement
- statement
- statement ? assign_statement
- while_statement
- do_statement
- assign_statement ? identifier identifier
- while_statement ? while ( condition )
-
statement_list wend - condition ? identifier identifier
- do_statement ? do identifier number to
- number
statement_list end do