Title: CS 321 Programming Languages and Compilers
1CS 321Programming Languages and Compilers
2Parsing
- Calculate grammatical structure of program, like
diagramming sentences, where - Tokens words
- Programs sentences
For further information, read Aho, Sethi,
Ullman, Compilers Principles, Techniques, and
Tools (a.k.a, the Dragon Book)
3Outline of coverage
- Context-free grammars
- Parsing
- Tabular Parsing Methods
- One pass
- Top-down
- Bottom-up
- Yacc
4What parser doesExtracts grammatical structure
of program
function-def
name
arguments
stmt-list
stmt
main
expression
operator
expression
expression
variable
string
ltlt
cout
hello, world\n
5Context-free languages
- Grammatical structure defined by context-free
grammar. - statement ? labeled-statement
expression-statement
compound-statementlabeled-statement ? ident
statement case
constant-expression statementcompound-statement
? declaration-list
statement-list
Context-free only one non-terminal in
left-part.
terminal
non-terminal
6Parse trees
- Parse tree tree labeled with grammar symbols,
such that - If node is labeled A, and its children are
labeled x1...xn, then there is a productionA
??x1...xn - Parse tree from A root labeled with A
- Complete parse tree all leaves labeled with
tokens
7Parse trees and sentences
- Frontier of tree labels on leaves (in
left-to-right order) - Frontier of tree from S is a sentential form.
- Frontier of a complete tree from S is a sentence.
Frontier
8Example
- G L ??L E E E ??a b
- Syntax trees from start symbol (L)
- Sentential forms
9Derivations
- Alternate definition of sentence
- Given ?, ? in V, say ??? is a derivation step if
??????? and ? ??? , where A ? ??is a
production - ? is a sentential form iff there exists a
derivation (sequence of derivation steps)
S??????? ( alternatively, we say that S?? )
Two definitions are equivalent, but note that
there are many derivations corresponding to each
parse tree.
10Another example
L
L
L
L
E
E
L
E
E
b
E
b
a
a
11Ambiguity
- For some purposes, it is important to know
whether a sentence can have more than one parse
tree. - A grammar is ambiguous if there is a sentence
with more than one parse tree. - Example E ? EE EE id
E
E
E
E
E
E
E
E
id
id
E
E
id
id
id
id
12Ambiguity
- Ambiguity is a function of the grammar rather
than the language. Certain unambiguous grammars
may have equivalent ambiguous ones.
13Grammar Transformations
- Grammars can be transformed without affecting the
language generated. - Three transformations are discussed next
- Eliminating Ambiguity
- Eliminating Left Recursion (i.e.productions of
the form A?A ? ) - Left Factoring
14Grammar Transformation1. Eliminating Ambiguity
- Sometimes an ambiguous grammar can be rewritten
to eliminate ambiguity. - For example, expressions involving additions and
products can be written as follows - E ? ET T
- T ? Tid id
- The language generated by this grammar is the
same as that generated by the grammar on
tranparency 11. Both generate id(idid) - However, this grammar is not ambiguous.
15Grammar Transformation1. Eliminating Ambiguity
(Cont.)
- One advantage of this grammar is that it
represents the precedence between operators. In
the parsing tree, products appear nested within
additions
E
T
E
id
T
T
id
id
16Grammar Transformation1. Eliminating Ambiguity
(Cont.)
- The most famous example of ambiguity in a
programming language is the dangling else. - Consider
- S ? if b then S else S if b then S a
17Grammar Transformation1. Eliminating Ambiguity
(Cont.)
- When there are two nested ifs and only one else..
S
if
b
then
S
else
S
a
if b then S
a
S
if
b
then
S
if
b
then
S
else
S
a
a
18Grammar Transformation1. Eliminating Ambiguity
(Cont.)
- In most languages (including C and Java), each
else is assumed to belong to the nearest if that
is not already matched by an else. This
association is expressed in the following
(unambiguous) grammar -
- S ? Matched
- Unmatched
- Matched ? if b then Matched else
Matched - a
- Unmatched ? if b then S
- if b then
Matched else Unmatched
19Grammar Transformation1. Eliminating Ambiguity
(Cont.)
- Ambiguity is a function of the grammar
- It is undecidable whether a context free grammar
is ambiguous. - The proof is done by reduction to Posts
correspondence problem. - Although there is no general algorithm, it is
possible to isolate certain constructs in
productions which lead to ambiguous grammars.
20Grammar Transformation1. Eliminating Ambiguity
(Cont.)
- For example, a grammar containg the production
A?AA ? would be ambiguous, because the
substring aaa has two parses.
A
A
A
A
A
A
A
A
a
A
A
a
a
a
a
a
- This ambiguity disappears if we use the
productions - A?AB B and B? ?
- or the productions
- A?BA B and B? ?.
21Grammar Transformation1. Eliminating Ambiguity
(Cont.)
- Other three examples of ambiguous productions
are - A?AaA
- A?aA Ab and
- A?aA aAbA
- A language generated by an ambiguous Context Free
Grammar is inherently ambiguous if it has no
unambiguous Context Free Grammar. (This can be
proven formally) - An example of such a language is Laibjcm ij
or jm which can be generated by the grammar - S?AB DC
- A?aA e C?cC e
- B?bBc e D?aDb e
22Grammar Transformations2. Elimination of Left
Recursion
- A grammar is left recursive if it has a
nonterminal A and a derivation A?Aa for some
string a. Top-down parsing methods (to be
discussed shortly) cannot handle left-recursive
grammars, so a transformation to eliminate left
recursion is needed. - Immediate left recursion (productions of the form
A?A ? ) can be easily eliminated. - We group the A-productions as
- A?A ?1 A ?2 A ?m b1 b2 bn
- where no bi begins with A. Then we replace the
A-productions by - A? b1 A b2 A bn A
- A? ?1 A ?2 A ?m A e
23Grammar Transformations2. Elimination of Left
Recursion (Cont.)
- The previous transformation, however, does not
eliminate left recursion involving two or more
steps. For example, consider the grammar - S?Aa b
- A?Ac Sd e
- S is left-recursive because S?Aa??Sda , but it is
not immediately left recursive.
24Grammar Transformations2. Elimination of Left
Recursion (Cont.)
- Algorithm. Eliminate left recursion
- Arrange nonterminals in some order A1, A2 ,,, An
- for i 1 to n
- for j 1 to i -1
- replace each production of the form
Ai?Aj g - by the production Ai? d1 g d2 g
dn g - where Aj? d1 d2 dn are all
the current Aj-productions -
- eliminate the immediate left recursion among
the Ai-productions
25Grammar Transformations2. Elimination of Left
Recursion (Cont.)
- To show that the previous algorithm actually
works all we need notice is that iteration i only
changes productions with Ai on the left-hand
side. And m gt i in all productions of the form
Ai?Am ? . - This can be easily shown by induction.
- It is clearly true for i1.
- If it is true for all iltk, then when the outer
loop is executed for ik, the inner loop will
remove all productions Ai?Am ? with m lt i. - Finally, with the elimination of self recursion,
m in the Ai?Am ? productions is forced to be gt
i. - So, at the end of the algorithm, all derivations
of the form Ai?Ama will have m gt i and therefore
left recursion would not be possible.
26Grammar Transformations3. Left Factoring
- Left factoring helps transform a grammar for
predictive parsing - For example, if we have the two productions
- S ? if b then S else S
- if b then S
- on seeing the input token if, we cannot
immediately tell which production to choose to
expand S. - In general, if we have A? ? b1 ? b2 and the
input begins with a, we do not know (without
looking further) which production to use to
expand A.
27Grammar Transformations3. Left Factoring(Cont.)
- However, we may defer the decision by expanding A
to ?A. - Then after seeing the input derived from ?, we
may expand A to ?1 or to ?2. That is,
left-factored, the original productions become - A? ? A
- A? b1 b2
28Non-Context-Free Language Constructs
- Examples of non-context-free languages are
- L1wcw w is of the form (ab)
- L2anbmcndm n ? 1 and m? 1
- L3anbncn n ? 0
- Languages similar to these that are context free
- L1wcwR w is of the form (ab) (wR stands
for w reversed) - This language is generated by the grammar
- S? aSa bSb c
- L2anbmcmdn n ? 1 and m? 1
- This language is generated by the grammar
- S? aSd aAd
- A? bAc bc
29Non-Context-Free Language Constructs (Cont.)
- L2anbncmdm n ? 1 and m? 1
- This language is generated by the grammar
- S? AB
- A? aAb ab
- B? cBd cd
- L3anbn n ? 1
- This language is generated by the grammar
- S? aSb ab
- This language is not definable by any regular
expression
30Non-Context-Free Language Constructs (Cont.)
- Suppose we could construct a DFSM D accepting
L3. - D must have a finite number of states, say k.
- Consider the sequence of states s0, s1, s2, , sk
entered by D having read ?, a, aa, , ak. - Since D only has k states, two of the states in
the sequence have to be equal. Say, si ? sj
(i?j). - From si, a sequence of i bs leads to an accepting
(final) state. Therefore, the same sequence of i
bs will also lead to an accepting state from sj.
Therefore D would accept ajbi which means that
the language accepted by D is not identical to
L3. A contradiction.
31Parsing
- The parsing problem is Given string of tokens
w, find a parse tree whose frontier is w.
(Equivalently, find a derivation from w.) - A parser for a grammar G reads a list of tokens
and finds a parse tree if they form a sentence
(or reports an error otherwise) - Two classes of algorithms for parsing
- Top-down
- Bottom-up
32Parser generators
- A parser generator is a program that reads a
grammar and produces a parser. - The best known parser generator is yacc. Both
produce bottom-up parsers. - Most parser generators - including yacc - do not
work for every cfg they accept a restricted
class of cfgs that can be parsed efficiently
using the method employed by that parser
generator.
33Top-down parsing
- Starting from parse tree containing just S, build
tree down toward input. Expand left-most
non-terminal. - Algorithm (next slide)
34Top-down parsing (cont.)
- Let input a1a2...an
- current sentential form (csf) S
- loop
- suppose csf t1...tkA?
- if t1...tk ??a1...ak , its an error
- based on ak1..., choose production A ??
- csf becomes t1...tk??
-
35Top-down parsing example
- Grammar H L ??E L E
E ??a b - Input ab
- Parse tree Sentential form Input
L
L
ab
EL
L
ab
E
L
L
aL
ab
E
L
a
36Top-down parsing example (cont.)
- Parse tree Sentential form Input
L
aE
ab
E
L
a
E
ab
L
ab
E
L
a
E
b
37LL(1) parsing
- Efficient form of top-down parsing.
- Use only first symbol of remaining input (ak1)
to choose next production. That is, employ a
function M? ? N? P in choose production step
of algorithm. - When this works, grammar is (usually) called
LL(1). (More precise definition to follow.)
38LL(1) examples
- Example 1
- H L ??E L E E ??a b
- Given input ab, so next symbol is a.
- Which production to use? Cant tell.
- ? H not LL(1).
39LL(1) examples
- Example 2
- Exp ?? Term Exp
- Exp ? Exp
- Term ??id
- (Use for end-of-input symbol.)
Grammar is LL(1) Exp and Term have only one
production Exp has two productions but only
one is applicable at any time.
40Nonrecursive predictive parsing
- It is possible to build a nonrecursive predictive
parser by maintaining as stack explicitly, rather
tan implicitly via recursive calls. - The key problem during predictive parsing is that
of determining the production to be applied for a
non-terminal.
41Nonrecursive predictive parsing
- Algorithm. Nonrecursive predictive parsing
- Set ip to point to the first symbol of w.
- repeat
- Let X be the top of the stack symbol and a the
symbol pointed to by ip - if X is a terminal or then
- if X a then
- pop X from the stack and advance ip
- else error()
- else // X is a nonterminal
- if MX,a X?Y1 Y2 Y k then
- pop X from the stack
- push YkY k-1, , Y1 onto the stack with Y1 on
top - (push nothing if Y1 Y2 Y k is ? )
- output the production X?Y1 Y2 Y k
- else error()
- until X
42LL(1) grammars
- No left recursion.
- A ?? Aa If this production is chosen, parse
makes no progress. - No common prefixes.
- A ?? ab ag
- Can fix by left factoring
- A ?? aA
- A ? b g
43LL(1) grammars (cont.)
- No ambiguity.
- Precise definition requires that production to
choose be unique (choose function M very hard
to calculate otherwise).
44Top-down Parsing
L
Start symbol and root of parse tree
Input tokens ltt0,t1,,t-i,...gt
E0 E-n
L
Input tokens ltt-i,...gt
E0 E-n
From left to right, grow the parse tree
downwards
...
45Checking LL(1)-ness
- For any sequence of grammar symbols ?, define
set FIRST(a) ? S to be those tokens a such that a
? ?ab for some b. - (Notation write a ?ab.)
46Checking LL(1)-ness
- Define Grammar G (N, ?, P, S) is LL(1) if
whenever there are two left-most derivations (in
which the leftmost non-terminal is always
expanded first ) - S gt wA? gt w?? gt wx
- S gt wA? gt w?? gt wy
- Such that FIRST(x) FIRST(y), it follows that ?
?. - In other words, given
- 1. A string wA? in V and
- 2. The first terminal symbol to be derived from
A?, say t - There is at most one production that can be
applied to A to - yield a derivation of any terminal string
beginning with wt. - FIRST sets can often be calculated by
inspection.
47FIRST Sets
Exp ?? Term Exp Exp ? Exp Term
??id (Use for end-of-input symbol.)
FIRST(Term Exp) id FIRST() ,
FIRST( Exp) implies FIRST() ?
FIRST( Exp) FIRST(id) id ? grammar
is LL(1)
48FIRST Sets
H L ??E L E E ??a b
FIRST(E L) a,b FIRST(E) FIRST(E L) ?
FIRST(E) ? ? H not LL(1).
49How to compute FIRST Sets of Vocabulary Symbols
- Algorithm. Compute FIRST(X) for all grammar
symbols X - forall X ? V do FIRST(X)
- forall X ? ? (X is a terminal) do FIRST(X)X
- forall productions X ? ? do FIRST(X) FIRST(X)
U ? - repeat
- forall productions X?Y1 Y2 Y k do
- forall i ? 1,k do
- FIRST(X) FIRST(X) U (FIRST(Yi) - ?)
if ? ? FIRST(Y i ) then continue outer loop - FIRST(X) FIRST(X) U ?
- until no more terminals or ? are added to any
FIRST set
50How to compute FIRST Sets of Strings of Symbols
- FIRST(X1X2Xn) is the union of FIRST(X1) and all
FIRST(Xi) such that ? ? FIRST(X k ) for
k1,2,..,i-1 - FIRST(X1X2Xn) contains ? iff ? ? FIRST(X k ) for
k1,2,..,n.
51FIRST Sets do not Suffice
- Given the productions
- A? T x
- A? T y T? w T? e
- T? w should be applied when the next input token
is w. - T? e should be applied whenever the next terminal
(the one pointed to by ip) is either x or y
52FOLLOW Sets
- For any nonterminal X, define set FOLLOW(X) ? S
to be those tokens a such that S ?aXab for some
a and b. -
53How to compute the FOLLOW Set
- Algorithm. Compute FOLLOW(X) for all nonterminals
X - FOLLOW(S)
- forall productions A ? ?B? do FOLLOW(B)Follow(B)
U (FIRST(?) - ?) - repeat
- forall productions A ? ?B or A ? ?B? with ? ?
FIRST(?) do - FOLLOW(B) FOLLOW(B) U FOLLOW(A)
- until all FOLLOW sets remain the same
54Construction of a predictive parsing table
- Algorithm. Construction of a predictive parsing
table - M,
- forall productions A ? ? do
- forall a ? FIRST(?) do
- MA,a MA,a U A ? ?
- if ? ? FIRST(?) then
- forall b ? FOLLOW(A) do
- MA,b MA,b U A ? ?
- Make all empty entries of M be error
55Another Definition of LL(1)
- Define Grammar G is LL(1) if for every A? N
with productions A ? a1 . . . an, - FIRST(ai FOLLOW(A)) ? FIRST(aj FOLLOW(A) ) ?
for all i, j