Title: Context-Free Languages
1Context-Free Languages
Programming Language Translators
- Prepared by
- Manuel E. Bermúdez, Ph.D.
- Associate Professor
- University of Florida
2Context-Free Grammars
- Definition A context-free grammar (CFG) is a
quadruple G (?, ?, P, S), where all productions
are of the form A ? ?, where - A ? ? and ? ? (?u? ).
- Left-most derivation At each step, the
left-most nonterminal is re-written. - Right-most derivation At each step, the
right-most nonterminal is re-written. -
3(No Transcript)
4Derivation Trees
- Derivation trees Describe re-writes,
independently of the order (left-most or
right-most). - Each tree branch matches a production rule in the
grammar.
5(No Transcript)
6Derivation Trees (contd)
- Notes
- Leaves are terminals.
- Bottom contour is the sentence.
- Left recursion causes left branching.
- Right recursion causes right branching.
7Goals of Parsing
- Examine input string, determine whether it's
legal. - Equivalent to building derivation tree.
- Added benefit tree embodies syntactic structure
of input. - Therefore, tree should be unique.
8Grammar Ambiguity
- Definition A CFG is ambiguous if there exist
two different right-most (or left-most, but not
both) derivations for some sentence z. - (Equivalent) Definition A CFG is ambiguous if
there exist two different derivation trees for
some sentence z.
9Ambiguous Grammars
- Classic ambiguities
- Simultaneous left/right recursion
- E ? E E
- Dangling else problem
- S ? if E then S
- ? if E then S else S
10(No Transcript)
11Grammar Reduction
- What language does this grammar generate?
- S ? a D ? EDBC
- A ? BCDEF E ? CBA
- B ? ASDFA F ? S
- C ? DDCF
- L(G) a
- Problem Many nonterminals (and productions)
cannot be used in the generation of any sentence.
12Grammar Reduction
- Definition A CFG is reduced iff for all A ? ?,
- a) S gt aAß, for some a, ß ? V,
- (we say A is generable), and
- b) A gt z, for some z ? S
- (we say A is terminable)
- G is reduced iff every nonterminal A is both
generable and terminable.
13Grammar Reduction
- Example S ? BB A ? aA
- B ? bB ? a
- B is not terminable, since B gt z, for any z ?
S. - A is not generable, since S gt aAß, for any
a,ß?V.
14Grammar Reduction
- To find out which nonterminals are generable
- Build the graph (?, d), where (A, B) ? d iff
- A ? aBß is a production.
- Check that all nodes are reachable from S.
15Grammar Reduction
- Example S ? BB A ? aA
- B ? bB ? a
- A is not reachable
- from S,
so A is not -
generable.
S
B
A
16Grammar Reduction
- Algorithmically,
- Generable S
- while(Generable changes) do
- for each A ? ?Bß do
- if A ? Generable then
- Generable Generable U B
- od
- Now, Generable contains the
- nonterminals that are generable
-
17Grammar Reduction
- To find out which nonterminals are terminable
- Build the graph (2?, d), where
- (N, N U A) ? d iff
- A ? X1 Xn is a production, and for all i,
- either Xi ? S or Xi ? N.
- Check that the node ? (set of all nonterminals)
is reachable from node ø (empty set).
18Grammar Reduction
- Example S ? BB A ? aA
- B ? bB ? a
- A, S, B not reachable from ø ! Only A is
reachable from ø. Thus S and B are not terminable.
19Grammar Reduction
- Algorithmically,
- Terminable
- while (Terminable changes) do
- for each A ? X1Xn do
- if every nonterminal among the Xs
- is in Terminable then
- Terminable Terminable U A
- od
- Now, Terminable contains the nonterminals
- that are terminable.
-
20Grammar Reduction
- Reducing a grammar
- Find all generable nonterminals.
- Find all terminable nonterminals.
- Remove any production A ? X1 Xn
- if either a) A is not generable
- b) any Xi is not terminable
- If the new grammar is not reduced, repeat the
process.
21Grammar Reduction
- Example E ? E T F ? not F
- ? T Q ? P / Q
- T ? F T P ? (E)
- ? P ? i
- Generable E, T, F, P, not Generable Q
-
- Terminable P, T, E, not Terminable F, Q
- So, eliminate every production for Q, and every
- production whose right-part contains either F or
Q.
22Grammar Reduction
- New Grammar
- E ? E T
- ? T
- T ? P
- P ? (E)
- ? i
-
- Generable E , T, P Now, grammar
- Terminable P, T, E is reduced.
23Operator Precedence and Associativity
- Lets build a CFG for expressions consisting of
- elementary identifier i.
- and - (binary ops) have lowest precedence, and
are left associative . - and / (binary ops) have middle precedence, and
are right associative. - and - (unary ops) have highest precedence, and
are right associative.
24Sample Grammar for Expressions
- E ? E T E consists of T's,
- ? E - T separated by s and 's
- ? T (lowest precedence).
- T ? F T T consists of F's,
- ? F / T separated by 's and /'s
- ? F (next precedence).
- F ? - F F consists of a single P,
- ? F preceded by 's and -'s.
- ? P (next precedence).
- P ? '(' E ')' P consists of a parenthesized
E, - ? i or a single i (highest
precedence).
25Operator Precedence and Associativity (contd)
- Operator Precedence
- The lower in the grammar, the higher the
precedence. - Operator Associativity
- left recursion in the grammar means left
associativity of the operator, and causes left
branching in the tree. - right recursion in the grammar means right
associativity of the operator, and causes right
branching in the tree.
26Building Derivation Trees
- Sample Input
- - i - i ( i i ) / i i
- (Human) derivation tree construction
- Bottom-up.
- On each pass, scan entire expression, process
operators with highest precedence (parentheses
are highest). - Lowest precedence operators are last, at the top
of tree.
27(No Transcript)
28Operator Precedence and Associativity
- Exercise
- Write a grammar for expressions that consists
of - elementary identifier i.
- , , are next (left associative)
- , , are next (right associative)
- _at_, ! have highest precedence (left
associative.) - Parentheses override precedence and associativity.
29Precedence and Associativity
- Grammar E0 ? E0 E1
- ? E0 E1
- ? E0 E1
- ? E1
- E1 ? E2 E1
- ? E2 E1
- ? E2
- E2 ? E2 _at_ E3
- ? E2 ! E3
- ? E3
- E3 ? (E0)
- ? i
30Operator Precedence and Associativity
- Example Construct the derivation tree for
- i i _at_ i i ( i i i ! ) ( i i ) i
_at_ i - Easier to construct the tree from the leaves to
the root. - On each pass, scan the entire expression, and
process first the operators with highest
precedence. - Leave operators with lowest precedence for last.
31Derivation Tree
32Transduction Grammars
- Definition A transduction grammar (a.k.a.
syntax-directed translation scheme) is like a
CFG, except for the following generalization - Each production is a triple (A, ß, ?) ? ? x V x
V, called a translation rule, denoted A ? ß gt
?, where - A is the left part,
- ß is the right part, and
- ? is the translation part.
33Sample Transduction Grammar
- Translation of infix to postfix expressions.
- E ? E T gt E T
- ? T gt T
- T ? P T gt P T
- ? P gt P
- P ? (E) gt E Note ()s discarded
- ? i gt i
- The translation part describes how the output is
generated, as the input is derived.
34Sample Transduction Grammar
- We keep track of a pair (?, ß), where ? and ß are
the sentential forms of the input and output. - ( E, E )
- gt ( E T, E T )
- gt ( T T, T T )
- gt ( P T, P T )
- gt ( i T, i T )
- gt ( i P T, i P T )
- gt ( i i T, i i T )
- gt ( i i i, i i i )
35String to Tree Transduction
- Transduction to Abstract Syntax Trees
- Notation lt N t1 tn gt denotes
- String-to-tree transduction grammar
- E ? E T gt lt E T gt
- ? T gt T
- T ? P T gt lt P T gt
- ? P gt P
- P ? (E) gt E
- ? i gt i
N
t1 tn
36String to Tree Transduction
- Example
- (E, E)
- gt (E T, lt E T gt)
- gt (T T, lt T T gt)
- gt (P T, lt P T gt)
- gt (i T, lt i T gt)
- gt (i P T, lt i lt P T gt gt)
- gt (i i T, lt i lt i T gt gt)
- gt (i i P, lt i lt i P gt gt)
- gt (i i i, lt i lt i i gt gt)
i
i
i
37String to Tree Transduction
- Definition A transduction grammar is simple if
for every rule A ? ? gt ß, the sequence of
nonterminals appearing in ? is identical to the
sequence appearing in ß. - Example E ? E T gt lt E T gt
- ? T gt T
- T ? P T gt lt P T gt
- ? P gt P
- P ? (E) gt E
- ? i gt i
38String to Tree Transduction
- For notational convenience, we dispense with both
the nonterminals and the tree notation in the
translation parts, leaving - E ? E T gt
- ? T
- T ? P T gt
- ? P
- P ? (E)
- ? i gt i Look familiar ?
39Abstract Syntax Trees
- AST is a condensed version of the derivation
tree. - No noise (intermediate nodes).
- Result of simple String-to-tree transduction
grammar. - Rules of the form A ? ? gt 's'.
- Build 's' tree node, with one child per tree from
each nonterminal in ?. - We transduce from vocabulary of input symbols
(which appear in ?), to vocabulary of tree node
names.
40Sample AST
Input - i - i ( i i ) / i i
DT
G
AST
41The Game of Syntactic Dominoes
- The grammar
- E ? ET T ? PT P ? (E)
- ? T ? P ? i
- The playing pieces An arbitrary supply of each
piece (one per grammar rule). - The game board
- Start domino at the top.
- Bottom dominoes are the "input".
42(No Transcript)
43Parsing The Game of Syntactic Dominoes (contd)
- Game rules
- Add game pieces to the board.
- Match the flat parts and the symbols.
- Lines are infinitely elastic.
- Object of the game
- Connect start domino with the input dominoes.
- Leave no unmatched flat parts.
44Parsing Strategies
- Same as for the game of syntactic dominoes.
- Top-down parsing start at the start symbol,
work toward the input string. - Bottom-up parsing start at the input string,
work towards the goal symbol. - In either strategy, can process the input
left-to-right ? or right-to-left ?
45Top-Down Parsing
- Attempt a left-most derivation, by predicting the
re-write that will match the remaining input. - Use a string (a stack, really) from which the
input can be derived.
46Top-Down Parsing
- Start with S on the stack.
- At every step, two alternatives
- ? (the stack) begins with a terminal t. Match t
against the first input symbol. - ? begins with a nonterminal A. Consult an OPF
(omniscient parsing function) to determine which
production for A would lead to a match with the
first symbol of the input. - The OPF does the predicting in such a
predictive parser.
47(No Transcript)
48Classical Top-Down Parsing Algorithm
- Push (Stack, S)
- while not Empty (Stack) do
- if Top(Stack) ??
- then if Top(Stack) Head(input)
- then input tail(input)
- Pop(Stack)
- else error (Stack, input)
- else P OPF (Stack, input)
- Push (Pop(Stack), RHS(P))
- od
-
49(No Transcript)
50Top-Down Parsing (contd)
- Most parsing methods impose bounds on the amount
of stack lookback and input lookahead. For
programming languages, a common choice is (1,1). - We must define OPF (A,t), where A is the top
element of the stack, and t is the first symbol
on the input. - Storage requirements O(n2), where n is the size
of the grammar vocabulary - (a few hundred).
51Top-Down Parsing
A
?
t
- OPF (A, t) A ? ? if
- ? gt t?, for some ?.
- ? gt e, and S gt ?A?t?, for some ?, ?, where ?
gt e.
or
52Top-Down Parsing
- Example S ? A B ? b
- (illustrating 1) A ? BAd C ? c
- ? C
- OPF b c d
- B B ? b B ? b B ? b
- C C ? c C ? c C ? c
- S S ? A S ? A S ? A
- A A ? BAd A ? C ???
- OPF (A, b) A ? BAd because BAd gt bAd
- OPF (A, c) A ? C because C gt c
- i.e., B begins with b, and C begins with c.
Tan entries are optional. So is the ??? entry.
53Top-Down Parsing
- Example (illustrating 2) S ? A A ? bAd
-
? - OPF b d ?
- S S ? A S ? A
- A A ? bAd A ? A ?
- OPF (S, b) S ? A , because A gt bAd
- OPF (S, d) -------- , because S gt
aS?dß - OPF (S, ? ) S ? A , because S? is legal
- OPF (A, b) A ? bAd , because A gt bAd
- OPF (A, d) A ? , because S gt bAd
- OPF (A, ? ) A ? , because S? gtA?
54Top-Down Parsing
- Definition
- First (A) t / A gt t?, for some ?
- Follow (A) t / S gt ?Atß, for some ?, ß
- Computing First sets
- Build graph (?, d), where (A,B) ? d if
- B ? ?A?, ? gt e (First(A) ? First(B))
- Attach to each node an empty set of terminals.
- Add t to the set for A if A ? ?A?, ? gt e.
- Propagate the elements of the sets along the
edges of the graph.
55Top-Down Parsing
- Example S ? ABCD A ? CDA C ? A
- B ? BC ? a D ? AC
- ? b ?
- Nullable A, C, D
a, b
b
S
B
White after step 3 Tan after step 4
A
C
a
a
D
a
56Top-Down Parsing
- Computing Follow Sets
- Build graph (?, d), where (A,B) ? d if
- A ? ?B?, ? gt e.
-
- Follow(A) ? Follow(B), because any symbol X that
follows A, also follows B.
A
X
?
B
a
e
57Top-Down Parsing
- Attach to each node an empty set of terminals.
Add ? to the set for the start symbol. - Add First(X) to the set for A (i.e. Follow(A)) if
- B ? ?A?X?, ? gt e.
- Propagate the elements of the sets along the
edges of the graph.
58Top-Down Parsing
- Example S ? ABCD A ? CDA C ? A
- B ? BC ? a D ? AC
- ? b ?
- Nullable A, C, D First(S) a, b
- First(C) a
- First(A) a
- First(D) a
- First(B) b
a
-
,
S
B
-
a
,b,
A
C
a,b,
-
-
White after step 3 Tan after step 4
a
,b,
D
-
59Top-Down Parsing
- So,
- Follow(S) ?
- Follow(A) Follow(C) Follow(D) a, b, ?
- Follow(B) a, ?
60Top-Down Parsing
- Back to Parsing
- We want OPF(A, t) A ? ? if either
- t ? First(?),
- i.e. ? gt tß
- ? gt e and t ? Follow(A),
- i.e. S gt ?A?
- gt ?Atß
A a
?
t ß
A a
?
e
t ß
61Top-Down Parsing
- Definition Select (A? ?)
- First(?) U
- if ? gt e then Follow(A)
- else ø
- So PT(A, t) A ? ? if t ? Select(A ? ?)
- Parse Table, rather than OPF, because it isnt
- omniscient.
62Top-Down Parsing
- Example First (S) a, b Follow (S) ?
- First (A) a Follow(A) a, b, ?
- First (B) b Follow(B) a, ?
- First (C) a Follow (C) a, b, ?
- First (D) a Follow(D) a, b, ?
- Grammar Selects sets
S ? ABCD a, b B ? BC b
? b b A ? CDA a, b, ? ? a a
? a, b, ? C ? A a, b, ? D ? AC a, b,
?
Grammar is not LL(1)
63Top-Down Parsing
Non LL(1) grammar multiple entries in PT.
S ? ABCD a, b C ? A
a, b, ? B ? BC b D ? AC
a, b, ? ? b b A ? CDA a, b,
? ? a a ? a, b,
?
- a b -
- S S ? ABCD S ? ABCD
- A A ? CDA, A? a, A ? A ? CDA, A ? A ?
CDA,A ? - B B ? BC, B ? b
- C C ? A C ? A C ? A
- D D ? AC D ? AC D ? AC
64LL(1) Grammars
- Definition A CFG G is LL(1)
- ( Left-to-right, Left-most, (1)-symbol lookahead)
- iff for all A? ?, and for all productions
- A??, A ?? with ? ? ?,
- Select (A ? ?) n Select (A ? ?) ?
- Previous example grammar is not LL(1).
- More later on what do to about it.
65Sample LL(1) Grammar
- S ? A b,?
- A ? bAd b
- ? d, ?
Disjoint! Grammar is LL(1) !
d b ?
S S ? A S ? A
A A ? A ? bAd A ?
One production per entry.
66Example
- Build the LL(1) parse table for the following
grammar. - S ? begin SL end begin
- ? id E id
- SL ? SL S begin,id
- ? S begin,id
- E ? ET (, id
- ? T (, id
- T ? PT (, id
- ? P (, id
- P ? (E) (
- ? id id
- not LL(1)
67(No Transcript)
68Example (contd)
- Lemma Left recursion always produces a
non-LL(1) grammar (e.g., SL, E above) - Proof Consider
- A ? A? First (?) or Follow (A)
- ? ? First (?) Follow (A)
69Problems with our Grammar
- SL is left recursive.
- E is left recursive.
- T ? P T both begin with the same ? P
sequence of symbols (P).
70Solution to Problem 3
- Change T ? P T (, id
- ? P (, id
- to T ? P X (, id
- X ? T
- ? , , )
- Follow(X)
- Follow(T) due to T ? P X
- Follow(E) due to E ? ET , E ? T
- , , ) due to E ? ET, S ? id E
- and P ? (E)
Disjoint!
71Solution to Problem 3 (contd)
- In general, change
- A ? ??1
- ? ??2
- . . .
- ? ??n
- to A ? ? X
- X ? ?1
- . . .
- ? ?n
Hopefully all the ?s begin with different symbols
72Solution to Problems 1 and 2
- We want (((( T T) T) T))
- Instead, (T) (T) (T) (T)
- Change E ? E T (, id
- ? T (, id
- To E ? T Y (, id
- Y ? T Y
- ? , )
- Follow(Y) ? Follow(E)
- , )
No longer contains , because we eliminated the
production E ? E T
73Solution to Problems 1 and 2 (contd)
- In general,
- Change A ? A?1 A ? ? 1
- . . . . . .
- ? A?n ? ? m
- to A ? ?1 X X ? ?1 X
- . . . . . .
- ? ?m X ? ?n X
- ?
74Solution to Problems 1 and 2 (contd)
- In our example,
- Change SL ? SL S begin, id
- ? S begin, id
- To SL ? S Z begin, id
- Z ? S Z begin, id
- ? end
75Modified Grammar
- S ? begin SL end begin
- ? id E id
- SL ? S Z begin,id
- Z ? S Z begin,id
- ? end
- E ? T Y (,id
- Y ? T Y
- ? ,)
- T ? P X (,id
- X ? T
- ? ,,)
- P ? (E) (
- ? id id
-
Disjoint. Grammar is LL(1)
76(No Transcript)
77(No Transcript)
78Recursive Descent Parsing
- Top-down parsing strategy, suitable for LL(1)
grammars. - One procedure per nonterminal.
- Contents of stack embedded in recursive call
sequence. - Each procedure commits to one production, based
on the next input symbol, and the select sets. - Good technique for hand-written parsers.
79Sample Recursive Descent Parser
- proc S S ? begin SL end
- ? id E
- case Next_Token of
- T_begin Read(T_begin)
- SL
- Read (T_end)
- T_id Read(T_id)
- Read (T_)
- E
- Read (T_)
- otherwise Error
- end
- end
Read (T_X) verifies that the upcoming token is
X, and consumes it.
Next_Token is the upcoming token.
80Sample Recursive Descent Parser
- proc SL SL ? SZ
- S
- Z
- end
- proc E E ? TY
- T
- Y
- end
Technically, should have insisted that Next Token
be either T_begin or T_id, but S will do that
anyway. Checking early would aid error
recovery.
// Ditto for T_( and T_id.
81Sample Recursive Descent Parser
- proc ZZ ? SZ
- ?
- case Next Token of
- T_begin, T_id SZ
- T_end
- otherwise Error
- end
- end
82Sample Recursive Descent Parser
Could have used a case statement
- proc Y Y ? TY
- ?
- if Next Token T_ then
- Read (T_)
- T
- Y
- end
- proc T T ? PX
- P
- X
- end
Could have checked for T_( and T_id.
83Sample Recursive Descent Parser
- proc XX ? T
- ?
- if Next Token T_ then
- Read (T_)
- T
- end
84Sample Recursive Descent Parser
- proc P P ?(E)
- ? id
- case Next Token of
- T_( Read (T_()
- E
- Read (T_))
- T_id Read (T_id)
- otherwise Error
- end
- end
85String-To-Tree Transduction
- Can obtain derivation or abstract syntax tree.
- Tree can be generated top-down, or bottom-up.
- We will show how to obtain
- Derivation tree top-down
- AST for the original grammar, bottom-up.
86TD Generation of Derivation Tree
- In each procedure, and for each alternative,
write out the appropriate production AS SOON AS
IT IS KNOWN
87TD Generation of Derivation Tree
- proc S S ? begin SL end
- ? id E
- case Next_Token of
- T_begin Write(S ? begin SL end)
- Read(T_begin)
- SL
- Read(T_end)
-
88TD Generation of Derivation Tree
- T_id Write(S ? id E)
- Read(T_id)
- Read (T_)
- E
- Read (T_)
- otherwise Error
- end
- end
89TD Generation of Derivation Tree
- proc SL SL ? SZ
- Write(SL ? SZ)
- S
- Z
- end
- proc E E ? TY
- Write(E ? TY)
- T
- Y
- end
90TD Generation of Derivation Tree
- proc Z Z ? SZ
- ?
- case Next_Token of
- T_begin, T_id Write(Z ? SZ)
- S
- Z
- T_end Write(Z ? )
- otherwise Error
- end
- end
91TD Generation of Derivation Tree
- proc Y Y ? TY
- ?
- if Next_Token T_ then
- Write (Y ? TY)
- Read (T_)
- T
- Y
- else Write (Y ? )
- end
92TD Generation of Derivation Tree
- proc T T ? PX
- Write (T ? PX)
- P
- X
- end
- proc XX ? T
- ?
-
93TD Generation of Derivation Tree
- if Next_Token T_ then
- Write (X ? T)
- Read (T_)
- T
- else Write (X ? )
- end
94TD Generation of Derivation Tree
- proc PP ? (E)
- ? id
- case Next_Token of
- T_( Write (P ? (E))
- Read (T_()
- E
- Read (T_))
- T_id Write (P ? id)
- Read (T_id)
- otherwise Error
- end
95Notes
- The placement of the Write statements is obvious
precisely because the grammar is LL(1). - Can build the tree as we go, or have it built
by a post-processor.
96Example
- Input String
- begin id (id id) id end
- Output
-
S ? begin SL end SL ? SZ S ? id E E ? TY T ?
PX P ? (E) E ? TY T ? PX P ? id X ?
Y ? TY T ? PX P ? id X ? Y ? X ? T T ? PX P ?
id X ? Y ? Z ?
97(No Transcript)
98Bottom-up Generation of the Derivation Tree
- We could have placed the write statements at the
END of each phrase, instead of the beginning. If
we do, the tree will be generated bottom-up. - In each procedure, and for each alternative,
write out the production A ? ? AFTER ? is parsed.
99BU Generation of the Derivation Tree
- proc SS ? begin SL end
- ? id E
- case Next_Token of
- T_begin Read (T_begin)
- SL
- Read (T_end)
- Write (S ? begin SL end)
- T_id Read (T_id)
- Read (T_)
- E
- Read (T_)
- Write (S ? idE)
- otherwise Error
- end
100BU Generation of the Derivation Tree
- proc SL SL ? SZ
- S
- Z
- Write(SL ? SZ)
- end
- proc E E ? TY
- T
- Y
- Write(E ? TY)
- end
101BU Generation of the Derivation Tree
- proc Z Z ? SZ
- ?
- case Next_Token of
- T_begin, T_id S
- Z
- Write(Z ? SZ)
- T_end Write(Z ? )
- otherwise Error
- end
- end
102BU Generation of the Derivation Tree
- proc Y Y ? TY
- ?
- if Next_Token T_ then
- Read (T_)
- T
- Y
- Write (Y ? TY)
- else Write (Y ? )
- end
103BU Generation of the Derivation Tree
- proc T T ? PX
- P
- X
- Write (T ? PX)
- end
- proc XX ? T
- ?
- if Next_Token T_ then
- Read (T_)
- T
- Write (X ? T)
- else Write (X ? )
- end
104BU Generation of the Derivation Tree
- proc PP ? (E)
- ? id
- case Next_Token of
- T_( Read (T_()
- E
- Read (T_))
- Write (P ? (E))
- T_id Read (T_id)
- Write (P ? id)
- otherwise Error
- end
105Notes
- The placement of the Write statements is still
obvious. - The productions are emitted as procedures quit,
not as they start. - Productions emitted in reverse order, i.e., the
sequence of productions must be used in reverse
order to obtain a right-most derivation. - Again, can built tree as we go (need stack of
trees), or later.
106Example
- Input String
- begin id (id id) id end
- Output
-
P ? id X ? T ? PX P ? id X ? T ? PX Y ? Y ?
TY E ? TY P ? (E)
P ? id X ? T ? PX X ? T T ? PX Y ? E ? TY S ?
idE Z ? SL ? SZ S ? begin SL end
107(No Transcript)
108Replacing Recursion with Iteration
- Not all the nonterminals are needed.
- The recursion in SL, X, Y and Z can be replaced
with iteration.
109Replacing Recursion with Iteration
SL ? S Z Z ? S Z ?
- proc S S ? begin SL end
- ? id E
-
- case Next_Token of
- T_begin Read(T_begin)
- repeat
- S
- until Next_Token ? T_begin,T_id
- Read(T_end)
- T_id Read(T_id)
- Read (T_)
- E
- Read (T_)
- otherwise Error
- end
- end
SL
Replaces call to SL.
Replaces recursion on Z.
110Replacing Recursion with Iteration
- proc E E ? TY
- Y ? TY
- ?
- T
- while Next_Token T_ do
- Read (T_)
- T
- od
- end
Replaces recursion on Y.
111Replacing Recursion with Iteration
- proc T T ? PX
- X ? T
- ?
- P
- if Next_Token T_
- then Read (T_)
- T
- end
Replaces call to X.
112Replacing Recursion with Iteration
- proc PP ? (E)
- ? id
- case Next_Token of
- T_( Read (T_()
- E
- Read (T_))
- T_id Read (T_id)
- otherwise Error
- end
- end
113Construction of Derivation Tree for the Original
Grammar (Bottom Up)
- proc S (1)S ? begin SL end (2)S ? begin SL
end - ? id E ? id E
- SL ? SZ SL ? SL S
- Z ? SZ ? S
- ?
- case Next_Token of
- T_begin Read(T_begin)
- S
- Write (SL ? S)
- while Next_Token in T_begin,T_id do
- S
- Write (SL ? SL S)
- od
- Read(T_end)
- T_id Read(T_id)
- Read (T_)
- E
- Read (T_)
- Write (SL ? id E)
114Construction of Derivation Tree for the Original
Grammar (Bottom Up)
- proc E (1)E ? TY (2) E ? ET
- Y ? TY ? T
- ?
- T
- Write (E ? T)
- while Next_Token T_ do
- Read (T_)
- T
- Write (E ? ET)
- od
- end
115Construction of Derivation Tree for the Original
Grammar (Bottom Up)
- proc T (1)T ? PX (2) T ? PT
- X ? T ? P
- ?
- P
- if Next_Token T_
- then Read (T_)
- T
- Write (T ? PT)
- else Write (T ? P)
- end
116Construction of Derivation Tree for the Original
Grammar (Bottom Up)
- proc P(1)P ? (E) (2)P ? (E)
- ? id ? id
-
- // SAME AS BEFORE
- end
117Example
- Input String
- begin id (id id) id end
- Output
-
P ? id T ? P E ? T P ? id T ? P E ? ET P ? (E) P
? id T ? P
T ? PT E ? T S ? idE SL? S S ? begin SL end
118(No Transcript)
119Generating the Abstract Syntax Tree, Bottom Up,
for the Original Grammar
- proc S S ? begin S end ? 'block'
- ? id E ? 'assign'
- var Ninteger
- case Next_Token of
- T_begin Read(T_begin)
- S
- N1
- while Next_Token in T_begin,T_id do
- S
- NN1
- od
- Read(T_end)
- Build Tree ('block',N)
- T_id Read(T_id)
- Read (T_)
- E
- Read (T_)
- Build Tree ('assign',2)
- otherwise Error
Build Tree (x,n) pops n trees from the stack,
builds an x node as their parent, and pushes
the resulting tree.
Assume this builds a node.
120Generating the Abstract Syntax Tree, Bottom Up,
for the Original Grammar
- proc E E ? ET ?''
- ? T
- T
- while Next_Token T_ do
- Read (T_)
- T
- Build Tree ('',2)
- od
- end
Left branching in tree!
121Generating the Abstract Syntax Tree, Bottom Up,
for the Original Grammar
- proc T T ? PT ?''
- ? P
- P
- if Next_Token T_
- then Read (T_)
- T
- Build Tree ('',2)
- end
Right branching in tree!
122Generating the Abstract Syntax Tree, Bottom Up,
for the Original Grammar
- proc PP ? (E)
- ? id
- // SAME AS BEFORE,
- // i.e.,no trees built
- end
123Example
- Input String
- begin id1 (id2 id3) id4 end
- Sequence of events
id1
id4
id2
BT('',2) BT('assign',2) BT('block',1)
id3
BT('',2)
124(No Transcript)
125Summary
- Bottom-up or top-down tree construction.
- Original or modified grammar.
- Derivation Tree or Abstract Syntax Tree.
- Technique of choice
- Top-down, recursive descent parser.
- Bottom-up tree construction for the original
grammar.
126LR Parsing
- Procedures in the recursive descent code can be
annotated with items, i.e. productions with a
dot marker somewhere in the right-part. - We can use the items to describe the operation of
the recursive descent parser. - There is an FSA that describes all possible
calling sequences in the R.D. parser.
127Recursive Descent Parser with items
- Example
- proc E E ? .E T, E ?.T
- T E ? E. T, E ? T.
- while Next_Token T_ do
- E ? E. T
- Read(T_) E ? E .T
- T E ? E T.
- od
- E ? E T. E ? T.
- end
T
T
T
128FSA Connecting Items
- The FSA is
- M (DP, V, ?, S ? .S?, S ? S?.)
- where DP is the set of all possible items (DP
dotted productions), and ? is defined such that - simulate a call to B
-
- simulate the execution of statement
- X, if X is a nonterminal, or
- Read(X), if X is a terminal.
?
1
A ? a.Bß
B ? . ?
X
2
A ? a.Xß
A??X.ß
129FSA Connecting Items
- Example E ? E T T ? i S ? E ?
- ? T T ? (E)
E
-
S ? . E?
S ? E ? .
S ? E . ?
e
T
E ? . T
E ? T .
e
e
e
i
e
T ? . i
T ? i .
e
(
E
T ? . (E)
T ? (E) .
T ? (.E)
)
e
T ? (E.)
e
e
e
E
T
E ? .E T
E ? E. T
E ? E . T
E ? E T.
130FSA Connecting Items
- Need to run this machine with the aid of a stack,
i.e. need to keep track of the recursive calling
sequence. - To return from A ? ?., back up ? 1 states,
then advance on A. - Problem with this machine it is
nondeterministic. - No problem. Be happy ?. Transform it to a DFA !
131Deterministic FSA Connecting Items
-
E
-
-
-
S ? . E
S ? E .
S ? E .
E ? .E T
E ? E. T
E ? . T
T ? . i
i
i
T
E ? E T.
E ? E . T
T ? i .
T ? . (E)
T ? .i
(
T ? .(E)
i
T
(
T
E ? T .
T ? (.E)
E ? .E T
E
)
T ? (E.)
(
T ? (E) .
E ? .T
E ? E. T
T ? .i
T ? .(E)
- THIS IS AN LR(0) AUTOMATON
132LR Parsing
- LR means Left-to-Right, Right-most Derivation.
- Need a stack of states to operate the parser.
- No look-ahead required, thus LR(0).
- DFA describes all possible positions in the R.D.
parsers code. - Once the automaton is built, items can be
discarded.
133LR Parsing
- Operation of an LR parser
- Two moves shift and reduce.
- Shift Advance from current state on Next_Token,
push new state on stack. - Reduce (on A ? ?). Pop ? states from stack.
Advance from new top state on A.
134LR Parsing
- Stack Input Derivation Tree
- 1 i (i i) i (
i i ) - 14 (i i)
- 13 (i i)
T - 12 (i i)
- 127 (i i)
E - 1275 i i)
- 12754 i)
- 12753 i)
T - 12758 i)
- 127587 i)
E - 1275874 )
- 1275879 )
T - 12758 )
E - 12758 10
- 1279
T - 12
E - 126 ------
E ? T
T
1
3
(
T
E
i
(
i
2
4
5
T?i
i
E
(
-
6
7
8
)
T
9
10
T ? (E)
E ? ET
135LR Parsing
- Table Representation of LR Parsers
- Two Tables
- Action Table indexed by state, and by terminal
symbol. Contains all shift and reduced moves. - GOTO Table indexed by state, and by nonterminal
symbol. Contains all transitions on nonterminals
symbols.
136LR Parsing
ACTION GOTO
i ( ) E T
-
E ? T
1 S/4 S/5 2 3
2 S/7 S/6
3 R/E?T R/E?T R/E?T R/E?T R/E?T
4 R/T? i R/T? i R/T? i R/T? i R/T? i
5 S/4 S/5 8 3
6 Accept Accept Accept Accept Accept
7 S/4 S/5 9
8 S/7 S/10
9 R/ E ?ET R/ E ?ET R/ E ?ET R/ E ?ET R/ E ?ET
10 R/ T ? (E) R/ T ? (E) R/ T ? (E) R/ T ? (E) R/ T ? (E)
T
1
3
(
T
E
i
(
2
4
5
T?i
i
i
E
(
-
6
7
8
)
T
9
10
T ? (E)
E ? ET
137LR Parsing
- Algorithm LR_Driver
- Push(Start_State, S)
- while ACTION (Top(S), ?) ? Accept do
- case ACTION (Top(S), Next_Token) of
- s/r Read(Next_Token)
- Push(r, S)
- R/A ? ? Pop(S) ? times
- Push(GOTO (Top(S), A), S)
- empty Error
- end
- end
138LR Parsing
- Direct Construction of the LR(0) Automaton
- PT(G) Closure(S ? .S ? ) U
- Closure(P) P ? Successors(P), P ? PT(G)
- Closure(P) P U A ? .w B ? a.Aß ? Closure(P)
- Successors(P) Nucleus(P, X) X ? V
- Nucleus(P, X) A ? aX .ß A ? a.Xß ? P
139LR Parsing
- Direct Construction of Previous Automaton
-
E
E
)
T ? (E.) E ? E. T
S ? .E E ? .E T E ? .T T ? .i T ? .(E)
T ? (.E) E ? .E T E ? .T T ? .i T ? .(E)
2
8
10
1
5
8
E
E
2
8
7
T
T
E ? E T.
3
3
9
i
i
4
4
T ? (E).
10
(
(
5
5
-
-
S ? E . E ? E. T
S ? E?.
2
6
6
7
E ? E .T T ? .i T ? .(E)
T
9
7
E ? T.
i
3
4
T ? i.
(
5
4
140LR Parsing
- Notes
- Two states are equal if their Nuclei are
identical. - This grammar is LR(0) because there are no
conflicts. - A conflict occurs when a state contains
- i Both a final (dot-at-the-end) item and
non-final one (shift-reduce), or - ii Two or more final items (reduce-reduce).
141LR Parsing
- Example E ? E T T ? P T P ? i
- ? T ? P P ? (E)
-
E
E
T
T ? P .T T ? .P T T ? .P P ? .i P ? .(E)
S ? .E E ? .E T E ? .T T ? .P T T ? .P P ?
.i P ? .(E)
P ? (.E) E ? .E T E ? .T T ? .P T T ? .P P ?.
i P ? .(E)
2
10
12
1
6
9
E
E
P
2
10
4
P
T
T
4
3
3
i
P
P
5
4
4
(
P
P
4
6
4
i
i
5
5
)
P ? (E.) E ? E. T
13
10
(
(
6
8
6
-
-
-
S ? E .
S ? E . E ? E. T
E ? E T.
7
11
2
7
T
E ? E .T T ? .P T T ? .P P ? .i P ? .(E)
11
8
8
T ? P T .
12
E ? T.
P
4
3
P
P ? (E).
4
13
T ? P. T T ? P.
9
4
i
5
Grammar is not LR(0).
(
6
P ?i.
5
142LR Parsing
- Solution Use lookahead!
- In LL(1), lookahead is used at the beginning of
the production. - In LR(1), lookahead is used at the end of the
production. - We will use SLR(1) Simple LR(1)
- LALR(1) Lookahead LR(1)
143LR Parsing
- The Conflict appears in the ACTION table, as
multiple entries. - i ( )
- 1 S/5 S/6
- 2 S/8
S/7 - 3 R/E?T
- 4
- 5 R/P?i
- 6 S/5 S/6
- 7 Accept
- 8 S/5 S/6
- 9 S/5 S/6
- 10 S/8
S/13 - 11 R/E?ET
- 12 R/T?PT
- 13 R/P?(E)
-
ACTION
R/T?P S/9,R/T?P R/T?P
144LR Parsing
- SLR(1) For each inconsistent state p, compute
Follow(A) for each conflict production A ? ?.
Then place R/A ? ? in the ACTION table, row p,
column t, only if t ? Follow(A). In our case,
Follow(T) ? Follow(E) , ), ? . So, - i ( )
- 4 R/T?P S/9
R/T?P R/T?P
-
Grammar is SLR(1)
145LR Parsing
- Example S ? aSb anbn/ n gt 0
- ?
-
S
4
-
1
2
4
S
S ? .S S ? .aSb S ? .
2
S ?
1
a
3
a
b
S
3
5
6
S ? aSb
-
-
S ?
S ? S .
a
4
2
a b ? S
1 S/3 R/S? R/S? R/S? 2
2 S/4
3 S/3 R/S? R/S? R/S? 5
4 Accept Accept Accept
5 S/6
6 R/S?aSb
S ? a.Sb S ? .aSb S ? .
S
5
3
a
3
-
S ? S .
4
b
S ? aS.b
6
5
S ? aSb.
6
Grammar is not LR(0)
146LR Parsing
- SLR(1) Analysis
- State 1 Follow(S) b, ?. Since a ?
Follow(S), the shift/reduce conflict is
resolved. - State 3 Same story.
- Rows 1 and 3 become
- a b - S
- 1 S/3 R/S ? R/S ? 2
- 3 S/3 R/S ? R/S ? 5
-
- All single entries. Grammar is SLR(1).
147LR Parsing
- LALR(1) Grammars
- Consider the grammar S ? AbAa A ? a
- ? Ba B ? a
- LR(0)
- Automaton
S
?
1
2
6
A
a
b
3
7
10
A ? a
A
a
A ? AbAa
9
11
B
a
4
8
S ? Ba
a
A ? a
5
Grammar is not LR(0) reduce-reduce conflict.
B ? a
148LR Parsing
- SLR(1) Analysis (State 5)
- Follow(A) a, b
- Follow(B) a
Conflict not resolved. Grammar is not SLR(1).
149LR Parsing
- LALR(1) Technique
- I. For each conflicting reduction A ? ? at each
inconsistent state q, find all nonterminal
transitions (pi, A) such that - II. Compute Follow(pi, A) (see below), for all
i, and union together the results. The resulting
set is the LALR(1) lookahead set for the A ? ?
reduction at q.
A
p1
?
q
A ? ?
?
A
pn
150LR Parsing
- Computation of Follow(p, A)
- Ordinary Follow computation, except on a
different grammar, called G. G embodies both
the structure of G, and the structure of the
LR(0) automaton. To build G For each
nonterminal transition (p, A) and for each
production A ? ?, there exists the following in
the LR(0) automaton -
- For each such situation, G contains a production
of the form - (p, A) ? (p, w1)(p2, w2)(pn, wn)
A
p
w1
wn
w2
q
A ? w1wn
151LR Parsing
- In our example G S ? AbAa A ? a
- ? Ba B ? a
-
-
-
- G (1, S) ? (1, A)(3, b)(7, A)(9, a)
- ? (1, B)(4, a)
- (1, A) ? (1, a)
- (7, A) ? (7, A)
- (1, B) ? (1, a)
S
?
1
2
6
A
a
b
3
7
10
A ? a
A
a
A ? AbAa
9
11
B
a
4
8
S ? Ba
a
A ? a
5
B ? a
these have split!
152LR Parsing
- For the conflict in state 5, we need
- Follow(1, A) (3, b)
- Follow(1, B) (4, a). Extract the terminal
symbols from these to obtain -
- a b -
- 5 R/B ? a R/A ? a Conflict
is resolved. - Grammar is LALR(1).
A ? a b
a
5
B ? a a
153LR Parsing
- Example S ? bBb B ? A
- ? aBa A ? c
- ? acb
- LR(0)
- Automaton
G
?
S
1
2
5
8
A ? c
c
b
B
b
11
6
4
S ? bBb
A
7
B ? A
A
S ? aBa
a
a
B
4
12
9
c
b
10
13
S ? acb
State 10 is inconsistent (shift-reduce conflict).
A ? c
Grammar is not LR(0).
154LR Parsing
- SLR(1) Analysis, state 10
- Follow(A) ? Follow(B) a, b.
- Grammar is not SLR(1).
- LALR(1) Analysis Need Follow(4, A).
- G (1,S) ? (1, b)(3, B)(6, b) (3, B) ? (3,
A) - ? (1, a)(4, B)(9, a) (4, B) ? (4,
A) - ? (1, a)(4, c)(10, b) (3, B) ? (3,
c) - (4, A) ? (4, c)
- Thus Follow(4, A) ? Follow(4, B) (9, a).
- The lookahead set is a. The grammar is LALR(1).
155LR Parsing
- Example S ? aBd B ? A
- ? aDa A ? a
- ? bBa D ? a
- ? bDb
- LR(0)
- Automaton
G
S
1
2
15
a
B
b
11
5
3
S ? aBb
D
a
a
7
12
S ? aDa
A
A ? a
8
6
B ? A
D ? a
A
a
a
b
B
9
13
4
S ? bBa
D
b
S ? bDb
10
14
State 10 is inconsistent. Grammar is not LR(0).
156LR Parsing
- SLR(1) Analysis Follow(A) Follow(B) a, b
- Follow(D) a, b Grammar is not SLR(1).
- LALR(1) Analysis
- G (1, 5) ? (1, a)(3, B)(5, b) (3, B) ? (3, A)
- ? (1, a)(3, D)(7, a) (3, D) ? (3,
a) - ? (1, b)(4, B)(9, a) (3, A) ? (3, a)
- ? (1, b)(4, D)(10, b) (4, B) ? (4, A)
- (4, D) ? (4, a)
- (4, A) ? (4, a)
- Need Follow(3, A) U Follow(4, A) a, b
- Follow(3, D) U Follow(4, D) a, b The
lookahead sets are not disjoint. The grammar is
not LALR(1).
?
157LR Parsing
- Solution Modify the LR(0) automaton, by
splitting state 8 in two states. - LR(1) Parsers
- Construction similar to LR(0).
- Difference lookahead symbol carried explicitly,
as part of each item, e.g. A ? a. ß t -
- PT(G) Closure(S ? .S ) U Closure(P)
- P ? Successors(P), P ? PT(G)
-
158LR Parsing
- Closure(P) P U A ? .w t B ? a. Aß t ?
Closure(P), t ? First(ßt) - Successors(P) Nucleus(P, X) X ? V
- Nucleus(P, X) A ? aX. ? t A ? a. X? t ?
P - Notes
- New lookahead symbols appear during Closure.
- Lookahead symbols are carried from state to
state.
159LR Parsing
- Example S ? aBd B ? A
- ? aDa A ? a
- ? bBa D ? a
- ? bDb
-
-
-
S ? bB.a
a
15
S
B
S ? .S S ? .aBd S ? .aDa S ? .bBa S ?
.bDb
2
S ? b.Ba S ? b.Db B ? .A a A ? .a a D ? .a
b
9
9
1
4
-
-
-
a
D
10
S ? bd.b
3
b
16
-
10
a
A
11
3
-
B ? A. a
b
11
4
a
12
-
A ? a. a D ? a. b
b
a
12
12
4
-
-
S ? aB.b
b
13
S
S ? S.
2
5
2
-
-
-
S ? aBb.
B
S ? b.Ba S ? b.Db B ? .A a A ? .a a D ? .a
b
5
S ? aD.a
13
a
6
13
-
3
-
D
6
S ? aDa.
B ? A.b
7
13
A
-
7
S ? bBa.
A ? a. b D ? a. b
8
13
a
8
-
S ? bDb.
a
13
8
160LR Parsing
S
S ? S
1
2
-
b
5
13
S ? aBb
B
-
a
S ? aDa
6
14
A
a
3
D
B ? A b
7
- No conflicts.
- Grammar is LR(1).
A
D ? a a
8
A ? a b
A ? a a
12
B
D ? a b
11
B ? a a
A
b
4
D
-
b
10
16
S ? bDb
A
-
a
S ? bBa
9
15
161Summary of Parsing
- Top-Down Parsing
- Hand-written or Table Driven (LL(1))
S
part of tree known
part of tree known
stack
w
part of tree left to predict
ß
a
remaining input
input already parsed
162Summary of Parsing
ß
LL(1) Table
w
Driver
- Two moves
- Terminal on stack match input
- Nonterminal on stack re-write according to
Table.