Parsing

About This Presentation

Title:

Parsing

Description:

Ambiguity ... Eliminating Ambiguity (Cont. ... An example of ambiguity in a programming language is the dangling else. Consider ... – PowerPoint PPT presentation

Number of Views:91

Avg rating:3.0/5.0

Slides: 76

Provided by: samk153

Category:

more less

Transcript and Presenter's Notes

Title: Parsing

1
Parsing
Giuseppe Attardi Università di Pisa
2
Parsing

Calculate grammatical structure of program, like
diagramming sentences, where
Tokens words
Programs sentences

For further information Aho, Sethi, Ullman,
Compilers Principles, Techniques, and Tools
(a.k.a, the Dragon Book)
3
Outline of coverage

Context-free grammars
Parsing
Tabular Parsing Methods
One pass
Top-down
Bottom-up
Yacc

4
Parser extracts grammatical structure of program
function-def
name
arguments
stmt-list
stmt
main
expression
operator
expression
expression
variable
string
ltlt
cout
hello, world\n
5
Context-free languages

Grammatical structure defined by context-free
grammar
statement ? labeled-statement
expression-statement
compound-statementlabeled-statement ? ident
statement case
constant-expression statementcompound-statement
? declaration-list
statement-list

Context-free only one non-terminal in
left-part
terminal
non-terminal
6
Parse trees

Parse tree tree labeled with grammar symbols,
such that
If node is labeled A, and its children are
labeled x1...xn, then there is a productionA
??x1...xn
Parse tree from A root labeled with A
Complete parse tree all leaves labeled with
tokens

7
Parse trees and sentences

Frontier of tree labels on leaves (in
left-to-right order)
Frontier of tree from S is a sentential form
Frontier of a complete tree from S is a sentence

8
Example

G L ??L E E E ??a b
Syntax trees from start symbol (L)

Sentential forms
9
Derivations

Alternate definition of sentence
Given ?, ? in V, say ??? is a derivation step if
??????? and ? ??? , where A ? ??is a
production
? is a sentential form iff there exists a
derivation (sequence of derivation steps)
S??????? ( alternatively, we say that S?? )

Two definitions are equivalent, but note that
there are many derivations corresponding to each
parse tree
10
Another example

H L ??E L E E ??a b

L
L
L
L
E

E

L

E
E
b
E
b
a
a
11
Ambiguity

For some purposes, it is important to know
whether a sentence can have more than one parse
tree
A grammar is ambiguous if there is a sentence
with more than one parse tree
Example E ? EE EE id

12
Notes

If e then if b then d else f
int x y 0
A.b.c d
Id -gt s s.id
E -gt E T -gt E T T -gt T T T -gt id T
T -gt id T id T -gt id id id T -gt
id id id id

13
Ambiguity

Ambiguity is a function of the grammar rather
than the language
Certain ambiguous grammars may have equivalent
unambiguous ones

14
Grammar Transformations

Grammars can be transformed without affecting the
language generated
Three transformations are discussed next
Eliminating Ambiguity
Eliminating Left Recursion (i.e.productions of
the form A?A ? )
Left Factoring

15
Eliminating Ambiguity

Sometimes an ambiguous grammar can be rewritten
to eliminate ambiguity
For example, expressions involving additions and
products can be written as follows
E ? E T T
T ? T id id
The language generated by this grammar is the
same as that generated by the grammar in slide
Ambiguity. Both generate id(idid)
However, this grammar is not ambiguous

16
Eliminating Ambiguity (Cont.)

One advantage of this grammar is that it
represents the precedence between operators. In
the parsing tree, products appear nested within
additions

17
Eliminating Ambiguity (Cont.)

An example of ambiguity in a programming language
is the dangling else
Consider
S ? if b then S else S if b then S a

18
Eliminating Ambiguity (Cont.)

When there are two nested ifs and only one else..

19
Eliminating Ambiguity (Cont.)

In most languages (including C and Java), each
else is assumed to belong to the nearest if that
is not already matched by an else. This
association is expressed in the following
(unambiguous) grammar
S ? Matched
Unmatched
Matched ? if b then Matched else
Matched
a
Unmatched ? if b then S
if b then
Matched else Unmatched

20
Eliminating Ambiguity (Cont.)

Ambiguity is a property of the grammar
It is undecidable whether a context free grammar
is ambiguous
The proof is done by reduction to Posts
correspondence problem
Although there is no general algorithm, it is
possible to isolate certain constructs in
productions which lead to ambiguous grammars

21
Eliminating Ambiguity (Cont.)

For example, a grammar containing the production
A?AA ? would be ambiguous, because the
substring aaa has two parses

A
A
A
A
A
A
A
A
a
A
A
a
a
a
a
a

This ambiguity disappears if we use the
productions
A?AB B and B? ?
or the productions
A?BA B and B? ?.

22
Eliminating Ambiguity (Cont.)

Examples of ambiguous productions
A?AaA
A?aA Ab
A?aA aAbA
A CF language is inherently ambiguous if it has
no unambiguous CFG
An example of such a language is
L aibjcm ij or jm which can be generated
by the grammar
S?AB DC
A?aA e C?cC e
B?bBc e D?aDb e

23
Elimination of Left Recursion

A grammar is left recursive if it has a
nonterminal A and a derivation A ? Aa for some
string a.
Top-down parsing methods cannot handle
left-recursive grammars, so a transformation to
eliminate left recursion is needed
Immediate left recursion (productions of the form
A ? A?) can be easily eliminated
Group the A-productions as
A ? A?1 A?2 A?m b1 b2 bn
where no bi begins with A
2. Replace the A-productions by
A ? b1A b2A bnA
A ? ?1A ?2A ?mA e

24
Elimination of Left Recursion (Cont.)

The previous transformation, however, does not
eliminate left recursion involving two or more
steps
For example, consider the grammar
S ? Aa b
A ? Ac Sd e
S is left-recursive because S ?Aa?? Sda, but it
is not immediately left recursive

25
Elimination of Left Recursion (Cont.)

Algorithm. Eliminate left recursion
Arrange nonterminals in some order A1, A2 ,,, An
for i 1 to n
for j 1 to i - 1
replace each production of the form Ai ? Aj
g
by the production Ai ? d1 g d2 g dn
g
where Aj ? d1 d2 dn are all the
current Aj-productions
eliminate the immediate left recursion among the
Ai-productions

26
Elimination of Left Recursion (Cont.)

To show that the previous algorithm actually
works, notice that iteration i only changes
productions with Ai on the left-hand side. And m
gt i in all productions of the form Ai ? Am ?
Induction proof
Clearly true for i 1
If it is true for all i lt k, then when the outer
loop is executed for i k, the inner loop will
remove all productions Ai ? Am? with m lt i
Finally, with the elimination of self recursion,
m in the Ai? Am? productions is forced to be gt i
At the end of the algorithm, all derivations of
the form Ai ? Ama will have m gt i and therefore
left recursion would not be possible

27
Left Factoring

Left factoring helps transform a grammar for
predictive parsing
For example, if we have the two productions
S ? if b then S else S
if b then S
on seeing the input token if, we cannot
immediately tell which production to choose to
expand S
In general, if we have A ? ?b1 ?b2 and the
input begins with a, we do not know (without
looking further) which production to use to
expand A

28
Left Factoring (Cont.)

However, we may defer the decision by expanding A
to ?A
Then after seeing the input derived from ?, we
may expand A to ?1 or to ?2
Left-factored, the original productions become
A? ? A
A? b1 b2

29
Non-Context-Free Language Constructs

Examples of non-context-free languages are
L1 wcw w is of the form (ab)
L2 anbmcndm n ? 1 and m ? 1
L3 anbncn n ? 0
Languages similar to these that are context free
L1 wcwR w is of the form (ab) (wR stands
for w reversed)
This language is generated by the grammar
S? aSa bSb c
L2 anbmcmdn n ? 1 and m? 1
This language is generated by the grammar
S? aSd aAd
A? bAc bc

30
Non-Context-Free Language Constructs (Cont.)

L2 anbncmdm n ? 1 and m? 1
is generated by the grammar
S? AB
A? aAb ab
B? cBd cd
L3 anbn n ? 1
is generated by the grammar
S? aSb ab
This language is not definable by any regular
expression

31
Non-Context-Free Language Constructs (Cont.)

Suppose we could construct a DFSM D accepting
L3.
D must have a finite number of states, say k.
Consider the sequence of states s0, s1, s2, , sk
entered by D having read ?, a, aa, , ak.
Since D only has k states, two of the states in
the sequence have to be equal. Say, si ? sj (i ?
j).
From si, a sequence of i bs leads to an accepting
(final) state. Therefore, the same sequence of i
bs will also lead to an accepting state from sj.
Therefore D would accept ajbi which means that
the language accepted by D is not identical to
L3. A contradiction.

32
Parsing

The parsing problem is Given string of tokens
w, find a parse tree whose frontier is w.
(Equivalently, find a derivation from w)
A parser for a grammar G reads a list of tokens
and finds a parse tree if they form a sentence
(or reports an error otherwise)
Two classes of algorithms for parsing
Top-down
Bottom-up

33
Parser generators

A parser generator is a program that reads a
grammar and produces a parser
The best known parser generator is yacc It
produces bottom-up parsers
Most parser generators - including yacc - do not
work for every CFG they accept a restricted
class of CFGs that can be parsed efficiently
using the method employed by that parser generator

34
Top-down parsing

Starting from parse tree containing just S, build
tree down toward input. Expand left-most
non-terminal.
Algorithm (next slide)

35
Top-down parsing (cont.)

Let input a1a2...an
current sentential form (csf) S
loop
suppose csf a1akA?
based on ak1, choose production
A ? ?
csf becomes a1ak??

36
Top-down parsing example

Grammar H L ??E L E
E ??a b
Input ab
Parse tree Sentential form Input

L
ab
EL
ab
aL
ab
37
Top-down parsing example (cont.)

Parse tree Sentential form Input

aE
ab
ab
ab
38
LL(1) parsing

Efficient form of top-down parsing
Use only first symbol of remaining input (ak1)
to choose next production. That is, employ a
function M ? ? N? P in choose production step
of algorithm.
When this is possible, grammar is called LL(1)

39
LL(1) examples

Example 1
H L ??E L E E ??a b
Given input ab, so next symbol is a.
Which production to use? Cant tell.
? H not LL(1)

40
LL(1) examples

Example 2
Exp ??Term Exp
Exp ? Exp
Term ??id
(Use for end-of-input symbol.)

Grammar is LL(1) Exp and Term have only one
production Exp has two productions but only
one is applicable at any time.
41
Nonrecursive predictive parsing

Maintain a stack explicitly, rather than
implicitly via recursive calls
Key problem during predictive parsing
determining the production to be applied for a
non-terminal

42
Nonrecursive predictive parsing

Algorithm. Nonrecursive predictive parsing
Set ip to point to the first symbol of w.
repeat
Let X be the top of the stack symbol and a the
symbol pointed to by ip
if X is a terminal or then
if X a then
pop X from the stack and advance ip
else error()
else // X is a nonterminal
if MX,a X?Y1 Y2 Y k then
pop X from the stack
push YkY k-1, , Y1 onto the stack with Y1 on
top
(push nothing if Y1 Y2 Y k is ? )
output the production X?Y1 Y2 Y k
else error()
until X

43
LL(1) grammars

No left recursion
A ?? Aa If this production is chosen, parse
makes no progress.
No common prefixes
A ?? ab ag
Can fix by left factoring
A ?? aA
A ? b g

44
LL(1) grammars (cont.)

No ambiguity
Precise definition requires that production to
choose be unique (choose function M very hard
to calculate otherwise)

45
Top-down Parsing
L
Start symbol and root of parse tree
Input tokens ltt0,t1,,ti,...gt
E0 En
L
Input tokens ltti,...gt
E0 En
From left to right, grow the parse tree
downwards
...
46
Checking LL(1)-ness

For any sequence of grammar symbols ?, define set
FIRST(a) ? S to be
FIRST(a) a a ? ab for some b

47
LL(1) definition

Define Grammar G (N, ?, P, S) is LL(1) iff
whenever there are two left-most derivations (in
which the leftmost non-terminal is always
expanded first)
S ? wA? ? w?? ? wtx
S ? wA? ? w?? ? wty
it follows that ? ?
In other words, given
1. a string wA? in V and
2. t, the first terminal symbol to be derived
from A?
there is at most one production that can be
applied to A to
yield a derivation of any terminal string
beginning with wt
FIRST sets can often be calculated by inspection

48
FIRST Sets
Exp ?? Term Exp Exp ? Exp Term
??id (Use for end-of-input symbol)
FIRST() FIRST( Exp) FIRST() ?
FIRST( Exp) ? grammar is LL(1)
49
FIRST Sets
L ??E L EE ??a b

FIRST(E L) a, b FIRST(E) FIRST(E L) ?
FIRST(E) ? ? grammar not LL(1).
50
Computing FIRST Sets

Algorithm. Compute FIRST(X) for all grammar
symbols X
forall X ? V do FIRST(X)
forall X ? ? (X is a terminal) do FIRST(X) X
forall productions X ? ? do FIRST(X) FIRST(X)
U ?
repeat
c forall productions X ? Y1Y2 Yk do
forall i ? 1,k do
FIRST(X) FIRST(X) U (FIRST(Yi) - ?) if
? ? FIRST(Yi) then continue c
FIRST(X) FIRST(X) U ?
until no more terminals or ? are added to any
FIRST set

51
FIRST Sets of Strings of Symbols

FIRST(X1X2Xn) is the union of FIRST(X1) and all
FIRST(Xi) such that ? ? FIRST(Xk) for k 1, 2,
, i-1
FIRST(X1X2Xn) contains ? iff ? ? FIRST(Xk) for k
1, 2, , n

52
FIRST Sets do not Suffice

Given the productions
A ? T x
A ? T y T ? w T ? e
T? w should be applied when the next input token
is w.
T? e should be applied whenever the next terminal
is either x or y

53
FOLLOW Sets

For any nonterminal X, define the set FOLLOW(X) ?
S as
FOLLOW(X) a S ? aXab

54
Computing the FOLLOW Set

Algorithm. Compute FOLLOW(X) for all nonterminals
X
FOLLOW(S)
forall productions A ? ?B? do FOLLOW(B)Follow(B)
? (FIRST(?) - ?)
repeat
forall productions A ? ?B or A ? ?B? with ? ?
FIRST(?) do
FOLLOW(B) FOLLOW(B) ? FOLLOW(A)
until all FOLLOW sets remain the same

55
Construction of a predictive parsing table

Algorithm. Construction of a predictive parsing
table
M,
forall productions A ? ? do
forall a ? FIRST(?) do
MA,a MA,a U A ? ?
if ? ? FIRST(?) then
forall b ? FOLLOW(A) do
MA,b MA,b U A ? ?
Make all empty entries of M be error

56
Another Definition of LL(1)

Define Grammar G is LL(1) if for every A? N
with productions A ? a1 . . . an
FIRST(ai FOLLOW(A)) ? FIRST(aj FOLLOW(A) )
for all i, j

57
Regular Languages

Definition. A regular grammar is one whose
productions are all of the type
A ? aB
A ? a
A Regular Expression is either
a
R1 R2
R1 R2
R

58
Nondeterministic Finite State Automaton
a
b
b
start
a
0
1
2
3
b
59
Regular Languages

Theorem. The classes of languages
Generated by a regular grammar
Expressed by a regular expression
Recognized by a NDFS automaton
Recognized by a DFS automaton
coincide.

60
Deterministic Finite Automaton
space, tab, new line
START
digit
digit
NUM

KEYWORD
letter
, , -, /, (, )
OPERATOR
61
Scanner code

state start
loop
if no input character buffered then read
one, and add it to the accumulated token
case state of
start
case input_char of
A..Z, a..z state id
0..9 state num
else ...
end
id
case input_char of
A..Z, a..z state id
0..9 state id
else ...
end
num
case input_char of
0..9 ...