Compilers Modern Compiler Design

About This Presentation

Title:

Compilers Modern Compiler Design

Description:

The LL(1) push-down automation. Transition table for an LL(1) parser. 56. Push-down automation (PDA) Type of moves. Prediction move ... – PowerPoint PPT presentation

Number of Views:214

Avg rating:3.0/5.0

Slides: 122

Provided by: wan145

Category:

more less

Transcript and Presenter's Notes

Title: Compilers Modern Compiler Design

1
CompilersModern Compiler Design

3. Syntax Analysis

Introduction to parsing methods Creating a
top-down parser manually Creating a top-down
parser automatically Creating a bottom-up parser
automatically Parser Generator Tools
NCYU C. H. Wang
2
Introduction

Context-free Grammar
The syntax of programming language constructs can
be described by context-free grammar
Important aspects
A grammar serves to impose a structure on the
linear sequence of tokens which is the program.
Using techniques from the field of formal
languages, a grammar can be employed to construct
a parser for it automatically.
Grammars aid programmers to write syntactically
correct programs and provide answer to detailed
questions about the syntax.

3
Definitions of CFG

A context-free grammar consists of terminals,
nonterminals, a start symbol and productions.
Terminals are the basic symbols from which
strings are formed.
Nonterminals are syntactic variables that denote
sets of strings.
In a grammar, one nonterminal is distinguished as
the start symbol, and the set of strings it
denotes is the language defined by the grammar.
The productions of a grammar specify the manner
in which the terminals and nonterminals can be
combined to form strings. Each production
consists of a nonterminal, followed by an arrow,
followed by a string of onterminals and terminals.

4
The role of the parser
5
Two approaches

Deterministic left-to-right top-down
LL method
Deterministic left-to-right bottom-up
LR method
Left-to-right
The sequence of tokens is processed from left to
right
Deterministic
No searching is involved each token brings the
parser one step closer to the goal of
constructing the syntax tree

6
Speed issue

The deterministic parsing methods require an
amount of time that is a linear function of the
length of the input they are linear-time method.
A grammar copied as is from a language manual
has a very small chance of leading to a
deterministic method, unless of course the
language designer has taken pains to make the
grammar match such a method.
Allowing some searching to take place
The algorithm can handle all grammars
These algorithms are no longer linear-time

7
Non-ambiguous

A grammar for which a deterministic parser can be
generated is guaranteed to be non-ambiguous
Since an arbitrary grammar will often fail to
match one of standard parsing methods, it is
important to have techniques to transform the
grammar to non-ambiguous form.
We will assume that the grammar of the
programming language is non-ambiguous.
That implies that to each input program there
belongs either one syntax tree or no syntax tree
(the program contains one or more errors)

8
Two classes of parsing methods

Syntax tree

9
Pre-order and post-order (1)

The top-down method constructs the syntax tree in
pre-order
The bottom-up method constructs the syntax tree
in post-order

10
Pre-order and post-order (2)
11
Principles of top-down parsing

The main task of a top-down parser is to choose
the correct alternatives for known non-terminals

12
Principles of bottom-up parsing

The main task of a bottom-up parser is to
repeatedly find the first node all of whose
children have already been constructed.

13
Error detection and error recovery

The position at which the error is detected my be
unrelated to the position of the actual error the
user made.
Example
x a(pq( - b(r-s)

14
Error recovery

Two strategies
Error correction
Modifies the input token stream and/or the
parsers internal state so that parsing can
continue
Non-correcting error recovery
Does not modify the input stream, but rather
discards all parser information and continues
parsing the rest of the program with a grammar
for rest of the program. (called suffix grammar)

15
Creating a top-down parser manually

Recursive descent parsing
Simplest way but has its limitations

16
Recursive descent parsing program (1)
17
Recursive descent parsing program (2)
18
Drawbacks

Three drawbacks
There is still some searching through the
alternatives
The method often fails to produce a correct
parser
Error handling leaves much to be desired

19
Second problems (1)

Example 1
Index_element will never be tried
IDENTIFIER

20
Second problems (2)

Example 2
The recognizer will not recognize ab

21
Second problems (3)

Example 3
Recursive descent parsers cannot handle
left-recursive grammars

22
Creating a top-down parser automatically

The principles of constructing a top-down parser
automatically derive from those of writing one by
hand, by applying precomputation.
Grammars which allow the construction of a
top-down parser to be performed are called LL(1)
grammars.

23
LL(1) parsing

FIRST set
The sets of first tokens produced by all
alternatives in the grammar.
We have to precompute the FIRST sets of all
non-terminals
The first sets of the terminals are obvious.
Finding FIRST(?) is trivial when ? starts with a
terminal.
FIRST(N) is the union of the FIRST sets of its
alternatives.

24
Predictive recursive descent parser

The FIRST sets can be used in the construction of
a predictive parser because it predicts the
presence of a given alternative without trying to
find out if it is there.

25
Closure algorithm for computing the FIRST set (1)

Data definitions

26
Closure algorithm for computing the FIRST set (2)

Initializations

27
Closure algorithm for computing the FIRST set (3)

Inference rules

28
FIRST sets example(1)

Grammar

29
FIRST sets example(2)

The initial FIRST sets

30
FIRST sets example(3)

The final FIRST sets

31
The predictive parser (1)
32
The predictive parser (2)
33
Practice

Find the FIRST sets of all alternative of the
following grammar.
E -gt TE
E-gtTE?
T-gtFT
T-gtFT?
F-gt(E)id

34
Nullable alternatives

A complication arises with the case label for the
empty alternative (ex. rest_expression). Since it
does not itself start with any token, how can we
decide whether it is the correct alternative?

35
FOLLOW sets

Follow sets
Determining the set of tokens that can
immediately follow a given non-terminal N.
LL(1) parser
LL because the parser works from Left to right
identifying the nodes in what is called Leftmost
derivation order.
(1) because all choices are based on a one
token look-ahead.

36
Closure algorithm for computing the FOLLOW sets
37
The first and follow sets
38
Recall the predictive parser
rest_expression ? expression ?
FIRST(rest_expr) , ?
void rest_expression(void) switch
(Token.class) case ''
token('') expression() break case EOF
case ')' break default
error()
FOLLOW(rest_expr) EOF, )
39
LL(1) conflicts

Example
The codes

40
LL(1) conflicts

FIRST/FIRST conflict
term ? IDENTIFIER
IDENTIFIER expression
( expression )

41
LL(1) conflicts

FIRST/FOLLOW conflict
FIRST set FOLLOW set
S ? A a b a
A ? a ? a, ? a

42
LL(1) conflicts

left recursion
expression ? expression - term term
Look-ahead token
LL(1) method predicts the alternative Ak for a
non-terminal N
FIRST(Ak) ? (if is nullable then FOLLOW(N))
LL(1) grammar
No FIRST/FIRST conflicts
No FIRST/FOLLOW conflicts
No multiple nullable alternatives
No non-terminal can have more than one nullable
alternative.

43
Solve the LL(1) conflicts

Two options
Use a stronger parser
Make the grammar LL(1)

44
Making a grammar LL(1)

manual labour
rewrite grammar
adjust semantic actions
three rewrite methods
left factoring
substitution
left-recursion removal

45
Left-factoring

term ? IDENTIFIER
IDENTIFIER expression
factor out common prefix
term ? IDENTIFIER after_identifier
after_identifier ? ? expression

? FOLLOW(after_identifier)
46
Substitution

A ? a B c ?
S ? p A q
replace non-terminal by its alternative
S ? p a q p B c q p q
Example
S ? A a b
A ? a ?
replace non-terminal by its alternative
S ? a a b a b

47
Left-recursion removal

Three types of left-recursion
Direct left-recursion
N ? N?
Indirect left-recursion
Chain structure
N ? A
A ? B
Z ? N
Hidden left-recursion
N ? ? N (? can produce ?)

48
Left-recursion removal

N ? N ? ?
replace by
N ? ? M
M ? ? M ?
example
expression ? expression - term term

? ? ? ? ? ? ? ? ? ? ...
expression ? term expression_tail_option expressio
n_tail_option ? - term expression_tail_option
?
49
Practice

make the following grammar LL(1)
expression ? expression term expression -
term term
term ? term factor term / factor factor
factor ? ( expression ) func-call
identifier constant
func-call ? identifier ( expr-list? )
expr-list ? expression (, expression)

50
Answers

substitution
F ? ( E ) ID ( expr-list? ) ID
constant
left factoring
E ? E ( - ) T T
T ? T ( / ) F F
F ? ( E ) ID ( ( expr-list? ) )?
constant
left recursion removal
E ? T (( - ) T )
T ? F (( / ) F )

51
Undoing the semantic effects of grammar
transformations

While it is often possible to transform our
grammar into a new grammar that is acceptable by
a parser generator and that generates the same
language, the new grammar usually assigns a
different structure to strings in the language
than our original grammar did
Fortunately, in many cases we are not really
interested in the structure but rather in the
semantics implied by it.

52
Semantics
Non-left-recursive equivalent
53
Automatic conflict resolution (1)

There are two ways in which LL parsers can be
strengthened
By increasing the look-ahead
Distinguishing alternatives not by their first
token but by their first two tokens is called
LL(2).
Disadvantages the parser code can get much
bigger.
By allowing dynamic conflict resolvers
When the conflict arises during parsing, some of
conditions are evaluated to solve it.
The parser generator LLgen requires a conflict
resolver to be placed on the first of two
conflicting alternatives.

54
Automatic conflict resolution (2)

If-else statement in C
else_tail_option both FIRST set and FOLLOW set
contain the token else
Conflict resolver

55
The LL(1) push-down automation

Transition table for an LL(1) parser

56
Push-down automation (PDA)

Type of moves
Prediction move
Top of the prediction stack is a non-terminal N.
N is removed from the stack
Look up the prediction table
Push the alternative of N into the prediction
stack
Match move
Top of the prediction stack is a terminal
Termination
Parsing terminates when the prediction stack is
exhausted.

57
Prediction move in an LL(1) PDA
58
Match move in an LL(1) PDA
59
Predictive parsing with an LL(1) PDA
60
PDA example (1)
input
prediction stack
aap ( noot mies ) EOF
input
61
PDA example (2)
input
prediction stack
aap ( noot mies ) EOF
input
replace non-terminal by transition entry
62
PDA example (3)
expression EOF
prediction stack
aap ( noot mies ) EOF
input
63
PDA example (4)
expression EOF
prediction stack
aap ( noot mies ) EOF
input
replace non-terminal by transition entry
64
PDA example (5)
term rest-expr EOF
prediction stack
aap ( noot mies ) EOF
input
65
PDA example (6)
term rest-expr EOF
prediction stack
aap ( noot mies ) EOF
input
replace non-terminal by transition entry
66
PDA example (7)

Please continue!!
Example of parsing (ii)i

67
LLgen

LLgen is part of the Amsterdam Compiler Kit
takes LL(1) grammar semantic actions in C and
generates a recursive descent parser
The non-terminals in the grammar can have
parameters, and rules can have local variables,
both again expressed in C.
LLgen features
repetition operators
advanced error handling
parameter passing
control over semantic actions
dynamic conflict resolvers

68
LLgen

start from LR(1) grammar
make grammar LL(1)
use repetition operators

token DIGIT main line line
expr '\n' expr term '' term
term factor '' factor
factor '(' expr ') DIGIT

add semantic actions
attach parameters to grammar rules
insert C-code between the symbols

LLgen
69
Minimal non-left-recursive grammar for expressions
70
LLgen code for a parser
Grammar
Semantics
71
LLgen code for a parser

The code from previous page resides in a file
called parser.g. LLgen converts the file to one
called parser.c, which contains a recursive
descent parser.

72
LLgen interface to lexical analyzer
73
LLgen interface to back-end

LLgen handles syntax errors by inserting missing
tokens and deleting unexpected tokens
LLmessage() is invoked to notify the lexical
analyzer

74
Creating a bottom-up parser automatically

Left-to-right parse, Rightmost-derivation
create a node when all
children are present
handle nodes representing
the right-hand side of a
production

75
LR(0) Parsing

Theoretically important but too weak to be
useful.
running example expression grammar
input ? expression EOF
expression ? expression term term
term ? IDENTIFIER ( expression )
short-hand notation
Z ? E
E ? E T T
T ? i ( E )

76
LR(0) Parsing

keep track of progress inside potential
handles when consuming input tokens
LR items N ? ? ? ?
initial set

S0
Z ? E E ? E T E ? T T ? i T ? ( E )
77
? Closure algorithm for LR(0)
The important part is the inference rule it
predicts new handle hypotheses from the
hypothesis that we are looking for a certain
non-terminal, and is sometimes called prediction
rule it corresponds to an ? move, in that it
allows the automation to move to another state
without consuming input.
Reduce item an item with the dot at the
end Shift item the others
78
Transition Diagram
S2
T
E ? T ?
i
S1
T ? i ?
E
i
S4
E ? E ? T T ? ? i T ? ? ( E )

T
S6
Z ? E ?
79
LR(0) parsing example (1)
Z ? E E ? E T E ? T T ? i T ? ( E )

shift input token (i) onto the stack
compute new state

80
LR(0) parsing example (2)
Z ? E E ? E T E ? T T ? i T ? ( E )
stack
input
S0 i S1
i

reduce handle on top of the stack
compute new state

81
LR(0) parsing example (3)
Z ? E E ? E T E ? T T ? i T ? ( E )
stack
input
S0 T S2
i
i

reduce handle on top of the stack
compute new state

82
LR(0) parsing example (4)
Z ? E E ? E T E ? T T ? i T ? ( E )
stack
input
S0 E S3
i
T

shift input token on top of the stack
compute new state

i
83
LR(0) parsing example (5)
Z ? E E ? E T E ? T T ? i T ? ( E )
stack
input
S0 E S3 S4
i
T

shift input token on top of the stack
compute new state

i
84
LR(0) parsing example (6)
Z ? E E ? E T E ? T T ? i T ? ( E )
stack
input
S0 E S3 S4 i S1

T

reduce handle on top of the stack
compute new state

i
85
LR(0) parsing example (7)
Z ? E E ? E T E ? T T ? i T ? ( E )
stack
input
S0 E S3 S4 T S5

T
i

reduce handle on top of the stack
compute new state

i
86
LR(0) parsing example (8)
Z ? E E ? E T E ? T T ? i T ? ( E )
stack
input
S0 E S3

E

T

shift input token on top of the stack
compute new state

i
T
i
87
LR(0) parsing example (9)
Z ? E E ? E T E ? T T ? i T ? ( E )
stack
input
S0 E S3 S6
E

T

reduce handle on top of the stack
compute new state

i
T
i
88
LR(0) parsing example (10)
Z ? E E ? E T E ? T T ? i T ? ( E )
stack
input
S0 Z
E

accept!

E

T
i
T
i
89
Precomputing the item set (1)

Initial item set

90
Precomputing the item set (2)

Next item set

91
Complete transition diagram
92
The LR push-down automation

Two major moves and a minor move
Shift move
Remove the first token from the present input and
pushes it onto the stack
Reduce move
N -gt ?
? are moved from the stack
N is then pushed onto the stack
Termination
The input has been parsed successfully when it
has been reduced to the start symbol.

93
GOTO and ACTION tables
94
LR(0) parsing of the input ii
95
LR comments

The bottom-up parsing, unlike the top-down
parsing, has no problems with left-recursion.
On the other hand, bottom-up parsing has a slight
problem with right-recursion.

96
LR(0) conflicts (1)

shift-reduce conflict
array indexing T ? i E
T ? i ? E (shift)
T ? i ? (reduce)
?-rule RestExpr ? ?
Expr ? Term ? RestExpr (shift)
RestExpr ? ? (reduce)

97
LR(0) conflicts (2)

reduce-reduce conflict
assignment statement Z ? V E
V ? i ? (reduce)
T ? i ? (reduce)
(Different reduce rules)
typical LR(0) table contains many conflicts

98
Handling LR(0) conflicts

Use a one-token look-ahead
Use a two-dimensional ACTION table
different construction of ACTION table
SLR(1) Simple LR
LR(1)
LALR(1) Look-Ahead LR

99
SLR(1) parsing

A handle should not be reduced to a non-terminal
N if the look-ahead is a token that cannot follow
N.
reduce N ? ? iff token ? FOLLOW(N)
FOLLOW(N)
FOLLOW(Z)
FOLLOW(E) , ),
FOLLOW(T) , ),

100
SLR(1) ACTION table
shift
101
SLR(1) ACTION/GOTO table
1 Z ? E 2 E ? T 3 E ? E T 4 T ? i 5
T ? ( E )
s7
sn shift to state n rn reduce rule n
102
Example of resolving conflicts (1)

A new rule T ? i E

1 Z ? E 2 E ? T 3 E ? E T 4 T ?
i 5 T ? ( E ) 6 T ? i E
103
Example of resolving conflicts (2)
1 Z ? E 2 E ? T 3 E ? E T 4 T ?
i 5 T ? ( E ) 6 T ? i E
s5
T ? i. T ? i. E
104
Unfortunately

SLR(1) leaves many shift-reduce conflicts
unsolved
problem FOLLOW(N) set is a union of all all
look-aheads of all alternatives of N in all
states
example
S ? A x b
A ? a A b B
B ? x

Follow (S) Follow(A) b, Follow(B) b,

105
SLR(1) automation
106
LR(1) parsing

The LR(1) technique does not rely on FOLLOW sets,
but rather keeps the specific look-ahead with
each item
LR(1) item N ? ? ? ? ?
? - closure for LR(1) item sets
if set S contains an item P ? ? ? N ? ? then
for each production rule N ? ?
S must contain the item N ? ? ? ?
where ? FIRST( ? ? )

107
Creating look-ahead sets

Extended definition of FIRST stes
If FIRST(?) does not contain ?, FIRST(??) is
just equal to FIRST(?) if ? can produce ?,
FIRST(??) contain all the tokens in FIRST(?),
excluding ?, plus the tokens in ?.

108
LR(1) automation
109
LR(1) parsing comments

LR(1) automation is more discriminating than the
SLR(1).
In fact, it is so strong that any language that
can be parsed from left to right with a one-token
look-ahead in linear time can be parsed using the
LR(1).
LR tables are big
Combine equal sets by merging look-ahead sets
LALR(1).

110
LALR(1)

S3 and S10 are similar in that they are equal if
one ignores the look-ahead sets, and so are S4
and S9, S6 and S11, and S8 and S12.

111
LALR(1) automation
112
Practice

Derive the LALR(1) ACTION/GOTO table for the
grammar in Fig. 2.95

113
Making a grammar LR(1) or not

Although the chances for a grammar to be LR(1)
are much larger than those being SLR(1) or LL(1),
one often encounters a grammar that still is not
LR(1). The reason is generally that the grammar
is ambiguous.
For Example
if_statement -gt if ( expression ) statement
if (expression ) statement else
statement
statement -gt if_statement
The statement if (xgt0) if (ygt0) p0 else q0

114
Possible syntax trees (1)
115
Possible syntax trees (2)
116
Resolving shift-reduce conflicts (1)

The longest possible sequence of grammar symbols
is taken for reduction.
In a shift-reduce conflict do shift.
Another example

input i i i E ? E ? E E ? E E
?
reduce
shift
117
Resolving shift-reduce conflicts (2)

The use of precedences between tokens
Example a shift-reduce conflict on t
P -gt ??t? (shift item)
Q -gt ?uR ?t (reduce item)
where R is either empty or one non-terminal.
If the look-ahead is t, we perform one of the
following three actions
If symbol u has a higher precedence than symbol
t, we reduce
If t has a higher precedence than symbol u, we
shift.
If both have equal precedence, we also shift

118
Bottom-up parser yacc/bison

The most widely used parser generator is yacc
Yacc is an LALR(1) parser generator
A yacc look-alike called bison, provided by GNU

119
A very high-level view of text analysis techniques
120
Yacc code example (constructing parser tree)
121
Yacc code example (auxiliary code)

Write a Comment

User Comments (0)