Compilers Modern Compiler Design - PowerPoint PPT Presentation

1 / 121
About This Presentation
Title:

Compilers Modern Compiler Design

Description:

The LL(1) push-down automation. Transition table for an LL(1) parser. 56. Push-down automation (PDA) Type of moves. Prediction move ... – PowerPoint PPT presentation

Number of Views:214
Avg rating:3.0/5.0
Slides: 122
Provided by: wan145
Category:

less

Transcript and Presenter's Notes

Title: Compilers Modern Compiler Design


1
CompilersModern Compiler Design
  • 3. Syntax Analysis

Introduction to parsing methods Creating a
top-down parser manually Creating a top-down
parser automatically Creating a bottom-up parser
automatically Parser Generator Tools
NCYU C. H. Wang
2
Introduction
  • Context-free Grammar
  • The syntax of programming language constructs can
    be described by context-free grammar
  • Important aspects
  • A grammar serves to impose a structure on the
    linear sequence of tokens which is the program.
  • Using techniques from the field of formal
    languages, a grammar can be employed to construct
    a parser for it automatically.
  • Grammars aid programmers to write syntactically
    correct programs and provide answer to detailed
    questions about the syntax.

3
Definitions of CFG
  • A context-free grammar consists of terminals,
    nonterminals, a start symbol and productions.
  • Terminals are the basic symbols from which
    strings are formed.
  • Nonterminals are syntactic variables that denote
    sets of strings.
  • In a grammar, one nonterminal is distinguished as
    the start symbol, and the set of strings it
    denotes is the language defined by the grammar.
  • The productions of a grammar specify the manner
    in which the terminals and nonterminals can be
    combined to form strings. Each production
    consists of a nonterminal, followed by an arrow,
    followed by a string of onterminals and terminals.

4
The role of the parser
5
Two approaches
  • Deterministic left-to-right top-down
  • LL method
  • Deterministic left-to-right bottom-up
  • LR method
  • Left-to-right
  • The sequence of tokens is processed from left to
    right
  • Deterministic
  • No searching is involved each token brings the
    parser one step closer to the goal of
    constructing the syntax tree

6
Speed issue
  • The deterministic parsing methods require an
    amount of time that is a linear function of the
    length of the input they are linear-time method.
  • A grammar copied as is from a language manual
    has a very small chance of leading to a
    deterministic method, unless of course the
    language designer has taken pains to make the
    grammar match such a method.
  • Allowing some searching to take place
  • The algorithm can handle all grammars
  • These algorithms are no longer linear-time

7
Non-ambiguous
  • A grammar for which a deterministic parser can be
    generated is guaranteed to be non-ambiguous
  • Since an arbitrary grammar will often fail to
    match one of standard parsing methods, it is
    important to have techniques to transform the
    grammar to non-ambiguous form.
  • We will assume that the grammar of the
    programming language is non-ambiguous.
  • That implies that to each input program there
    belongs either one syntax tree or no syntax tree
    (the program contains one or more errors)

8
Two classes of parsing methods
  • Syntax tree

9
Pre-order and post-order (1)
  • The top-down method constructs the syntax tree in
    pre-order
  • The bottom-up method constructs the syntax tree
    in post-order

10
Pre-order and post-order (2)
11
Principles of top-down parsing
  • The main task of a top-down parser is to choose
    the correct alternatives for known non-terminals

12
Principles of bottom-up parsing
  • The main task of a bottom-up parser is to
    repeatedly find the first node all of whose
    children have already been constructed.

13
Error detection and error recovery
  • The position at which the error is detected my be
    unrelated to the position of the actual error the
    user made.
  • Example
  • x a(pq( - b(r-s)

14
Error recovery
  • Two strategies
  • Error correction
  • Modifies the input token stream and/or the
    parsers internal state so that parsing can
    continue
  • Non-correcting error recovery
  • Does not modify the input stream, but rather
    discards all parser information and continues
    parsing the rest of the program with a grammar
    for rest of the program. (called suffix grammar)

15
Creating a top-down parser manually
  • Recursive descent parsing
  • Simplest way but has its limitations

16
Recursive descent parsing program (1)
17
Recursive descent parsing program (2)
18
Drawbacks
  • Three drawbacks
  • There is still some searching through the
    alternatives
  • The method often fails to produce a correct
    parser
  • Error handling leaves much to be desired

19
Second problems (1)
  • Example 1
  • Index_element will never be tried
  • IDENTIFIER

20
Second problems (2)
  • Example 2
  • The recognizer will not recognize ab

21
Second problems (3)
  • Example 3
  • Recursive descent parsers cannot handle
    left-recursive grammars

22
Creating a top-down parser automatically
  • The principles of constructing a top-down parser
    automatically derive from those of writing one by
    hand, by applying precomputation.
  • Grammars which allow the construction of a
    top-down parser to be performed are called LL(1)
    grammars.

23
LL(1) parsing
  • FIRST set
  • The sets of first tokens produced by all
    alternatives in the grammar.
  • We have to precompute the FIRST sets of all
    non-terminals
  • The first sets of the terminals are obvious.
  • Finding FIRST(?) is trivial when ? starts with a
    terminal.
  • FIRST(N) is the union of the FIRST sets of its
    alternatives.

24
Predictive recursive descent parser
  • The FIRST sets can be used in the construction of
    a predictive parser because it predicts the
    presence of a given alternative without trying to
    find out if it is there.

25
Closure algorithm for computing the FIRST set (1)
  • Data definitions

26
Closure algorithm for computing the FIRST set (2)
  • Initializations

27
Closure algorithm for computing the FIRST set (3)
  • Inference rules

28
FIRST sets example(1)
  • Grammar

29
FIRST sets example(2)
  • The initial FIRST sets

30
FIRST sets example(3)
  • The final FIRST sets

31
The predictive parser (1)
32
The predictive parser (2)
33
Practice
  • Find the FIRST sets of all alternative of the
    following grammar.
  • E -gt TE
  • E-gtTE?
  • T-gtFT
  • T-gtFT?
  • F-gt(E)id

34
Nullable alternatives
  • A complication arises with the case label for the
    empty alternative (ex. rest_expression). Since it
    does not itself start with any token, how can we
    decide whether it is the correct alternative?

35
FOLLOW sets
  • Follow sets
  • Determining the set of tokens that can
    immediately follow a given non-terminal N.
  • LL(1) parser
  • LL because the parser works from Left to right
    identifying the nodes in what is called Leftmost
    derivation order.
  • (1) because all choices are based on a one
    token look-ahead.

36
Closure algorithm for computing the FOLLOW sets
37
The first and follow sets
38
Recall the predictive parser
rest_expression ? expression ?
FIRST(rest_expr) , ?
void rest_expression(void) switch
(Token.class) case ''
token('') expression() break case EOF
case ')' break default
error()
FOLLOW(rest_expr) EOF, )
39
LL(1) conflicts
  • Example
  • The codes

40
LL(1) conflicts
  • FIRST/FIRST conflict
  • term ? IDENTIFIER
  • IDENTIFIER expression
  • ( expression )

41
LL(1) conflicts
  • FIRST/FOLLOW conflict
  • FIRST set FOLLOW set
  • S ? A a b a
  • A ? a ? a, ? a

42
LL(1) conflicts
  • left recursion
  • expression ? expression - term term
  • Look-ahead token
  • LL(1) method predicts the alternative Ak for a
    non-terminal N
  • FIRST(Ak) ? (if is nullable then FOLLOW(N))
  • LL(1) grammar
  • No FIRST/FIRST conflicts
  • No FIRST/FOLLOW conflicts
  • No multiple nullable alternatives
  • No non-terminal can have more than one nullable
    alternative.

43
Solve the LL(1) conflicts
  • Two options
  • Use a stronger parser
  • Make the grammar LL(1)

44
Making a grammar LL(1)
  • manual labour
  • rewrite grammar
  • adjust semantic actions
  • three rewrite methods
  • left factoring
  • substitution
  • left-recursion removal

45
Left-factoring
  • term ? IDENTIFIER
  • IDENTIFIER expression
  • factor out common prefix
  • term ? IDENTIFIER after_identifier
  • after_identifier ? ? expression

? FOLLOW(after_identifier)
46
Substitution
  • A ? a B c ?
  • S ? p A q
  • replace non-terminal by its alternative
  • S ? p a q p B c q p q
  • Example
  • S ? A a b
  • A ? a ?
  • replace non-terminal by its alternative
  • S ? a a b a b

47
Left-recursion removal
  • Three types of left-recursion
  • Direct left-recursion
  • N ? N?
  • Indirect left-recursion
  • Chain structure
  • N ? A
  • A ? B
  • Z ? N
  • Hidden left-recursion
  • N ? ? N (? can produce ?)

48
Left-recursion removal
  • N ? N ? ?
  • replace by
  • N ? ? M
  • M ? ? M ?
  • example
  • expression ? expression - term term

? ? ? ? ? ? ? ? ? ? ...
expression ? term expression_tail_option expressio
n_tail_option ? - term expression_tail_option
?
49
Practice
  • make the following grammar LL(1)
  • expression ? expression term expression -
    term term
  • term ? term factor term / factor factor
  • factor ? ( expression ) func-call
    identifier constant
  • func-call ? identifier ( expr-list? )
  • expr-list ? expression (, expression)

50
Answers
  • substitution
  • F ? ( E ) ID ( expr-list? ) ID
    constant
  • left factoring
  • E ? E ( - ) T T
  • T ? T ( / ) F F
  • F ? ( E ) ID ( ( expr-list? ) )?
    constant
  • left recursion removal
  • E ? T (( - ) T )
  • T ? F (( / ) F )

51
Undoing the semantic effects of grammar
transformations
  • While it is often possible to transform our
    grammar into a new grammar that is acceptable by
    a parser generator and that generates the same
    language, the new grammar usually assigns a
    different structure to strings in the language
    than our original grammar did
  • Fortunately, in many cases we are not really
    interested in the structure but rather in the
    semantics implied by it.

52
Semantics
Non-left-recursive equivalent
53
Automatic conflict resolution (1)
  • There are two ways in which LL parsers can be
    strengthened
  • By increasing the look-ahead
  • Distinguishing alternatives not by their first
    token but by their first two tokens is called
    LL(2).
  • Disadvantages the parser code can get much
    bigger.
  • By allowing dynamic conflict resolvers
  • When the conflict arises during parsing, some of
    conditions are evaluated to solve it.
  • The parser generator LLgen requires a conflict
    resolver to be placed on the first of two
    conflicting alternatives.

54
Automatic conflict resolution (2)
  • If-else statement in C
  • else_tail_option both FIRST set and FOLLOW set
    contain the token else
  • Conflict resolver

55
The LL(1) push-down automation
  • Transition table for an LL(1) parser

56
Push-down automation (PDA)
  • Type of moves
  • Prediction move
  • Top of the prediction stack is a non-terminal N.
  • N is removed from the stack
  • Look up the prediction table
  • Push the alternative of N into the prediction
    stack
  • Match move
  • Top of the prediction stack is a terminal
  • Termination
  • Parsing terminates when the prediction stack is
    exhausted.

57
Prediction move in an LL(1) PDA
58
Match move in an LL(1) PDA
59
Predictive parsing with an LL(1) PDA
60
PDA example (1)
input
prediction stack
aap ( noot mies ) EOF
input
61
PDA example (2)
input
prediction stack
aap ( noot mies ) EOF
input
replace non-terminal by transition entry
62
PDA example (3)
expression EOF
prediction stack
aap ( noot mies ) EOF
input
63
PDA example (4)
expression EOF
prediction stack
aap ( noot mies ) EOF
input
replace non-terminal by transition entry
64
PDA example (5)
term rest-expr EOF
prediction stack
aap ( noot mies ) EOF
input
65
PDA example (6)
term rest-expr EOF
prediction stack
aap ( noot mies ) EOF
input
replace non-terminal by transition entry
66
PDA example (7)
  • Please continue!!
  • Example of parsing (ii)i

67
LLgen
  • LLgen is part of the Amsterdam Compiler Kit
  • takes LL(1) grammar semantic actions in C and
    generates a recursive descent parser
  • The non-terminals in the grammar can have
    parameters, and rules can have local variables,
    both again expressed in C.
  • LLgen features
  • repetition operators
  • advanced error handling
  • parameter passing
  • control over semantic actions
  • dynamic conflict resolvers

68
LLgen
  • start from LR(1) grammar
  • make grammar LL(1)
  • use repetition operators

token DIGIT main line line
expr '\n' expr term '' term
term factor '' factor
factor '(' expr ') DIGIT
  • add semantic actions
  • attach parameters to grammar rules
  • insert C-code between the symbols

LLgen
69
Minimal non-left-recursive grammar for expressions
70
LLgen code for a parser
Grammar
Semantics
71
LLgen code for a parser
  • The code from previous page resides in a file
    called parser.g. LLgen converts the file to one
    called parser.c, which contains a recursive
    descent parser.

72
LLgen interface to lexical analyzer
73
LLgen interface to back-end
  • LLgen handles syntax errors by inserting missing
    tokens and deleting unexpected tokens
  • LLmessage() is invoked to notify the lexical
    analyzer

74
Creating a bottom-up parser automatically
  • Left-to-right parse, Rightmost-derivation
  • create a node when all
  • children are present
  • handle nodes representing
  • the right-hand side of a
  • production

75
LR(0) Parsing
  • Theoretically important but too weak to be
    useful.
  • running example expression grammar
  • input ? expression EOF
  • expression ? expression term term
  • term ? IDENTIFIER ( expression )
  • short-hand notation
  • Z ? E
  • E ? E T T
  • T ? i ( E )

76
LR(0) Parsing
  • keep track of progress inside potential
  • handles when consuming input tokens
  • LR items N ? ? ? ?
  • initial set

S0
Z ? E E ? E T E ? T T ? i T ? ( E )
77
? Closure algorithm for LR(0)
The important part is the inference rule it
predicts new handle hypotheses from the
hypothesis that we are looking for a certain
non-terminal, and is sometimes called prediction
rule it corresponds to an ? move, in that it
allows the automation to move to another state
without consuming input.
Reduce item an item with the dot at the
end Shift item the others
78
Transition Diagram
S2
T
E ? T ?
i
S1
T ? i ?
E
i
S4
E ? E ? T T ? ? i T ? ? ( E )


T
S6
Z ? E ?
79
LR(0) parsing example (1)
Z ? E E ? E T E ? T T ? i T ? ( E )
  • shift input token (i) onto the stack
  • compute new state

80
LR(0) parsing example (2)
Z ? E E ? E T E ? T T ? i T ? ( E )
stack
input
S0 i S1
i
  • reduce handle on top of the stack
  • compute new state

81
LR(0) parsing example (3)
Z ? E E ? E T E ? T T ? i T ? ( E )
stack
input
S0 T S2
i
i
  • reduce handle on top of the stack
  • compute new state

82
LR(0) parsing example (4)
Z ? E E ? E T E ? T T ? i T ? ( E )
stack
input
S0 E S3
i
T
  • shift input token on top of the stack
  • compute new state

i
83
LR(0) parsing example (5)
Z ? E E ? E T E ? T T ? i T ? ( E )
stack
input
S0 E S3 S4
i
T
  • shift input token on top of the stack
  • compute new state

i
84
LR(0) parsing example (6)
Z ? E E ? E T E ? T T ? i T ? ( E )
stack
input
S0 E S3 S4 i S1

T
  • reduce handle on top of the stack
  • compute new state

i
85
LR(0) parsing example (7)
Z ? E E ? E T E ? T T ? i T ? ( E )
stack
input
S0 E S3 S4 T S5

T
i
  • reduce handle on top of the stack
  • compute new state

i
86
LR(0) parsing example (8)
Z ? E E ? E T E ? T T ? i T ? ( E )
stack
input
S0 E S3

E

T
  • shift input token on top of the stack
  • compute new state

i
T
i
87
LR(0) parsing example (9)
Z ? E E ? E T E ? T T ? i T ? ( E )
stack
input
S0 E S3 S6
E

T
  • reduce handle on top of the stack
  • compute new state

i
T
i
88
LR(0) parsing example (10)
Z ? E E ? E T E ? T T ? i T ? ( E )
stack
input
S0 Z
E
  • accept!

E

T
i
T
i
89
Precomputing the item set (1)
  • Initial item set

90
Precomputing the item set (2)
  • Next item set

91
Complete transition diagram
92
The LR push-down automation
  • Two major moves and a minor move
  • Shift move
  • Remove the first token from the present input and
    pushes it onto the stack
  • Reduce move
  • N -gt ?
  • ? are moved from the stack
  • N is then pushed onto the stack
  • Termination
  • The input has been parsed successfully when it
    has been reduced to the start symbol.

93
GOTO and ACTION tables
94
LR(0) parsing of the input ii
95
LR comments
  • The bottom-up parsing, unlike the top-down
    parsing, has no problems with left-recursion.
  • On the other hand, bottom-up parsing has a slight
    problem with right-recursion.

96
LR(0) conflicts (1)
  • shift-reduce conflict
  • array indexing T ? i E
  • T ? i ? E (shift)
  • T ? i ? (reduce)
  • ?-rule RestExpr ? ?
  • Expr ? Term ? RestExpr (shift)
  • RestExpr ? ? (reduce)

97
LR(0) conflicts (2)
  • reduce-reduce conflict
  • assignment statement Z ? V E
  • V ? i ? (reduce)
  • T ? i ? (reduce)
  • (Different reduce rules)
  • typical LR(0) table contains many conflicts

98
Handling LR(0) conflicts
  • Use a one-token look-ahead
  • Use a two-dimensional ACTION table
  • different construction of ACTION table
  • SLR(1) Simple LR
  • LR(1)
  • LALR(1) Look-Ahead LR

99
SLR(1) parsing
  • A handle should not be reduced to a non-terminal
    N if the look-ahead is a token that cannot follow
    N.
  • reduce N ? ? iff token ? FOLLOW(N)
  • FOLLOW(N)
  • FOLLOW(Z)
  • FOLLOW(E) , ),
  • FOLLOW(T) , ),

100
SLR(1) ACTION table
shift
101
SLR(1) ACTION/GOTO table
1 Z ? E 2 E ? T 3 E ? E T 4 T ? i 5
T ? ( E )
s7
sn shift to state n rn reduce rule n
102
Example of resolving conflicts (1)
  • A new rule T ? i E

1 Z ? E 2 E ? T 3 E ? E T 4 T ?
i 5 T ? ( E ) 6 T ? i E
103
Example of resolving conflicts (2)
1 Z ? E 2 E ? T 3 E ? E T 4 T ?
i 5 T ? ( E ) 6 T ? i E
s5
T ? i. T ? i. E
104
Unfortunately
  • SLR(1) leaves many shift-reduce conflicts
    unsolved
  • problem FOLLOW(N) set is a union of all all
    look-aheads of all alternatives of N in all
    states
  • example
  • S ? A x b
  • A ? a A b B
  • B ? x

Follow (S) Follow(A) b, Follow(B) b,

105
SLR(1) automation
106
LR(1) parsing
  • The LR(1) technique does not rely on FOLLOW sets,
    but rather keeps the specific look-ahead with
    each item
  • LR(1) item N ? ? ? ? ?
  • ? - closure for LR(1) item sets
  • if set S contains an item P ? ? ? N ? ? then
  • for each production rule N ? ?
  • S must contain the item N ? ? ? ?
  • where ? FIRST( ? ? )

107
Creating look-ahead sets
  • Extended definition of FIRST stes
  • If FIRST(?) does not contain ?, FIRST(??) is
    just equal to FIRST(?) if ? can produce ?,
    FIRST(??) contain all the tokens in FIRST(?),
    excluding ?, plus the tokens in ?.

108
LR(1) automation
109
LR(1) parsing comments
  • LR(1) automation is more discriminating than the
    SLR(1).
  • In fact, it is so strong that any language that
    can be parsed from left to right with a one-token
    look-ahead in linear time can be parsed using the
    LR(1).
  • LR tables are big
  • Combine equal sets by merging look-ahead sets
    LALR(1).

110
LALR(1)
  • S3 and S10 are similar in that they are equal if
    one ignores the look-ahead sets, and so are S4
    and S9, S6 and S11, and S8 and S12.

111
LALR(1) automation
112
Practice
  • Derive the LALR(1) ACTION/GOTO table for the
    grammar in Fig. 2.95

113
Making a grammar LR(1) or not
  • Although the chances for a grammar to be LR(1)
    are much larger than those being SLR(1) or LL(1),
    one often encounters a grammar that still is not
    LR(1). The reason is generally that the grammar
    is ambiguous.
  • For Example
  • if_statement -gt if ( expression ) statement
  • if (expression ) statement else
    statement
  • statement -gt if_statement
  • The statement if (xgt0) if (ygt0) p0 else q0

114
Possible syntax trees (1)
115
Possible syntax trees (2)
116
Resolving shift-reduce conflicts (1)
  • The longest possible sequence of grammar symbols
    is taken for reduction.
  • In a shift-reduce conflict do shift.
  • Another example

input i i i E ? E ? E E ? E E
?
reduce
shift
117
Resolving shift-reduce conflicts (2)
  • The use of precedences between tokens
  • Example a shift-reduce conflict on t
  • P -gt ??t? (shift item)
  • Q -gt ?uR ?t (reduce item)
  • where R is either empty or one non-terminal.
  • If the look-ahead is t, we perform one of the
    following three actions
  • If symbol u has a higher precedence than symbol
    t, we reduce
  • If t has a higher precedence than symbol u, we
    shift.
  • If both have equal precedence, we also shift

118
Bottom-up parser yacc/bison
  • The most widely used parser generator is yacc
  • Yacc is an LALR(1) parser generator
  • A yacc look-alike called bison, provided by GNU

119
A very high-level view of text analysis techniques
120
Yacc code example (constructing parser tree)
121
Yacc code example (auxiliary code)
Write a Comment
User Comments (0)
About PowerShow.com