Title: CS412/413
1CS412/413
- Introduction to
- Compilers and Translators
- Spring 99
- Lecture 5 Bottom-up parsing
2Outline
- Creating LL(1) grammars
- Limitations of LL(1) grammars
- Bottom-up parsing
- LR(0) parser construction
3Administration
- Should have received mail about group assignments
by now - Homework 1 due next class (Friday)
- Monday considered 2 days late (-20), Tuesday 3
days (-40) - No class next Monday (Feb 8)
4Programming Assignment
- Due Monday, Feb 15
- Implement a lexer for Iota language
- Do not need to implement DFA construction
- Opportunity to work as group
- We expect high quality
5Review
- Can construct recursive descent parsers for LL(1)
grammars
Language grammar
How to perform this step?
LL(1) grammar
predictive parse table
recursive-descent parser
recursive-descent parser w/ AST generation
6Grammars
- Have been using grammar for language of sums
with parentheses - Original grammar
- S ? S E E
- E ? number ( S )
- LL(1) grammar for same language
- S ? ES
- S ? ? S
- E ? number ( S )
(1(34))5
7Left-recursive vs Right-recursive
(1 2 (3 4)) 5
- Original grammar was left-recursive
- S ? S E
- S ? E
- LL(1) grammar is right-recursive parsed
top-down - S ? E S
- S ? ? S
- Left-recursive grammars dont work with top-down
parsing -- need an arbitrary amount of look-ahead
S ? E S S ? E
S
S E
(...) (...) (...) (...) ...
S E
S E
8How to create an LL(1) grammar
- Write a right-recursive grammar
- S ? E S
- S ? E
- Left-factor common prefixes, place suffix in new
non-terminal - S ? E S
- S ? ?
- S ? S
9Right Recursion
(1 2 (3 4)) 5
S
- Right recursion right-associative
E S
( S ) S
5
E S
5
1
1
S
2
E S
3 4
2 S
E S
- Left recursion left-associative
?
( S )
5
E S
S
3
E
3
4
1
2
4
10Associativity
- We can provide left-associativity by massaging
the recursive-descent code
void parse_S() switch (token) case (
case number parse_E() parse_S()
return default throw new ParseError()
void parse_S() switch (token) case
token input.read() parse_S()
return case ) return case EOF
return default throw new ParseError()
11Associativity
- void parse_S() // parses a sequence of E E E
... - switch (token)
- case ( case number
- parse_E()
- switch (token)
- case token input.read() parse_S()
return - case ) return
- case EOF return
- default throw new ParseError()
-
- return
- default throw new ParseError()
tail recursion
12Flattening Associative Operators
- void parse_S () // parses an arbitrary sequence
of E E E ... - while (true)
- switch (token)
- case (
- case number
- parse_E ()
- switch (token)
- case token input.read()
break case ) case EOF return - default throw new ParseError()
-
- break
- default throw new ParseError()
(1 2 (34)) 5
5
1
2
13Summary
- Now have complete recipe for building a parser
Language grammar
LL(1) grammar
predictive parse table
recursive-descent parser
recursive-descent parser w/ AST generation
14Bottom-up parsing
- A more powerful parsing technology
- LR grammars -- more power than LL
- can handle left-recursive grammars, virtually all
programming languages - More natural expression of programming language
syntax - Shift-reduce parsers
- automatic parser generators (e.g. yacc)
- detect errors as soon as possible
- allows better error recovery
15Top-down parsing
S ? S E E E ? number ( S )
(12(34))5
- S ? SE ? EE ? (S)E ? (SE)E ?(SEE)E
?(EEE)E ?(1EE)E?(12E)E ... - In left-most derivation, entire tree above a
token (2) has been expanded when encountered - Must be able to predict!
S
S E
E
5
( S )
S E
( S )
S E
E
S E
2
4
1
E
3
16Bottom-up parsing
S ? S E E E ? number ( S )
- Right-most derivation-- backward
- Start with the tokens
- End with the start symbol
- (12(34))5 ? (E2(34))5 ? (S2(34))5
?(SE(34))5 ? (S(34))5 ? (S(E4))5
?(S(S4))5 ?(S(SE))5 ? (S(S))5 ?(SE)5 ?
(S)5 ? E5 ? SE ? S
17Bottom-up parsing
S ? S E E E ? number ( S )
(12(34))5 ? (12(34))5 (E2(34))5
? (1 2(34))5 (S2(34))5 ? (1
2(34))5 (SE(34))5 ? (12
(34))5 (S(34))5 ? (12(3
4))5 (S(E4))5 ? (12(3
4))5 (S(S4))5 ? (12(3
4))5 (S(SE))5 ? (12(34
))5 (S(S))5 ? (12(34 ))5 (SE)5
? (12(34) )5 (S)5 ? (12(34)
)5 E5 ? (12(34)) 5 SE ?
(12(34))5 S (12(34))5
right-most derivation
18Bottom-up parsing
S ? S E E E ? number ( S )
- (12(34))5 ? (E2(34))5 ? (S2(34))5
?(SE(34))5 - Advantage of bottom-up parsing can select
productions based on more information
S
S E
E
5
( S )
S E
( S )
S E
E
S E
2
4
1
E
3
19Top-down vs. Bottom-up
Bottom-up Dont need to figure out as much of
the parse tree for a given amount of input
scanned unscanned
scanned unscanned
Top-down
Bottom-up
20Shift-reduce parsing
- Parsing is a sequence of shift and reduce
operations - Parser state is a stack of terminals and
non-terminals (grows to the right) - Unconsumed input is a string of terminals
- Current derivation step is always stackinput
- Shift -- push head of input onto stack
- stack input
- ( 12(34))5
- (1 2(34))5
21Reduce
- Replace symbols ? in top of stack with
non-terminal symbol X, corresponding to
production X ? ? (pop ?, push X) - stack input
- (SE (34))5 reduce S? SE
- (S (34))5
- What effect does this have on derivation?
22Shift-reduce parsing
S ? S E E E ? number ( S )
- (12(34))5 ? (12(34))5 shift
- (12(34))5 ? ( 12(34))5 shift
- (12(34))5 ? (1 2(34))5 reduce E?num
- (E2(34))5 ? (E 2(34))5 reduce S ? E
- (S2(34))5 ? (S 2(34))5 shift
- (S2(34))5 ? (S 2(34))5 shift
- (S2(34))5 ? (S2 (34))5 reduce E?num
- (SE(34))5 ? (SE (34))5 reduce S? SE
- (S(34))5 ? (S (34))5 shift
- (S(34))5 ? (S (34))5 shift
- (S(34))5 ? (S( 34))5 shift
- (S(34))5 ? (S(3 4))5 reduce E?num
derivation
input stream
action
stack
23Problem
- How do we know which action to take -- whether to
shift or reduce, and which production? - Sometimes can reduce but shouldnt
- e.g., X ? ? can always be reduced
- Sometimes can reduce in different ways
24Action Selection Problem
- Given stack ? and input symbol b, should we
- shift b onto the stack (making it ?b)
- reduce some production X ? ? assuming that stack
has the form ? ? (making it ?X) - Should apply reduction X ? ? depending on what
stack prefix ? is -- but ? is different for
different possible reductions, since ?s have
different length. How to keep track?
25Parser States
- Idea summarize all possible stack prefixes ? as
a parser state - A state transition function updates the parser
state as shifts and reductions are performed DFA - Summarizing discards information
- affects what grammars parser handles
- affects size of DFA (number of states)
26LR(0) parser
- Left-to-right scanning, Right-most derivation,
zero look-ahead characters - Too weak to handle most language grammars
(including this one) - But will help us understand how to build better
parsers
27LR(0) states
- A state is a set of items
- An LR(0) item is a production from the language
with a separator . somewhere in the RHS of the
production - Stuff before . already on stack (beginnings of
possible ?s) - Stuff after . what we might see next
- The prefixes ? represented by state
E ? number . E ? ( . S )
state
item
28An LR(0) grammar non-empty lists
- S ? ( L )
- S ? id
- L ? S
- L ? L , S
- x (x,y) (x, (y,z), w)
- ((((x)))) (x, (y, (z, w)))
29Closure
S ? ( L ) id L ? S L, S
S ? . S S ? . ( L ) S ? . id
Closure
start state
S ? . S
- Closure of a state adds items for all
productions whose LHS occurs in an item in the
state, just after . - Added items have the . located at the
beginning - Like NFA ? DFA conversion
30Applying shift actions
S ? ( . L ) L ? . S L ? . L , S S ? . ( L
) S ? . id
S ? ( L ) id L ? S L , S
S ? . S S ? . ( L ) S ? . id
(
(
id
id
S ? id .
In new state, include all items that have
appropriate input symbol just after dot, and
advance dot in those items. (and take closure)
31Applying reduce actions
S ? ( . L ) L ? . S L ? . L , S S ? . ( L
) S ? . id
S ? ( L . ) L ? L . , S
L
S ? . S S ? . ( L ) S ? . id
(
S
(
L ? S .
id
id
S ? id .
states causing reductions
- Need to set state after reducing
- On reduction, pop back to old state and take DFA
transition on non-terminal reduced
32Full DFA (Appel p. 63)
8
9
2
L ? L , . S S ? . ( L ) S ? . id
1
id
S
id
S ? . S S ? . ( L ) S ? . id
S ? id .
L ? L , S .
id
3
S ? ( . L ) L ? . S L ? . L , S S ? . ( L
) S ? . id
(
5
L
S ? ( L . ) L ? L . , S
S
)
(
S
6
S ? ( L ) .
4
7
L ? S .
S ? S .
final state
33S ? ( L ) id L ? S L, S
- Idea stack is labeled w/state
- Lets try parsing ((x),y)
- derivation stack input action
- ((x),y) ? 1 ((x),y) shift, goto 3
- ((x),y) ? 1 (3 (x),y) shift, goto 3
- ((x),y) ? 1 (3 (3 x),y) shift, goto 2
- ((x),y) ? 1 (3 (3 x2 ),y) reduce S?id
- ((S),y) ? 1 (3 (3 S7 ),y) reduce L?S
- ((L),y) ? 1 (3 (3 L5 ),y) shift, goto 6
- ((L),y) ? 1 (3 (3 L5)6 ,y) reduce S?(L)
- (S,y) ? 1 (3 S7 ,y) reduce L?S
- (L,y) ? 1 (3 L5 ,y) shift, goto 8
- (L,y) ? 1 (3 L5 , 8 y) shift, goto 9
- (L,y) ? 1 (3 L5 , 8 y2 ) reduce S?id
- (L,S) ? 1 (3 L5 , 8 S9 ) reduce L?L , S
- (L) ? 1 (3 L5 ) shift, goto 6
- (L) ? 1 (3 L5 )6 reduce S?(L)
- S 1 S4 done
34Summary
- Grammars can be parsed bottom-up using a DFA
stack - State construction converts grammar into states
that capture information needed to know what
action to take - Stack entries labeled by state index
- Next time SLR, LR(1) parsers, automatic parser
generators