Title: Syntax Analysis
1Syntax Analysis
- Introduction to parsers
- Context-free grammars
- Push-down automata
- Top-down parsing
- LL grammars and parsers
- Bottom-up parsing
- LR grammars and parsers
- Bison/Yacc - parser generators
- Error Handling Detection Recovery
2Introduction to parsers
token
source
syntax
Parser
tree
code
next token
Symbol Table
3Context Free Grammar
- CFG Terminology
- Rewrite vs. Reduce
- Derivation
- Language and CFL
- Equivalence CNF
- Parsing vs. Derivation
- lm/rm derivation parse tree
- Ambiguity resolution
- Expressive power
Derivation is the reverse of Parsing. If we know
how sentences are derived, we may find a parsing
method in the reversed direction.
4CFG An Example
- Terminals id, , -, , /, (, )
- Nonterminals expr, op
- Productions expr ? expr op expr
expr ? ( expr ) expr ? - expr expr
? id op ? - / - The start symbol expr
5Notational Conventions in CFG
- a, b, c, -0-9, id symbols in ?
- A, B, C,,S, expr,stmt symbols in N
- U, V, W,,X,Y,Z grammar symbols in(?N)
- a, b, g,denotes strings in (?N)
- u, v, w, denotes strings in ?
- is an abbreviation of
- Alternatives a, b, at RHS
6Notational Conventions in CFG
- Abbreviation
- is the abbreviation of
7Context-Free Grammars
- A set of terminals basic symbols from which
sentences are formed - A set of nonterminals syntactic variables
denoting sets of strings - A set of productions rules specifying how the
terminals and nonterminals can be combined to
form sentences - The start symbol a distinguished nonterminal
denoting the language
8CFG ComponentsSpecification for Structures
Constituency
- CFG formal specification of structure (parse
trees) - G ?, N, P, S
- ? terminal symbols
- N non-terminal symbols
- P production rules
- S start symbol
9CFG Components
- ? terminal symbols
- the input symbols of the language
- programming language tokens (reserved words,
variables, operators, ) - natural languages words or parts of speech
- pre-terminal parts of speech (when words are
regarded as terminals) - N non-terminal symbols
- groups of terminals and/or other non-terminals
- S start symbol the largest constituent of a
parse tree
10CFG Components
- P production (re-writing) rules
- form A ? ß (A non-terminal, ß string of
terminals and non-terminals) - meaning A re-writes to (consists of, derived
into)ß, or ß reduced to A - start with S-productions (S ? ß)
11Derivations
- A derivation step is an application of a
production as a rewriting rule E ? - E - A sequence of derivation steps E ? - E ? - ( E )
? - ( id ) is called a derivation of - ( id )
from E - The symbol ? denotes derives in zero or more
steps the symbol ? denotes derives in one or
more steps
12CFG Accepted Languages
- Context-Free Language
- Language accepted by a CFG
- L(G) ? S ? ? (strings of terminals that
can be derived from start symbol) - Proof of acceptance by induction
- On the number of derivation steps
- On the length of input string
13Context-Free Languages
- A context-free language L(G) is the language
defined by a context-free grammar G - A string of terminals ? is in L(G) if and only if
S ? ?, ? is called a sentence of G - If S ? ?, where ? may contain nonterminals, then
we call ? a sentential form of G E ? - E ? - (
E ) ? - ( id ) - G1 is equivalent to G2 if L(G1) L(G2)
14CFG Equivalence
- Chomsky Normal Form (CNF) (Chomsky, 1963)
- e-free, and
- Every production rule is in either of the
following form - A ? A1 A2
- A ? a (A1, A2 non-terminal, a terminal)
- i.e., two non-terminals or one terminal at the
RHS - Properties
- Generate binary parse tree
- Good simplification for some algorithms
- e.g., grammar training with the inside-outside
algorithm (Baker 1979) - Good tool for theoretical proving
- e.g., time complexity
15CFG Equivalence
- Every CFG can be converted into a weakly
equivalent CNF - equivalence L(G1) L(G2)
- strong equivalent assign the same phrase
structure to each sentence (except for renaming
non-terminals) - weak equivalent do not assign the same phrase
structure to each sentence - e.g., A ? B C D A ? B X, X ? CD
16CFG An Example
- Terminals id, , -, , /, (, )
- Nonterminals expr, op
- Productions expr ? expr op expr R1
expr ? ( expr ) R2
expr ? - expr R3 expr ?
id R4 op ?
- / - The start symbol expr
17Left- Right-most Derivations
- Each derivation step needs to choose
- a nonterminal to rewrite
- an alternative to apply
- A leftmost derivation always chooses the leftmost
nonterminal to rewrite E ?lm - E ?lm - ( E ) ?lm
- ( E E ) ?lm - ( id E ) ?lm - ( id
id ) - A rightmost (canonical) derivation always chooses
the rightmost nonterminal to rewrite E ?rm - E
?rm - ( E ) ?rm - ( E E ) ?rm - (E
id ) ?rm - ( id id )
18Left- Right-most Derivations
- Representation of leftmost/rightmost derivations
- Use the sequence of productions (or production
numbers) to represent a derivation sequence. - Example
- E ?rm - E ?rm - ( E ) ?rm - ( E E )
?rm - (E id ) ?rm - ( id id ) - gt 3, 2, 1, 4, 4 ( R3, R2, R1, R4, R4)
- Advantage A compact representation for parse
tree (data compression) - Each parse tree has a unique leftmost/rightmost
derivation
19Parse Trees
- A parse tree is a graphical representation for a
derivation that filters out the order of choosing
nonterminals for rewriting
20Context Free Grammar (CFG) Specification for
Structures Constituency
- Parse Tree graphical representation of structure
- Root node (S) a sentencial level structure
- Internal nodes constituents of the sentence
- Arcs relationship between parent nodes and their
children (constituents) - Terminal nodes surface forms of the input
symbols (e.g., words) - Bracketed notation Alternative representation
- e.g., I saw the girl in the park
21Parse TreeI saw the girl in the park
1st parse
22Parse TreeI saw the girl in the park
S
2nd parse
NP
VP
NP
PP
NP
NP
v
pron
det
n
p
det
n
in
girl
the
park
I
saw
the
23LM RM An Example
E ?lm - E ?lm - ( E ) ?lm - ( E E
)?lm - ( id E ) ?lm - ( id id )
E ?rm - E ?rm - ( E ) ?rm - ( E E
)?rm - ( E id ) ?rm - ( id id )
24Parse Trees Derivations
- Many derivations may correspond to the same parse
tree, but every parse tree has associated with it
a unique leftmost and a unique rightmost
derivation
25Ambiguous Grammar
- A grammar is ambiguous if it produces more than
one parse tree for some sentence - more than one leftmost/rightmost derivation
E ? E E ? id E ? id E E ? id
id E ? id id id
E ? E E ? E E E ? id E E ? id
id E ? id id id
26Ambiguous Grammar
27Resolving Ambiguity
- Use disambiguating rules to throw away
undesirable parse trees - Rewrite grammars by incorporating disambiguating
rules into grammars
28An Example
- The dangling-else grammar stmt ? if expr then
stmt if expr then stmt else
stmt other - Two parse trees for if E1 then if E2 then S1
else S2
29An Example
Preferred parse closest then
30Disambiguating Rules
- Rule match each else with the closest previous
unmatched then - Remove undesired state transitions in the
pushdown automaton - shift/reduce conflict on else
- 1st parse reduce
- 2nd parse shift
31Grammar Rewriting
stmt ? m_stmt with only paired
then-else unm_stmt m_stmt ? if
expr then m_stmt else m_stmt
other unm_stmt ? if expr then stmt
if expr then m_stmt else unm_stmt
32RE vs. CFG
- Every language described by a RE can also be
described by a CFG - Example (ab)abb
- A0 ? a A0 b A0 a A1
- A1 ? b A2
- A2 ? b A3
- A3 ? e
- Right branching
- Starts with a terminal symbol
33RE vs. CFG
- Regular Grammar
- Right branching
- Starts with a terminal symbol
A2
(ab)
abb
b
A3
e
34RE vs. CFG
A0 ? a A0 b A0 a A1 A1 ? b A2 A2 ? b A3 A3 ? e
RE (a b)abb
A2
A0
A3
A1
35RE vs. CFG
A2
A0
A0 ? b A0 a A1 A1 ? a A1 b A2 A2 ? a A1 b
A3 A3 ? a A1 b A0 e
A3
A1
36CFG Expressive Power (cont.)
- Writing a CFG for a FSA (RE)
- define a non-terminal Ni for a state with state
number i - start symbol S N0 (assuming that state 0 is the
initial state) - for each transition d(i,a)j (from state i to
stet j on input alphabet a), add a new production
Ni ? a Nj to P (a e?Ni ? Nj) - for each final state i, add a new production Ni ?
eto P
37CFG Expressive Power (cont.)
N0 ? a N0 b N0 a N1 N1 ? b N2 N2 ? b N3 N3 ? e
38CFG Expressive Power
- CFG vs. Regular Expression (R.E.)
- Every R.E. can be recognized by a FSA
- Every FSA can be represented by a CFG with
production rules of the form A ?
a B e - Therefore, L(RE) ? L(CFG)
39CFG Expressive Power (cont.)
- Chomsky Hierarchy
- R.E. Regular set (recognized by FSAs)
- CFG Context-free (Pushdown automata)
- CSG Context-sensitive (Linear bounded automata)
- Unrestricted Recursively enumerable (Tuning
Machine)
40Push-Down Automata
41RE vs. CFG
- Why use REs for lexical syntax?
- do not need a notation as powerful as CFGs
- are more concise and easier to understand than
CFGs - More efficient lexical analyzers can be
constructed from REs than from CFGs - Provide a way for modularizing the front end into
two manageable-sized components
42CFG vs. Finite-State Machine
- Inappropriateness of FSA
- Constituents only terminals
- Recursion do not allow A gt B gt A
- RTN (Recursive Transition Network)
- FSA with augmentation of recursion
- arc terminal or non-terminal
- if arc is non-terminal call to a sub-transition
network return upon traversal
43Nonregular Constructs
- REs can denote only a fixed number of repetitions
or an unspecified number of repetitions of one
given construct - E.g. ab
- A nonregular construct
- L anbn n ? 1
44Non-Context-Free Constructs
- CFGs can denote only a fixed number of
repetitions or an unspecified number of
repetitions of one or two (paired) given
constructs - E.g. anbn
- Some non-context-free constructs
- L1 wcw w is in (a b)
- declaration/use of identifiers
- L2 anbmcndm n ? 1 and m ? 1
- formal arguments/actual arguments
- L3 anbncn n ? 0
- e.g., b Backspace, c under score
45Context-Free Constructs
- FA (RE) cannot keep counts
- CFGs can keep count of two items but not three
- Similar context-free constructs
- L1 wcwR w is in (a b), R reverse order
- L2 anbmcmdn n ? 1 and m ? 1
- L2 anbncmdm n ? 1 and m ? 1
- L3 anbn n ? 1
46CFG Parsers
47Types of CFG Parsers
- Universal can parse any CFG grammar
- CYK, Earley
- CYK Exhaustively matching sub-ranges of input
tokens against grammar rules, from smaller ranges
to larger ranges - Earley Exhaustively enumerating possible
expectations from left-to-right, according to
current input token and grammar - Non-universal e.g., recursive descent parser
- Universal (to all grammars) is NOT always
efficient
48Types of CFG Parsers
- Practical Parsers what is a good parser?
- Simple simple program structure
- Left-to-right (or right-to-left) scan
- middle-out or island driven is often not
preferred - Top-down or Bottom up matching
- Efficient efficient for good/bad inputs
- Parse normal syntax quickly
- Detect errors immediately on next token
- Deterministic
- No alternative choices during parsing given next
token - Small lookahead buffer (also contribute to
efficiency)
49Types of CFG Parsers
- Top Down
- Matching from start symbol down to terminal
tokens - Bottom Up
- Matching input tokens with reducible rules from
terminal up to start symbol
50Efficient CFG Parsers
- Top Down LL Parsers
- Matching from start symbol down to terminal
tokens, left-to-right, according to a leftmost
derivation sequence - Bottom Up LR Parsers
- Matching input tokens with reducible rules,
left-to-right, from terminal up to start symbol,
in a reverse order of rightmost derivation
sequence
51Efficient CFG Parsers
- Efficient Deterministic Parsing only possible
for some subclasses of grammars with special
parsing algorithms - Top Down
- Parsing LL Grammars with LL Parsers
- Bottom Up
- Parsing LR Grammars with LR Parsers
- LR grammar is a larger class of grammars than LL
52Parsing Table Construction for Efficient Parsers
Good parsers do not change their codes when the
grammar is revised.? Table driven.
- Parsing Table
- A pre-computed table (according to the grammar),
indicating the appropriate action(s) to take in
any predefined state when some input token(s)
is/are under examination - Lookahead symbol(s) the input symbol(s) under
examination for determining next action(s)
id num
State-0 action-1 action-3
State-1 action-2 action-5
State-2 action-4
53Parsing Table Construction for Efficient Parsers
- Parsing Table Construction
- Decide a pre-defined number of lookaheads to use
for predicting next state - Define and enumerate all the unique states for
the parsing method - Decide the actions to take in all states with all
possible lookahead(s)
54Parsing Table Construction for Efficient Parsers
- X-Parser you can invent any parser and call it
the X-Parser - But its parsing algorithm may not handle all
grammars deterministically, thus efficiently. - X-Grammar
- Any grammar whose parsing table for the X-parsing
method/X-Parser has no conflicting actions in all
states - Non-X Grammar has more than one action to take
under any state
55Parsing Table Construction for Efficient Parsers
- k The number of lookahead symbols used by a
parser to determine the next action - A larger number of lookahead symbols tends to
make it less possible to have conflicting actions - But may result in a much larger table that grows
exponentially with the number of lookaheads - Does not guarantee unambiguous for some grammars
(inherently ambiguous) - X(k) Parser
- X Parser that uses k lookahead symbols to
determine the next action - X(k) Grammar
- any grammar deterministically parsable with X(k)
Parser
56Types of Grammars Capable of Efficient Parsing
- LL(k) Grammars
- Grammars that can be deterministically parsed
using an LL(k) parsing algorithm - e.g., LL(1) grammar
- LR(k) Grammars
- Grammars that can be deterministically parsed
using an LR(k) parsing algorithm - e.g., SLR(1) grammar, LR(1) grammar, LALR(1)
grammar
57Top-Down CFG Parsers
- Recursive Descent Parser
- vs.
- Non-Recursive LL(1) Parser
58Top-Down Parsing
- Construct a parse tree from the root to the
leaves using leftmost derivation S ? c A
B input cad A ? a b a B ? d
S
c
A
B
a
d
59Predictive Parsing
- A top-down parsing without backtracking
- there is only one alternative production to
choose at each derivation stepstmt ? if expr
then stmt else stmt while expr do
stmt begin stmt_list end
60LL(k) Parsing
- The first L stands for scanning the input from
left to right - The second L stands for producing a leftmost
derivation - The k stands for the number of input symbols for
lookahead used to choose alternative productions
at each derivation step
61LL(1) Parsing
- Use one input symbol of lookahead
- Same as Recursive-descent parsing
- But, Nonrecursive predictive parsing
62Recursive Descent Parsing (more)
- The parser consists of a set of (possibly
recursive) procedures - Each procedure is associated with a nonterminal
of the grammar - The sequence of procedures called in processing
the input implicitly defines a parse tree for the
input
63An Example
type ? simple id
array simple of type simple ?
integer char
num dotdot num
64An Example
array num dotdot num of integer
65An Example
procedure type begin if lookahead is in
integer, char, num then simple else if
lookahead id then match(id) else if
lookahead array then begin
match(array) match('') simple match('')
match(of) type end else error end
66An Example
procedure match(t token) begin if
lookahead t then lookahead
nexttoken else error end
67An Example
procedure simple begin if lookahead integer
then match(integer) else if lookahead
char then match(char) else if lookahead
num then begin match(num) match(dotdot)
match(num) end else error end
68LL(k) Constraint Left Recursion
- A grammar is left recursive if it has a
nonterminal A such that A ? A ?
A ? A ? ?
A ? ? R R ? ? R ?
A
A
R
R
A
R
R
A
? ?
A
69Direct/Immediate Left Recursion
A ? A ?1 A ?2 ... A ?m ?1 ?2 ... ?n
is equivalent to
A ? A ?i ?j (i1,m j1,n)
A ? ?1 A' ?2 A' ... ?n A'
A' ? ?1 A' ?2 A' ... ?m A' ?
(?1 ?2 ... ?n ) (?1 ?2 ... ?m )
70An Example
E ? E T T T ? T F F F ? ( E
) id E ? T E' E' ? T E' ? T ? F
T' T' ? F T' ? F ? ( E ) id
71Indirect Left Recursion
G0 S ? A a b A ? A c S d ?
Problem Indirect Left-Recursion S ?
A a ? S d a Solution-Step1 Indirect to Direct
Left-Recursion A ? A c A a d b d ?
Solution-Step2 Direct Left-Recursion to
Right-Recursion S ? A a b A ?
b d A' A' A' ? c A' a d A' ?
- Scan rules top-down
- Do not start with symbols defined earlier (gt
substitute them if any) - Resolve direct recursion
72Indirect Left Recursion
Input. Grammar G with no cycles or
?-production. Output. An equivalent grammar with
no left recursion. 1. Arrange the nonterminals in
some order A1, A2, ..., An 2. for i 1 to n do
begin // Step1 Substitute 1st-symbols of Ai for
j 1 to i - 1 do begin // which are previous
Ajs replace each production of the form Ai
? Aj ? ( j lt i ) by the production Ai ? ?1 ?
?2 ? ... ?k ? where Aj ? ?1 ?2 ...
?k are all the current Aj-productions end eli
minate direct left recursion among Ai-productions
// Step2 end
73Left Factoring
- Two alternatives of a nonterminal A have a
nontrivial common prefix if ? ? ? , and A ?
? ?1 ? ?2 A ? ? A' A' ? ?1 ?2
74An Example
S ? i E t S i E t S e S a E ? b S ? i E t
S S' a S' ? e S ? E ? b
75Transition Network as a Plan for
Recursive-Descent Parser
- CFG gt RTN gt simplified RTN gt Parser
- - tail recursion
- - remove unnecessary e-move
- - merge sub-networks
- - merge equivalent states
- Section 4.4 Aho 86
- Example Infix Expression
- Example HTML Document Parser
76Top-Down Parsing as Stack Matching
- Construct a parse tree from the root to the
leaves using leftmost derivation S ? c A
B input cad A ? a b a B ? d
S
c
A
B
a
d
77Nonrecursive Predictive Parsing General State
a b c x y z
Input
X
Stack
Non-Recursive Stack Driver Program (instead
of Recursive procedures)
Parsing program (parser/driver)
Output
MX,a X -gt Y1 Y2 Yk
Parsing table
Predictive pre-computed parsing actions
78Nonrecursive Predictive Parsing Expand
Non-terminal
a b c x y z
Input
Y1
Y2
Yk
Stack
Non-Recursive Stack Driver Program (instead
of Recursive procedures)
Parsing program (parser/driver)
Output
MX,a X -gt Y1 Y2 Yk
Parsing table
Predictive pre-computed parsing actions
79Nonrecursive Predictive Parsing Match Terminal
a b c x y z
Input
Y1
Y2
Yk
a
Stack
Non-Recursive Stack Driver Program (instead
of Recursive procedures)
Parsing program (parser/driver)
Output
MX,a X -gt Y1 Y2 Yk
Parsing table
Predictive pre-computed parsing actions
80Nonrecursive Predictive Parsing - Error Recovery
a b c x y z
Input
Y1
Y2
Yk
a
Stack
c
Non-Recursive Stack Driver Program (instead
of Recursive procedures)
Parsing program (parser/driver)
Output
MX,a X -gt Y1 Y2 Yk
Parsing table
Predictive pre-computed parsing actions
81Nonrecursive Predictive Parsing - Error Recovery
a b c x y z
Input
Y1
Y2
Yk
a
Stack
c
Non-Recursive Stack Driver Program (instead
of Recursive procedures)
Parsing program (parser/driver)
Output
MX,a X -gt Y1 Y2 Yk
Parsing table
Predictive pre-computed parsing actions
82Nonrecursive Predictive Parsing - Error Recovery
a b c x y z
Input
Y1
Y2
Yk
c
Stack
Non-Recursive Stack Driver Program (instead
of Recursive procedures)
Parsing program (parser/driver)
Output
MX,a X -gt Y1 Y2 Yk
Parsing table
Predictive pre-computed parsing actions
83Stack Operations
- Match
- when the top stack symbol is a terminal and it
matches the input symbol, pop the top stack
symbol and advance the input pointer - Expand
- when the top stack symbol is a nonterminal,
replace this symbol by the right hand side of one
of its productions - Leftmost RHS symbol at Top-of-Stack
84An Example
type ? simple id
array simple of type simple ?
integer char
num dotdot num
85An Example
Action Stack Input E type
array num dotdot num
of integer M type of simple array
array num dotdot num of integer M type of
simple num dotdot num
of integer E type of simple
num dotdot num of integer M
type of num dotdot num num dotdot num
of integer M type of num dotdot
dotdot num of integer M
type of num
num of integer M type of
of integer M type of
of integer E type
integer E simple
integer M integer
integer
86Parsing program
push S onto the stack, where S is the start
symbol set ip to point to the first symbol of w
// try to match S with w repeat let X be
the top stack symbol and a the symbol pointed to
by ip if X is a terminal or then if X
a then pop X from the stack and
advance ip else error // or
error_recovery() else // X is a nonterminal
if MX, a X ? Y1 Y2 ... Yk then
pop X from and push Yk ... Y2 Y1 onto the stack
else error // or error_recovery() until X
87Parser Driven by a Parsing TableNon-recursive
Descent
a b c d
X X ?Y1 Y2 Yk X ?Z1 Z2 Zm
Y1 Y1 ? a1 Y1 ? a2
Z1 Z1 ? b1 Z1 ? b2
X() // WITHOUT e-production X?e if (LAa)
then Y1() Y2() Yk() else if
(LAb) Z1() Z2() Zm() else ERROR() //
no X?e // else RETURN if X ? ? exists //
Recursive decent procedure for matching X
a in FirstSet( Y1 Y2 Yk ) b in FirstSet( Z1
Z2 Zm )
88Parser Driven by a Parsing TableNon-recursive
Descent
a b c d
X X ?Y1 Y2 Yk X ?Z1 Z2 Zm X ? ?
Y1 Y1 ? a1 Y1 ? a2
Z1 Z1 ? b1 Z1 ? b2
X() // WITH e-production X?e if (LAa)
then Y1() Y2() Yk() else if
(LAb) Z1() Z2() Zm() // else ERROR()
// no X?e else if (LA??) RETURN // if X ? ?
exists // Recursive decent procedure for
matching X
a in FirstSet( Y1 Y2 Yk ) b in FirstSet( Z1
Z2 Zm )
d in FollowSet(X)(S gt X d )
89First Sets
- The first set of a string ? is the set of
terminals that begin the strings derived from?. - If ? ? ? , then ? is also in the first set of
?. - Used simply to flag whether ? can be null for
computing First Set - Not for matching any real input when parsing
- FIRST(?) a ? ? a b ? , if ? ? ?
- FIRST(?) includes ? means that ? ? ?
90Compute First Sets
- If X is terminal, then FIRST(X) is X
- If X is nonterminal and X ? ? is a production,
then add ? to FIRST(X) - If X is nonterminal and X ? Y1 Y2 ... Yk is a
production, then add a to FIRST(X) if for some
i, a is in FIRST(Yi) and ? is in all of
FIRST(Y1), ..., FIRST(Yi-1). - If ? is in FIRST(Yj) for all j, then add ? to
FIRST(X)
91Follow Sets
- What to do with matching null A ? ? ?
- TD Recursive Descent Parsing assumes success
- LL more predictive gt Follow Set of A
- The follow set of a nonterminal A is the set of
terminals that can appear immediately to the
right of A in some sentential form, namely, S
? ? A a ? a is in the follow set of A.
92Compute Follow Sets
- Initialization Place in FOLLOW(S), where S is
the start symbol and is the input right end
marker. - If there is a production A ? ?B? , then
everything in FIRST(?) except for ? is placed in
FOLLOW(B) - ? is not considered a visible input to follow any
symbol - If there is a production A ? ?B or A ? ?B? where
FIRST(?) contains ? (i.e., ? ? ?), then
everything in FOLLOW(A) is in FOLLOW(B) - S ? A a implies S ? ? B a ?
- YESevery symbol that can follow A will also
follow B - NO! every symbol that can follow B will also
follow A
93An Example
E ? T E' E' ? T E' ? T ? F T' T' ?
F T' ? F ? ( E ) id FIRST(E) FIRST(T)
FIRST(F) (, id FIRST(E') , ?
FIRST(T') , ? FOLLOW(E) FOLLOW(E')
), FOLLOW(T) FOLLOW(T') , ),
FOLLOW(F) , , ),
94Constructing Parsing Table
Input. Grammar G. Output. Parsing Table
M. Method. 1. For each production A ? ? of the
grammar, do steps 2 and 3. 2. For each terminal a
in FIRST(? ), add A ? ? to MA, a. 3. If ? is
in FIRST(? ) A ? ? ? ?, add A ? ? to MA, b
for each terminal b including in
FOLLOW(A). - If ? is in FIRST(? ) and is
in FOLLOW(A), add A ? ? to MA, . 4. Make
each undefined entry of M be error.
95LL(1) Parsing Table Construction
a in First(a) b in Follow(A) c not in First(a) or Follow(A)
A A ? ? A ? ? (? ?) error
B
C
When to apply A ? ? ?
A() // WITH/WITHOUT e-productions A ? ? (?
?) if (LAa in First(Y1 Y2 Yk)) then Y1()
Y2() Yk() else if (LAb in Follow(A) ein
First(Z1 Z2... )) Z1() Z2() Zm() //
Nullable else ERROR() // Recursive version of
LL(1) parser
including A ? ?
96An Example
97An Example
Stack Input
Output E id id id
E'T id id id
E ? TE' E'T'F id id id
T ? FT' E'T'id id id id
F ? id E'T' id id E'
id id T' ?
? E'T id id
E' ? TE' E'T id
id E'T'F id id
T ? FT' E'T'id id id
F ? id E'T'
id E'T'F id
T' ? FT' E'T'F
id E'T'id id
F ? id E'T'
E'
T' ? ?
E' ? ?
98LL(1) Grammars
- A grammar is an LL(1) grammar if its predictive
parsing table has no multiply-defined entries
99A Counter Example
S ? i E t S S' a S' ? e S ? E ? b
e ? FOLLOW(S)
a b e i
t S S ? a
S ? i E t S S' S' S' ? ?
S' ? ? S' ? e
S E E ? b
e ? FIRST(e S)
Disambiguation matching closest then
100LL(1) Grammars or Not ??
- A grammar G is LL(1) iff whenever A ? ? ? are
two distinct productions of G, the following
conditions hold - For no terminal a do both ? and ? derive strings
beginning with a. - or MA, first(a)first(b) entries will have
conflicting actions - At most one of ? and ? can derive the empty
string - or MA, follow(A) entries have conflicting
actions - If ? ? ? , then ? does not derive any string
beginning with a terminal in FOLLOW(A). - or MA, first(a)follow(A) entries have
conflicting actions
101Non-LL(1) GrammarAmbiguous According to LL(1)
Parsing Table Construction
a in First(a) First(b) b in Follow(A) a in First(a) Follow(A)
A A ? ? A ? b A ? ? (? ?) A ? b (? ?) A ? ? (/? ?) (but ? a g) A ? b (? ?)
B
C
When will A ? ? A ? b appear in the same table
cell ??
102LL(1) Grammars or Not??
- If G is left-recursive or ambiguous, then M will
have at least one multiply-defined entry - gt non-LL(1)
- E.g., X ? X a b
- gt FIRST(X) b (and, of course, FIRST(b)
b) - gt MX,b includes both X ? X a and X ? b
- Ambiguous G and G with left-recursive productions
can not be LL(1). - No LL(1) grammar can be ambiguous
103Error Recovery for LL Parsers
104Syntactic Errors
- Empty entries in a parsing table
- Syntactic error is encountered when the lookahead
symbol corresponding to this entry is in input
buffer - Error Recovery information can be encoded in such
entries to take appropriate actions upon error - Error Detection
- (1) Stacktop x x ! input (a)
- (2) Stacktop A MA, a empty (error)
105Error Recovery Strategies
- Panic mode skip tokens until a token in a set of
synchronizing tokens appears - INS(eration) type of errors
- sync at delimiters, keywords, , that have clear
functions - Phrase Level Recovery
- local INS(eration), DEL(eation), SUB(stitution)
types of errors - Error Production
- define error patterns in grammar
- Global Correction Grammar Correction
- minimum distance correction
106Error Recovery Panic Mode
- Panic mode skip tokens until a token in a set of
synchronizing tokens appears - Commonly used Synchronizing tokens
- SUB(A,ip) use FOLLOW(A) as sync set for A (pop
A) - use the FIRST set of a higher construct as sync
set for a lower construct - INS(ip) use FIRST(A) as sync set for A
- ip? use the production deriving ? as the
default - DEL(ip) If a terminal on stack cannot be
matched, pop the terminal
107Error Recovery Panic Mode
Action Stack Input SUB(A,ip) INS(ip) DE
L(ip)
A
ip Follow(A)
A
a
A
ip First(A)
A
a x
ip
x
X
X
X
A
A
x
a
ip
ip
ip
Follow(A)
First(A)
x
108Error Recovery Actions Using Follow First Sets
to Sync
- Expanding non-terminal A
- MA,a error (blank)
- Skip a in input
- delete all such a (until sync with sync
symbol, b) / panic / - MA,b sync (at FOLLOW(A))
- Pop A from stack
- b is a sync symbol following A
- MA,b A ? a (sync at FIRST(A))
- Expand A (same as normal parsing action)
- Matching terminal x
- (spx) ! a
- Pop(x) from stack
- missing input token x
109An Example
FOLLOW(X) is used to Expand e-productions or Sync
(on errors)
FOLLOW(E)FOLLOW(E)),
FOLLOW(F),,),
FIRST(X) is used to Expand non-e productions or
Sync (on errors)
110An Example
Stack Input
Output E ) id id
error, skip ) E id id
id is in FIRST(E) E'T
id id E ? TE' E'T'F
id id T ? FT' E'T'id
id id F ? id E'T'
id E'T'F
id T' ? FT' E'T'F
id error, MF,synch /
FOLLOW(F) E'T' id
F popped E'
id T' ? ? E'T
id E' ? TE' E'T
id E'T'F
id T ? FT' E'T'id
id F ? id E'T'
E'
T' ? ?
E' ? ?
111Parse Tree - Error Recovered
) id id gt id F id