Syntax Analysis - PowerPoint PPT Presentation

1 / 107
About This Presentation
Title:

Syntax Analysis

Description:

Syntax Analysis Introduction to parsers Context-free grammars Push-down automata Top-down parsing LL grammars and parsers Bottom-up parsing LR grammars and parsers – PowerPoint PPT presentation

Number of Views:146
Avg rating:3.0/5.0
Slides: 108
Provided by: edut1550
Category:

less

Transcript and Presenter's Notes

Title: Syntax Analysis


1
Syntax Analysis
  • Introduction to parsers
  • Context-free grammars
  • Push-down automata
  • Top-down parsing
  • LL grammars and parsers
  • Bottom-up parsing
  • LR grammars and parsers
  • Bison/Yacc - parser generators
  • Error Handling Detection Recovery

2
Introduction to parsers
token
source
syntax
Parser
tree
code
next token
Symbol Table
3
Context Free Grammar
  • CFG Terminology
  • Rewrite vs. Reduce
  • Derivation
  • Language and CFL
  • Equivalence CNF
  • Parsing vs. Derivation
  • lm/rm derivation parse tree
  • Ambiguity resolution
  • Expressive power

Derivation is the reverse of Parsing. If we know
how sentences are derived, we may find a parsing
method in the reversed direction.
4
CFG An Example
  • Terminals id, , -, , /, (, )
  • Nonterminals expr, op
  • Productions expr ? expr op expr
    expr ? ( expr ) expr ? - expr expr
    ? id op ? - /
  • The start symbol expr

5
Notational Conventions in CFG
  • a, b, c, -0-9, id symbols in ?
  • A, B, C,,S, expr,stmt symbols in N
  • U, V, W,,X,Y,Z grammar symbols in(?N)
  • a, b, g,denotes strings in (?N)
  • u, v, w, denotes strings in ?
  • is an abbreviation of
  • Alternatives a, b, at RHS

6
Notational Conventions in CFG
  • Abbreviation
  • is the abbreviation of

7
Context-Free Grammars
  • A set of terminals basic symbols from which
    sentences are formed
  • A set of nonterminals syntactic variables
    denoting sets of strings
  • A set of productions rules specifying how the
    terminals and nonterminals can be combined to
    form sentences
  • The start symbol a distinguished nonterminal
    denoting the language

8
CFG ComponentsSpecification for Structures
Constituency
  • CFG formal specification of structure (parse
    trees)
  • G ?, N, P, S
  • ? terminal symbols
  • N non-terminal symbols
  • P production rules
  • S start symbol

9
CFG Components
  • ? terminal symbols
  • the input symbols of the language
  • programming language tokens (reserved words,
    variables, operators, )
  • natural languages words or parts of speech
  • pre-terminal parts of speech (when words are
    regarded as terminals)
  • N non-terminal symbols
  • groups of terminals and/or other non-terminals
  • S start symbol the largest constituent of a
    parse tree

10
CFG Components
  • P production (re-writing) rules
  • form A ? ß (A non-terminal, ß string of
    terminals and non-terminals)
  • meaning A re-writes to (consists of, derived
    into)ß, or ß reduced to A
  • start with S-productions (S ? ß)

11
Derivations
  • A derivation step is an application of a
    production as a rewriting rule E ? - E
  • A sequence of derivation steps E ? - E ? - ( E )
    ? - ( id ) is called a derivation of - ( id )
    from E
  • The symbol ? denotes derives in zero or more
    steps the symbol ? denotes derives in one or
    more steps

12
CFG Accepted Languages
  • Context-Free Language
  • Language accepted by a CFG
  • L(G) ? S ? ? (strings of terminals that
    can be derived from start symbol)
  • Proof of acceptance by induction
  • On the number of derivation steps
  • On the length of input string

13
Context-Free Languages
  • A context-free language L(G) is the language
    defined by a context-free grammar G
  • A string of terminals ? is in L(G) if and only if
    S ? ?, ? is called a sentence of G
  • If S ? ?, where ? may contain nonterminals, then
    we call ? a sentential form of G E ? - E ? - (
    E ) ? - ( id )
  • G1 is equivalent to G2 if L(G1) L(G2)

14
CFG Equivalence
  • Chomsky Normal Form (CNF) (Chomsky, 1963)
  • e-free, and
  • Every production rule is in either of the
    following form
  • A ? A1 A2
  • A ? a (A1, A2 non-terminal, a terminal)
  • i.e., two non-terminals or one terminal at the
    RHS
  • Properties
  • Generate binary parse tree
  • Good simplification for some algorithms
  • e.g., grammar training with the inside-outside
    algorithm (Baker 1979)
  • Good tool for theoretical proving
  • e.g., time complexity

15
CFG Equivalence
  • Every CFG can be converted into a weakly
    equivalent CNF
  • equivalence L(G1) L(G2)
  • strong equivalent assign the same phrase
    structure to each sentence (except for renaming
    non-terminals)
  • weak equivalent do not assign the same phrase
    structure to each sentence
  • e.g., A ? B C D A ? B X, X ? CD

16
CFG An Example
  • Terminals id, , -, , /, (, )
  • Nonterminals expr, op
  • Productions expr ? expr op expr R1
    expr ? ( expr ) R2
    expr ? - expr R3 expr ?
    id R4 op ?
    - /
  • The start symbol expr

17
Left- Right-most Derivations
  • Each derivation step needs to choose
  • a nonterminal to rewrite
  • an alternative to apply
  • A leftmost derivation always chooses the leftmost
    nonterminal to rewrite E ?lm - E ?lm - ( E ) ?lm
    - ( E E ) ?lm - ( id E ) ?lm - ( id
    id )
  • A rightmost (canonical) derivation always chooses
    the rightmost nonterminal to rewrite E ?rm - E
    ?rm - ( E ) ?rm - ( E E ) ?rm - (E
    id ) ?rm - ( id id )

18
Left- Right-most Derivations
  • Representation of leftmost/rightmost derivations
  • Use the sequence of productions (or production
    numbers) to represent a derivation sequence.
  • Example
  • E ?rm - E ?rm - ( E ) ?rm - ( E E )
    ?rm - (E id ) ?rm - ( id id )
  • gt 3, 2, 1, 4, 4 ( R3, R2, R1, R4, R4)
  • Advantage A compact representation for parse
    tree (data compression)
  • Each parse tree has a unique leftmost/rightmost
    derivation

19
Parse Trees
  • A parse tree is a graphical representation for a
    derivation that filters out the order of choosing
    nonterminals for rewriting

20
Context Free Grammar (CFG) Specification for
Structures Constituency
  • Parse Tree graphical representation of structure
  • Root node (S) a sentencial level structure
  • Internal nodes constituents of the sentence
  • Arcs relationship between parent nodes and their
    children (constituents)
  • Terminal nodes surface forms of the input
    symbols (e.g., words)
  • Bracketed notation Alternative representation
  • e.g., I saw the girl in the park

21
Parse TreeI saw the girl in the park
1st parse
22
Parse TreeI saw the girl in the park
S
2nd parse
NP
VP
NP
PP
NP
NP
v
pron
det
n
p
det
n
in
girl
the
park
I
saw
the
23
LM RM An Example
E ?lm - E ?lm - ( E ) ?lm - ( E E
)?lm - ( id E ) ?lm - ( id id )
E ?rm - E ?rm - ( E ) ?rm - ( E E
)?rm - ( E id ) ?rm - ( id id )
24
Parse Trees Derivations
  • Many derivations may correspond to the same parse
    tree, but every parse tree has associated with it
    a unique leftmost and a unique rightmost
    derivation

25
Ambiguous Grammar
  • A grammar is ambiguous if it produces more than
    one parse tree for some sentence
  • more than one leftmost/rightmost derivation

E ? E E ? id E ? id E E ? id
id E ? id id id
E ? E E ? E E E ? id E E ? id
id E ? id id id
26
Ambiguous Grammar
27
Resolving Ambiguity
  • Use disambiguating rules to throw away
    undesirable parse trees
  • Rewrite grammars by incorporating disambiguating
    rules into grammars

28
An Example
  • The dangling-else grammar stmt ? if expr then
    stmt if expr then stmt else
    stmt other
  • Two parse trees for if E1 then if E2 then S1
    else S2

29
An Example
Preferred parse closest then
30
Disambiguating Rules
  • Rule match each else with the closest previous
    unmatched then
  • Remove undesired state transitions in the
    pushdown automaton
  • shift/reduce conflict on else
  • 1st parse reduce
  • 2nd parse shift

31
Grammar Rewriting
stmt ? m_stmt with only paired
then-else unm_stmt m_stmt ? if
expr then m_stmt else m_stmt
other unm_stmt ? if expr then stmt
if expr then m_stmt else unm_stmt
32
RE vs. CFG
  • Every language described by a RE can also be
    described by a CFG
  • Example (ab)abb
  • A0 ? a A0 b A0 a A1
  • A1 ? b A2
  • A2 ? b A3
  • A3 ? e
  1. Right branching
  2. Starts with a terminal symbol

33
RE vs. CFG
  • Regular Grammar
  • Right branching
  • Starts with a terminal symbol

A2
(ab)
abb
b
A3
e
34
RE vs. CFG
A0 ? a A0 b A0 a A1 A1 ? b A2 A2 ? b A3 A3 ? e
RE (a b)abb
A2
A0
A3
A1
35
RE vs. CFG
A2
A0
A0 ? b A0 a A1 A1 ? a A1 b A2 A2 ? a A1 b
A3 A3 ? a A1 b A0 e
A3
A1
36
CFG Expressive Power (cont.)
  • Writing a CFG for a FSA (RE)
  • define a non-terminal Ni for a state with state
    number i
  • start symbol S N0 (assuming that state 0 is the
    initial state)
  • for each transition d(i,a)j (from state i to
    stet j on input alphabet a), add a new production
    Ni ? a Nj to P (a e?Ni ? Nj)
  • for each final state i, add a new production Ni ?
    eto P

37
CFG Expressive Power (cont.)
  • Example RE (ab) a b b

N0 ? a N0 b N0 a N1 N1 ? b N2 N2 ? b N3 N3 ? e
38
CFG Expressive Power
  • CFG vs. Regular Expression (R.E.)
  • Every R.E. can be recognized by a FSA
  • Every FSA can be represented by a CFG with
    production rules of the form A ?
    a B e
  • Therefore, L(RE) ? L(CFG)

39
CFG Expressive Power (cont.)
  • Chomsky Hierarchy
  • R.E. Regular set (recognized by FSAs)
  • CFG Context-free (Pushdown automata)
  • CSG Context-sensitive (Linear bounded automata)
  • Unrestricted Recursively enumerable (Tuning
    Machine)

40
Push-Down Automata
41
RE vs. CFG
  • Why use REs for lexical syntax?
  • do not need a notation as powerful as CFGs
  • are more concise and easier to understand than
    CFGs
  • More efficient lexical analyzers can be
    constructed from REs than from CFGs
  • Provide a way for modularizing the front end into
    two manageable-sized components

42
CFG vs. Finite-State Machine
  • Inappropriateness of FSA
  • Constituents only terminals
  • Recursion do not allow A gt B gt A
  • RTN (Recursive Transition Network)
  • FSA with augmentation of recursion
  • arc terminal or non-terminal
  • if arc is non-terminal call to a sub-transition
    network return upon traversal

43
Nonregular Constructs
  • REs can denote only a fixed number of repetitions
    or an unspecified number of repetitions of one
    given construct
  • E.g. ab
  • A nonregular construct
  • L anbn n ? 1

44
Non-Context-Free Constructs
  • CFGs can denote only a fixed number of
    repetitions or an unspecified number of
    repetitions of one or two (paired) given
    constructs
  • E.g. anbn
  • Some non-context-free constructs
  • L1 wcw w is in (a b)
  • declaration/use of identifiers
  • L2 anbmcndm n ? 1 and m ? 1
  • formal arguments/actual arguments
  • L3 anbncn n ? 0
  • e.g., b Backspace, c under score

45
Context-Free Constructs
  • FA (RE) cannot keep counts
  • CFGs can keep count of two items but not three
  • Similar context-free constructs
  • L1 wcwR w is in (a b), R reverse order
  • L2 anbmcmdn n ? 1 and m ? 1
  • L2 anbncmdm n ? 1 and m ? 1
  • L3 anbn n ? 1

46
CFG Parsers
47
Types of CFG Parsers
  • Universal can parse any CFG grammar
  • CYK, Earley
  • CYK Exhaustively matching sub-ranges of input
    tokens against grammar rules, from smaller ranges
    to larger ranges
  • Earley Exhaustively enumerating possible
    expectations from left-to-right, according to
    current input token and grammar
  • Non-universal e.g., recursive descent parser
  • Universal (to all grammars) is NOT always
    efficient

48
Types of CFG Parsers
  • Practical Parsers what is a good parser?
  • Simple simple program structure
  • Left-to-right (or right-to-left) scan
  • middle-out or island driven is often not
    preferred
  • Top-down or Bottom up matching
  • Efficient efficient for good/bad inputs
  • Parse normal syntax quickly
  • Detect errors immediately on next token
  • Deterministic
  • No alternative choices during parsing given next
    token
  • Small lookahead buffer (also contribute to
    efficiency)

49
Types of CFG Parsers
  • Top Down
  • Matching from start symbol down to terminal
    tokens
  • Bottom Up
  • Matching input tokens with reducible rules from
    terminal up to start symbol

50
Efficient CFG Parsers
  • Top Down LL Parsers
  • Matching from start symbol down to terminal
    tokens, left-to-right, according to a leftmost
    derivation sequence
  • Bottom Up LR Parsers
  • Matching input tokens with reducible rules,
    left-to-right, from terminal up to start symbol,
    in a reverse order of rightmost derivation
    sequence

51
Efficient CFG Parsers
  • Efficient Deterministic Parsing only possible
    for some subclasses of grammars with special
    parsing algorithms
  • Top Down
  • Parsing LL Grammars with LL Parsers
  • Bottom Up
  • Parsing LR Grammars with LR Parsers
  • LR grammar is a larger class of grammars than LL

52
Parsing Table Construction for Efficient Parsers
Good parsers do not change their codes when the
grammar is revised.? Table driven.
  • Parsing Table
  • A pre-computed table (according to the grammar),
    indicating the appropriate action(s) to take in
    any predefined state when some input token(s)
    is/are under examination
  • Lookahead symbol(s) the input symbol(s) under
    examination for determining next action(s)

id num
State-0 action-1 action-3
State-1 action-2 action-5
State-2 action-4
53
Parsing Table Construction for Efficient Parsers
  • Parsing Table Construction
  • Decide a pre-defined number of lookaheads to use
    for predicting next state
  • Define and enumerate all the unique states for
    the parsing method
  • Decide the actions to take in all states with all
    possible lookahead(s)

54
Parsing Table Construction for Efficient Parsers
  • X-Parser you can invent any parser and call it
    the X-Parser
  • But its parsing algorithm may not handle all
    grammars deterministically, thus efficiently.
  • X-Grammar
  • Any grammar whose parsing table for the X-parsing
    method/X-Parser has no conflicting actions in all
    states
  • Non-X Grammar has more than one action to take
    under any state

55
Parsing Table Construction for Efficient Parsers
  • k The number of lookahead symbols used by a
    parser to determine the next action
  • A larger number of lookahead symbols tends to
    make it less possible to have conflicting actions
  • But may result in a much larger table that grows
    exponentially with the number of lookaheads
  • Does not guarantee unambiguous for some grammars
    (inherently ambiguous)
  • X(k) Parser
  • X Parser that uses k lookahead symbols to
    determine the next action
  • X(k) Grammar
  • any grammar deterministically parsable with X(k)
    Parser

56
Types of Grammars Capable of Efficient Parsing
  • LL(k) Grammars
  • Grammars that can be deterministically parsed
    using an LL(k) parsing algorithm
  • e.g., LL(1) grammar
  • LR(k) Grammars
  • Grammars that can be deterministically parsed
    using an LR(k) parsing algorithm
  • e.g., SLR(1) grammar, LR(1) grammar, LALR(1)
    grammar

57
Top-Down CFG Parsers
  • Recursive Descent Parser
  • vs.
  • Non-Recursive LL(1) Parser

58
Top-Down Parsing
  • Construct a parse tree from the root to the
    leaves using leftmost derivation S ? c A
    B input cad A ? a b a B ? d

S
c
A
B
a
d
59
Predictive Parsing
  • A top-down parsing without backtracking
  • there is only one alternative production to
    choose at each derivation stepstmt ? if expr
    then stmt else stmt while expr do
    stmt begin stmt_list end

60
LL(k) Parsing
  • The first L stands for scanning the input from
    left to right
  • The second L stands for producing a leftmost
    derivation
  • The k stands for the number of input symbols for
    lookahead used to choose alternative productions
    at each derivation step

61
LL(1) Parsing
  • Use one input symbol of lookahead
  • Same as Recursive-descent parsing
  • But, Nonrecursive predictive parsing

62
Recursive Descent Parsing (more)
  • The parser consists of a set of (possibly
    recursive) procedures
  • Each procedure is associated with a nonterminal
    of the grammar
  • The sequence of procedures called in processing
    the input implicitly defines a parse tree for the
    input

63
An Example
type ? simple id
array simple of type simple ?
integer char
num dotdot num
64
An Example
array num dotdot num of integer
65
An Example
procedure type begin if lookahead is in
integer, char, num then simple else if
lookahead id then match(id) else if
lookahead array then begin
match(array) match('') simple match('')
match(of) type end else error end
66
An Example
procedure match(t token) begin if
lookahead t then lookahead
nexttoken else error end
67
An Example
procedure simple begin if lookahead integer
then match(integer) else if lookahead
char then match(char) else if lookahead
num then begin match(num) match(dotdot)
match(num) end else error end
68
LL(k) Constraint Left Recursion
  • A grammar is left recursive if it has a
    nonterminal A such that A ? A ?

A ? A ? ?
A ? ? R R ? ? R ?
A
A
R
R
A
R
R
A
? ?
A
69
Direct/Immediate Left Recursion
A ? A ?1 A ?2 ... A ?m ?1 ?2 ... ?n
is equivalent to
A ? A ?i ?j (i1,m j1,n)
A ? ?1 A' ?2 A' ... ?n A'
A' ? ?1 A' ?2 A' ... ?m A' ?
(?1 ?2 ... ?n ) (?1 ?2 ... ?m )
70
An Example
E ? E T T T ? T F F F ? ( E
) id E ? T E' E' ? T E' ? T ? F
T' T' ? F T' ? F ? ( E ) id
71
Indirect Left Recursion
G0 S ? A a b A ? A c S d ?
Problem Indirect Left-Recursion S ?
A a ? S d a Solution-Step1 Indirect to Direct
Left-Recursion A ? A c A a d b d ?
Solution-Step2 Direct Left-Recursion to
Right-Recursion S ? A a b A ?
b d A' A' A' ? c A' a d A' ?
  • Scan rules top-down
  • Do not start with symbols defined earlier (gt
    substitute them if any)
  • Resolve direct recursion

72
Indirect Left Recursion
Input. Grammar G with no cycles or
?-production. Output. An equivalent grammar with
no left recursion. 1. Arrange the nonterminals in
some order A1, A2, ..., An 2. for i 1 to n do
begin // Step1 Substitute 1st-symbols of Ai for
j 1 to i - 1 do begin // which are previous
Ajs replace each production of the form Ai
? Aj ? ( j lt i ) by the production Ai ? ?1 ?
?2 ? ... ?k ? where Aj ? ?1 ?2 ...
?k are all the current Aj-productions end eli
minate direct left recursion among Ai-productions
// Step2 end
73
Left Factoring
  • Two alternatives of a nonterminal A have a
    nontrivial common prefix if ? ? ? , and A ?
    ? ?1 ? ?2 A ? ? A' A' ? ?1 ?2

74
An Example
S ? i E t S i E t S e S a E ? b S ? i E t
S S' a S' ? e S ? E ? b
75
Transition Network as a Plan for
Recursive-Descent Parser
  • CFG gt RTN gt simplified RTN gt Parser
  • - tail recursion
  • - remove unnecessary e-move
  • - merge sub-networks
  • - merge equivalent states
  • Section 4.4 Aho 86
  • Example Infix Expression
  • Example HTML Document Parser

76
Top-Down Parsing as Stack Matching
  • Construct a parse tree from the root to the
    leaves using leftmost derivation S ? c A
    B input cad A ? a b a B ? d

S
c
A
B
a
d
77
Nonrecursive Predictive Parsing General State
a b c x y z
Input
X






Stack
Non-Recursive Stack Driver Program (instead
of Recursive procedures)
Parsing program (parser/driver)
Output
MX,a X -gt Y1 Y2 Yk
Parsing table
Predictive pre-computed parsing actions
78
Nonrecursive Predictive Parsing Expand
Non-terminal
a b c x y z
Input
Y1
Y2

Yk



Stack
Non-Recursive Stack Driver Program (instead
of Recursive procedures)
Parsing program (parser/driver)
Output
MX,a X -gt Y1 Y2 Yk
Parsing table
Predictive pre-computed parsing actions
79
Nonrecursive Predictive Parsing Match Terminal
a b c x y z
Input
Y1
Y2

Yk



a
Stack
Non-Recursive Stack Driver Program (instead
of Recursive procedures)
Parsing program (parser/driver)
Output
MX,a X -gt Y1 Y2 Yk
Parsing table
Predictive pre-computed parsing actions
80
Nonrecursive Predictive Parsing - Error Recovery
a b c x y z
Input
Y1
Y2

Yk



a
Stack
c
Non-Recursive Stack Driver Program (instead
of Recursive procedures)
Parsing program (parser/driver)
Output
MX,a X -gt Y1 Y2 Yk
Parsing table
Predictive pre-computed parsing actions
81
Nonrecursive Predictive Parsing - Error Recovery
a b c x y z
Input
Y1
Y2

Yk



a
Stack
c
Non-Recursive Stack Driver Program (instead
of Recursive procedures)
Parsing program (parser/driver)
Output
MX,a X -gt Y1 Y2 Yk
Parsing table
Predictive pre-computed parsing actions
82
Nonrecursive Predictive Parsing - Error Recovery
a b c x y z
Input
Y1
Y2

Yk



c
Stack
Non-Recursive Stack Driver Program (instead
of Recursive procedures)
Parsing program (parser/driver)
Output
MX,a X -gt Y1 Y2 Yk
Parsing table
Predictive pre-computed parsing actions
83
Stack Operations
  • Match
  • when the top stack symbol is a terminal and it
    matches the input symbol, pop the top stack
    symbol and advance the input pointer
  • Expand
  • when the top stack symbol is a nonterminal,
    replace this symbol by the right hand side of one
    of its productions
  • Leftmost RHS symbol at Top-of-Stack

84
An Example
type ? simple id
array simple of type simple ?
integer char
num dotdot num
85
An Example
Action Stack Input E type
array num dotdot num
of integer M type of simple array
array num dotdot num of integer M type of
simple num dotdot num
of integer E type of simple
num dotdot num of integer M
type of num dotdot num num dotdot num
of integer M type of num dotdot
dotdot num of integer M
type of num
num of integer M type of

of integer M type of

of integer E type

integer E simple

integer M integer

integer


86
Parsing program
push S onto the stack, where S is the start
symbol set ip to point to the first symbol of w
// try to match S with w repeat let X be
the top stack symbol and a the symbol pointed to
by ip if X is a terminal or then if X
a then pop X from the stack and
advance ip else error // or
error_recovery() else // X is a nonterminal
if MX, a X ? Y1 Y2 ... Yk then
pop X from and push Yk ... Y2 Y1 onto the stack
else error // or error_recovery() until X
87
Parser Driven by a Parsing TableNon-recursive
Descent
a b c d
X X ?Y1 Y2 Yk X ?Z1 Z2 Zm
Y1 Y1 ? a1 Y1 ? a2
Z1 Z1 ? b1 Z1 ? b2
X() // WITHOUT e-production X?e if (LAa)
then Y1() Y2() Yk() else if
(LAb) Z1() Z2() Zm() else ERROR() //
no X?e // else RETURN if X ? ? exists //
Recursive decent procedure for matching X
a in FirstSet( Y1 Y2 Yk ) b in FirstSet( Z1
Z2 Zm )
88
Parser Driven by a Parsing TableNon-recursive
Descent
a b c d
X X ?Y1 Y2 Yk X ?Z1 Z2 Zm X ? ?
Y1 Y1 ? a1 Y1 ? a2
Z1 Z1 ? b1 Z1 ? b2
X() // WITH e-production X?e if (LAa)
then Y1() Y2() Yk() else if
(LAb) Z1() Z2() Zm() // else ERROR()
// no X?e else if (LA??) RETURN // if X ? ?
exists // Recursive decent procedure for
matching X
a in FirstSet( Y1 Y2 Yk ) b in FirstSet( Z1
Z2 Zm )
d in FollowSet(X)(S gt X d )
89
First Sets
  • The first set of a string ? is the set of
    terminals that begin the strings derived from?.
  • If ? ? ? , then ? is also in the first set of
    ?.
  • Used simply to flag whether ? can be null for
    computing First Set
  • Not for matching any real input when parsing
  • FIRST(?) a ? ? a b ? , if ? ? ?
  • FIRST(?) includes ? means that ? ? ?

90
Compute First Sets
  • If X is terminal, then FIRST(X) is X
  • If X is nonterminal and X ? ? is a production,
    then add ? to FIRST(X)
  • If X is nonterminal and X ? Y1 Y2 ... Yk is a
    production, then add a to FIRST(X) if for some
    i, a is in FIRST(Yi) and ? is in all of
    FIRST(Y1), ..., FIRST(Yi-1).
  • If ? is in FIRST(Yj) for all j, then add ? to
    FIRST(X)

91
Follow Sets
  • What to do with matching null A ? ? ?
  • TD Recursive Descent Parsing assumes success
  • LL more predictive gt Follow Set of A
  • The follow set of a nonterminal A is the set of
    terminals that can appear immediately to the
    right of A in some sentential form, namely, S
    ? ? A a ? a is in the follow set of A.

92
Compute Follow Sets
  • Initialization Place in FOLLOW(S), where S is
    the start symbol and is the input right end
    marker.
  • If there is a production A ? ?B? , then
    everything in FIRST(?) except for ? is placed in
    FOLLOW(B)
  • ? is not considered a visible input to follow any
    symbol
  • If there is a production A ? ?B or A ? ?B? where
    FIRST(?) contains ? (i.e., ? ? ?), then
    everything in FOLLOW(A) is in FOLLOW(B)
  • S ? A a implies S ? ? B a ?
  • YESevery symbol that can follow A will also
    follow B
  • NO! every symbol that can follow B will also
    follow A

93
An Example
E ? T E' E' ? T E' ? T ? F T' T' ?
F T' ? F ? ( E ) id FIRST(E) FIRST(T)
FIRST(F) (, id FIRST(E') , ?
FIRST(T') , ? FOLLOW(E) FOLLOW(E')
), FOLLOW(T) FOLLOW(T') , ),
FOLLOW(F) , , ),
94
Constructing Parsing Table
Input. Grammar G. Output. Parsing Table
M. Method. 1. For each production A ? ? of the
grammar, do steps 2 and 3. 2. For each terminal a
in FIRST(? ), add A ? ? to MA, a. 3. If ? is
in FIRST(? ) A ? ? ? ?, add A ? ? to MA, b
for each terminal b including in
FOLLOW(A). - If ? is in FIRST(? ) and is
in FOLLOW(A), add A ? ? to MA, . 4. Make
each undefined entry of M be error.
95
LL(1) Parsing Table Construction
a in First(a) b in Follow(A) c not in First(a) or Follow(A)
A A ? ? A ? ? (? ?) error
B
C
When to apply A ? ? ?
A() // WITH/WITHOUT e-productions A ? ? (?
?) if (LAa in First(Y1 Y2 Yk)) then Y1()
Y2() Yk() else if (LAb in Follow(A) ein
First(Z1 Z2... )) Z1() Z2() Zm() //
Nullable else ERROR() // Recursive version of
LL(1) parser
including A ? ?
96
An Example
97
An Example
Stack Input
Output E id id id
E'T id id id
E ? TE' E'T'F id id id
T ? FT' E'T'id id id id
F ? id E'T' id id E'
id id T' ?
? E'T id id
E' ? TE' E'T id
id E'T'F id id
T ? FT' E'T'id id id
F ? id E'T'
id E'T'F id
T' ? FT' E'T'F
id E'T'id id
F ? id E'T'
E'
T' ? ?
E' ? ?
98
LL(1) Grammars
  • A grammar is an LL(1) grammar if its predictive
    parsing table has no multiply-defined entries

99
A Counter Example
S ? i E t S S' a S' ? e S ? E ? b
e ? FOLLOW(S)
a b e i
t S S ? a
S ? i E t S S' S' S' ? ?
S' ? ? S' ? e
S E E ? b
e ? FIRST(e S)
Disambiguation matching closest then
100
LL(1) Grammars or Not ??
  • A grammar G is LL(1) iff whenever A ? ? ? are
    two distinct productions of G, the following
    conditions hold
  • For no terminal a do both ? and ? derive strings
    beginning with a.
  • or MA, first(a)first(b) entries will have
    conflicting actions
  • At most one of ? and ? can derive the empty
    string
  • or MA, follow(A) entries have conflicting
    actions
  • If ? ? ? , then ? does not derive any string
    beginning with a terminal in FOLLOW(A).
  • or MA, first(a)follow(A) entries have
    conflicting actions

101
Non-LL(1) GrammarAmbiguous According to LL(1)
Parsing Table Construction
a in First(a) First(b) b in Follow(A) a in First(a) Follow(A)
A A ? ? A ? b A ? ? (? ?) A ? b (? ?) A ? ? (/? ?) (but ? a g) A ? b (? ?)
B
C
When will A ? ? A ? b appear in the same table
cell ??
102
LL(1) Grammars or Not??
  • If G is left-recursive or ambiguous, then M will
    have at least one multiply-defined entry
  • gt non-LL(1)
  • E.g., X ? X a b
  • gt FIRST(X) b (and, of course, FIRST(b)
    b)
  • gt MX,b includes both X ? X a and X ? b
  • Ambiguous G and G with left-recursive productions
    can not be LL(1).
  • No LL(1) grammar can be ambiguous

103
Error Recovery for LL Parsers
104
Syntactic Errors
  • Empty entries in a parsing table
  • Syntactic error is encountered when the lookahead
    symbol corresponding to this entry is in input
    buffer
  • Error Recovery information can be encoded in such
    entries to take appropriate actions upon error
  • Error Detection
  • (1) Stacktop x x ! input (a)
  • (2) Stacktop A MA, a empty (error)

105
Error Recovery Strategies
  • Panic mode skip tokens until a token in a set of
    synchronizing tokens appears
  • INS(eration) type of errors
  • sync at delimiters, keywords, , that have clear
    functions
  • Phrase Level Recovery
  • local INS(eration), DEL(eation), SUB(stitution)
    types of errors
  • Error Production
  • define error patterns in grammar
  • Global Correction Grammar Correction
  • minimum distance correction

106
Error Recovery Panic Mode
  • Panic mode skip tokens until a token in a set of
    synchronizing tokens appears
  • Commonly used Synchronizing tokens
  • SUB(A,ip) use FOLLOW(A) as sync set for A (pop
    A)
  • use the FIRST set of a higher construct as sync
    set for a lower construct
  • INS(ip) use FIRST(A) as sync set for A
  • ip? use the production deriving ? as the
    default
  • DEL(ip) If a terminal on stack cannot be
    matched, pop the terminal

107
Error Recovery Panic Mode
Action Stack Input SUB(A,ip) INS(ip) DE
L(ip)
A
ip Follow(A)
A
a
A
ip First(A)
A
a x
ip
x
X
X
X






A
A
x
a
ip
ip
ip
Follow(A)
First(A)
x
108
Error Recovery Actions Using Follow First Sets
to Sync
  • Expanding non-terminal A
  • MA,a error (blank)
  • Skip a in input
  • delete all such a (until sync with sync
    symbol, b) / panic /
  • MA,b sync (at FOLLOW(A))
  • Pop A from stack
  • b is a sync symbol following A
  • MA,b A ? a (sync at FIRST(A))
  • Expand A (same as normal parsing action)
  • Matching terminal x
  • (spx) ! a
  • Pop(x) from stack
  • missing input token x

109
An Example
FOLLOW(X) is used to Expand e-productions or Sync
(on errors)
FOLLOW(E)FOLLOW(E)),
FOLLOW(F),,),
FIRST(X) is used to Expand non-e productions or
Sync (on errors)
110
An Example
Stack Input
Output E ) id id
error, skip ) E id id
id is in FIRST(E) E'T
id id E ? TE' E'T'F
id id T ? FT' E'T'id
id id F ? id E'T'
id E'T'F
id T' ? FT' E'T'F
id error, MF,synch /
FOLLOW(F) E'T' id
F popped E'
id T' ? ? E'T
id E' ? TE' E'T
id E'T'F
id T ? FT' E'T'id
id F ? id E'T'
E'
T' ? ?
E' ? ?
111
Parse Tree - Error Recovered
) id id gt id F id
Write a Comment
User Comments (0)
About PowerShow.com