Title: Syntax Analysis
1Chapter 9
2Contents
- Context free grammars
- Top-down parsing
- Bottom-up parsing
- Attribute grammars
- Dynamic semantics
- Tools for syntax analysis
- Chomskys hierarchy
3The Role of Parser
49.1 Context Free Grammars
- A context free grammar consists of terminals,
nonterminals, a start symbol, and productions. - Terminals are the basic symbols from which
strings are formed. - Nonterminals are syntactic variables that denote
sets of strings. - One nonterminal is distinguished as the start
symbol. - The productions of a grammar specify the manner
in which the terminal and nonterminals can be
combined to form strings. - A language that can be generated by a grammar is
said to be a context-free language.
5Example of Grammar
6Notational Conventions
- Aho P.166
- Example P.167
- E?EAE(E)-Eid
- A?-/?
7Derivations
- E?-E is read E derives -E
- E?-E?-(E)-(id)is called a derivation of -(id)
from E. - If A?? is a production and ? and ? are arbitrary
strings of grammar symbols, we say ?A? ???? . - If ?1??2?... ??n, we say ?1 derives ?n.
8Derivations (II)
- ? means derives in one step.
- ? means derives in zero or more steps.
- ???
- if ??? and ??? then ???
- ? means derives in one or more steps.
- If S??, where ? may contain nonterminals, then we
say that ? is a sentential form.
9Derivations (III)
- G grammar, S start symbol, L(G) the language
generated by G. - Strings in L(G) may contain only terminal symbols
of G. - A string of terminal w is said to be in L(G) if
and only if S?w. - The string w is called a sentence of G.
- A language that can be generated by a grammar is
said to be a context-free language. - If two grammars generate the same language, the
grammars are said to be equivalent.
10Derivations (IV)
- E?EAE(E)-Eid
- A?-/?
- The string -(idid) is a sentence of the above
grammar because - E?-E?-(EE)?-(idE)?-(idid)
- We write E?-(idid)
11Parse Tree
E?EEEE(E)-Eid
12Parse Tree (II)
13Two Parse Trees
14Ambiguity
- A grammar that produces more than one parse tree
for some sentence is said to be ambiguous.
15Eliminating Ambiguity
- Sometimes an ambiguous grammar can be rewritten
to eliminate the ambiguity. - E.g. match each else with the closest unmatched
then
16Eliminating Left Recursion
- A grammar is left recursive if it has a
nonterminal A such that there is a derivation
A?A? for some string ?. - A?A?? can be replaced by
- A? ?A
- A??A?
- A?A?1A?2 A?m?1?2?n
- A??1A?2A?nA
- A??1A?2A ?mA?
17Algorithm Eliminating Left Recursion
18Examples
- S?Aab
- A?AcSd?
- A?AcAadbd?
- S?Aab
- A?bdAA
- A?cAadA??
19Left Factoring
- Left factoring is a grammar transformation that
is useful for producing a grammar suitable for
predictive parsing. - The basic idea is that when it is not clear which
of two alternative productions to use to expand a
nonterminal A, we may be able to rewrite the
A-productions to defer the decision until we have
seen enough of the input to make the right
choice. - Stmt --gt if expr then stmt else stmt
- if expr then stmt
20Algorithm Left Factoring
21Left Factoring (example p178)
- A???1??2
- The following grammar abstracts the dangling-else
problem - S?iEtSiEtSeSa
- E?b
229.2 Top Down Parsing
- Recursive-descent parsing
- Predictive parsers
- Nonrecursive predictive parsing
- FIRST and FOLLOW
- Construction of predictive parsing table
- LL(1) grammars
- Error recovery in predictive parsing (if time
permits)
23Recursive-Descent Parsing
- Top-down parsing can be viewed as an attempt to
find a leftmost derivation for an input string. - It can also viewed as an attempt to construct a
parse tree for the input string from the root and
creating the nodes of the parse tree in preorder.
Grammar
Input string w cad
24Predictive Parsers
- By carefully writing a grammar, eliminating left
recursion, and left factoring the resulting
grammar, we can obtain a grammar that can be
parsed by a recursive-descent parser that needs
no backtracking, i.e., a predictive parser.
S?cAd A?aA A?b?
25Predictive Parser (II)
- Recursive-descent parsing is a top-down method of
syntax analysis in which we execute a set of
recursive procedures to process the input. - A procedures is associated with each nonterminal
of a grammar. - Predictive parsing is what in which the
look-ahead symbol unambiguously determines the
procedure selected for each nonterminal. - The sequence of procedures called in processing
the input implicitly defines a parse tree for the
input.
26(No Transcript)
27(No Transcript)
28Nonrecursive predictive parsing
29Predictive Parsing Program
30Parsing Table M
Grammar
Input id id id
31Moves Made by Predictive Parser
32FIRST and FOLLOW
- If ? is any string of grammar symbols, FIRST(?)
is the set of terminals that begin the strings
derived from ?. If ??? then ? is also in
FIRST(?). - FOLLOW(A), for nonternimal A, is the set of
terminals a that can appear immediately to the
right of A in some sentential form, i.e. the set
of terminals a such that there exists a
derivation of the form S??Aa? for some ? and ?. - If A can be the rightmost symbol in some
sentential form, the is in FOLLOW(A).
33Compute FIRST(X)
34Compute FOLLOW(A)
35Construction of Predictive Parsing Tables
36Example of Producing Parsing Table
37LL(1) Grammars
- A grammar whose parsing table has no
multiply-defined entries is said to be LL(1). - First L scanning from left to right
- Second L producing a leftmost derivation
- 1 using one input symbol of lookahead at each
step to make parsing action decision.
38Properties of LL(1)
- No ambiguous or left recursive grammar can be
LL(1). - Grammar G is LL(1) iff whenever A??? are two
distinct productions of G and - For no terminal a do both ? and ? derive strings
beginning with a. - FIRST(?)?FIRST(?)?
- At most one of ? and ? can derive the empty
string. - If ???, the ? does not derive any string
beginning with a terminal in FOLLOW(A). - FIRST(?FOLLOW(A))?FIRST(?FOLLOW(A))?
39LL(1) Grammars Example
40Non-LL(1) Grammar Example
41Error recovery in predictive parsing
- An error is detected during the predictive
parsing when the terminal on top of the stack
does not match the next input symbol, or when
nonterminal A on top of the stack, a is the next
input symbol, and parsing table entry MA,a is
empty. - Panic-mode error recovery is based on the idea of
skipping symbols on the input until a token in a
selected set of synchronizing tokens.
42How to select synchronizing set?
- Place all symbols in FOLLOW(A) into the
synchronizing set for nonterminal A. If we skip
tokens until an element of FOLLOW(A) is seen and
pop A from the stack, it likely that parsing can
continue. - We might add keywords that begins statements to
the synchronizing sets for the nonterminals
generating expressions.
43How to select synchronizing set? (II)
- If a nonterminal can generate the empty string,
then the production deriving ? can be used as a
default. This may postpone some error detection,
but cannot cause an error to be missed. This
approach reduces the number of nonterminals that
have to be considered during error recovery. - If a terminal on top of stack cannot be matched,
a simple idea is to pop the terminal, issue a
message saying that the terminal was inserted.
44Example error recovery
synch indicating synchronizing tokens obtained
from FOLLOW set of the nonterminal in
question. If the parser looks up entry MA,a
and finds that it is blank, the input symbol a is
skipped. If the entry is synch, the the
nonterminal on top of the stack is popped. If a
token on top of the stack does not match the
input symbol, then we pop the token from the
stack.
45Example error recovery (II)
469.3 Bottom Up Parsing and LR Parsers
- Shift-reduce parsing attempts to construct a
parse tree for an input string beginning at the
leaves (bottom) and working up towards the root
(top). - Reducing a string w to the start symbol of a
grammar. - At each reduction step a particular substring
machining the right side of a production is
replaced by the symbol on the left of that
production, and if the substring is chosen
correctly at each step, a rightmost derivation is
traced out in reverse.
47Example
- Grammar
- S?aABe
- A?Abcb
- B?d
- Reduction
- abbcde
- aAbcde
- aAde
- aABe
- S
48Operator-Precedence Parsing
Grammar for expression
Can be rewritten as
With the precedence relations inserted, id id
id can be written as
49LR(k) Parsers
- L left-to-right scanning of the input
- R constructing a rightmost derivation in reverse
- k the number of input symbols of lookahead that
are used in making parsing decisions.
50LR Parsing
51Shift-Reduce Parser
52Example LR Parsing Table
539.4 Attributes Grammars
- An attribute grammar is a device used to describe
more of the structure of a programming language
than is possible with a context-free grammar. - Some of the semantic properties can be evaluated
at compile-time, they are called "static
semantics", other properties are determined at
execution time, they are called "dynamic
semantics". - The static semantics is often represented by
semantic attributes which are associated with the
nonterminals.
54Attribute Grammars
- Grammars with added attributes, attribute
computation functions, and predicate functions. - Attributes similar to variables
- Attribute computation functions specify how
attribute values are computed - Predicate functions state some of the syntax and
static semantic rules of the language
55Example of Attribute Grammar
56Example (II)
57Example (III)
58Example (IV)
599.5 Dynamic Semantics
- Informal definition Only informal explanations
are given (in natural language) which define the
meaning of programs (e.g. language reference
manuals, etc.). - Operational semantics The meaning of the
constructs of the programming language is defined
in terms of the translation into another
lower-level language and the semantics of this
lower-level language. Usually only the
translation is defined formally, the semantics of
the lower-level language is defined informally.
60Axiomatic Semantics
- Axiomatic semantics was defined to prove the
correctness of programs. - This approach is related to the approach of
defining the semantics of a procedure
(independently of its code) in terms of pre- and
post-conditions that define properties of input
and output parameters and values of state
variables. - Weakest precondition For a given statement, and
a given postcondition that should hold after its
execution, the weakest precondition is the
weakest condition which ensures, when it holds
before the execution of the statement, that the
given postcondition holds afterwards.
61Denotational Semantics
- Denotational semantics is a method for describing
the meaning of programs. - It is based on recursive function theory.
- Grammar
- ltbin_numgt ? 0
- 1
- ltbin_numgt 0
- ltbin_numgt 1
- Function Mmin
629.6 Tools for Syntax Analysis (YACC)
63(No Transcript)
64Syntax Graphs
- A graph is a collection of nodes, some of which
are connected by lines (edges). - A directed graph is one in which the lines are
directional. - A parse tree is a restricted form of directed
graph. - Syntax graph is a directed graph representing the
information in BNF rules.
659.7 Chomsky Hierarchy
66Turing Machine
67Turing Machine (II)
- Unrestricted grammar
- Recognized by Turing machine
- It consists of a read-write head that can be
positioned anywhere along an infinite tape. - It is not a useful class of language for compiler
design.
68Linear-Bounded Automata
69Linear-Bounded Automata
- Context-sensitive
- Restrictions
- Left-hand of each production must have at least
one nonterminal in it - Right-hand side must not have fewer symbols than
the left - There can be no empty productions (N??)
70Push-Down Automata
71Push-Down Automata (II)
- Context-free
- Recognized by push-down automata
- Can only read its input tape but has a stack that
can grow to arbitrary depth where it can save
information - An automation with a read-only tape and two
independent stacks is equivalent to a Turing
machine. - It allows at most a single nonterminal (and no
terminal) on the left-hand side of each
production.
72Finite-State Automata
73Finite State Automata (II)
- Regular language
- Anything that must be remembered about the
context of a symbol on the input tape must be
preserved in the state of the machine. - It allows only one symbol (a nonterminal) on the
left-hand, and only one or two symbols on the
right.