Title: Grammars for Syntax Definition
1Grammars for Syntax Definition
- A Context-free Grammar (CFG) Is Utilized to
Describe the Syntactic Structure of a Language - A CFG Is Characterized By
- 1. A Set of Tokens or Terminal Symbols
- 2. A Set of Non-terminals
- 3. A Set of Production Rules Each Rule Has the
FormNT ? T, NT - 4. A Non-terminal Designated As the Start Symbol
2Grammars for Syntax DefinitionExample CFG
list ? list digit list ? list - digit list ?
digit digit ? 0 1 2 3 4 5 6 7 8
9 (the means OR) (So we could have
written list ? list digit list - digit
digit )
3Grammars are Used to Derive Strings
Using the CFG defined on the previous slide, we
can derive the string 9 - 5 2 as
follows list ? list digit ? list -
digit digit ? digit - digit digit
? 9 - digit digit ? 9 - 5 digit
? 9 - 5 2
P1 list ? list digit P2 list ? list -
digit P3 list ? digit P4 digit ? 9 P4
digit ? 5 P4 digit ? 2
4Grammars are Used to Derive Strings
This derivation could also be represented via a
Parse Tree
list ? list digit ? list - digit
digit ? digit - digit digit ? 9
- digit digit ? 9 - 5 digit ?
9 - 5 2
5A More Complex Grammar
block ? begin opt_stmts end opt_stmts ?
stmt_list ? stmt_list ? stmt_list stmt
stmt
What is this grammar for ? What does ?
represent ? What kind of production rule is this ?
6Defining a Parse Tree
- More Formally, a Parse Tree for a CFG Has the
Following Properties - Root Is Labeled With the Start Symbol
- Leaf Node Is a Token or ?
- Interior Node (Now Leaf) Is a Non-Terminal
- If A ? x1x2xn, Then A Is an Interior
x1x2xn Are Children of A and May Be
Non-Terminals or Tokens
7Other Important Concepts Ambiguity
Two derivations (Parse Trees) for the same token
string.
Grammar string ? string string string
string 0 1 9
Why is this a Problem ?
8Other Important Concepts Associativity of
Operators
Left vs. Right
right ? letter right letter letter ? a b
c z
9Other Important Concepts Operator Precedence
What does 9 5 2 mean?
( ) / -
is precedence order
Typically
This can be incorporated into a grammar via
rules
expr ? expr term expr term term term ?
term factor term / factor factor factor ?
digit ( expr ) digit ? 0 1 2 3 9
Precedemce Achieved by expr term for each
precedence level Rules for each are left
recursive or associate to the left
10Syntax-Directed Translation
- Associate Attributes With Grammar Rules
Constructs and Translate As Parsing Occurs - Our Example Uses Infix to Postfix Notation
Translation for Expressions - Translation May Be Defined Inductively As
Postfix(e), E is an Expression
If E e1 op e2 then postfix(E) postfix(e1)
postfix(e2) op If E (e) then postfix(E)
postfix(e) If E x then postfix(E) x
( 9 5 ) 2 ? 9 5 2 9 ( 5 2 ) ? 9 5
2 -
Examples
11Syntax-Directed Definition (2 parts)
- Each Production Has a Set of Semantic Rules
- Each Grammar Symbol Has a Set of Attributes
- For the Following Example, String Attribute t
is Associated With Each Grammar Symbol, i.e., - What is a Derivation for 9 5 - 2?
12Syntax-Directed Definition (2 parts)
- Each Production Rule of the CFG Has a Semantic
Rule - Note Semantic Rules for expr Use Synthesized
Attributes Which Obtain Their Values From Other
Rules.
13Semantic Rules are Embedded in Parse Tree
- How Do Semantic Rules Work ?
- What Type of Tree Traversal is Being Performed?
- How Can We More Closely Associate Semantic Rules
With Production Rules ?
14Examples
rest ? term rest ? rest ? term
print()rest (Print After term for
postfix translation)
15Parsing Top-Down Predictive
- Top-Down Parsing ? Parse tree / derivation of
a token string occurs in a top down fashion. - For Example, Consider
type ? simple ? id
array simple of type simple ? integer
char num dotdot num
Suppose input is array num dotdot num
of integer The parse would begin with type ?
array simple of type
16Top-Down Parse (type start symbol)
Input array num dotdot num of integer
Tokens
17Top-Down Parse (type start symbol)
Input array num dotdot num of integer
18Top-Down Process Recursive Descent or Predictive
Parsing
- Parser Operates by Attempting to Match Tokens in
the Input Stream - Utilize both Grammar and Input Below to Motivate
Code for Algorithm
array num dotdot num of integer
type ? simple ? id
array simple of type simple ? integer
char num dotdot num
procedure match ( t token ) begin
if lookahead t then
lookahead nexttoken else
error end
19Top-Down Algorithm (Continued)
procedure type begin if lookahead
is in integer, char, num then simple
else if lookahead ? then begin match
(? ) match( id ) end else if
lookahead array then begin
match( array ) match() simple match()
match(of) type end
else error end procedure simple
begin if lookahead integer then
match ( integer ) else if lookahead
char then match ( char ) else
if lookahead num then begin
match (num) match (dotdot) match
(num) end
else error end
20Problem with Top Down Parsing
- Left Recursion in CFG May Cause Parser to Loop
Forever - Solution Algorithm to Remove Left Recursion
expr ? expr term expr - term term term
? 0 1 2 3 4 5 6 7 8 9
expr ? term rest rest ? term rest - term
rest ? term ? 0 1 2 3 4 5 6 7
8 9
New Semantic Actions !
rest ? term print() rest - term
print(-) rest ?
21Comparing Grammarswith Left Recursion
- Notice Location of Semantic Actions in Tree
- What is Order of Processing?
22Comparing Grammarswithout Left Recursion
- Now, Notice Location of Semantic Actions in Tree
for Revised Grammar - What is Order of Processing in this Case?
rest
23The Lexical Analysis ProcessA Graphical Depiction
returns token to caller
uses getchar ( ) to read character
lexan ( ) lexical analyzer
pushes back c using ungetc (c , stdin)
tokenval
Sets global variable to attribute value
24The Lexical Analysis ProcessFunctional
Responsibilities
- Input Token String Is Broken Down
- White Space and Comments Are Filtered Out
- Individual Tokens With Associated Values Are
Identified - Symbol Table Is Initialized and Entries Are
Constructed for Each Appropriate Token - Under What Conditions will a Character be Pushed
Back? - Can You Cite Some Examples in Programming
Language Statements?
25Algorithm for Lexical Analyzer
function lexan integer var lexbuf
array 0 .. 100 of char c
char begin loop begin
read a character into c
if c is a blank or a tab then
do nothing else if
c is a newline then
lineno lineno 1 else if
c is a digit then begin
set tokenval to the value of this and
following digits
return NUM end
26Algorithm for Lexical Analyzer
else if c is a letter then
begin place c and
successive letters and digits into lexbuf
p lookup ( lexbuf )
if p 0 then
p iinsert ( lexbf,
ID) tokenval p
return the token field of
table entry p end
else / token is a single character /
set tokenval to NONE /
there is no attribute /
return integer encoding of character c
end end
Note Insert / Lookup operations occur against
the Symbol Table !
27Symbol Table Considerations
OPERATIONS Insert (string, token_ID)
Lookup (string) NOTICE
Reserved words are placed into
symbol table for easy
lookup Attributes may be associated with each
entry, i.e.,
Semantic Actions
Typing Info id ? integer
etc.
ARRAY symtable lexptr
token attributes
div mod
id id
0 1 2 3 4
ARRAY lexemes