Title: Chapter 2 Syntax
1Chapter 2 Syntax
2Syntax
- The syntax of a programming language specifies
the structure of the language - The lexical structure specifies how words can be
constituted from characters - The syntactic structure specifies how sentences
can be constituted from words
3Lexical Structure
- The tokens of a programming language consist of
the set of all baisc grammatical categories that
are the building blocks of syntax - A program is viewed as a stream of tokens
4Standard Token Categories
- Keywords, such as if and while
- Literals or constants, such as 42 (a numeric
literal) or "hello" (a string literal) - Special symbols, such as , lt, or
- Identifiers, such as x24, putchar, or
monthly_balance
5White Spaces and Comments
- White spaces and comments are ignored except they
function as delimiters - Typical white spaces newlines, tabs, spaces
- Comments
- / /, // \n (C, C, Java)
- -- \n (Ada, Haskell)
- ( ) (Pascal, ML)
- \n (Scheme)
6C tokens
There are six classes of tokens identifiers,
keywords, constants, string literals, operators,
and other separators. Blanks, horizontal and
vertical tabs, newlines, formfeeds, and comments
as described below (collectively, "white space")
are ignored except as they separate tokens. Some
white space is required to separate otherwise
adjacent identifiers, keywords, and constants. If
the input stream has been separated into tokens
up to a given character, the next token is the
longest string of characters that could
constitute a token.
7An Example
/ This program counts from 1 to 10. / main(
) int i for (i 1 i lt 10 i)
printf(d\n, i)
8Backus-Naur Form (BNF)
- BNF is a notation widely used in formal
definition of syntactic structure - A BNF is a set of rewriting rules ?, a set of
terminal symbols ?, a set of nonterminal symbols
N, and a start symbol S ? N - Each rule in ? has the following form A ?
?where A ? N and ? ? (N ? ?)
9Backus-Naur Form
- The terminals in ? form the basic alphabet
(tokens) from which programs are constructed - The nonterminals in N identify grammatical
categories like Identifier, Integer, Expression,
Statement, Function, Program - The start symbol S identifies the principal
grammatical category being defined by the grammar
10Examples
1. binaryDigit ? 0 binaryDigit ?
1 binaryDigit ? 0 1 2. Integer ? Digit
Integer Digit Digit ? 0 1 2 3 4 5 6
7 8 9
metasymbol or
metasymbol concatenate
11Derivation
- Integer
- ? Integer Digit
- Integer Digit Digit
- Digit Digit Digit
- 3 Digit Digit
- 3 5 Digit
- 3 5 2
Sentential form
Sentence
12Parse Tree
Sentential form
13Example Expression
Assignment ? Identifier Expression Expression ?
Term Expression Term Expression
Term Term ? Factor Term Factor Term /
Factor Factor ? Identifier Literal (
Expression )
14Example Expression
x 2 y
15Syntax for a Subset of C
Program ? void main ( ) Declarations Statements
Declarations ? ? Declarations
Declaration Declaration ? Type Identifiers Type
? int boolean Identifiers ? Identifier
Identifiers , Identifier Statements ? ?
Statements Statement Statement ? Block
Assignment IfStatement WhileStatement Block ?
Statements Assignment ? Identifier
Expression IfStatement ? if ( Expression )
Statement if (
Expression ) Statement else Statement WhileStateme
nt ? while ( Expression ) Statement
16Syntax for a Subset of C
Expression ? Conjuction Expression
Conjuction Conjuction ? Relation Conjuction
Relation Relation ? Addition Relation lt
Addition
Relation lt Addition
Relation gt Addition
Relation gt Addition
Relation Addition
Relation
! Addition Addition ? Term Addition Term
Addition Term Term ? Negation Term Negation
Term / Negation Negation ? Factor !
Factor Factor ? Identifier Literal (
Expression )
17Example Program
. .
void main ( ) int x x 1
18Ambiguity
- A grammar is ambiguous if it permits a string to
be parsed into two or more different parse
trees AmbExp ? Integer AmbExp AmbExp 2
- 3 - 4
19An Example
(2 3) 4
2 (3 4)
20The Dangling Else Problem
if ( x lt 0 ) if ( y lt 0 ) y y 1 else y
0
21The Dangling Else Problem
if ( x lt 0 ) if ( y lt 0 ) y y 1 else y 0
22The Dangling Else Problem
- Solution I use a special keyword fi to
explicitly close every if statement. For example,
in AdaIfStatement ? if ( E ) S fi - if ( E ) S else S fi
- Solution II use an explicit rule outside the BNF
syntax. For example, in C, every else clause is
associated with the closest preceding if in the
statement
23Extended BNF (EBNF)
- EBNF introduces 3 parentheses
- It uses to denote repetition to simplify the
specification of recursion - It uses to denote the optional part
- It uses ( ) for grouping
24An Example
Expression ? Term Expression Term
Expression Term Term ? Factor Term Factor
Term / Factor Factor ? number - number
number
grouping
Expression ? Term ( ) Term Term ?
Factor ( / ) Factor Factor ? -
number
zero or more occurrences
optional
25Abstract Syntax
- The abstract syntax of a language identifies the
essential syntactic elements in a program without
describing how they are concretely constructed
while i lt n do begin i i 1 end
while (i lt n) i i 1
Pascal
C
26Example Loop
- Thinking a loop abstractly, the essential
elements are a test expression for continuing a
loop and a body which is the statement to be
repeated - All other elements constitute nonessential
syntactic sugar - The complete syntax is usually called concrete
syntax
27Example Loop
while i lt n do begin i i 1 end
loop
lt
Pascal
i
n
i
while (i lt n) i i 1
i
1
C
28Example Expression
x 2 y
29Example Expression
x 2 y
30Parser
- A parser of a language accepts or rejects strings
based on whether they are legal strings in the
language - In a recursive-descent parser, each nonterminal
is implemented as a function, and each terminal
is implemented as a matching with the current
token
31Example Calculator
command ? expr \n expr ? term term
term ? factor factor factor ? number
( expr ) number ? digit digit digit ? 0
1 2 3 4 5 6 7 8 9
32Example Calculator
include ltctype.hgt include ltstdlib.hgtinclude
ltstdio.hgtint tokenint pos 0
void command(void)void expr(void)void
term(void)void factor(void)void
number(void)void digit(void)
33Example Calculator
main() parse() return 0
void getToken(void) token getchar()
pos while (token ' ') token
getchar() pos
void parse(void) getToken()
command()
34Example Calculator
command ? expr \n
void command(void) expr()
match(\n)
void match(char c) if (token c)
getToken() else error()
35Example Calculator
expr ? term term
term ? factor factor
void term(void) factor() while (token
'') match('') term()
void expr(void) term() while (token
'') match('') term()
36Example Calculator
factor ? number ( expr )
number ? digit digit
void factor(void) if (token '(')
match('(') expr()
match(')') else number()
void number(void) digit() while
(isdigit(token)) digit()
37Example Calculator
void digit(void) if (isdigit(token))
match(token) else error()
void error(void) printf("parse error
position d character
c\n", pos, token) exit(1)