Title: Syntax
1Syntax
2Syntax
- The syntax of a programming language specifies
the structure of the language - The lexical structure specifies how words can be
constituted from characters - The syntactic structure specifies how sentences
can be constituted from words
3Lexical Structure
- The tokens of a programming language consist of
the set of all grammatical categories that are
the building blocks of syntax - A program is viewed as a stream of tokens
4Standard Token Categories
- Keywords, such as if and while
- Literals or constants, such as 42 (a numeric
literal) or "hello" (a string literal) - Special symbols, such as , lt, or
- Identifiers, such as x24, putchar, or
monthly_balance
5White Spaces and Comments
- White spaces and comments are ignored except they
function as delimiters - Typical white spaces newlines, tabs, spaces
- Comments
- / /, // \n (C, C, Java)
- -- \n (Ada, Haskell)
- ( ) (Pascal, ML)
- \n (Scheme)
6C tokens
There are six classes of tokens identifiers,
keywords, constants, string literals, operators,
and other separators. Blanks, horizontal and
vertical tabs, newlines, formfeeds, and comments
as described below (collectively, "white space")
are ignored except as they separate tokens. Some
white space is required to separate otherwise
adjacent identifiers, keywords, and constants. If
the input stream has been separated into tokens
up to a given character, the next token is the
longest string of characters that could
constitute a token.
7An Example
/ This program counts from 1 to 10. / main(
) int i for (i 1 i lt 10 i)
printf(d\n, i)
8Backus-Naur Form (BNF)
- BNF is a notation widely used in formal
definition of syntactic structure - A BNF is a set of rewriting rules ?, a set of
terminal symbols ?, a set of nonterminal symbols
N, and a start symbol S ? N - Each rule in ? has the following form A ?
?where A ? N and ? ? (N ? ?)
9Backus-Naur Form
- The terminals in ? form the basic alphabet from
which programs are constructed - The nonterminals in N identify grammatical
categories like Identifier, Integer, Expression,
Statement, Function, Program - The start symbol S identifies the principal
grammatical category being defined by the grammar
10Examples
1. binaryDigit ? 0 binaryDigit ?
1 binaryDigit ? 0 1 2. Integer ? Digit
Integer Digit Digit ? 0 1 2 3 4 5 6
7 8 9
metasymbol or
metasymbol concatenate
11Derivation
- Integer
- ? Integer Digit
- Integer Digit Digit
- Digit Digit Digit
- 3 Digit Digit
- 3 5 Digit
- 3 5 2
Sentential form
Sentence
12Parse Tree
Sentential form
13An Example for an Expression
Assignment ? Identifier Expression Expression ?
Term Expression Term Expression
Term Term ? Factor Term Factor Term /
Factor Factor ? Identifier Literal (
Expression )
14An Example for an Expression
x 2 y
15Syntax for a Subset of C
Program ? void main ( ) Declarations Statements
Declarations ? ? Declarations
Declaration Declaration ? Type Identifiers Type
? int boolean Identifiers ? Identifier
Identifiers , Identifier Statements ? ?
Statements Statement Statement ? Block
Assignment IfStatement WhileStatement Block ?
Statements Assignment ? Identifier
Expression IfStatement ? if ( Expression )
Statement if (
Expression ) Statement else Statement WhileStateme
nt ? while ( Expression ) Statement
16Syntax for a Subset of C
Expression ? Conjuction Expression
Conjuction Conjuction ? Relation Conjuction
Relation Relation ? Addition Relation lt
Addition
Relation lt Addition
Relation gt Addition
Relation gt Addition
Relation Addition
Relation
! Addition Addition ? Term Addition Term
Addition Term Term ? Negation Term Negation
Term / Negation Negation ? Factor !
Factor Factor ? Identifier Literal (
Expression )
17An Example for a Program
void main ( ) int x x 1
18Ambiguity
- A grammar is ambiguous if it permits a string to
be parsed into two or more different parse
trees AmbExp ? Integer AmbExp AmbExp 2
- 3 - 4
19An Example
2 (3 4)
(2 3) 4
20The Dangling Else Problem
if ( x lt 0 ) if ( y lt 0 ) y y 1 else y
0
21The Dangling Else Problem
if ( x lt 0 ) if ( y lt 0 ) y y 1 else y 0
22The Dangling Else Problem
- Solution I use a special keyword fi to
explicitly close every if statement. For example,
in AdaIfStatement ? if ( Expression ) Statement
fi - if ( Expression )
Statement else Statement fi - Solution II use an explicit rule outside the BNF
syntax. For example, in C, every else clause is
associated with the closest preceding if
statement
23Extended BNF (EBNF)
- EBNF introduces 3 parentheses. It uses to
denote repetition to simplify the specification
of recursion, uses to denote the optional
part, and uses () for grouping
zero or more occurrences
Expression ? Term ( ) Term Term ?
Factor ( / ) Term Factor ? -
Factor number
grouping
optional
24Abstract Syntax
- The abstract syntax of a language identifies the
essential syntactic elements in a program without
describing how they are concretely constructed
while i lt n do begin i i 1 end
while (i lt n) i i 1
Pascal
C
25Abstract Syntax
- Thinking a loop abstractly, the essential
elements are a test expression for continuing a
loop and a body which is the statement to be
repeated - All other elements constitute nonessential
syntactic sugar - The complete syntax is usually called concrete
syntax
26Parse Trees
x 2 y
27Abstract Syntax Trees
x 2 y
x
y
2