Title: Syntax
1Syntax
- Juan Carlos Guzmán
- CS 3123 Programming Languages Concepts
- Southern Polytechnic State University
2What does your DOS computer do when ?
- gt copy a.txt b.txt
- gt copy a.txt a.txt
- gt del .
- gt del 01.
- gt type a.txt gt null
- gt type a.txt gt nul
3- How do we know the meaning of our commands?
4Semiotic
- Synthesized from Merriam-Webster (m-w.com)
- a general philosophical theory of signs and
symbols that deals especially with their function
in both artificially constructed and natural
languages and comprises - syntactics
- the formal relations between signs or expressions
in abstraction from their signification and their
interpreters - semantics
- the relations between signs and what they refer
to - pragmatics
- the relation between signs or linguistic
expressions and their users
5Syntax
- Two levels
- The language level, properly known as parsing
- The lexeme level, known as lexing
- More information about this topic can be found in
- Aho, Sethi, Ullman. Compilers Principles,
Techniques, and Tools. Addison-Wesley, 1988.
(on reserve, The Dragon book)
6Lexing
- Specification of the lexemes of the language
- A class of lexemes is known as a token
- Tokens are specified in regular expressions
- letter, empty string
- concatenation
- choice
- closure
- Many convenient extensions
- Recognized by Finite Automata
- Limited in Power cannot count, cannot recognize
anbn
7Sample Regular Expressions
- digit (0 1 2 3 4 5 6 7 8 9)
- ldigit (1 2 3 4 5 6 7 8 9)
- natural ldigit digit
- integer ( - ?) (natural 0)
- How about floating points?
- W/o exponents
- add the exponents
8Parsing
- Specification of the language structure
- The parser
- recognizes the phrase, and
- reconstructs its structure (parse tree)
9Context-Free Grammars
- Generate Context-Free Languages
- Allow recursion
- Are specified as G(N,T,P,S) where
- N is the set of non-terminals, or variables
- T is the alphabet
- P the production set
- S the starting symbol for every phrase
10CFG (Example)
- G1 (S,A,B, a,b, P, S)
- where P S ? ASB, S ? BSA, S ? ?, A ?a, B
?b - G2 (E, a,,,(,), P, E)
- where P E ? EE, E ? EE,
- E ? a, E ? (E)
11Grammars (conventions)
- The empty string ?
- First uppercase letters of the alphabet (A, B, C,
) - gt Non-terminal
- First lowercase letters of the alphabet (a, b, c,
), or numbers (1, 2, ) - gt Terminal
- First lowercase greek letters (?, ?, ?,),
- gt string of terminals and non-terminals
- Last lowercase letters of the alphabet (t, u,
v,) - gt string of terminals
12Derivation
- How do we generate phrases in the language?
- By using a derivation
- ?A? gt ??? iff A?? ? P
- E gt EE gt EEE gt aEE gt aEa gt aaa
13The Language Generated
- The language generated by the grammar is composed
of all strings of terminals that can be derived
from S by applying productions rules one or more
times - Anything derived from S is called a sentential
form
14Derivations
- Leftmost derivation the leftmost non-terminal is
always reduced - E gt EE gt EEE gt aEE gt aaE gt aaa
- Rightmost derivation the rightmost non-terminal
is always reduced - E gt EE gt EEE gt EEa gt Eaa gt aaa
15Parse Tree
- A structured sequence of derivations
- Visually appealing
- From previous example
16Ambiguous Grammar
- Two different parse trees for a single phrase
- Just one phrase with two trees is proof of
ambiguity - Not ambiguous? All phrases must have only one
parse tree! - An ambiguous grammar is quite different from an
inherently ambiguous language
17Grammars vs. Languages
- A language is a set
- A grammar is a medium by which the set can be
formally specified - Many grammars specify the same set
18An Expression Grammar
- The grammar for expressions presented before was
ambiguous - Non-ambiguous, with correct precedence (relative
priority given to and ) - E ? E T T
- T ? T F F
- F ? a ( E )
E
E
T
F
T
T
a
F
F
a
a
19Parsing Styles
- Top-down to derive w from S, start from S,
derive until w is obtained - Bottom-up to derive w from S, try doing reverse
derivations from w until S is obtained
20Parsing Styles
- Top-down LL(k)
- Easy to implement and understand
- hand-coded
- table-driven
- Limited use, many problems
- Bottom-up LR(k)
- More difficult to understand
- table driven
- A nice trade-off between complexity and generality
21An Expression Grammar
- G (E,T,F,a,,,(,),P,E)
- where P
- E ? T E T,
- T ? F T F,
- F ? a ( E )
- Does aaa in L(G)?
E
T
E
T
F
T
F
a
a
F
a
22(No Transcript)
23A Grammar for a Small Language
- ?program? ? begin ?stmt_list? end
- ?stmt_list? ? ?stmt?
- ? ?stmt? ?stmt_list?
- ?stmt? ? ?var? ?expression?
- ?var? ? A ? B ? C
- ?expression? ? ?var? ?var?
- ? ?var? - ?var?
- ? ?var?
24Predictive Parsing
- How many characters of look-ahead are needed to
predict the next production to take? - Is this a finite number?
- Is it 1?
25Another Expression Grammar
- G (E,E,T,T,F,a,,,(,),P,E)
- where P
- E ? T E,
- E ? T E ?,
- T ? F T,
- T ? F T ?,
- F ? a ( E )
- Does aaa in L(G)?
E
T
E
T
F
E
T
?
T
F
a
?
a
F
T
a
?
26LL(1) Parsing Table
27LL(1) Algorithm
input
stack
- Parse(a1 an, X1 Xm)
- if (a1) (X1)
- accept
- else if X1 is a terminal and (X1a1)
- Parse(a2 an, X2 Xm) // match
- else if TableX1,a1 X1?Y1 Yk
- Parse(a1 an, Y1 Yk X2 Xm) / derive
- else
- fail
-
- Call initially with Parse(w,S), where w is the
phrase to parse and S is the starting symbol of
the grammar
ai is a terminal Xj Yk are terminals or
nonterminals
28Parser Operation on aaa
- INPUT
- a a a
- a a a
- a a a
- a a a
- a a
- a a
- a a
- a a
- a a
- a a
- a
- a
- a
- a
-
-
- STACK
- E
- T E
- F T E
- a T E
- T E
- E
- T E
- T E
- F T E
- a T E
- T E
- F T E
- F T E
- a T E
- T E
- E
OPERATION derive derive derive match derive derive
match derive derive match derive match derive mat
ch derive derive accept
Sentential Form E T E F T E a T E a
T E a E a T E a T E a F T E
a a T E a a T E a a F T E a
a F T E a a a T E a a a T E
a a a E a a a
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
29Note how the leftmost derivation of aaa is done
Sentential Form E T E F T E a T E a
T E a E a T E a T E a F T E
a a T E a a T E a a F T E a
a F T E a a a T E a a a T E
a a a E a a a
E
T
E
T
F
E
T
?
T
F
a
?
a
F
T
a
?
30Whats the Table Lookup
- Note that the predictive nature of the parser
guarantees the uniqueness of the entry for
TableA,b (or no entry at all) - When attempting to derive nonterminal A, the
look-ahead b must give the correct rule to apply - This b can be
- the initial character of the derivation of A,
i.e., A ? b?, - or, it can be the initial character of the
derivation of what follows A! (A ? ?)
31First Sets
- first(?) is the set of one-character prefixes of
strings of terminals that can be derived from ? - If the empty string can be derived from ?, then
it will also be in the set - if ? ? aw then a ? first(?)
- if ? ? ? then ? ? first(?)
32First Sets (II)
- first(?) ?
- first(a) a
- first(A) first(?1) ? ? first(?n)
- if A? ?1 ? P, , A? ?n ? P
- first(X?) first(X)?first(?)
- where X is either terminal or nonterminal
33Bounded Concatenation
- In computing first(X?), our interest is to obtain
one-character prefixes (or ?) - Consider the operation at the char level
- ? ? ? ?, where ? is either ? or a terminal
- a ? ? a
- Generalize it to work on sets
- A?B v?w v?A, w?B, where A B are sets
34Computation of First Sets
35Computation of First Sets
36Follow Sets
- Follow(A) is the set of prefixes of strings of
terminals that can follow any derivation of A in
G - ? follow(S)
- if (B??A?) ? P, then
- first(?)?follow(B)? follow(A)
- The definition of follow usually results in
recursive set definitions. In order to solve
them, you need to do several iterations on the
equations - ? never appears in any follow set
- Note I had promised a closed definition of
follow, but it will be unnecessarily complex.
JCG.
37Computation of Follow Sets
38Computation of Follow Sets
39How to Fill In the Table (Predict)
- For each production (A??) ? P
- let X first(?)?follow(A)
- then for all x ? X
- B?? ? TableA,x
- After processing all productions, each cell of
the table must have, at most, one production - if not, your grammar is not LL(1) (nice try!)
40First Follow Sets
41Predict
42Yet Another Expression Grammar (its in the book!)
- G (E,T,F,a,,,(,),P,E)
- where P
- ? E ? E T,
- ? E ? T,
- ? T ? T F,
- ? T ? F,
- ? F ? ( E ),
- ? F ? a
- Does aaa in L(G)?
E
T
E
T
T
F
a
F
F
a
a
43LR(1) Parsing Table
Sn shift to state n Rn reduce according to
production n
44LR(1) Algorithm
input
stack
- Parse(S0X1S1X2S2 XrSr XmSm,a1 an)
- if ActionSm,a1 Shift S
- Parse(S0X1S1X2S2 XmSma1S,a2 an)
- else if ActionSm,a1 Reduce A ? Xr1 Xm
- and GOTOSr,A S
- Parse(S0X1S1X2S2 XrS,a1 an)
- else if ActionSm,a1 Accept
- accept
- else if ActionSm,a1 Error
- error
-
- Call initially with Parse(S0,w), where w is the
phrase to parse and S0 is the initial state of
the table
ai is a terminal Xj Yk are terminals or
nonterminals Si is a state
45Parser Operation on aaa
- STACK
- 0
- 0 a 5
- 0 F 3
- 0 T 2
- 0 E 1
- 0 E 1 6
- 0 E 1 6 a 5
- 0 E 1 6 F 3
- 0 E 1 6 T 9
- 0 E 1 6 T 9 7
- 0 E 1 6 T 9 7 a 5
- 0 E 1 6 T 9 7 F 10
- 0 E 1 6 T 9
- 0 E 1
OPERATION S 5 R 6, G0,F R 4, G0,T R 2,
G0,E R 6 S 5 R 6, G6,F R 4, G6,T S 7 S 5 R
6, G7,F R 3, G7,T R 1, G0,E accept
Sentential Form a a a a a a F a a
T a a E a a E a a E a a
E F a E T a E T a E T a
E T F E T E
INPUT a a a a a a a a a
a a a a a a a a
1 2 3 4 5 6 7 8 9 10 11 12 13 14
46Note how the rightmost derivation of aaa is done
Sentential Form E E T E T F E T a
E T a E T a E F a E a a
E a a E a a T a a F a a
a a a a a a
E
T
E
T
T
F
a
F
F
a
a