Title: Syntax of All Sorts
1Syntax of All Sorts
- COS 441
- Princeton University
- Fall 2004
2Acknowledgements
- Many slides from this lecture have been adapted
from John Mitchells CS-242 course at Stanford - Good source of supplemental information
- http//theory.stanford.edu/people/jcm/books/cpl-te
aching.html - Differs in foundational approach when compared to
Harpers approach will discuss this difference in
later lectures
3Interpreter vs Compiler
Source Program
Input
Output
Interpreter
Source Program
Compiler
Target Program
Input
Output
4Interpreter vs Compiler
- The difference is actually a bit more fuzzy
- Some interpreters compile to native code
- SML/NJ runs native machine code!
- Does fancy optimizations too
- Some compilers compile to byte-code which is
interpreted - javac and a JVM which may also compile
5Interactive vs Batch
- SML/NJ is a native code compiler with an
interactive interface - javac is a batch compiler for bytecode which may
be interpreted or compiled to native code by a
JVM - Python compiles to bytecode then interprets the
bytecode it has both batch and interactive
interfaces - Terms are historical and misleading
- Best to be precise and verbose
6Typical Compiler
Source Program
Lexical Analyzer
Syntax Analyzer
Semantic Analyzer
Intermediate Code Generator
Code Optimizer
Code Generator
Target Program
7Brief Look at Syntax
- Grammar
- e n e e e e
- n d nd
- d 0 1 2 3 4 5
6 7 8 9 - Expressions in language
- e ? e e ? e e e ? n n n ? nd d d ?
dd d d - ? ? 27 4 3
- Grammar defines a language
- Expressions in language derived by sequence of
productions
8Parse Tree
- Derivation represented by tree
- e ? e e ? e e e ? n n n ? nd d d
? dd d d ? ? 27 4 3
e
e
e
e
e
27
4
3
Tree shows parenthesization of expression
9Dealing with Parsing Ambiguity
- Ambiguity
- Expression 27 4 3 can be parsed two ways
- Problem 27 (4 3) ? (27 4) 3
- Ways to resolve ambiguity
- Precedence
- Group before
- Parse 3 4 2 as (3 4) 2
- Associativity
- Parenthesize operators of equal precedence to
left (or right) - Parse 3 4 5 as (3 4) 5
10Rewriting the Grammar
- Harper describe a way to rewrite the grammar to
avoid ambiguity - Eliminating ambiguity in a parsing grammar is a
dark art - Not always even possible
- Depends on the parsing technique you use
- Learn all you want to know and more in a compiler
book or course - Ambiguity is bad when trying to think formally
about programming languages
11Abstract Syntax Trees
- We want to reason about expressions inductively
- So we need an inductive description of
expressions - We could use the parse tree
- But it has all this silly term and factor stuff
to parsing unambiguous - Introduce nicer tree after the parsing phase
12Syntax Comparisons
Ambiguous Concrete
Digits d 0 9
Numbers n d n d
Expressions e n
e1 e2
e1 e2
Abstract
Numbers n 2 N
Expressions e numn
plus(e1,e2)
times(e1,e2)
Unambiguous Concrete
Digits d 0 9
Numbers n d n d
Expressions e t t e
Terms t f f t
Factors f n (e)
13Induction Over Syntax
- We can think of the CFG notation as defining a
predicate exp - Thinking about things this way lets us use rule
induction on abstract syntax
14A More General Pattern
- One might ask what is the general pattern of
predicates defined by CFG that represent abstract
syntax - ? represents a signature that assigns arities to
operators such a numN, plus, and times - Note the T1 Tn are all unique
15An Example
16An Example
Operator Arity
zero 0
succ 1
17An Example
Operator Arity
zero 0
succ 1
n zero
succ(n)
18Another Example
19Another Example
Operator Arity
zero 0
succ 1
even zero
succ(odd)
odd succ(even)
20Another Example
Operator Arity
zero 0
succ 1
even zero
succ(odd)
odd succ(even)
Operator Arity
succ 1
21 NOT Abstract Syntax
22Abstract Syntax To ML
n zero
succ(n)
n 2 N
e numn
plus(e1,e2)
times(e1,e2)
23Abstract Syntax To ML
datatype nat zero succ of nat
n zero
succ(n)
n 2 N
e numn
plus(e1,e2)
times(e1,e2)
24Abstract Syntax To ML
datatype nat zero succ of nat
n zero
succ(n)
n 2 N
e numn
plus(e1,e2)
times(e1,e2)
datatype exp num of nat plus of (exp
exp) times of (exp exp)
25Abstract Syntax To ML
datatype nat Zero Succ of nat
n zero
succ(n)
n 2 N
e numn
plus(e1,e2)
times(e1,e2)
datatype exp Num of int Plus of (exp
exp) Times of (exp exp)
A small cheat for performance
Capitalize by convention
26Abstract Syntax To ML
- Converting things is not always straightforward
- Constructor names need to be distinct
- ML lacks some features that would make some
things easier - Operator overloading
- Interacts badly with type-inference
- Subtyping would improve things too
- We can get by with predicates sometimes instead
27Abstract Syntax To ML (cont.)
datatype nat Zero Succ of nat
n zero
succ(n)
even zero
succ(odd)
odd succ(even)
28Abstract Syntax To ML (cont.)
datatype nat Zero Succ of nat
n zero
succ(n)
datatype even EvenZero EvenSucc of
(odd) and odd OddSucc of even
even zero
succ(odd)
odd succ(even)
29Abstract Syntax To ML (cont.)
datatype nat Zero Succ of nat
n zero
succ(n)
fun even(Zero) true even(Succ(n))
odd(n) and odd(Zero) false odd(Succ(n))
even(n)
even zero
succ(odd)
odd succ(even)
30Syntax Trees Are Not Enough
- Programming languages include binding constructs
- variables function arguments, variable function
declarations, type declarations - When reasoning about binding constructs we want
to ignore irrelevant details - e.g. the actual spelling of the variable
- we only that variables are the same or different
- Still we must be clear about the scope of a
variable definition
31Abstract Binding Trees
- Harper uses abstract binding trees (Chapter 5)
- This solution is based on very recent work
- M. J. Gabbay and A.M. Pitts, A New Approach to
Abstract Syntax with Variable Binding, Formal
Aspects of Computing 13(2002)341-363 - The solution is new but the problems are old!
- Alonzo Church. A set of postulates for the
foundation of logic. The Annals of Mathematics,
Second Series, Volume 33, Issue 2 (Apr. 1932),
346-366. - We will talk about the problem with binding today
and deal with the solutions in the next lecture
32Lambda Calculus
- Formal system with three parts
- Notation for function expressions
- Proof system for equations
- Calculation rules called reduction
- Basic syntactic notions
- Free and bound variables
- Illustrates some ideas about scope of binding
- Symbolic evaluation useful for discussing programs
33History
- Original intention
- Formal theory of substitution
- More successful for computable functions
- Substitution ! symbolic computation
- Church/Turing thesis
- Influenced design of Lisp, ML, other languages
- Important part of CS history and theory
34Expressions and Functions
- Expressions
- x y x 2 y z
- Functions
- ?x. (x y) ?z. (x 2 y z)
- Application
- (?x. (x y)) 3 3 y
- (?z. (x 2 y z)) 5 x 2 y 5
- Parsing ?x. f (f x) ?x.( f (f (x)) )
35Free and Bound Variables
- Bound variable is placeholder
- Variable x is bound in ?x. (xy)
- Function ?x. (xy) is same function as ?z. (zy)
- Compare
- ? xy dx ? zy dz ?x P(x) ?z P(z)
- Name of free (unbound) variable does matter
- Variable y is free in ?x. (xy)
- Function ?x. (xy) is not same as ?x. (xz)
- Occurrences
- y is free and bound in ?x. ((?y. y2) x) y
36Reduction
- Basic computation rule is ?-reduction
- (?x. e1) e2 ? xÃe2e1
- where substitution involves renaming as needed
- Example
- (?f. ?x. f (f x)) (?y. yx)
37Rename Bound Variables
- Rename bound variables to avoid conflicts
- (?f. ?z. f (f z)) (?y. yx) ?
- ?
- ?z. (zx)x
- Substitute blindly
- (?f. ?x. f (f x)) (?y. yx) ?
- ?
- ?x. (xx)x
38Reduction
- (?f. ?x. f (f x)) (?y. yx)
-
39Reduction
- (?f. ?z. f (f z)) (?y. yx)
- Rename bound x to z to avoid conflict with
free x -
40Reduction
- (?f. ?z. f (f z)) (?y. yx) ?
- fÃ(?y. yx)(?z. f (f z))
41Reduction
- (?f. ?z. f (f z)) (?y. yx) ?
- ?z. (?y. yx) ((?y. yx) z))
42Reduction
- (?f. ?z. f (f z)) (?y. yx) ?
- ?z. (?y. yx) ((?y. yx) z)) ?
- ?z. (?y. yx) (yÃz(yx))
43Reduction
- (?f. ?z. f (f z)) (?y. yx) ?
- ?z. (?y. yx) ((?y. yx) z)) ?
- ?z. (?y. yx) (zx)
44Reduction
- (?f. ?z. f (f z)) (?y. yx) ?
- ?z. (?y. yx) ((?y. yx) z)) ?
- ?z. (?y. yx) (zx) ?
- ?z. yÃ(zx)(yx)
45Reduction
- (?f. ?z. f (f z)) (?y. yx) ?
- ?z. (?y. yx) ((?y. yx) z)) ?
- ?z. (?y. yx) (zx) ?
- ?z. ((zx)x)
46Incorrect Reduction
- (?f. ?x. f (f x)) (?y. yx)
47Incorrect Reduction
- (?f. ?x. f (f x)) (?y. yx) ?
- fÃ(?y. yx)(?x. f (f x))
48Incorrect Reduction
- (?f. ?x. f (f x)) (?y. yx) ?
- ?x. (?y. yx) ((?y. yx) x))
49Incorrect Reduction
- (?f. ?x. f (f x)) (?y. yx) ?
- ?x. (?y. yx) ((?y. yx) x)) ?
- ?x. (?y. yx) (yÃx(yx))
50Incorrect Reduction
- (?f. ?x. f (f x)) (?y. yx) ?
- ?x. (?y. yx) ((?y. yx) x)) ?
- ?x. (?y. yx) (xx)
51Incorrect Reduction
- (?f. ?x. f (f x)) (?y. yx) ?
- ?x. (?y. yx) ((?y. yx) x)) ?
- ?x. (?y. yx) (xx) ?
- ?x. yÃ(xx)(yx)
52Incorrect Reduction
- (?f. ?x. f (f x)) (?y. yx) ?
- ?x. (?y. yx) ((?y. yx) x)) ?
- ?x. (?y. yx) (xx) ?
- ?x. ((xx)x)
53Main Points About ?-calculus
- ? captures essence of variable binding
- Function parameters
- Bound variables can be renamed
- Succinct function expressions
- Simple symbolic evaluator via substitution
- Easy rule always rename bound variables to be
distinct