Title: CSE Translation of Programming Languages AKA: Compilers
1CSE Translation of Programming LanguagesAKA
Compilers
- Charles B. Owen (Instructor)
- 1138 E. B., 353-6488
- Ken Horne (TA and grading)
- Classroom 1225 Engineering Building
2Introduction
Introduction to the class Structure, rules, etc.
Getting Started Why are we here? What does a
compiler do?
3Course Objectives
- Understand the processes, algorithms, and
mathematics of programming language translation - Programming methods, algorithms, data structures,
mathematics, etc.
4Why Are We Here?
5Well, of course
- Its valuable to know how compilers work
- You can write more efficient code
- You can debug better
- You can impress your friends
- More
6Its not just compilers
- The ideas in this course are useful for
- Expression evaluation in programs
- Adding scripting language features
- Parsing multimedia file formats (like MP3 or
MPEG) - Creating network protocols
- User interface design
- Computer aided design
- Hardware design
- More
7Course Structure
- See the syllabus
- http//www.cse.msu.edu/cse450
- MW Lectures
- Attendance is expected
8Course Materials
- Textbooks
- Compilers Principles, Techniques, and Tools (2nd
Edition), Aho, Lam, Sethi, and Ullman, 2006,
ISBN-13 978-0321486813. - lex and yacc, Brown, Levine, and Mason, 1995,
ISBN-13 978-1565920002.
- WWW
- http//www.cse.msu.edu/cse450
- And on angel (angel.msu.edu)
9Course Structure
- Exams
- Midterm exam
- Final exam
- Assignments
- 6 programming assignment (planned)
- Toe-tippers
Notice Bring a red pen to class
10Policies
- Reading
- Read the chapters
- Attendance
- Will take care of itself
11In case you are interested...
- Projects will build as a sequence
- Well have some group projects
- Ill try to show the use of these techniques
beyond basic compilers
12Reading and First Programming Assignment
- I suggest
- Read chapter 1 in text.
- Start reading chapter 2
- Project 1
- Will begin next week
13How are languages implemented
Source program
- Compilers
- Translate a language to some other form
- Might be machine language, or could be a
different language or byte-codes or something else
Compiler
Target program
- Interpreters
- Directly execute the programming language
- Sort of like you do when you hand execute a
program
Source program
Interpreter
Output
Input
Note A language is neither interpreted or
compiled. The implementation is what determines
this distinction. Some languages lend themselves
better to one method or the other.
14Compiler Examples
C program
- C
- Usually compiles directly to machine language
Compiler
Java program
Machine Language
- Java
- Compiles to an intermediate code that no CPU
actually executes. - This is then interpreted by a Java Virtual
Machine (JVM).
Compiler
Byte-codes
Java byte-codes can also be compiled to machine
language.
15Interpreter Examples
- MATLAB
- You can just type in the statements and they
execute right away.
Matlab program
Interpreter
Output
Input
Byte codes
- Common Language Runtime,
- Java Virtual Machine
- The most basic implementation
JVM
Output
Input
16What are some consequences of each?
Source program
Compiler
Target program
Source program
Interpreter
Output
Input
TT
17What are some consequences of each?
Source program
- Compilers?
- Can be as slow as necessary
- Can spend time optimizing code
- Sees the program in its entirety
Compiler
Target program
Source program
- Interpreters?
- Can be interactive
- Fastest time from start to execution
Interpreter
Output
Input
Mixture of both are common. Interpret dynamic
statements and for rapid startup and compile for
better performance later.
18History of High-Level Languages
- 1954 IBM 704
- All programming in assembly
- Programming costs exceeded hardware costs!
TEXT FEED COMPOSITION n20 TAPE v213
TAPE v240 TAPE n20 v140 TAPE n2 v3 0 n0
4 n1 0 v0 4 43) v215 v0 1 v215 v213 /
v215 v218 v213 - v215 n1 0 42) v(20n1)
v(240n1) x v213 n1 n1 1 -gt 42, n1 0
- Solution Speedcoding
- An interpreted computer language
- Simple language to express floating point
calculations
The IBM 704 did not have any floating point
support, it was implemented in the speedcoding
interpreter
19Enter John Backus
- Idea Translate programs to assembly
READ INPUT TAPE 5, 501, IA, IB, IC 501 FORMAT
(3I5) C IA, IB, AND IC MAY NOT BE NEGATIVE
IF (IA) 777, 777, 701 701 IF (IB) 777, 777,
702 702 IF (IC) 777, 777, 703 703 IF
(IAIB-IC) 777,777,704 704 IF (IAIC-IB)
777,777,705 705 IF (IBIC-IA) 777,777,799 777
STOP 1 C USING HERON'S FORMULA WE CALCULATE THE C
AREA OF THE TRIANGLE 799 S FLOATF (IA IB
IC) / 2.0 AREA SQRT( S (S - FLOATF(IA))
(S - FLOATF(IB)) (S - FLOATF(IC)))
WRITE OUTPUT TAPE 6, 601, IA, IB, IC, AREA
601 FORMAT (4H A ,I5,5H B ,I5,5H C ,I5,8H
AREA ,F10.2, 13H SQUARE UNITS) STOP
END
- Result Fortran I
- 1954-1957
- By 1958, 50 of all software is in Fortran!
He invented many of the basic techniques well
use in this course!
20Structure of a Compiler
character stream
Lexical Analysis
token stream
These steps are often done in phases or
passes. This structure is very common. Each
step will be a set of algorithms well explore.
Parsing
Front End
syntax tree
Semantic Analysis
syntax tree
Intermediate Code Generate
Symbol Table
intermediate code
Optimization
Back End
intermediate code
Code Generation
target machine code
21Lexical Analysis
character stream
Lexical Analysis
Read the character stream and converts it into a
stream of tokens A sequential set of characters,
called a lexeme, becomes a token. Were
recognizing substrings that are meaningful.
token stream
What is meaningful about this
speed speed 10 time
22Lexemes for this string
speed speed 10 time
Well convert each of these into a token of the
form ltname, valuegt. Sometime the value will be
omitted. speed becomes ltid, 1gt, where id
means this is a symbol and 1 is the location in
the symbol table. 10 becomes ltconstant, 10gt
(or just lt10gt in your textbook)
Symbol Table
Sort of like recognizing the words in a sentence.
23Lexemes for this string
speed speed 10 time
Lexical Analysis
ltid, 1gt ltgt ltid,1gt ltgt lt10gt ltgt ltid, 2gt
Symbol Table
The tool lex creates lexical analyzers
24Lexical Analysis
The lexemes and their tokens will be determined
by the language.
sing func count rest prin pick "99
bottles " "no bottles " "1 bottle " count
"bottles " min 4 count 2 print
rest
REBOL
def bottles (_at_bottles.zero? ? "no more"
_at_bottles).to_s ltlt " bottle" ltlt ("s" unless
_at_bottles 1).to_s end
RUBY
Things that become lexemes punctuation,
symbols, keywords, constants, etc.
TT
25Syntax Analysis
token stream
Parsing
Converting the token stream into a syntax tree.
In a syntax tree, the nodes are operations and
the children are the arguments to the operation.
syntax tree
What are the operations and arguments here?
ltid, 1gt ltgt ltid,1gt ltgt lt10gt ltgt ltid, 2gt
Sort of like diagramming a sentence in English
class.
26Syntax Trees
ltid, 1gt ltgt ltid,1gt ltgt lt10gt ltgt ltid, 2gt
Heres an operation for sure
ltgt
ltid,1gt ltgt lt10gt ltgt ltid, 2gt
ltid, 1gt
27A complete syntax tree
ltid, 1gt ltgt ltid,1gt ltgt lt10gt ltgt ltid, 2gt
Parsing
ltgt
ltid, 1gt
ltgt
ltid,1gt
ltgt
lt10gt
ltid, 2gt
Symbol Table
28What about this code?
" bottle" ltlt ("s" unless _at_bottles 1).to_s
ltbottlegt ltinsertiongt lt(gt ltsgt ltunlessgt lt_at_gt
ltid,1gt ltgt lt1gt lt)gt lt.gt ltid,2gt
TT
29Semantic Analysis
syntax tree
Semantic Analysis
- Semantics are the meaning of the programming
language. - Now were going to analyze our syntax tree to see
if it is, or can be converted, to a tree that
semantically meaningful. - Common checks
- Valid arguments
- Type checking
syntax tree
ltgt
ltid, 1gt
ltgt
ltid,1gt
ltgt
lt10gt
ltid, 2gt
Symbol Table
How were the types determined? Do we have any
type issues here?
30Silly English analogies for semantic analysis
Jack said Bob is an idiot. Who does idiot refer
to? The rain in Spain stays mainly in the plain.
Where does it rain? Where is that soggy
plain? Jack left her homework at home. This is a
type mismatch (Jacks a guy).
31Type Coercion
ltgt
We modify the syntax tree to fix semantic issues
are the fixable What if there are not
fixable? Whats an example of something not
fixable?
ltid, 1gt
ltgt
ltid,1gt
ltgt
ltid, 2gt
ltinttofloatgt
Coercion
lt10gt
Symbol Table
How were the types determined? Do we have any
type issues here?
32Semantic Analysis
ltgt
ltid, 1gt
ltgt
ltid,1gt
ltgt
lt10gt
ltid, 2gt
Semantic Analysis
ltgt
ltid, 1gt
ltgt
ltid,1gt
ltgt
ltid, 2gt
ltinttofloatgt
lt10gt
TT
33Intermediate Code Generator
syntax tree
Intermediate Code Generate
intermediate code
Most compilers convert the syntax tree into some
intermediate code. This is then subject to
optimization and conversion to the final machine
code. Why an intermediate code?
34Intermediate Code Generator
syntax tree
Intermediate Code Generate
intermediate code
Most compilers convert the syntax tree into some
intermediate code. This is then subject to
optimization and conversion to the final machine
code. Why an intermediate code?
- Intermediate code is usually more general and
easier to optimize. - Many compilers have the same back end for
multiple front ends.
gcc compiles both C and C to the same
intermediate code, then uses a common back end
for both.
35Intermediate code example
ltgt
ltid, 1gt
ltgt
t1 inttofloat(10) t2 t1 id2 t3 id1
t2 id1 t3
ltid,1gt
ltgt
ltid, 2gt
ltinttofloatgt
Each operation became a line of intermediate
code. The t values are temporary variables.
lt10gt
The textbook refers to this as three-address
code. Each operation has up to 3 operands (some
have fewer). Can you see the three operands in
each of these statements?
36Intermediate code example
ltgt
ltid, 1gt
ltgt
t1 inttofloat(10) t2 t1 id2 t3 id1
t2 id1 t3
ltid,1gt
ltgt
ltid, 2gt
ltinttofloatgt
lt10gt
t2 t1 id2 Operands are t2, t1, id2 This of
this like an assembly instruction mult t1, id2,
t2 t1 inttofloat(10) Operands are t1, 10
This is designed as an easy to understand
assembly language.
TT
37Optimization
intermediate code
Optimization
intermediate code
t1 inttofloat(10) t2 t1 id2 t3 id1
t2 id1 t3
Optimization Making the code more
efficient. Any optimization ideas here?
38Optimization
t1 inttofloat(10) t2 t1 id2 t3 id1
t2 id1 t3
Optimization
t2 10.0 id2 id1 id1 t2
39Code Generation
intermediate code
Code Generation
Translate the intermediate code into a target
code.
target machine code
t2 10.0 id2 id1 id1 t2
Code Generation
LDF R2, id2 MULF R2, R2, 60.0 LDF R1, t2 ADDF R1,
R1, R2 STF id1, R1
40Other issues
Symbol tables are heavily uses. You need very
efficient data structures. Any ideas? What ways
might we be access the symbol table? Optimization
is a major area and may be done after final code
generation as well. Compilers are large, complex
pieces of software and a major task for software
engineers.