Languages and Compilers (SProg og Overs - PowerPoint PPT Presentation

About This Presentation
Title:

Languages and Compilers (SProg og Overs

Description:

Languages and Compilers (SProg og Overs ttere) Bent Thomsen Department of Computer Science Aalborg University With acknowledgement to Norm Hutchinson whose s ... – PowerPoint PPT presentation

Number of Views:196
Avg rating:3.0/5.0
Slides: 89
Provided by: aau72
Category:

less

Transcript and Presenter's Notes

Title: Languages and Compilers (SProg og Overs


1
Languages and Compilers(SProg og Oversættere)
  • Bent Thomsen
  • Department of Computer Science
  • Aalborg University

With acknowledgement to Norm Hutchinson whose
slides this lecture is based on.
2
Todays lecture
  • Three topics
  • Treating Compilers and Interpreters as
    black-boxes
  • Tombstone- or T- diagrams
  • A first look inside the black-box
  • Your guided tour
  • Some Language Design Issues

3
Terminology
Q Which programming languages play a role in
this picture?
Translator
input
output
source program
object program
A All of them!
4
Tombstone Diagrams
  • What are they?
  • diagrams consisting out of a set of puzzle
    pieces we can use to reason about language
    processors and programs
  • different kinds of pieces
  • combination rules (not all diagrams are well
    formed)

5
Tombstone diagrams Combination rules
6
Compilation
Example Compilation of C programs on an x86
machine
x86
7
What is Tetris?
Tetris The World's Most Popular Video Game Since
its commercial introduction in 1987, Tetris has
been established as the largest selling and most
recognized global brand in the history of the
interactive game software industry. Simple,
entertaining, and yet challenging, Tetris can be
found on more than 60 platforms. Over 65 million
Tetris units have been sold worldwide to date.
                                                
 
8
Cross compilation
Example A C cross compiler from x86 to PPC
A cross compiler is a compiler which runs on one
machine (the host machine) but emits code for
another machine (the target machine).
x86
Q Are cross compilers useful? Why would/could we
use them?
9
Two Stage Compilation
A two-stage translator is a composition of two
translators. The output of the first translator
is provided as input to the second translator.
x86
10
Compiling a Compiler
Observation A compiler is a program! Therefore
it can be provided as input to a language
processor.Example compiling a compiler.
11
Interpreters
An interpreter is a language processor
implemented in software, i.e. as a
program. Terminology abstract (or virtual)
machine versus real machine Example The Java
Virtual Machine
JVM x86
x86
Q Why are abstract machines useful?
12
Interpreters
Q Why are abstract machines useful? 1) Abstract
machines provide better platform independence
JVM x86
JVM PPC
x86
PPC
13
Interpreters
Q Why are abstract machines useful? 2) Abstract
machines are useful for testing and
debugging. Example Testing the Ultima
processor using hardware emulation
?
Ultima x86
Ultima
x86
Functional equivalence
Note we dont have to implement Ultima emulator
in x86 we can use a high-level language and
compile it.
14
Interpreters versus Compilers
Q What are the tradeoffs between compilation and
interpretation?
  • Compilers typically offer more advantages when
  • programs are deployed in a production setting
  • programs are repetitive
  • the instructions of the programming language are
    complex
  • Interpreters typically are a better choice when
  • we are in a development/testing/debugging stage
  • programs are run once and then discarded
  • the instructions of the language are simple
  • the execution speed is overshadowed by other
    factors
  • e.g. on a web server where communications costs
    are much higher than execution speed

15
Interpretive Compilers
  • Why?
  • A tradeoff between fast(er) compilation and a
    reasonable runtime performance.
  • How?
  • Use an intermediate language
  • more high-level than machine code gt easier to
    compile to
  • more low-level than source language gt easy to
    implement as an interpreter
  • Example A Java Development Kit for machine M

Java-gtJVM
JVM M
M
16
Interpretive Compilers
Example Here is how we use our Java Development
Kit to run a Java program P
JVM M
M
M
17
Portable Compilers
Example Two different Java Development Kits
Kit 1
JVM M
Kit 2
JVM M
Q Which one is more portable?
18
Portable Compilers
  • In the previous example we have seen that
    portability is not an all or nothing kind of
    deal.
  • It is useful to talk about a degree of
    portability as the percentage of code that
    needs to be re-written when moving to a
    dissimilar machine.
  • In practice 100 portability is as good as
    impossible.

19
Example a portable compiler kit
Portable Compiler Kit
JVM Java
Q Suppose we want to run this kit on some
machine M. How could we go about realizing that
goal? (with the least amount of effort)
20
Example a portable compiler kit
JVM Java
Q Suppose we want to run this kit on some
machine M. How could we go about realizing that
goal? (with the least amount of effort)
JVM M
M
21
Example a portable compiler kit
This is what we have now
JVM Java
JVM M
Now, how do we run our Tetris program?
22
Bootstrapping
Remember our portable compiler kit
JVM Java
JVM M
23
Bootstrapping
Q What can we do with a compiler written in
itself? Is that useful at all?
Same language!
  • By implementing the compiler in (a subset of) its
    own language, we become less dependent on the
    target platform gt more portable implementation.
  • But chicken and egg problem? How do to get
    around that?
  • gt BOOTSTRAPPING requires some work to make the
    first egg.
  • There are many possible variations on how to
    bootstrap a compiler written in its own language.

24
Bootstrapping an Interpretive Compiler to
Generate M code
Our portable compiler kit
JVM Java
JVM M
M
25
Bootstrapping an Interpretive Compiler to
Generate M code (first approach)
Step 1 implement
by rewriting
Step 2 compile it
JVM M
M
Step 3 Use this to compile again
26
Bootstrapping an Interpretive Compiler to
Generate M code (first approach)
Step 3 Self compile the Java (in Java) compiler
JVM M
M
27
Bootstrapping an Interpretive Compiler to
Generate M code (second approach)
Idea we will build a two-stage Java -gt M
compiler.
We will make this by compiling
To get this we implement
and compile it
28
Bootstrapping an Interpretive Compiler to
Generate M code (second approach)
Step 1 implement
Step 2 compile it
JVM M
M
Step 3 compile this
29
Bootstrapping an Interpretive Compiler to
Generate M code (second approach)
Step 3 Self compile the JVM (in JVM) compiler
JVM M
M
30
Bootstrapping an Interpretive Compiler to
Generate M code
Step 4 Compile the Java-gtJVM compiler into
machine code
M
We are DONE!
31
Full Bootstrap
A full bootstrap is necessary when we are
building a new compiler from scratch. Example We
want to implement an Ada compiler for machine M.
We dont currently have access to any Ada
compiler (not on M, nor on any other
machine). Idea Ada is very large, we will
implement the compiler in a subset of Ada and
bootstrap it from a subset of Ada compiler in
another language. (e.g. C)
Step 1 build a compiler for Ada-S in another
language
32
Full Bootstrap
33
Full Bootstrap
Q Is it hard to rewrite the compiler in Ada-S?
We are now no longer dependent on the
availability of a C compiler!
34
Full Bootstrap
Step 3a Build a full Ada compiler in Ada-S
Step 3b Compile with v2 compiler
M
From this point on we can maintain the compiler
in Ada. Subsequent versions v4,v5,... of the
compiler in Ada and compile each with the the
previous version.
35
Half Bootstrap
We discussed full bootstrap which is required
when we have no access to a compiler for our
language at all. Q What if we have access to an
compiler for our language on a different machine
HM but want to develop one for TM ?
We have
We want
Idea We can use cross compilation from HM to TM
to bootstrap the TM compiler.
36
Half Bootstrap
Idea We can use cross compilation from HM to M
to bootstrap the M compiler.
Step 1 Implement Ada-gtTM compiler in Ada
Step 2 Compile on HM
Ada-gtTM
HM
HM
37
Half Bootstrap
Step 3 Cross compile our TM compiler.
Ada-gtTM
TM
HM
From now on we can develop subsequent versions of
the compiler completely on TM
38
Bootstrapping to Improve Efficiency
The efficiency of programs and compilers Efficien
cy of programs - memory usage -
runtime Efficiency of compilers - Efficiency of
the compiler itself - Efficiency of the emitted
code
Idea We start from a simple compiler (generating
inefficient code) and develop more sophisticated
version of it. We can then use bootstrapping to
improve performance of the compiler.
39
Bootstrapping to Improve Efficiency
We have
We implement
40
Conclusion
  • To write a good compiler you may be writing
    several simpler ones first
  • You have to think about the source language, the
    target language and the implementation language.
  • Strategies for implementing a compiler
  • Write it in machine code
  • Write it in a lower level language and compile it
    using an existing compiler
  • Write it in the same language that it compiles
    and bootstrap
  • The work of a compiler writer is never finished,
    there is always version 1.x and version 2.0 and

41
Compilation
  • So far we have treated language processors
    (including compilers) as black boxes
  • Now we take a first look "inside the box" how
    are compilers built.
  • And we take a look at the different phases and
    their relationships

42
The Phases of a Compiler
Source Program
Syntax Analysis
Error Reports
Abstract Syntax Tree
Contextual Analysis
Error Reports
Decorated Abstract Syntax Tree
Code Generation
Object Code
43
Different Phases of a Compiler
  • The different phases can be seen as different
    transformation steps to transform source code
    into object code.
  • The different phases correspond roughly to the
    different parts of the language specification
  • Syntax analysis lt-gt Syntax
  • Contextual analysis lt-gt Contextual constraints
  • Code generation lt-gt Semantics

44
Example Program
  • We now look at each of the three different phases
    in a little more detail. We look at each of the
    steps in transforming an example Triangle program
    into TAM code.

! This program is useless except for!
illustrationlet var n integer var c
charin begin c n n1end
45
1) Syntax Analysis
Source Program
Syntax Analysis
Error Reports
Abstract Syntax Tree
Note Not all compilers construct an explicit
representation of an AST. (e.g. on a single pass
compiler generally no need to construct an AST)
46
1) Syntax Analysis -gt AST
Program
LetCommand
SequentialCommand
SequentialDeclaration
AssignCommand
AssignCommand
BinaryExpr
VarDecl
Char.Expr
VNameExp
Int.Expr
SimpleT
SimpleV
SimpleV
Ident
Ident
Ident
Ident
Ident
Ident
Ident
Op
Char.Lit
Int.Lit
n Integer c Char c n n 1
47
2) Contextual Analysis -gt Decorated AST
Abstract Syntax Tree
Contextual Analysis
Error Reports
Decorated Abstract Syntax Tree
  • Contextual analysis
  • Scope checking verify that all applied
    occurrences of identifiers are declared
  • Type checking verify that all operations in the
    program are used according to their type rules.
  • Annotate AST
  • Applied identifier occurrences gt declaration
  • Expressions gt Type

48
2) Contextual Analysis -gt Decorated AST
Program
LetCommand
SequentialCommand
SequentialDeclaration
AssignCommand
int
AssignCommand
BinaryExpr
VarDecl
Char.Expr
VNameExp
Int.Expr
char
int
int
SimpleT
SimpleV
SimpleV
char
int
Ident
Ident
Ident
Ident
Ident
Ident
Ident
Op
Char.Lit
Int.Lit
n
c
n
n
Integer
Char
c


1
49
Contextual Analysis
  • Finds scope and type errors.

Example 1
AssignCommand
TYPE ERROR (incompatible types in
assigncommand)
char
int
Example 2
foo not found
SimpleV
SCOPE ERROR undeclared variable foo
Ident
foo
50
3) Code Generation
Decorated Abstract Syntax Tree
Code Generation
Object Code
  • Assumes that program has been thoroughly checked
    and is well formed (scope type rules)
  • Takes into account semantics of the source
    language as well as the target language.
  • Transforms source program into target code.

51
3) Code Generation
let var n integer var c charin begin c
n n1end
PUSH 2LOADL 38STORE 1SBLOAD 0LOADL 1CALL
addSTORE 0SBPOP 2HALT
address 0SB
Ident
Ident
n
Integer
52
Compiler Passes
  • A pass is a complete traversal of the source
    program, or a complete traversal of some internal
    representation of the source program.
  • A pass can correspond to a phase but it does
    not have to!
  • Sometimes a single pass corresponds to several
    phases that are interleaved in time.
  • What and how many passes a compiler does over the
    source program is an important design decision.

53
Single Pass Compiler
A single pass compiler makes a single pass over
the source text, parsing, analyzing and
generating code all at once.
Dependency diagram of a typical Single Pass
Compiler
Compiler Driver
calls
Syntactic Analyzer
calls
calls
Contextual Analyzer
Code Generator
54
Multi Pass Compiler
A multi pass compiler makes several passes over
the program. The output of a preceding phase is
stored in a data structure and used by subsequent
phases.
Dependency diagram of a typical Multi Pass
Compiler
Compiler Driver
calls
calls
calls
Syntactic Analyzer
Contextual Analyzer
Code Generator
55
Example The Triangle Compiler Driver
public class Compiler public static void
compileProgram(...) Parser parser new
Parser(...) Checker checker new
Checker(...) Encoder generator new
Encoder(...) Program theAST
parser.parse() checker.check(theAST) generator
.encode(theAST) public void
main(String args) ... compileProgram(...)
...
56
Compiler Design Issues
Single Pass
Multi Pass
Speed Memory Modularity Flexibility Global
optimization Source Language
better
worse
better for large programs
(potentially) better for small programs
worse
better
better
worse
impossible
possible
single pass compilers are not possible for many
programming languages
57
Language Issues
  • Example Pascal
  • Pascal was explicitly designed to be easy to
    implement with a single pass compiler
  • Every identifier must be declared before it is
    first use.

?
procedure incbegin nn1end var ninteger
var ninteger procedure incbegin nn1end
Undeclared Variable!
58
Language Issues
  • Example Pascal
  • Every identifier must be declared before it is
    used.
  • How to handle mutual recursion then?

procedure ping(xinteger)begin ... pong(x-1)
...end procedure pong(xinteger)begin ...
ping(x) ...end
59
Language Issues
  • Example Pascal
  • Every identifier must be declared before it is
    used.
  • How to handle mutual recursion then?

forward procedure pong(xinteger) procedure
ping(xinteger)begin ... pong(x-1)
...end procedure pong(xinteger)begin ...
ping(x) ...end
OK!
60
Language Issues
  • Example Java
  • identifiers can be declared before they are used.
  • thus a Java compiler need at least two passes

Class Example void inc() n n 1 int
n void use() n 0 inc()
61
Scope of Variable
  • Range of program that can reference that variable
    (ie access the corresponding data object by the
    variables name)
  • Variable is local to program or block if it is
    declared there
  • Variable is nonlocal to program unit if it is
    visible there but not declared there

62
Static vs. Dynamic Scope
  • Under static, sometimes called lexical, scope,
    sub1 will always reference the x defined in big
  • Under dynamic scope, the x it references depends
    on the dynamic state of execution
  • procedure big
  • var x integer
  • procedure sub1
  • begin sub1
  • ... x ...
  • end sub1
  • procedure sub2
  • var x integer
  • begin sub2
  • ...
  • sub1
  • ...
  • end sub2

begin big ... sub1 sub2
... end big
63
Static Scoping
  • Scope computed at compile time, based on program
    text
  • To determine the name of a used variable we must
    find statement declaring variable
  • Subprograms and blocks generate hierarchy of
    scopes
  • Subprogram or block that declares current
    subprogram or contains current block is its
    static parent
  • General procedure to find declaration
  • First see if variable is local if yes, done
  • If non-local to current subprogram or block
    recursively search static parent until
    declaration is found
  • If no declaration is found this way, undeclared
    variable error detected

64
Example
  • begin main
  • x
  • end main
  • program main
  • var x integer
  • procedure sub1
  • var x integer
  • begin sub1
  • x
  • end sub1

65
Dynamic Scope
  • Now generally thought to have been a mistake
  • Main example of use original versions of LISP
  • Scheme uses static scope
  • Perl allows variables to be declared to have
    dynamic scope
  • Determined by the calling sequence of program
    units, not static layout
  • Name bound to corresponding variable most
    recently declared among still active subprograms
    and blocks

66
Example
  • program main
  • var x integer
  • procedure sub1
  • begin sub1
  • x
  • end sub1
  • procedure sub2
  • var x integer
  • begin sub2
  • call sub1
  • end sub2
  • call sub2
  • end main

67
Binding
  • Binding an association between an attribute and
    its entity
  • Binding Time when does it happen?
  • and, when can it happen?

68
Binding of Data Objects and Variables
  • Attributes of data objects and variables have
    different binding times
  • If a binding is made before run time and remains
    fixed through execution, it is called static
  • If the binding first occurs or can change during
    execution, it is called dynamic

69
Binding Time
  • Static
  • Language definition time
  • Language implementation time
  • Program writing time
  • Compile time
  • Link time
  • Load time
  • Dynamic
  • Run time
  • At the start of execution (program)
  • On entry to a subprogram or block
  • When the expression is evaluated
  • When the data is accessed

70
X X 10
  • Set of types for variable X
  • Type of variable X
  • Set of possible values for variable X
  • Value of variable X
  • Scope of X
  • lexical or dynamic scope
  • Representation of constant 10
  • Value (10)
  • Value representation (10102)
  • big-endian vs. little-endian
  • Type (int)
  • Storage (4 bytes)
  • stack or global allocation
  • Properties of the operator
  • Overloaded or not

71
Little- vs. Big-Endians
  • Big-endian
  • A computer architecture in which, within a given
    multi-byte numeric representation, the most
    significant byte has the lowest address (the word
    is stored big-end-first').
  • Motorola and Sun processors
  • Little-endian
  • a computer architecture in which, within a given
    16- or 32-bit word, bytes at lower addresses have
    lower significance (the word is stored
    little-end-first').
  • Intel processors

from The Jargon Dictionary - http//info.astrian.n
et/jargon
72
Binding Times summary
  • Language definition time
  • language syntax and semantics, scope discipline
  • Language implementation time
  • interpreter versus compiler,
  • aspects left flexible in definition,
  • set of available libraries
  • Compile time
  • some initial data layout, internal data
    structures
  • Link time (load time)
  • binding of values to identifiers across program
    modules
  • Run time (execution time)
  • actual values assigned to non-constant
    identifiers

The Programming language designer and compiler
implementer have to make decisions about binding
times
73
Syntax Design Criteria
  • Readability
  • syntactic differences reflect semantic
    differences
  • verbose, redundant
  • Writeability
  • concise
  • Ease of verifiability
  • simple semantics
  • Ease of translation
  • simple language
  • simple semantics
  • Lack of ambiguity
  • dangling else
  • Fortrans A(I,J)

74
Lexical Elements
  • Character set
  • Identifiers
  • Operators
  • Keywords
  • Noise words
  • Elementary data
  • numbers
  • integers
  • floating point
  • strings
  • symbols
  • Delimiters
  • Comments
  • Blank space
  • Layout
  • Free- and fixed-field formats

75
Some nitty gritty decisions
  • Primitive data
  • Integers, floating points, bit strings
  • Machine dependent or independent (standards like
    IEEE)
  • Boxed or unboxed
  • Character set
  • ASCII, EBCDIC, UNICODE
  • Identifiers
  • Length, special start symbol (,...), type
    encode in start letter
  • Operator symbols
  • Infix, prefix, postfix, precedence
  • Comments
  • REM, / /, //, !,
  • Blanks
  • Delimiters and brackets
  • Reserved words or Keywords

76
Syntactic Elements
  • Definitions
  • Declarations
  • Expressions
  • Statements
  • Separate subprogram definitions (Module system)
  • Separate data definitions
  • Nested subprogram definitions
  • Separate interface definitions

77
Overall Program Structure
  • Subprograms
  • shallow definitions
  • C
  • nested definitions
  • Pascal
  • Data (OO)
  • shallow definitions
  • C, Java, Smalltalk
  • Separate Interface
  • C, Fortran
  • ML, Ada
  • Mixed data and programs
  • C
  • Basic
  • Others
  • Cobol
  • Data description separated from executable
    statements
  • Data and procedure division

78
Some more Programming Language Design Issues
  • A Programming model (sometimes called the
    computer) is defined by the language semantics
  • More about this in the semantics course
  • Programming model given by the underlying system
  • Hardware platform and operating system
  • The mapping between these two programming models
    (or computers) that the language processing
    system must define can be influenced in both
    directions
  • E.g. low level features in high level languages
  • Pointers, arrays, for-loops
  • Hardware support for fast procedure calls

79
Programming Language Implementation
  • Develop layers of machines, each more primitive
    than the previous
  • Translate between successive layers
  • End at basic layer
  • Ultimately hardware machine at bottom
  • To design programming languages and compilers, we
    thus need to understand a bit about computers -)

80
Why So Many Computers?
  • It is economically feasible to produce in
    hardware (or firmware) only relatively simple
    computers
  • More complex or abstract computers are built in
    software
  • There are exceptions
  • EDS machine to run prolog (or rather WAM)
  • Alice Machine to run Hope

81
Machines
  • Hardware computer built out of wires, gates,
    circuit boards, etc.
  • An elaboration of the Von Neumann Machine
  • Software simulated computer that implemented in
    software, which runs on top of another computer

Von Neumann Machine
  • Data
  • Primitive Operations
  • Sequence Control
  • Data Access
  • Storage Management
  • Operating Environment

82
Memory and data
  • Memory
  • Registers
  • PC, data or address
  • Main memory (fixed length words 32 or 64 bits)
  • Cache
  • External
  • Disc, CD-ROM, memory stick, tape drives
  • Order of magnitude in access speed
  • Nanoseconds vs. milliseconds
  • Built-in data types
  • integers, floating point, fixed length strings,
    fixed length bit strings

83
Hardware computer
  • Operations
  • Arithmetic on primitive data
  • Tests (test for zero, positive or negative)
  • Primitive access and modification
  • Jumps (unconditional, conditional, return)
  • Sequence control
  • Next instruction in PC (location counter)
  • Some instructions modify PC
  • Data access
  • Reading and writing
  • Words from main memory, Blocks from external
    storage
  • Storage management
  • Wait for data or multi-programming
  • Paging
  • Cache (32K usually gives 95 hit rate)

84
Virtual Computers
  • How can we execute programs written in the
    high-level computer, given that all we have is
    the low-level computer?
  • Compilation
  • Translate instructions to the high-level computer
    to those of the low-level
  • Simulation (interpretation)
  • create a virtual machine
  • Sometimes the simulation is done by hardware
  • This is called firmware

85
Micro Program interpretation and execution
Fetch next instruction
Decode instruction Operation and operands
Fetch designated operands
Branch to designated operation
Execute Primitive Operation
Execute Primitive Operation
Execute Primitive Operation
Execute Primitive Operation
Execute halt
86
A Six-Level Computer
Applications
Level 5
Application Level
Compilers, Editors, Navigators
Software
Level 4
Assembly Language Level
Assembler, Linker, Loader
Level 3
Operating System Machine Level
Operating System
Level 2
Instruction Set Architecture Level
Hardware
Microprogram or hardware
Level 1
Microarchitecture Level
Hardware
Level 0
Digital Logic Level
from Andrew S. Tanenbaum, Structured Computer
Organization, 4th Edition, Prentice Hall, 1999.
87
Keep in mind
  • There are many issues influencing the design of a
    new programming language
  • Choice of paradigm
  • Syntactic preferences
  • Even the compiler implementation
  • e.g no of passes
  • available tools
  • There are many issues influencing the design of
    new compiler
  • No of passes
  • The source, target and implementation language
  • Available tools

88
Some advice from an expert
  • Programming languages are for people
  • Design for yourself and your friends
  • Give the programmer as much control as possible
  • Aim for brevity
Write a Comment
User Comments (0)
About PowerShow.com