Title: Introduction to Compilers
1Introduction to Compilers
2Related Area
- Programming languages
- Machine architecture
- Language theory
- Algorithms
- Data structures
- Operating systems
- Software engineering
3Compilers
- A compiler is a program that reads a program
written in one language the source language
and - translate it into an equivalent program in
another language - the target language. - Early compilers - 1950s
4Machine Language, AssemblyLanguage, High-Level
Language
- Machine language is the native language of the
computer on which the program is run. - Native code
- It consists of bit strings which are interpreted
by the mechanism inside the computer. - Example in IBM 370
- Binary 0001100000110101
- Hexadecimal 1835
- Copy the content of Register 5 into Register 3
- LR 3, 5
- Assembler, assembly language
5Machine Language, AssemblyLanguage, High-Level
Language
- Example
- High-level language
- X Y Z
- Assembly language
- L 3, Y Load the working register with Y
- A 3, Z Add Z
- ST 3, X Store the result in X
6Terminology
- Source language
- Java, C, C
- Object language
- Machine language
- Object code
- Object file, object module
- Target machine
- The computer on which the program is to be run
7Terminology
- Cross compiler
- A compiler that generates code for a machine that
is different from the machine on which the
compiler runs. - Example
- A compiler which can be run on a IBM PC but which
compiles to the machine language of a
special-purpose embedded system.
8Compilers and Interpreters
- Compiler
- Translates the high-level program to the target
program. - Interpreter
- Executes the program.
9The Environment of theCompiler
10The Environment of theCompiler
- Example
- COMP myprog Compiles the program
- LINK myprog Links the program
- RUN myprog Runs the program
11Phases of a Compiler
- Five (six) phases of compilation
- Lexical analysis
- Syntactic analysis
- (Semantic analysis)
- Intermediate code generation
- Optimization
- Object code generation
12Phases of a CompilerLanguage Processing System
13Phases of a Compiler
14Two Parts of Compilation
- Analysis
- breaks up the source program
- creates an intermediate representation
- Synthesis
- constructs the desired target program from the
intermediate representation
15Analysis
- Lexical Analysis
- linear analysis, scanning
- Syntax Analysis
- parsing, hierarchical analysis
- Semantic Analysis
- Intermediate Code Generation
- Advantage of dividing analysis
- simple design
- compiler efficiency
- compiler portability
16Analysis of the SourceProgram
- Lexical analysis (linear analysis)
- the streams of characters making up the source
program is read from left-to-right and grouped
into tokens - Syntax analysis (hierarchical analysis)
- characters or tokens are grouped hierarchically
into nested collections with collective meaning - Semantic analysis
- certain checks are performed to ensure that the
components of a program fit together meaningfully
17Lexical Analysis
- Linear analysis, scanning
- Reads the stream of characters in the source
program from left to right, and groups into
tokens - Tokens
- are sequences of characters having a collective
meaning
18Example Lexical Analysis
- position initial rate 60
- id1 id2 id3 60
19Syntax Analysis
- Parsing
- Hierarchical analysis
- Groups the tokens of the source program into
grammatical phrases represented by parse tree
that are used by the compiler to synthesize
output.
20Example Syntax Analysis
21Semantic Analysis
- Checks the source program for semantic errors and
gathers type or semantic information for the
subsequent code generation phase
22Example Semantic Analysis
23Error Handler
- When each phases of compilation encounters error,
a phase must somehow deal with that error. - Error in Lexical Phase
- The characters in the input do not form any token
of the language. - Error in Syntax Phase
- The token stream violates the structure rules
(syntax) of the language. - Error in Semantic Phase
- Constructs have the right syntactic structure,
but no meaning to the operation involved.
24S/W ToolsPerforming Analysis
- Structure editors
- a sequence of command gt a source program
- Pretty printers
- indentation, fonts
- Static checkers
- a program gt discover bugs without run
- Interpreters
- performing operations
25Performing Analysis
- Text formatters
- typeset text
- Silicon compilers
- circuit design
- Query interpreters
- DB
26Intermediate Code Generation
- Explicit intermediate representation
- A program for an abstract machine
- Two properties of intermediate code
- Easy to produce
- Easy to translate into the target program
- Intermediate form
- Three address form (quadruples, triples)
- Two address form
27Three-Address Code
- Has at most three operands
- Each three-address instruction has at most one
operator in addition to the assignment - The compiler must generate a temporary name to
hold the value computed by each instruction - May have fewer than 3 operands
28Example Three-Address Code
29ExampleIntermediate Code
30Synthesis Part
- The synthesis part constructs the desired target
program from the intermediate representation.
31Code Optimization
- Improve the intermediate code to get the
fast-running machine code - Optimizing compiler
32Example Code Optimization
33Code Generation
- Generates the target codes
- re-locatable machine code
- assembly code
34Example Code Generation
35System Support
- There is a certain amount of supporting code to
be supplied to the compilation. - Symbol table management
- Error handling
36System Support
- Symbol table handler
- The central repository of information about the
names or identifiers in the program - Error handling
- Implements the compilers response to errors in
the code it is compiling. - Diagnostics
- Where the error was found and what kind of error
it was
37Passes, Front End, Back End
- The compiler makes one or more passes through the
program. - A pass consists of reading a version of the
program from a file and writing a new version of
it to an output file. - A pass normally comprises more than one phase,
but the number of passes, and the phases they
cover, varies.
38Passes, Front End, Back End
- Front End
- Dependent on the source language and have little
or no concern with the target machine - Lexical analysis
- (Semantic analysis)
- Intermediate code generation
- Back End
- Machine-dependent
- Code optimization
- Target code generation
39Writing a Compiler
- The first compiler was written in assembly
language there was no other alternative. - High-level language compilers
- Cross compiler
- Useful tools compiler compilers
- Lex
- Yacc
40Retargetable Compilers
- In many cases, a compiler writer will want to
adapt a compiler for use with a new target - A compiler that can be modified in this way is
said to be retargetable. - Cross compiler
- Alternative approaches
- Distinction between Front End and Back End
- Compiler for imaginary machine (virtual machine)
41Cousins of the Compiler
- Preprocessors
- Assemblers
- Loaders and Link-Editors
42Preprocessor
- A preprocessor is a simple translator that is
applied to the source program before it is
submitted to the compiler. - Before the program is compiled it is passed
through the preprocessor, which replaces all
occurrences of the pre-defined expression with
the defined sequence of instructions. - Example
43Functions of Preprocessors
- Macro processing
- File inclusion
- Language extensions
- DB query languages embedded in highlevel
languages
44Example Preprocessors
- The C Programming Language
45Assemblers
- Assemblers
- Two-pass assembly
- Loaders and Link-Editors
46Assemblers
- Assembly code
- a mnemonic version of machine code, in which
- names are used instead of binary codes for
operations, and - names are also given to memory addresses
47Two-Pass Assembly(1)
- In the first pass,
- All the identifiers that denote storage locations
are found and stored in a symbol table - Identifiers are assigned storage locations as
they are encountered for the first time - Example b a 2
48Two-Pass Assembly (2)
- In the second pass,
- The assembler scans the input again.
- It translates each operation code into the
sequence of bits representing that operation in
machine language - It translates each identifier representing a
location into the address given for that
identifier in the symbol table - The output of the second pass is usually
relocatable machine code
49Example Relocatable Addresses
- Altering the relocatable address to absolute or
unrelocatable machine code - Suppose that the address space containing the
data is to be loaded starting at location L
00001111 - L must be added to the address of the instruction
50Loaders and Link-Editors
- Loader
- Performs the two functions of loading and
link-editing - Loading
- Consists of taking relocatable machine code,
- altering the relocatable addresses, and
- placing the altered instructions and data in
memory at the proper locations
51Link-Editors
- Link-editor
- Allows us to make a single program from several
files of relocatable machine code. - These files may have been the result of several
different compilations, and - One or more may be library files.
- External references
- In which the code of one file refers to a
location in another file.
52Summary
- A quick overall picture of
- what a compiler does,
- what goes into it, and
- how it is organized