Title: Compiler Construction
1Compiler Construction
- Main Source
- Compiler Construction Lecture Notes,
- Prof Trevor Mudge and Prof Mark Hodges
- University of Michigan
2Lexical Analyzer generators Lex/Flex
- Lex helps write programs whose control flow is
directed by instances of regular expressions in
the input stream. - Flex is a Fast scanner generator tool
- Used for Automatic generation of scanners
- Hand-coded ones are faster
- But tedious to write, and error prone!
- Lex/Flex
- Given a specification of regular expressions
- Generate a table driven FSA
- Output is a C program that you compile to produce
your scanner
3Lexical Analyzer generators Lex/Flex
- Lex source is a table of regular expressions and
corresponding program fragments. - The table is translated to a program which reads
an input stream, copying it to an output stream
and partitioning the input into strings which
match the given expressions. - As each such string is recognized the
corresponding program fragment is executed. - The recognition of the expressions is performed
by a deterministic finite automaton (DFA)
generated by Lex. - The program fragments written by the user are
executed in the order in which the corresponding
regular expressions occur in the input stream.
4How Does Lex Work?
FLEX
Regular Expressions
C code
Some kind of DFAs and NFAs Activity taking place
in the box
5How Does Lex Work?
Flex
RE ? NFA NFA ? DFA Optimize DFA
REs for Tokens
DFA Simulation
Character Stream
Token stream (and errors)
6Regular Expression to NFA
- It is possible to construct an NFA from a regular
expression using an algorithm. For example- - Thompsons Construction algorithm builds the NFA
inductively - Defines rules for each base RE
- Combines rules for more complex REs
general machine
s
f
E
more on this in supplementary lectures
7Syntax Analysis Parser
- Checks input stream for syntactic correctness
- Framework for subsequent semantic processing
- Implemented as a push down automaton (PDA)
- A pushdown automaton is a finite state automaton
that can make use of a stack containing data in a
binary form. - Lots of variations
- Hand coded,
- Table driven (top-down or bottom-up)
- For any non-trivial language, writing a correct
parser is a challenge!
8Parser Generator Tool Yacc/Bison(yet another
compiler-compiler)
- Yacc
- (yet another compiler-compiler)
- Given a context free grammar
- Generates a parser for that language (again a C
program) - Bison is a general-purpose parser generator that
converts a grammar description for a context-free
grammar into a C program to parse that grammar.
9Static Semantic Analysis
- Involves several distinct actions to perform
- Check definition of identifiers, ascertain that
the usage is correct - Disambiguate overloaded operators
- Translate from source to IR (intermediate
representation) - The standard formalism to define the application
of semantic rules is - the Attribute Grammar (AG)
10Static Semantic Analysis Attribute Grammar (AG)
- AG is graph that provides for the migration of
information around the parse tree. - AG defines the information that will need to be
in the parse tree in order to successfully
perform semantic analysis. - This information is stored as attributes of the
nodes of the tree.
11Revisit the General Structure of a Modern Compiler
Source Program
Lexical Analysis
Scanner
Syntax Analysis
Parser
Context Symbol Table CFG
Front end
Build high-level IR
Semantic Analysis
High-level IR to low-level IR conversion
Controlflow/Dataflow
Optimization
Back end
Code Generation
Assembly Code
Machine independent asm to machine dependent
12Front-end
- So far we look at some of the basic aspects of
front end compiler phases dealing with - Statements, loops, etc
- These statements should then be broken down into
multiple assembly statements.
13Backend
- Machine independent assembly code involves
- 3-address code (TAC)
- Each TAC can be described as a quadruple
(operator, operand1, operand2, result). - Each statement has the general form of
- x y op Z
- where x, y and z are variables, constants or
temporary variables generated by the compiler, op
represents any operator, e.g. an arithmetic
operator. - Infinite virtual registers, infinite resources
- Standard opcode repertoire
- load/store architecture
- Goals
- Optimize code quality
- Map application to real hardware
14Dataflow and Control flow Analysis
- Provide the necessary information about variable
usage and execution behavior to determine when a
transformation is legal/illegal - Dataflow analysis
- Is a process for collecting run-time information
about data in programs without actually executing
them. - Identify when variables contain interesting
values. - Which instructions created values or consume
values
15Control Flow Analysis
- Execution behavior caused by control statements
- Ifs, for/while loops, gotos
- Uses Control flow graph (CFG)
- Source (http//en.wikipedia.org/wiki/Control_flow
_graph) - An abstract data structure representation of a
program, - maintained internally by a compiler.
- Each node in the graph represents a basic block,
i.e. a straight-line piece of code without any
jumps or jump targets - jump targets start a block, and jumps end a
block. Directed edges are used to represent jumps
in the control flow.
16Optimization
- Is about how to make the code go faster.
Alternative optimizers are - Classical optimizations which involve
- Dead code elimination remove useless code
- Common sub-expression elimination recomputing
the same thing multiple times - Machine independent
- Useful for almost all architectures
- Machine dependent
- Depends on processor architecture
- Memory system, branches dependencies
17Code Generation
- Is the mapping of machine independent assembly
code to the target architecture. - Takes care of virtual to physical binding-
- Instruction selection
- Register allocation infinite virtual registers
to N physical registers - Scheduling binding to resources
- Assembly emission
- Machine assembly is our output,
- Assembler, linker will then take over to create
binary