Languages and Compilers (SProg og Overs - PowerPoint PPT Presentation

About This Presentation
Title:

Languages and Compilers (SProg og Overs

Description:

Programming Language ... change in language structure Implementability ensure a translator or interpreter can be written Tennent s Language Design principles ... – PowerPoint PPT presentation

Number of Views:283
Avg rating:3.0/5.0
Slides: 72
Provided by: csAauDkb6
Category:

less

Transcript and Presenter's Notes

Title: Languages and Compilers (SProg og Overs


1
Languages and Compilers(SProg og
Oversættere)Lecture 15 (2)
  • Bent Thomsen
  • Department of Computer Science
  • Aalborg University

With acknowledgement to Norm Hutchinson whose
slides this lecture is based on.
2
Curricula (Studieordning)
The purpose of the course is for the student to
gain knowledge of important principles in
programming languages and for the student to gain
an understanding of techniques for describing and
compiling programming languages.
3
What was this course about?
  • Programming Language Design
  • Concepts and Paradigms
  • Ideas and philosophy
  • Syntax and Semantics
  • Compiler Construction
  • Tools and Techniques
  • Implementations
  • The nuts and bolts

4
The principal paradigms
  • Imperative Programming (C)
  • Object-Oriented Programming (C)
  • Logic/Declarative Programming (Prolog)
  • Functional/Applicative Programming (Lisp)
  • New paradigms?
  • Agent Oriented Programming
  • Business Process Oriented (Web computing)
  • Grid Oriented
  • Aspect Oriented Programming

5
Criteria in a good language design
  • Readability
  • understand and comprehend a computation easily
    and accurately
  • Write-ability
  • express a computation clearly, correctly,
    concisely, and quickly
  • Reliability
  • assures a program will not behave in unexpected
    or disastrous ways
  • Orthogonality
  • A relatively small set of primitive constructs
    can be combined in a relatively small number of
    ways
  • Every possible combination is legal
  • Lack of orthogonality leads to exceptions to
    rules

6
Criteria (Continued)
  • Uniformity
  • similar features should look similar and behave
    similar
  • Maintainability
  • errors can be found and corrected and new
    features added easily
  • Generality
  • avoid special cases in the availability or use of
    constructs and by combining closely related
    constructs into a single more general one
  • Extensibility
  • provide some general mechanism for the user to
    add new constructs to a language
  • Standardability
  • allow programs to be transported from one
    computer to another without significant change in
    language structure
  • Implementability
  • ensure a translator or interpreter can be written

7
Tennents Language Design principles
8
Important!
  • Syntax is the visible part of a programming
    language
  • Programming Language designers can waste a lot of
    time discussing unimportant details of syntax
  • The language paradigm is the next most visible
    part
  • The choice of paradigm, and therefore language,
    depends on how humans best think about the
    problem
  • There are no right models of computations just
    different models of computations, some more
    suited for certain classes of problems than
    others
  • The most invisible part is the language semantics
  • Clear semantics usually leads to simple and
    efficient implementations

9
Levels of Programming Languages
High-level program
class Triangle ... float surface()
return bh/2
Low-level program
LOAD r1,b LOAD r2,h MUL r1,r2 DIV r1,2 RET
Executable Machine code
0001001001000101001001001110110010101101001...
10
Terminology
Q Which programming languages play a role in
this picture?
Translator
input
output
source program
object program
A All of them!
11
Tombstone Diagrams
  • What are they?
  • diagrams consisting out of a set of puzzle
    pieces we can use to reason about language
    processors and programs
  • different kinds of pieces
  • combination rules (not all diagrams are well
    formed)

12
Syntax Specification
  • Syntax is specified using Context Free
    Grammars
  • A finite set of terminal symbols
  • A finite set of non-terminal symbols
  • A start symbol
  • A finite set of production rules
  • A CFG defines a set of strings
  • This is called the language of the CFG.

13
Backus-Naur Form
  • Usually CFG are written in BNF notation.
  • A production rule in BNF notation is written as
  • N a where N is a non terminal
    and a a sequence of terminals and non-terminals
  • N a b ... is an abbreviation for
    several rules with N
  • as left-hand side.

14
Concrete and Abstract Syntax
  • The previous grammar specified the concrete
    syntax of Mini Mriangle.

The concrete syntax is important for the
programmer who needs to know exactly how to write
syntactically well-formed programs.
The abstract syntax omits irrelevant syntactic
details and only specifies the essential
structure of programs.
Example different concrete syntaxes for an
assignment v e (set! v e) e -gt v v e
15
Abstract Syntax Trees
  • Abstract Syntax Tree for dd10n

AssignmentCmd
BinaryExpression
BinaryExpression
VName
VNameExp
IntegerExp
VNameExp
SimpleVName
SimpleVName
SimpleVName
Int-Lit
Ident
Op
Ident
Ident
Op

10
d
n
d

16
Contextual Constraints
Syntax rules alone are not enough to specify the
format of well-formed programs.
Example 1 let const m2 in m x
Example 2 let const m2 var nBoolean in
begin n mlt4 n n1 end
17
Semantics
Specification of semantics is concerned with
specifying the meaning of well-formed programs.
  • Terminology
  • Expressions are evaluated and yield values (and
    may or may not perform side effects)
  • Commands are executed and perform side effects.
  • Declarations are elaborated to produce bindings
  • Side effects
  • change the values of variables
  • perform input/output

18
Phases of a Compiler
  • A compilers phases are steps in transforming
    source code into object code.
  • The different phases correspond roughly to the
    different parts of the language specification
  • Syntax analysis lt-gt Syntax
  • Contextual analysis lt-gt Contextual constraints
  • Code generation lt-gt Semantics

19
The Phases of a Compiler
Source Program
Syntax Analysis
Error Reports
Abstract Syntax Tree
Contextual Analysis
Error Reports
Decorated Abstract Syntax Tree
Code Generation
Object Code
20
Compiler Passes
  • A pass is a complete traversal of the source
    program, or a complete traversal of some internal
    representation of the source program.
  • A pass can correspond to a phase but it does
    not have to!
  • Sometimes a single pass corresponds to several
    phases that are interleaved in time.
  • What and how many passes a compiler does over the
    source program is an important design decision.

21
Single Pass Compiler
A single pass compiler makes a single pass over
the source text, parsing, analyzing and
generating code all at once.
Dependency diagram of a typical Single Pass
Compiler
Compiler Driver
calls
Syntactic Analyzer
calls
calls
Contextual Analyzer
Code Generator
22
Multi Pass Compiler
A multi pass compiler makes several passes over
the program. The output of a preceding phase is
stored in a data structure and used by subsequent
phases.
Dependency diagram of a typical Multi Pass
Compiler
Compiler Driver
calls
calls
calls
Syntactic Analyzer
Contextual Analyzer
Code Generator
23
Syntax Analysis
Dataflow chart
Source Program
Stream of Characters
Scanner
Error Reports
Stream of Tokens
Parser
Error Reports
Abstract Syntax Tree
24
Regular Expressions
  • RE are a notation for expressing a set of strings
    of terminal symbols.

Different kinds of RE e The empty
string t Generates only the string t X
Y Generates any string xy such that x is
generated by x and y is generated by Y X
Y Generates any string which generated either
by X or by Y X The concatenation of zero or
more strings generated by X (X) For grouping,
25
FA and the implementation of Scanners
  • Regular expressions, (N)DFA-e and NDFA and DFAs
    are all equivalent formalisms in terms of what
    languages can be defined with them.
  • Regular expressions are a convenient notation for
    describing the tokens of programming languages.
  • Regular expressions can be converted into FAs
    (the algorithm for conversion into NDFA-e is
    straightforward)
  • DFAs can be easily implemented as computer
    programs.

26
Parsing
  • Parsing Recognition determining phrase
    structure (for example by generating AST)
  • Different types of parsing strategies
  • bottom up
  • top down

27
Top-Down vs Bottom-Up parsing


28
Development of Recursive Descent Parser
  • (1) Express grammar in EBNF
  • (2) Grammar Transformations
  • Left factorization and Left recursion elimination
  • (3) Create a parser class with
  • private variable currentToken
  • methods to call the scanner accept and acceptIt
  • (4) Implement private parsing methods
  • add private parseN method for each non terminal
    N
  • public parse method that
  • gets the first token form the scanner
  • calls parseS (S is the start symbol of the
    grammar)

29
LL(1) Grammars
  • The presented algorithm to convert EBNF into a
    parser does not work for all possible grammars.
  • It only works for so called LL(1) grammars.
  • Basically, an LL(1) grammar is a grammar which
    can be parsed with a top-down parser with a
    lookahead (in the input stream of tokens) of one
    token.
  • What grammars are LL(1)?
  • How can we recognize that a grammar is (or is
    not) LL(1)?
  • We can deduce the necessary conditions from the
    parser generation algorithm.
  • We can use a formal definition

30
Converting EBNF into RD parsers
  • The conversion of an EBNF specification into a
    Java implementation for a recursive descent
    parser is so mechanical that it can easily be
    automated!
  • gt JavaCC Java Compiler Compiler

31
JavaCC and JJTree
32
LR parsing
  • The algorithm makes use of a stack.
  • The first item on the stack is the initial state
    of a DFA
  • A state of the automaton is a set of LR(0)/LR(1)
    items.
  • The initial state is constructed from productions
    of the form S a , (where S is the start
    symbol of the CFG)
  • The stack contains (in alternating) order
  • A DFA state
  • A terminal symbol or part (subtree) of the parse
    tree being constructed
  • The items on the stack are related by transitions
    of the DFA
  • There are two basic actions in the algorithm
  • shift get next input token
  • reduce build a new node (remove children from
    stack)

33
Bottom Up Parsers Overview of Algorithms
  • LR(0) The simplest algorithm, theoretically
    important but rather weak (not practical)
  • SLR An improved version of LR(0) more practical
    but still rather weak.
  • LR(1) LR(0) algorithm with extra lookahead
    token.
  • very powerful algorithm. Not often used because
    of large memory requirements (very big parsing
    tables)
  • LALR Watered down version of LR(1)
  • still very powerful, but has much smaller parsing
    tables
  • most commonly used algorithm today

34
JavaCUP A LALR generator for Java
Grammar BNF-like Specification
Definition of tokens Regular Expressions
JavaCUP
JFlex
Java File Parser Class Uses Scanner to get
TokensParses Stream of Tokens
Java File Scanner Class Recognizes Tokens
Syntactic Analyzer
35
Steps to build a compiler with SableCC
  1. Create a SableCC specification file
  2. Call SableCC
  3. Create one or more working classes, possibly
    inherited from classes generated by SableCC
  4. Create a Main class activating lexer, parser and
    working classes
  5. Compile with Javac

36
Contextual Analysis Phase
  • Purposes
  • Finish syntax analysis by deriving
    context-sensitive information
  • Associate semantic routines with individual
    productions of the context free grammar or
    subtrees of the AST
  • Start to interpret meaning of program based on
    its syntactic structure
  • Prepare for the final stage of compilation Code
    generation

37
Contextual Analysis -gt Decorated AST
Annotations
Program
result of identification
LetCommand
type result of type checking
SequentialCommand
SequentialDeclaration
AssignCommand
int
AssignCommand
BinaryExpr
VarDecl
Char.Expr
VNameExp
Int.Expr
char
int
int
int
SimpleT
SimpleV
SimpleV
char
int
Ident
Ident
Ident
Ident
Ident
Ident
Ident
Op
Char.Lit
Int.Lit
n
c
n
n
Integer
Char
c


1
38
Nested Block Structure
A language exhibits nested block structure if
blocks may be nested one within another
(typically with no upper bound on the level of
nesting that is allowed).
Nested
  • There can be any number of scope levels
    (depending on the level of nesting of blocks)
  • Typical scope rules
  • no identifier may be declared more than once
    within the same block (at the same level).
  • for any applied occurrence there must be a
    corresponding declaration, either within the same
    block or in a block in which it is nested.

39
Type Checking
  • For most statically typed programming languages,
    type checking is a bottom up algorithm over the
    AST
  • Types of expression AST leaves are known
    immediately
  • literals gt obvious
  • variables gt from the ID table
  • named constants gt from the ID table
  • Types of internal nodes are inferred from the
    type of the children and the type rule for that
    kind of expression

40
Contextual Analysis
Identification and type checking are combined
into a depth-first traversal of the abstract
syntax tree.
Program
LetCommand
SequentialCommand
SequentialDeclaration
AssignCommand
AssignCommand
BinaryExpression
VarDec
VarDec
VnameExpr
IntExpr
CharExpr
SimpleT
SimpleT
SimpleV
SimpleV
SimpleV
Ident
Ident
Ident
Ident
Ident
CharLit
Ident
Ident
Op
IntLit
n
Integer
c
Char
c

n
n

1
41
Visitor Solution
  • Nodes accept visitors and call appropriate method
    of the visitor
  • Visitors implement the operations and have one
    method for each type of node they visit

42
Runtime organization
  • Data Representation how to represent values of
    the source language on the target machine.
  • Primitives, arrays, structures, unions, pointers
  • Expression Evaluation How to organize computing
    the values of expressions (taking care of
    intermediate results)
  • Register vs. stack machine
  • Storage Allocation How to organize storage for
    variables (considering different lifetimes of
    global, local and heap variables)
  • Activation records, static links
  • Routines How to implement procedures, functions
    (and how to pass their parameters and return
    values)
  • Value vs. reference, closures, recursion
  • Object Orientation Runtime organization for OO
    languages
  • Method tables

43
RECAP TAM Frame Layout Summary

Arguments for current procedure they were put
here by the caller.
arguments
LB
dynamic link static link return address
Link data
local variables and intermediate results
Local data, grows and shrinks during execution.
ST
44
Garbage Collection Conclusions
  • Relieves the burden of explicit memory allocation
    and deallocation.
  • Software module coupling related to memory
    management issues is eliminated.
  • An extremely dangerous class of bugs is
    eliminated.
  • The compiler generates code for allocating
    objects
  • The compiler must also generate code to support
    GC
  • The GC must be able to recognize root pointers
    from the stack
  • The GC must know about data-layout and objects
    descriptors

45
Code Generation
Source Program
Target program
let var n integer var c charin begin c
n n1end
PUSH 2LOADL 38STORE 1SBLOAD 0LOADL 1CALL
addSTORE 0SBPOP 2HALT
Source and target program must be semantically
equivalent
Semantic specification of the source language is
structured in terms of phrases in the SL
expressions, commands, etc. gt Code generation
follows the same inductive structure.
46
Specifying Code Generation with Code Templates
The code generation functions for Mini Triangle
Phrase Class Function Effect of the generated
code
Run program P then halt. Starting and finishing
with empty stack Execute Command C. May update
variables but does not shrink or grow the
stack! Evaluate E, net result is pushing the
value of E on the stack. Push value of constant
or variable on the stack. Pop value from stack
and store in variable V Elaborate declaration,
make space on the stack for constants and
variables in the decl.
Program Command Expres- sion V-name V-name Decl
a-ration
run P execute C evaluate E fetch V assign
V elaborate D
47
Code Generation with Code Templates
While command
execute while E do C JUMP h g execute
C h evaluateE JUMPIF(1) g
C
E
48
Developing a Code Generator Visitor
execute C1 C2 executeC1 executeC2
public Object visitSequentialCommand( Sequent
ialCommand com,Object arg) com.C1.visit(this,a
rg) com.C2.visit(this,arg) return null
LetCommand, IfCommand, WhileCommand gt later. -
LetCommand is more complex memory allocation and
addresses - IfCommand and WhileCommand
complications with jumps
49
Code improvement (optimization)
  • The code generated by our compiler is not
    efficient
  • It computes values at runtime that could be known
    at compile time
  • It computes values more times than necessary
  • We can do better!
  • Constant folding
  • Common sub-expression elimination
  • Code motion
  • Dead code elimination

50
Optimization implementation
  • Is the optimization correct or safe?
  • Is the optimization an improvement?
  • What sort of analyses do we need to perform to
    get the required information?
  • Local
  • Global

51
Concurrency, distributed computing, the Internet
  • Traditional view
  • Let the OS deal with this
  • gt It is not a programming language issue!
  • End of Lecture
  • Wait-a-minute
  • Maybe the traditional view is getting out of
    date?

52
Languages with concurrency constructs
  • Maybe the traditional view was always out of
    date?
  • Simula
  • Modula3
  • Occam
  • Concurrent Pascal
  • ADA
  • Linda
  • CML
  • Facile
  • Jo-Caml
  • Java
  • C
  • Fortress

53
What could languages provide?
  • Abstract model of system
  • abstract machine gt abstract system
  • Example high-level constructs
  • Process as the value of an expression
  • Pass processes to functions
  • Create processes at the result of function call
  • Communication abstractions
  • Synchronous communication
  • Buffered asynchronous channels that preserve msg
    order
  • Mutual exclusion, atomicity primitives
  • Most concurrent languages provide some form of
    locking
  • Atomicity is more complicated, less commonly
    provided

54
Programming Language Life cycle
  • The requirements for the new language are
    identified
  • The language syntax and semantics is designed
  • BNF or EBNF, experiments with front-end tools
  • Informal or formal Semantic
  • An informal or formal specification is developed
  • Initial implementation
  • Prototype via interpreter or interpretive
    compiler
  • Language tested by designers, implementers and a
    few friends
  • Feedback on the design and possible
    reconsiderations
  • Improved implementation

55
Programming Language Life cycle
Design
Specification
Prototype
Compiler
Manuals, Textbooks
56
Programming Language Life cycle
  • Lots of research papers
  • Conferences session dedicated to new language
  • Text books and manuals
  • Used in large applications
  • Huge international user community
  • Dedicated conference
  • International standardisation efforts
  • Industry de facto standard
  • Programs written in the languages becomes legacy
    code
  • Language enters hall-of-fame and features are
    taught in CS course on Programming Language
    Design and Implementation

57
The Most Important Open Problem in Computing
  • Increasing Programmer Productivity
  • Write programs correctly
  • Write programs quickly
  • Write programs easily
  • Why?
  • Decreases support cost
  • Decreases development cost
  • Decreases time to market
  • Increases satisfaction

58
Why Programming Languages?
  • 3 ways of increasing programmer productivity
  • Process (software engineering)
  • Controlling programmers
  • Tools (verification, static analysis, program
    generation)
  • Important, but generally of narrow applicability
  • Language design --- the center of the universe!
  • Core abstractions, mechanisms, services,
    guarantees
  • Affect how programmers approach a task (C vs.
    SML)
  • Multi-paradigm integration

59
Programming Languages and Compilers are at the
core of Computing
All software is written in a programming
language Learning about compilers will teach you
a lot about the programming languages you already
know. Compilers are big therefore you need to
apply all you knowledge of software
engineering. The compiler is the program from
which all other programs arise.
60
How to recognize a problem that can be solved
with programming language techniques when you see
one?
  • Problem - a Scrabble game to be distributed as an
    applet.
  • Create a dictionary of 50,000 words.
  • Two options
  • Program 1
  • create an external file words.txt and read it
    into an array when
  • program starts
  • while ((word f.readLine()) ! null
    words.addElement(word)
  • Program 2
  • create a 50.000 element table in the program and
    initialize it to the words
  • String words hill, fetch, pail,
    water,..
  • Advantages/disadvantages of each approach?
  • performance
  • flexibility
  • correctness
  • .
  • Example from J. Craig Cleaveland. Program
    Generators with XML and Java, chapter 1

61
A program generator approach
  • import java.io.
  • import java.util.
  • class Dictionary1Generator
  • static Vector words new Vector()
  • static void loadWords()
  • // read the words in file words.txt
  • // into the Vector words
  • static public void main(String args)
  • loadWords()
  • // Generate Dictionary1 program
  • System.out.println("class Dictionary1\n")
  • System.out.println(" String words ")
  • for (int j0 jltwords.size() j)
  • System.out.println("\""words.elementAt(j)"\
    ",")
  • System.out.println( \n )

62
Typical program generator
  • Dictionary example
  • The data
  • simply a list of words
  • Analyzing/transforming data
  • duplicate word removal
  • sorting
  • Generate program
  • simply use print statements to write program text
  • General picture
  • The data
  • some more complex representation of data
  • formal specs,
  • grammar,
  • spreadsheet,
  • XML,
  • etc.
  • Analyzing/transforming data
  • parse, check for inconsistencies, transform to
    other data structures
  • Generate program
  • generate syntax tree, use templates,

63
The next wave of Program GeneratorsModel-Driven
Development
64
New Programming Language! Why Should I Care?
  • The problem is not designing a new language
  • Its easy! Thousands of languages have been
    developed
  • The problem is how to get wide adoption of the
    new language
  • Its hard! Challenges include
  • Competition
  • Usefulness
  • Interoperability
  • Fear
  • Its a good idea, but its a new idea
    therefore, I fear it and must reject it. ---
    Homer Simpson
  • The financial rewards are low, but

65
Famous Danish Computer Scientists
  • Peter Nauer
  • BNF and Algol
  • Per Brinck Hansen
  • Monitors and Concurrent Pascal
  • Dines Bjørner
  • VDM and ADA
  • Bjarne Straustrup
  • C
  • Mads Tofte
  • SML
  • Rasmus Lerdorf
  • PhP
  • Anders Hejlsberg
  • Turbo Pascal and C
  • Jacob Nielsen

66
(No Transcript)
67
(No Transcript)
68
Fancy joining this crowd?
  • Join the Programming Language Technology Research
    Group when you get to DAT5/DAT6 or SW8/SW9
  • Research Programme underway
  • How would you like to programme in 20 years?
  • OO and Functional Programming
  • Lots of MSc projects
  • Languages for testability, verifiability,
    specifiability
  • Java vs. .Net
  • Aspect Oriented Programming on .Net
  • Business Process Management Language
  • Multiple dispatch in C
  • XML as program representation
  • Java on Mobile Phones
  • OO and DB
  • OO and Concurrency

69
Finally
Keep in mind, the compiler is the program from
which all other programs arise. If your compiler
is under par, all programs created by the
compiler will also be under par. No matter the
purpose or use -- your own enlightenment about
compilers or commercial applications -- you want
to be patient and do a good job with this
program in other words, don't try to throw this
together on a weekend. Asking a computer
programmer to tell you how to write a compiler is
like saying to Picasso, "Teach me to paint like
you." Sigh Well, Picasso tried.
70
What I promised you at the start of the course
  • Ideas, principles and techniques to help you
  • Design your own programming language or design
    your own extensions to an existing language
  • Tools and techniques to implement a compiler or
    an interpreter
  • Lots of knowledge about programming

I hope you feel you got what I promised
71
Top 10 reasons COMPILERS must be female
  • 10. Picky, picky, picky.
  • 9. They hear what you say, but not what you mean.
  • 8. Beauty is only shell deep.
  • 7. When you ask what's wrong, they say "nothing".
  • 6. Can produce incorrect results with alarming
    speed.
  • 5. Always turning simple statements into big
    productions.
  • 4. Small talk is important.
  • 3. You do the same thing for years, and suddenly
    it's wrong.
  • 2. They make you take the garbage out.
  • 1. Miss a period and they go wild.
Write a Comment
User Comments (0)
About PowerShow.com