CSC 415: Translators and Compilers - PowerPoint PPT Presentation

1 / 280
About This Presentation
Title:

CSC 415: Translators and Compilers

Description:

CSC 415: Translators and Compilers Dr. Chuck Lillie – PowerPoint PPT presentation

Number of Views:197
Avg rating:3.0/5.0
Slides: 281
Provided by: lill180
Category:

less

Transcript and Presenter's Notes

Title: CSC 415: Translators and Compilers


1
CSC 415 Translators and Compilers
  • Dr. Chuck Lillie

2
Course Outline
  • Translators and Compilers
  • Language Processors
  • Compilation
  • Syntactic Analysis
  • Contextual Analysis
  • Run-Time Organization
  • Code Generation
  • Interpretation
  • Major Programming Project
  • Project Definition and Planning
  • Implementation
  • Weekly Status Reports
  • Project Presentation

3
Project
  • Implement a Compiler for the Programming Language
    Triangle
  • Appendix B Informal Specification of the
    Programming Language Triangle
  • Appendix D Class Diagrams for the Triangle
    Compiler
  • Present Project Plan
  • What and How
  • Weekly Status Reports
  • Work accomplished during the reporting period
  • Deliverable progress, as a percentage of
    completion
  • Problem areas
  • Planned activities for the next reporting period

4
Chapter 1 Introduction to Programming Languages
  • Programming Language A formal notation for
    expressing algorithms.
  • Programming Language Processors Tools to enter,
    edit, translate, and interpret programs on
    machines.
  • Machine Code Basic machine instructions
  • Keep track of exact address of each data item and
    each instruction
  • Encode each instruction as a bit string
  • Assembly Language Symbolic names for operations,
    registers, and addresses.

5
Programming Languages
  • High Level Languages Notation similar to
    familiar mathematical notation
  • Expressions , -, , /
  • Data Types truth variables, characters,
    integers, records, arrays
  • Control Structures if, case, while, for
  • Declarations constant values, variables,
    procedures, functions, types
  • Abstraction separates what is to be performed
    from how it is to be performed
  • Encapsulation (or data abstraction) group
    together related declarations and selectively
    hide some

6
Programming Languages
  • Any system that manipulates programs expressed in
    some particular programming language
  • Editors enter, modify, and save program text
  • Translators and Compilers Translates text from
    one language to another. Compiler translates a
    program from a high-level language to a low-level
    language, preparing it to be run on a machine
  • Checks program for syntactic and contextual
    errors
  • Interpreters Runs program without compliation
  • Command languages
  • Database query languages

7
Programming Languages Specifications
  • Syntax
  • Form of the program
  • Defines symbols
  • How phrases are composed
  • Contextual constraints
  • Scope determine scope of each declaration
  • Type
  • Semantics
  • Meaning of the program

8
Representation
  • Syntax
  • Backus-Naur Form (BNF) context-free grammar
  • Terminal symbols (gt, while, )
  • Non-terminal symbols (Program, Command,
    Expression, Declaration)
  • Start symbol (Program)
  • Production rules (defines how phrases are
    composed from terminals and sub-phrases)
  • Nab.
  • Syntax Tree
  • Used to define language in terms of strings and
    terminal symbols

9
Representation
  • Semantics
  • Abstract Syntax
  • Concentrate on phrase structure alone
  • Abstract Syntax Tree

10
Contextual Constraints
  • Scope
  • Binding
  • Static determined by language processor
  • Dynamic determined at run-time
  • Type
  • Statically language processor can detect all
    errors
  • Dynamically type errors cannot be detected until
    run-time

Will assume static binding and statically typed
11
Semantics
  • Concerned with meaning of program
  • Behavior when run
  • Usually specified informally
  • Declarative sentences
  • Could include side effects
  • Correspond to production rules

12
Mini-Triangle Syntax
  • single-Command
  • single-Command
  • Command single-Command
  • V-name Expression
  • Identifier ( Expression )
  • if Expression then single-Command
  • else single-Command
  • while Expression do single-Command
  • let Declaration in single-Command
  • begin Command end
  • primary-Expression
  • Expression Operator primary-Expression
  • Program
  • Command
  • Single-Command
  • Expression

13
Mini-Triangle Syntax
  • Integer-Literal
  • V-name
  • Operator primary-Expression
  • ( Expression )
  • Identifier
  • single-Declaration
  • Declaration single-Declaration
  • const Identifier Expression
  • var Identifier Type-denoter
  • Identifier
  • - / lt gt \
  • Letter Identifier Letter Identifier Digit
  • Digit Integer-Literal Digit
  • ! Graphic eol
  • Primary-Expression
  • V-name
  • Declaration
  • Single-Declaration
  • Type-Denoter
  • Operator
  • Identifier
  • Integer-Literal
  • Comment

14
Syntax Tree let var y Integer in y y 1
Program
single-Command
single-Command
Expression
Declaration
Expression
primary-Expression
primary-Expression
single-Declaration
Type-denoter
V-name
V-name
Integer-Literal
Identifier
Identifier
Identifier
Identifier
Operator
y
var
y
y
let

Integer
in
1


15
Representation
  • Semantics
  • Abstract Syntax
  • Concentrate on phrase structure alone
  • Abstract Syntax Tree

16
Mini-Triangle Abstract Syntax
Label Program AssignCommand CallCommand Sequential
Command IfCommand WhileCommand LetCommand Integer
Expression VnameExpression UnaryExpression BinaryE
xpression
  • Command
  • V-name Expression
  • Identifier ( Expression )
  • Command Command
  • if Expression then Command
  • else Command
  • while Expression do Command
  • let Declaration in Command
  • Integer-Literal
  • V-name
  • Operator Expression
  • Expression Operator Expression
  • Program
  • Command
  • Expression

17
Mini-Triangle Abstract Syntax
Label SimpleVname ConstDeclaration VarDeclaration
SequentialDeclaration SimpleTypeDenoter
  • Identifier
  • const Identifier Expression
  • var Identifier Type-denoter
  • Declaration Declaration
  • Identifier
  • V-name
  • Declaration
  • Type-Denoter

18
Abstract Syntax Tree let var y Integer in y
y 1
Program
LetCommand
AssignmentCommand
BinaryExpression
VarDeclaration
Expression
IntegerExpression
VnameExpression
SimpleTypeDenoter
SimpleVname
SimpleVname
Integer-Literal
Identifier
Identifier
Identifier
Identifier
Operator
y
y
y
Integer
1

19
Mini-Triangle Semantics
  • A command C is executed in order to update
    variables (this includes input and output)
  • The assignment statement V E is executed as
    follows. The expression E is evaluated to yield
    a value v then v is assigned to the
    value-or-variable-name V.
  • The call-command I (E) is executed as follows.
    The expression E is evaluated to yield a value v
    then the procedure bound to I is called with v as
    its argument.
  • The sequence command C1 C2 is executed as
    follows. First C1 is executed then C2 is
    executed.
  • The if-command if E then C1 else C2 is executed
    as follows. The expression E is evaluated to
    yield a truth-value t If t is true, C1 is
    executed if t is false, C2 is executed.
  • The while-command while E do C is executed as
    follows. The expression E is evaluated to yield
    a truth-value t if t is true, C is executed, and
    then the while-command is executed again if t is
    false, execution of the while-command is
    completed.
  • The let-command let D in C is executed as
    follows. The declaration D is elaborated to
    produce bindings b C is executed, in the
    environment of the let-command overlaid by the
    bindings b. The bindings b have no effect
    outside the let-command.

20
Chapter 2 Language Processors
  • Translators and Compilers
  • Interpreters
  • Real and Abstract Machines
  • Interpretive Compilers
  • Portable Compilers
  • Bootstrapping
  • Case Study The Triangle Language Processor

21
Translators Compilers
  • Translator a program that accepts any text
    expressed in one language (the translators
    source language), and generates a
    semantically-equivalent text expressed in another
    language (its target language)
  • Chinese-into-English
  • Java-into-C
  • Java-into-x86
  • X86 assembler

22
Translators Compilers
  • Assembler translates from an assembly language
    into the corresponding machine code
  • Generates one machine code instruction per source
    instruction
  • Compiler translates from a high-level language
    into a low-level language
  • Generates several machine-code instructions per
    source command.

23
Translators Compilers
  • Disassembler translates a machine code into the
    corresponding assembly language
  • Decompiler translates a low-level language into
    a high-level language

Question Why would you want a disassembler or
decompiler?
24
Translators Compilers
  • Source Program the source language text
  • Object Program the target language text

Compiler
Syntax Check
Context Constraints
  • Object program semantically equivalent to source
    program
  • If source program is well-formed

25
Translators Compilers
  • Why would you want to do
  • Java-into-C translator
  • C-into-Java translator
  • Assembly-language-into-Pascal decompiler

26
Translators Compilers
P Program Name
L Implementation Language
M Target Machine
For this to work, L must equal M, that is, the
implementation language must be the same as the
machine language
S Source Language
T Target Language
L Translators Implementation Language
S-into-T Translator is itself a program that runs
on machine L
27
Translators Compilers
  • Translating a source program P
  • Expressed in language T,
  • Using an S-into-T translator
  • Running on machine M

28
Translators Compilers
sort
sort
sort
Java
x86
Java
x86
x86
x86
  • Translating a source program sort
  • Expressed in language Java,
  • Using an Java-into-x86 translator
  • Running on an x86 machine

The object program is running on the same machine
as the compiler
29
Translators Compilers
sort
sort
sort
Java
PPC
Java
PPC
PPC
download
x86
  • Translating a source program sort
  • Expressed in language Java,
  • Using an Java-into-PPC translator
  • Running on an x86 machine
  • Downloaded to a PPC machine

Cross Compiler The object program is running on
a different machine than the compiler
30
Translators Compilers
sort
sort
sort
Java
Java
C
C
C
x86
x86
  • Translating a source program sort
  • Expressed in language Java,
  • Using an Java-into-C translator
  • Running on an x86 machine
  • Then translating the C program
  • Using an C-into x86 compiler
  • Running on an x86 machine
  • Into x86 object program

Two-stage Compiler The source program is
translated to another language before being
translated into the object program
31
Translators Compilers
  • Translator Rules
  • Can run on machine M only if it is expressed in
    machine code M
  • Source program must be expressed in translators
    source language S
  • Object program is expressed in the translators
    target language T
  • Object program is semantically equivalent to the
    source program

32
Interpreters
  • Accepts any program (source program) expressed in
    a particular language (source language) and runs
    that source program immediately
  • Does not translate the source program into object
    code prior to execution

33
Interpreters
Interpreter
Fetch Instruction
Analyze Instruction
Program Complete
Execute Instruction
  • Source program starts to run as soon as the first
    instruction is analyzed

34
Interpreters
  • When to Use Interpretation
  • Interactive mode want to see results of
    instruction before entering next instruction
  • Only use program once
  • Each instruction expected to be executed only
    once
  • Instructions have simple formats
  • Disadvantages
  • Slow up to 100 times slower than in machine code

35
Interpreters
  • Examples
  • Basic
  • Lisp
  • Unix Command Language (shell)
  • SQL

36
Interpreters
S interpreter expressed in language L
Program P expressed in language S, using
Interpreter S, running on machine M
Program graph written in Basic running on a Basic
interpreter executed on an x86 machine
37
Real and Abstract Machines
  • Hardware emulation Using software to execute one
    set of machine code on another machine
  • Can measure everything about the new machine
    except its speed
  • Abstract machine emulator
  • Real machine actual hardware

An abstract machine is functionally equivalent to
a real machine if they both implement the same
language L
38
Real and Abstract Machines
New Machine Instruction (nmi) interpreter written
in C
nmi interpreter expressed in machine code M
nmi interpreter written in C
The nmi interpreter is translated into machine
code M using the C compiler
Compiler to translate C program into M machine
code
39
Interpretive Compilers
  • Combination of compiler and interpreter
  • Translate source program into an intermediate
    language
  • It is intermediate in level between the source
    language and ordinary machine code
  • Its instructions have simple formats, and
    therefore can be analyzed easily and quickly
  • Translation from the source language into the
    intermediate language is easy and fast

An interpretive compiles combines fast
compilation with tolerable running speed
40
Interpretive Compilers
Java into JVM translator running on machine M
JVM code interpreter running on machine M
A Java program P is first translated into
JVM-code, and then the JVM-code object program is
interpreted
41
Portable Compilers
  • A program is portable if it can be compiled and
    run on any machine, without change
  • A portable program is more valuable than an
    unportable one, because its development cost can
    be spread over more copies
  • Portability is measured by the proportion of code
    that remains unchanged when it is moved to a
    dissimilar machine
  • Language affects protability
  • Assembly language 0 portable
  • High level language approaches 100 portability

42
Portable Compilers
  • Language Processors
  • Valuable and widely used programs
  • Typically written in high-level language
  • Pascal, C, Java
  • Part of language processor is machine dependent
  • Code generation part
  • Language processor is only about 50 portable
  • Compiler that generates intermediate code is more
    portable than a compiler that generates machine
    code

43
Portable Compilers
Java
JVM
Java
Rewrite interpreter in C
44
Bootstrapping
  • The language processor is used to process itself
  • Implementation language is the source language
  • Bootstrapping a portable compiler
  • A portable compiler can be bootstrapped to make a
    true compiler one that generates machine code
    by writing an intermediate-language-into-machine-c
    ode translator
  • Full bootstrap
  • Writing the compiler in itself
  • Using the latest version to upgrade the next
    version
  • Half bootstrap
  • Compiler expressed in itself but targeted for
    another machine
  • Bootstrapping to improve efficiency
  • Upgrade the compiler to optomize code generation
    as well as to improve compile efficiency

45
Bootstrapping
Bootstrap an interpretive compiler to generate
machine code
First, write a JVM-coded-into-M translator in Java
Next, compile translator using existing
interpreter
Use translator to translate itself
Two stage Java-into-M compiler
Translate Java-into-JVM-code translator into
machine code
46
Bootstrapping
Full bootstrap
v2
v1
Convert the C version of Ada-S into Ada-S version
of Ada-S
Write Ada-S compiler in C
v1
v2
v3
Extend Ada-S compiler to (full) Ada compiler
47
Bootstrapping
Half bootstrap
48
Bootstrapping
Bootstrap to improve efficiency
49
Chapter 3 Compilation
  • Phases
  • Syntactic Analysis
  • Contextual Analysis
  • Code Generation
  • Passes
  • Multi-pass Compilation
  • One-pass Compilation
  • Compiler Design Issues
  • Case Study The Triangle Compiler

50
Phases
  • Syntactic Analysis
  • The source program is parsed to check whether it
    conforms to the source languages syntax, and to
    determine its phrase structure
  • Contextual Analysis
  • The parsed program is analyzed to check whether
    it conforms to the source language's contextual
    constraints
  • Code Generation
  • The checked program is translated to an object
    program, in accordance with the semantics of the
    source and target languages

51
Phases
Source Program
Syntactic Analysis
Error Report
AST
Contextual Analysis
Error Report
Decorated AST
Code Generation
Object Program
52
Syntactic Analysis
  • To determine the source programs phrase
    structure
  • Parsing
  • Contextual analysis and code generation must know
    how the program is composed
  • Commands, expressions, declarations,
  • Check for conformance to the source languages
    syntax
  • Construct suitable representation of its phrase
    structure (AST)
  • AST
  • Terminal nodes corresponding to identifiers,
    literals, and operators
  • Sub trees representing the phases of the source
    program
  • Blanks and comments not in AST (no meaning)
  • Punctuation and brackets not in AST (only
    separate and enclose)

53
Contextual Analysis
  • Analyzes the parsed program
  • Scope rules
  • Type rules
  • Produces decorated AST
  • AST with information gathered during contextual
    analysis
  • Each applied occurrence of an identifier is
    linked ot the corresponding declaration
  • Each expression is decorated by its type T

54
Code Generation
  • The final translation of the checked program to
    an object program
  • After syntactic and contextual analysis is
    completed
  • Treatment of identifiers
  • Constants
  • Binds identifier to value
  • Replace each occurrence of identifier with value
  • Variables
  • Binds identifier to some memory address
  • Replace each occurrence of identifier by address
  • Target language
  • Assembly language
  • Machine code

55
Passes
  • Multi-pass compilation
  • Traverses the program or AST several times
  • One-pass compilation
  • Single traverse of program
  • Contextual analysis and code generation are
    performed on the fly during syntactic analysis

56
Compiler Design Issues
  • Speed
  • Compiler run time
  • Space
  • Storage size of compiler files generated
  • Modularity
  • Multi-pass compiler more modular than one-pass
    compiler
  • Flexibility
  • Multi-pass compiler is more flexible because it
    generates an AST that can be traversed in any
    order by the other phases
  • Semantics-preserving transformations
  • To optimize code must have multi-pass compiler
  • Source language properties
  • May restrict compiler choice some language
    constructs may require multi-pass compilers

57
Simple Triangle Program
! This program is useless ! Except for
illustration. let var n Integer var c
char in begin c n n 1 end
58
Abstract Syntax Tree
Program
(1)
LetCommand
(4)
SequentialDeclaraation
SequentialDeclaraation
(5)
(2)
AssignmentCommand
(3)
AssignmentCommand
VarDeclaration
VarDeclaration
(7)
Character Expression
BinaryExpression
SimpleTypeDenoter
SimpleTypeDenoter
(8)
(9)
IntegerExpression
VnameExpression
Identifier
Identifier
Identifier
Identifier
SimpleVname
SimpleVname
SimpleVname
Identifier
Character Literal
(6)
Integer-Literal
Identifier
Identifier
Operator
c
n

n
Char
1

Integer
c
n
59
Abstract Syntax Tree
Program
(1)
LetCommand
(4)
SequentialDeclaraation
SequentialDeclaraation
(5)
(2)
AssignmentCommand
(3)
AssignmentCommand
VarDeclaration
VarDeclaration
(7)
Character Expression
BinaryExpression
SimpleTypeDenoter
SimpleTypeDenoter
(8)
(9)
IntegerExpression
VnameExpression
Identifier
Identifier
Identifier
Identifier
SimpleVname
SimpleVname
SimpleVname
Identifier
Character Literal
(6)
Integer-Literal
Identifier
Identifier
Operator
c
n

n
Char
1

Integer
c
n
60
Chapter 4 Syntactic Analysis
  • Sub-phases of Syntactic Analysis
  • Grammars Revisited
  • Parsing
  • Abstract Syntax Trees
  • Scanning
  • Case Study Syntactic Analysis in the Triangle
    Compiler

61
Structure of a Compiler
Lexical Analyzer
Source code
Symbol Table
tokens
Parser Semantic Analyzer
parse tree
Intermediate Code Generation
intermediate representation
Optimization
intermediate representation
Assembly Code Generation
Assembly code
62
Syntactic Analysis
  • Main function
  • Parse source program to discover its phrase
    structure
  • Recursive-descent parsing
  • Constructing an AST
  • Scanning to group characters into tokens

63
Sub-phases of Syntactic Analysis
  • Scanning (or lexical analysis)
  • Source program transformed to a stream of tokens
  • Identifiers
  • Literals
  • Operators
  • Keywords
  • Punctuation
  • Comments and blank spaces discarded
  • Parsing
  • To determine the source programs phrase structure
  • Source program is input as a stream of tokens
    (from the Scanner)
  • Treats each token as a terminal symbol
  • Representation of phrase structure
  • AST

64
Lexical Analysis A Simple Example
Main() int a, b, c char number5 / get
user inputs / A atoi ( gets(number)) B
atoi (gets(number)) / calculate value for c
/ C 2(ab) a(ab) / print results
/ Printf(d,c)
  • Scan the file character by character and group
    characters into words and punctuation (tokens),
    remove white space and comments
  • Some tokens for this example
  • main
  • (
  • )
  • int
  • a
  • ,
  • b
  • ,
  • c

65
Creating Tokens Mini-Triangle Example
Input Converter
character string
. . . .
l
e
t
S
v
a
r
y

I
n
t
e
g
e
r
i
n
S
S
S
Scanner
Ident.
colon
Ident.
Ident.
becomes
Ident.
op.
Intlit.
eot
let
var
in

1
y

Integer
y
y


let
var
in
66
Tokens in Triangle
  • // literals, identifiers, operators...
  • INTLITERAL 0, "ltintgt",
  • CHARLITERAL 1, "ltchargt",
  • IDENTIFIER 2, "ltidentifiergt",
  • OPERATOR 3, "ltoperatorgt",
  • // reserved words - must be in alphabetical
    order...
  • ARRAY 4, "array",
  • BEGIN 5, "begin",
  • CONST 6, "const",
  • DO 7, "do",
  • ELSE 8, "else",
  • END 9, "end",
  • FUNC 10, "func",
  • IF 11, "if",
  • IN 12, "in",
  • LET 13, "let",
  • OF 14, "of",
  • PROC 15, "proc",

// punctuation... DOT 21, ".",
COLON 22, "", SEMICOLON 23, "",
COMMA 24, ",", BECOMES 25, "",
IS 26, // brackets... LPAREN 27,
"(", RPAREN 28, ")", LBRACKET
29, ", RBRACKET 30, "", LCURLY
31, "", RCURLY 32, "", // special
tokens... EOT 33, "", ERROR 34
"lterrorgt"
67
Grammars Revisited
  • Context free grammars
  • Generates a set of sentences
  • Each sentence is a string of terminal symbols
  • An unambiguous sentence has a unique phrase
    structure embodied in its syntax tree
  • Develop parsers from context-free grammars

68
Regular Expressions
  • A regular expression (RE) is a convenient
    notation for expressing a set of stings of
    terminal symbols
  • Main features
  • separates alternatives
  • indicates that the previous item may be
    represented zero or more times
  • ( and ) are grouping parentheses

69
Regular Expression Basics
  • e The empty string a special string of length 0
  • Regular expression operations
  • separates alternatives
  • indicates that the previous item may be
    represented zero or more times (repetition)
  • ( and ) are grouping parentheses

70
Regular Expression Basics
  • Algebraic Properties
  • is commutative and associative
  • rs sr
  • r(st) (rs)t
  • Concatenation is associative
  • (rs)t r(st)
  • Concatenation distributes over
  • r(st) rsrt
  • (st)r srtr
  • e is the identity for concatenation
  • e r r
  • r e r
  • is idempotent
  • r r
  • r (r e)

71
Regular Expression Basics
  • Common Extensions
  • r one or more of expression r, same as rr
  • rk k repetitions of r
  • r3 rrr
  • r the characters not in the expression r
  • \t\n
  • r-z range of characters
  • 0-9a-z
  • r? Zero or one copy of expression (used for
    fields of an expression that are optional)

72
Regular Expression Example
  • Regular Expression for Representing Months
  • Examples of legal inputs
  • January represented as 1 or 01
  • October represented as 10
  • First Try 01e0-9
  • Matches all legal inputs? Yes
  • 1, 2, 3, , 10, 11, 12, 01, 02, , 09
  • Matches any illegal inputs? Yes
  • 0, 00, 18

73
Regular Expression Example
  • Regular Expression for Representing Months
  • Examples of legal inputs
  • January represented as 1 or 01
  • October represented as 10
  • Second Try 1-9(01-9)(10-2)
  • Matches all legal inputs? Yes
  • 1, 2, 3, , 10, 11, 12, 01, 02, , 09
  • Matches any illegal inputs? No

74
Regular Expression Example
  • Regular Expression for Floating Point Numbers
  • Examples of legal inputs
  • 1.0, 0.2, 3.14159, -1.0, 2.7e8, 1.0E-6
  • Assume that a 0 is required before numbers less
    than 1 and does not prevent extra leading zeros,
    so numbers such as 0011 or 0003.14159 are legal
  • Building the regular expression
  • Assume
  • Digit ? 0123456789
  • Handle simple decimals such as 1.0, 0.2, 3.14159
  • Digit.digit
  • Add an optional sign (only minus, no plus)
  • (- e)digit.digit or -?digit.digit

75
Regular Expression Example
  • Regular Expression for Floating Point Numbers
    (cont.)
  • Building the regular expression (cont.)
  • Format for the exponent
  • (Ee)(-)?(digit)
  • Adding it as an optional expression to the
    decimal part
  • (- e)digit.digit((Ee)(-)?(digit))?

76
Extended BNF
  • Extended BNF (EBNF)
  • Combination of BNF and RE
  • NX, where N is a nonterminal symbol and X is
    an extended RE, i.e., an RE constructed from both
    terminal and nonterminal symbols
  • EBNF
  • Right hand side may use . , (, )
  • Right hand side may contain both terminal and
    nonterminal symbols

77
Example EBNF
  • Expression primary-Expression (Operator
    primary-Expression)
  • Primary-Expression Identifier
  • ( Expression )
  • Identifier abcde
  • Operator -/
  • Generates
  • e
  • a b
  • a b c
  • a (b c)
  • a (b c) / d
  • a (b (c (d e)))

78
Grammar Transformations
  • Left Factorization
  • XY XZ is equivalent to X(Y Z)
  • single-Command V-name Expression
  • if Expression then single-Command
  • if Expression then single-Command
  • else single-Command
  • single-Command V-name Expression
  • if Expression then single-Command
  • (e else single-Command)

79
Grammar Transformations
  • Elimination of left recursion
  • N X NY is equivalent to NX(Y)
  • Identifier Letter
  • Identifier Letter
  • Identifier Digit
  • Identifier Letter
  • Identifier (Letter Digit)
  • Identifier Letter(Letter Digit)

80
Grammar Transformations
  • Substitution of nonterminal symbols
  • Given NX, we can substitute each occurrence
    of N with X
  • iff NX is nonrecursive and is the only
    production rule for N
  • single-Command for Control-Variable
    Expression To-or-Downto
  • Expression do single-Command
  • Control-Variable Identifier
  • To-or-Downto to
  • down
  • single-Command for Identifier Expression
    (todownto)
  • Expression do single-Command

81
Scanning (Lexical Analysis)
  • The purpose of scanning is to recognize tokens in
    the source program. Or, to group input
    characters (the source program text) into tokens.
  • Difference between parsing and scanning
  • Parsing groups terminal symbols, which are
    tokens, into larger phrases such as expressions
    and commands and analyzes the tokens for
    correctness and structure
  • Scanning groups individual characters into tokens

82
Structure of a Compiler
Lexical Analyzer
Source code
Symbol Table
tokens
Parser Semantic Analyzer
parse tree
Intermediate Code Generation
intermediate representation
Optimization
intermediate representation
Assembly Code Generation
Assembly code
83
Creating Tokens Mini-Triangle Example
Input Converter
character string
. . . .
l
e
t
S
v
a
r
y

I
n
t
e
g
e
r
i
n
S
S
S
Scanner
Ident.
colon
Ident.
Ident.
becomes
Ident.
op.
Intlit.
eot
let
var
in

1
y

Integer
y
y


let
var
in
84
What Does a Scanner Do?
  • Hand keywords (reserve words)
  • Recognizes identifiers and keywords
  • Match explicitly
  • Write regular expression for each keyword
  • Identifier is any alpha numeric string which is
    not a keyword
  • Match as an identifier, perform lookup
  • No special regular expressions for keywords
  • When an identifier is found, perform lookup into
    preloaded keyword table

How does Triangle handle keywords? Discuss in
terms of efficiency and ease to code.
85
What Does a Scanner Do?
  • Remove white space
  • Tabs, spaces, new lines
  • Remove comments
  • Single line
  • -- Ada comment
  • Multi-line, start and end delimiters
  • Pascal comment
  • / c comment /
  • Nested
  • Runaway comments
  • Nonterminated comments cant be detected till end
    of file

86
What Does a Scanner Do?
  • Perform look ahead
  • Multi-character tokens
  • 1..10 vs. 1.10
  • ,
  • lt, lt
  • etc
  • Challenging input languages
  • FORTRAN
  • Keywords not reserved
  • Blanks are not a delimiter
  • Example (comma vs. decimal)
  • DO10I1,5 start of a do loop (equivalent to a C
    for loop)
  • DO10I1.5 an assignment statement, assignment to
    variable DO10I

87
What Does a Scanner Do?
  • Challenging input languages (cont.)
  • PL/I, keywords not reserved
  • IF THEN THEN THEN ELSE ELSE ELSE THEN

88
What Does a Scanner Do?
  • Error Handling
  • Error token passed to parser which reports the
    error
  • Recovery
  • Delete characters from current token which have
    been read so far, restart scanning at next unread
    character
  • Delete the first character of the current lexeme
    and resume scanning form next character.
  • Examples of lexical errors
  • 3.25e bad format for a constant
  • Var1 illegal character
  • Some errors that are not lexical errors
  • Mistyped keywords
  • Begim
  • Mismatched parenthesis
  • Undeclared variables

89
Scanner Implementation
  • Issues
  • Simpler design parser doesnt have to worry
    about white space, etc.
  • Improve compiler efficiency allows the
    construction of a specialized and potentially
    more efficient processor
  • Compiler portability is enhanced input alphabet
    peculiarities and other device-specific anomalies
    can be restricted to the scanner

90
Scanner Implementation
  • What are the keywords in Triangle?
  • How are keywords and identifiers implemented in
    Triangles?
  • Is look ahead implemented in Triangle?
  • If so, how?

91
Structure of a Compiler
Lexical Analyzer
Source code
Symbol Table
tokens
Semantic Analyzer
Parser
parse tree
Intermediate Code Generation
intermediate representation
Optimization
intermediate representation
Assembly Code Generation
Assembly code
92
Parsing
  • Given an unambiguous, context free grammar,
    parsing is
  • Recognition of an input string, i.e., deciding
    whether or not the input string is a sentence of
    the grammar
  • Parsing of an input string, i.e., recognition of
    the input string plus determination of its phrase
    structure. The phrase structure can be
    represented by a syntax tree, or otherwise.

Unambiguous is necessary so that every sentence
of the grammar will form exactly one syntax tree.
93
Parsing
  • The syntax of programming language constructs are
    described by context-free grammars.
  • Advantages of unambiguous, context-free grammars
  • A precise, yet easy-to understand, syntactic
    specification of the programming language
  • For certain classes of grammars we can
    automatically construct an efficient parser that
    determines if a source program is syntactically
    well formed.
  • Imparts a structure to a programming language
    that is useful for the translation of source
    programs into correct object code and for the
    detection of errors.
  • Easier to add new constructs to the language if
    the implementation is based on a grammatical
    description of the language

94
Parsing
  • Check the syntax (structure) of a program and
    create a tree representation of the program
  • Programming languages have non-regular constructs
  • Nesting
  • Recursion
  • Context-free grammars are used to express the
    syntax for programming languages

95
Context-Free Grammars
  • Comprised of
  • A set of tokens or terminal symbols
  • A set of non-terminal symbols
  • A set of rules or productions which express the
    legal relationships between symbols
  • A start or goal symbol
  • Example
  • expr ? expr digit
  • expr ? expr digit
  • expr ? digit
  • digit ? 0129
  • Tokens -,,0,1,2,,9
  • Non-terminals expr, digit
  • Start symbol expr

96
Context-Free Grammars
  1. expr ? expr digit
  2. expr ? expr digit
  3. expr ? digit
  4. digit ? 0129

Example input 3 8 - 2
97
Checking for Correct Syntax
  • Given a grammar for a language and a program, how
    do you know if the syntax of the program is
    legal?
  • A legal program can be derived from the start
    symbol of the grammar

Grammar must be unambiguous and context-free
98
Deriving a String
  • The derivation begins with the start symbol
  • At each step of a derivation the right hand side
    of a grammar rule is used to replace a
    non-terminal symbol
  • Continue replacing non-terminals until only
    terminal symbols remain

Rule 2
Rule 1
Rule 4
expr ? expr digit ? expr 2 ? expr digit - 2
Rule 3
Rule 4
Rule 4
? expr 8-2 ? digit 8-2 ? 38 -2
99
Rightmost Derivation
  • The rightmost non-terminal is replaced in each
    step

Rule 4
expr digit ? expr 2
Rule 2
expr 2 ? expr digit - 2
Rule 4
expr digit - 2 ? expr 8-2
Rule 3
expr 8-2 ? digit 8-2
Rule 4
digit 8-2 ? 38 -2
100
Leftmost Derivation
  • The leftmost non-terminal is replaced in each step

Rule 2
expr digit ? expr digit digit
Rule 3
expr digit digit ? digit digit digit
Rule 4
digit digit digit ? 3 digit digit
Rule 4
3 digit digit ? 3 8 digit
Rule 4
3 8 digit ? 3 8 2
101
Leftmost Derivation
  • The leftmost non-terminal is replaced in each step

expr
1
1
Rule 2
expr digit ? expr digit digit
6
2
2
expr
-
digit
Rule 3
expr digit digit ? digit digit digit
3
3
5
expr
digit

Rule 4
digit digit digit ? 3 digit digit
4
2
Rule 4
3 digit digit ? 3 8 digit
5
4
digit
8
Rule 4
3 8 digit ? 3 8 2
6
3
102
Bottom-Up Parsing
  • Parser examines terminal symbols of the input
    string, in order from left to right
  • Reconstructs the syntax tree from the bottom
    (terminal nodes) up (toward the root node)
  • Bottom-up parsing reduces a string w to the start
    symbol of the grammar.
  • At each reduction step a particular sub-string
    matching the right side of a production is
    replaced by the symbol on the left of that
    production, and if the sub-string is chosen
    correctly at each step, a rightmost derivation is
    traced out in reverse.

103
Bottom-Up Parsing
  • Types of bottom-up parsing algorithms
  • Shift-reduce parsing
  • At each reduction step a particular sub-string
    matching the right side of a production is
    replaced by the symbol on the left of that
    production, and if the sub-string is chosen
    correctly at each step, a rightmost derivation is
    traced out in reverse.
  • LR(k) parsing
  • L is for left-to-right scanning of the input, the
    R is for constructing a right-most derivation in
    reverse, and the k is for the number of input
    symbols of look-ahead that are used in making
    parsing decisions.

104
Bottom-Up Parsing Example38-2
105
Bottom-Up Parsing Example38-2
106
Bottom-Up Parsing Exampleabbcde
a
b
b
c
d
e
A
a
b
b
c
d
e
Abbcde ? aAbcde
A
a
b
b
c
d
e
aAbcde
107
Bottom-Up Parsing Exampleabbcde
A
A
a
b
b
c
d
e
aAbcde ? aAde
A
A
a
b
b
c
d
e
aAde
108
Bottom-Up Parsing Exampleabbcde
A
B
A
a
b
b
c
d
e
aAde ? aABe
A
B
A
a
b
b
c
d
e
aABe
109
Bottom-Up Parsing Exampleabbcde
S
A
B
A
a
b
b
c
d
e
aABe ? S
110
Bottom-Up Parsing Examplethe cat sees a rat.
the
cat
sees
a
rat
.
Noun
.
the
cat
sees
a
rat
the cat sees a rat. ? the Noun sees a rat.
Noun
the
cat
sees
a
rat
.
the Noun sees a rat.
111
Bottom-Up Parsing Examplethe cat sees a rat.
Subject
Noun
the
cat
sees
a
rat
.
the Noun sees a rat. ? Subject sees a rat.
Subject
Noun
.
the
cat
sees
a
rat
Subject sees a rat.
112
Bottom-Up Parsing Examplethe cat sees a rat.
Subject
Noun
Verb
.
the
cat
sees
a
rat
Subject sees a rat. ? Subject Verb a rat.
Subject
Noun
Verb
.
the
cat
sees
a
rat
Subject Verb a rat.
113
Bottom-Up Parsing Examplethe cat sees a rat.
Subject
Noun
Noun
Verb
.
the
cat
sees
a
rat
Subject Verb a rat. ? Subject Verb a Noun.
Subject
Noun
Noun
Verb
.
the
cat
sees
a
rat
Subject Verb a Noun.
114
Bottom-Up Parsing Examplethe cat sees a rat.
Subject
Object
Noun
Noun
Verb
.
the
cat
sees
a
rat
Subject Verb a Noun. ? Subject Verb Object.
What would happened if we choose Subject ? a
Noun instead of Object ? a Noun?
Subject
Object
Noun
Noun
Verb
.
the
cat
sees
a
rat
Subject Verb Object.
115
Bottom-Up Parsing Examplethe cat sees a rat.
Sentence
Subject
Object
Noun
Noun
Verb
.
the
cat
sees
a
rat
Subject Verb Object.
116
Top-Down Parsing
  • The parser examines the terminal symbols of the
    input string, in order from left to right.
  • The parser reconstructs its syntax tree from the
    top (root node) down (towards the terminal
    nodes).

An attempt to find the leftmost derivation for an
input string
117
Top-Down Parsers
  • General rules for top-down parsers
  • Start with just a stub for the root node
  • At each step the parser takes the left most stub
  • If the stub is labeled by terminal symbol t, the
    parser connects it to the next input terminal
    symbol, which must be t. (If not, the parser has
    detected a syntactic error.)
  • If the stub is labeled by nonterminal symbol N,
    the parser chooses one of the production rules
    N X1Xn, and grows branches from the node
    labeled by N to new stubs labeled X1,, Xn (in
    order from left to right).
  • Parsing succeeds when and if the whole input
    string is connected up to the syntax tree.

118
Top-Down Parsing
  • Two forms
  • Backtracking parsers
  • Guesses which rule to apply, back up, and changes
    choices if it can not proceed
  • Predictive Parsers
  • Predicts which rule to apply by using look-ahead
    tokens

Backtracking parsers are not very efficient. We
will cover Predictive parsers
119
Predictive Parsers
  • Many types
  • LL(1) parsing
  • First L is scanning the input form left to right
    second L is for producing a left-most derivation
    1 is for using one input symbol of look-ahead
  • Table driven with an explicit stack to maintain
    the parse tree
  • Recursive decent parsing
  • Uses recursive subroutines to traverse the parse
    tree

120
Predictive Parsers (Lookahead)
  • Lookahead in predictive parsing
  • The lookahead token (next token in the input) is
    used to determine which rule should be used next
  • For example

7
term
num

121
Predictive Parsers (Lookahead)
7
term
num

3
7
term
num

num
3
-
term
122
Predictive Parsers (Lookahead)
num
term

7
3
num
-
term
2
num
term

7
3
num
-
term
e
2
123
Recursive-Decent Parsing
  • Top-down parsing algorithm
  • Consists of a group of methods (programs) parseN,
    one for each nonterminal symbol N of the grammar.
  • The task of each method parseN is to parse a
    single N-phrase
  • These parsing methods cooperate to parse complete
    sentences

124
Recursive-Decent Parsing
Sentence
.
Verb
Subject
Object
the
cat
sees
a
rat
.
  • Decide which production rule to apply. Only one,
    1.
  • This step created four stubs.

125
Recursive-Decent Parsing
Sentence
.
Verb
Subject
Object
Noun
cat
sees
a
rat
the
126
Recursive-Decent Parsing
Sentence
.
Verb
Subject
Object
Noun
cat
sees
a
rat
the
127
Recursive-Decent Parsing
Sentence
.
Verb
Subject
Object
Noun
cat
sees
a
rat
the
128
Recursive-Decent Parsing
Sentence
.
Verb
Subject
Object
Noun
Noun
cat
sees
a
rat
the
129
Recursive-Decent Parsing
Sentence
.
Verb
Subject
Object
Noun
Noun
cat
sees
a
rat
the
130
Recursive-Decent Parsing
Sentence
.
Verb
Subject
Object
Noun
Noun
cat
sees
a
rat
the
131
Recursive-Descent Parser for Micro-English
  1. Sentence ? Subject Verb Object.
  2. Subject ? I a Noun the Noun
  3. Object ? me a Noun the Noun
  4. Noun ? cat mat rat
  5. Verb ? like is see sees
  • ParseSentence
  • ParseSubject
  • ParseObject
  • ParseVerb
  • ParseNoun

132
Recursive-Descent Parser for Micro-English
  1. Sentence ? Subject Verb Object.
  2. Subject ? I a Noun the Noun
  3. Object ? me a Noun the Noun
  4. Noun ? cat mat rat
  5. Verb ? like is see sees
  • ParseSentence
  • parseSubject
  • parseVerb
  • parseObject
  • parseEnd

Sentence ?
Subject
Verb
Object
.
133
Recursive-Descent Parser for Micro-English
  1. Sentence ? Subject Verb Object.
  2. Subject ? I a Noun the Noun
  3. Object ? me a Noun the Noun
  4. Noun ? cat mat rat
  5. Verb ? like is see sees

Subject ?
  • ParseSubject
  • if input I
  • accept
  • else if input a
  • accept
  • parseNoun
  • else if input the
  • accept
  • parseNoun
  • else error

I

a
Noun

the
Noun
134
Recursive-Descent Parser for Micro-English
  1. Sentence ? Subject Verb Object.
  2. Subject ? I a Noun the Noun
  3. Object ? me a Noun the Noun
  4. Noun ? cat mat rat
  5. Verb ? like is see sees
  • ParseNoun
  • if input cat
  • accept
  • else if input mat
  • accept
  • else if input rat
  • accept
  • else error

Noun ?
cat

mat

rat
135
Recursive-Descent Parser for Micro-English
Object ?
  • ParseObject
  • if input me
  • accept
  • else if input a
  • accept
  • parseNoun
  • else if input the
  • accept
  • parseNoun
  • else error
  1. Sentence ? Subject Verb Object.
  2. Subject ? I a Noun the Noun
  3. Object ? me a Noun the Noun
  4. Noun ? cat mat rat
  5. Verb ? like is see sees

me

a
Noun

the
Noun
136
Recursive-Descent Parser for Micro-English
  • ParseVerb
  • if input like
  • accept
  • else if input is
  • accept
  • else if input see
  • accept
  • else if input sees
  • accept
  • else error

Verb ?
  1. Sentence ? Subject Verb Object.
  2. Subject ? I a Noun the Noun
  3. Object ? me a Noun the Noun
  4. Noun ? cat mat rat
  5. Verb ? like is see sees

like

is

see

sees
137
Recursive-Descent Parser for Micro-English
  • ParseEnd
  • if input .
  • accept
  • else error
  1. Sentence ? Subject Verb Object.
  2. Subject ? I a Noun the Noun
  3. Object ? me a Noun the Noun
  4. Noun ? cat mat rat
  5. Verb ? like is see sees

.
138
Systematic Development of a Recursive-Descent
Parser
  • Given a (suitable) context-free grammar
  • Express the grammar in EBNF, with a single
    production rule for each nonterminal symbol, and
    perform any necessary grammar transformations
  • Always eliminate left recursion
  • Always left-factorize whenever possible
  • Transcribe each EBNF production rule NX to a
    parsing method parseN, whose body is determined
    by X
  • Make the parser consist of
  • A private variable currentToken
  • Private parsing methods developed in previous
    step
  • Private auxiliary methods accept and acceptIt,
    both of which call the scanner
  • A public parse method that calls parseS, where S
    is the start symbol of the grammar), having first
    called the scanner to store the first input token
    in currentToken

139
Quote of the Week
  • C makes it easy to shoot yourself in the foot
    C makes it harder, but when you do, it blows
    away your whole leg.
  • Bjarne Stroustrup

140
Quote of the Week
  • Did you really say that?
  •  
  • Dr. Bjarne Stroustrup
  •  
  • Yes, I did say something along the lines of C
    makes it easy to shoot yourself in the foot C
    makes it harder, but when you do, it blows your
    whole leg off. What people tend to miss is that
    what I said about C is to a varying extent true
    for all powerful languages. As you protect people
    from simple dangers, they get themselves into new
    and less obvious problems. Someone who avoids
    the simple problems may simply be heading for a
    not-so-simple one. One problem with very
    supporting and protective environments is that
    the hard problems may be discovered too late or
    be too hard to remedy once discovered. Also, a
    rare problem is harder to find than a frequent
    one because you don't suspect it.
  •  
  • I also said, "Within C, there is a much smaller
    and cleaner language struggling to get out." For
    example, that quote can be found on page 207 of
    The Design and Evolution of C. And no, that
    smaller and cleaner language is not Java or C.
    The quote occurs in a section entitled "Beyond
    Files and Syntax". I was pointing out that the
    C semantics is much cleaner than its syntax. I
    was thinking of programming styles, libraries and
    programming environments that emphasized the
    cleaner and more effective practices over archaic
    uses focused on the low-level aspects of C.

141
Converting EBNF Production Rules to Parsing
Methods
  • For production rule NX
  • Convert production rule to parsing method named
    parseN
  • Private void parseN ()
  • Parse X
  • Refine parseE to a dummy statement
  • Refine parse t (where t is a terminal symbol) to
    accept(t) or acceptIt()
  • Refine parse N (where N is a non terminal symbol)
    to a call of the corresponding parsing method
  • parseN()
  • Refine parse X Y to
  • parseX
  • parseY
  • Refine parse XY
  • Switch (currentToken.kind)
  • Cases in starterX
  • Parse X
  • Break

142
Converting EBNF Production Rules to Parsing
Methods
  • For X Y
  • Choose parse X only if the current token is one
    that can start an X-phrase
  • Choose parse Y only if the current token is one
    that can start an Y-phrase
  • startersX and startersY must be disjoint
  • For X
  • Choose
  • while (currentToken.kind is in starters
Write a Comment
User Comments (0)
About PowerShow.com