Compilers - PowerPoint PPT Presentation

1 / 36
About This Presentation
Title:

Compilers

Description:

A lexer typically strips whitespace and comments. Why? The job of the lexer is to group characters into tokens. ... The output of the lexer is a series of tokens ... – PowerPoint PPT presentation

Number of Views:19
Avg rating:3.0/5.0
Slides: 37
Provided by: webhome2
Category:
Tags: compilers | lexer

less

Transcript and Presenter's Notes

Title: Compilers


1
Compilers
2
The Great Mystery....
  • How does a computer understand a program written
    in a high-level language such as Java or SML?

3
Compilers
  • A piece of software which translates code written
    in a high level language to a lower level
    language (typically assembly)

4
How Do They Work?
  • A compiler has roughly 4 phases
  • Lexing (or scanning)
  • Parsing
  • Semantic Analysis / Code Generation
  • Code Optimization

5
The Big Picture
6
Lexical Analysis
  • Sometimes called lexing or scanning
  • A lexer typically strips whitespace and comments
  • Why?
  • The job of the lexer is to group characters into
    tokens.

7
Tokens
  • Tokens are the basic syntactic units of our
    language
  • Ex keywords, identifiers, constants
  • Two attributes
  • Sequence of characters
  • Classification

8
Token Classifications
9
Lexing Example
  • What tokens appear in the following line of Java
    code
  • bar foo baz - 10

10
Example (cont)
  • We end up with the following tokens
  • (1, bar)
  • (3, )
  • (1, foo)
  • (4, )
  • (1, baz)
  • (5, -)
  • (2, 10)
  • (6, )

11
Lexing Example 2
  • What about
  • if (x 3)

12
Example 2 (cont)
  • (8, if)
  • (10, ()
  • (1, x)
  • (7, )
  • (2, 3)
  • (11, ))

13
Parsing
  • The output of the lexer is a series of tokens
  • These tokens are then fed into a parser which
    determines if the tokens are syntactically
    correct
  • That is, we look at the structure of the code

14
Parsing (cont)
  • Parsing tries to answer the question is this
    sentence in the language?
  • This is commonly done by creating a parse tree.
  • Example
  • The man bit the dog

15
Example
16
What about...
  • The statement
  • A b c

17
Example 2
18
Parsing (cont)
  • If a parse tree cannot be generated for a
    statement, then that statement is not
    syntactically valid.
  • How do we specify the types of statements that
    are recognized for a language?
  • That is, how do we specify a languages grammar?

19
Grammars
  • Grammars specify how tokens can be combined into
    sentences in a language
  • We will look at one type of grammars called
    context-free grammars
  • Identified by Noam Chomksy in his hierarchy of
    grammars

20
Context Free Grammars
  • Big part of CSC 320 (and CSC 435)
  • CFGs are made up of a set of rules (or
    productions)
  • Each rule maps a single nonterminal symbol to an
    expression
  • Expressions are made up of terminals (tokens) and
    other nonterminals

21
Context-Free Grammars (cont)
  • The goal symbol is the one non-terminal a grammar
    tries to replace
  • The collection of all possible sentences which
    can be generated by a grammar is called the
    language.
  • How do we represent these grammars?
  • BNF!

22
Backus-Naur Form (BNF)
  • Named after the famous computer scientists John
    Backus and Peter Naur
  • Rules take the form
  • ltnonterminalgt ltproduction1gt ltproduction2gt
    ....

23
Example 1
  • A simplified English grammar
  • ltsentencegt ltsubjectgtltverbgtltobjectgt
  • ltsubjectgt ltnoun phrasegt
  • ltobjectgt ltnoun phrasegt
  • ltnoun phrasegt ltarticlegtltnoungt
  • ltarticlegt the a an ?
  • ltnoungt man dog cat John
  • ltverbgt bit bite run

24
Example 2
  • Java-ish assignment statement
  • ltassnStmtgt ltsymbolgtltassnOpgtltexpgt
  • ltexpgt ltsymbolgt ltexpgtltarithOpgtltexpgt
  • ltsymbolgt a b c d
  • ltassnOpgt
  • ltarithOpgt - /

25
Why Does This Matter?
  • From the text
  • If, by repeated applications of the rules of the
    grammar, a parser can convert the sequence of
    input tokens into the goal symbol, then that
    sequence of tokens is a syntactically valid
    statement of the language. Otherwise, it is not a
    syntactically valid statement of the language.

26
Example
  • Given the following grammar
  • ltsentencegt ltnoungtltverbgt
  • ltnoungt bees dogs
  • ltverbgt buzz bite
  • Are the following valid sentences?
  • dogs bite
  • bees buzz
  • buzz bees

27
A Problem?
  • Draw a parse tree for
  • A B C D

28
Ambiguity
  • There were two different ways of producing a
    parse tree for the last sentence
  • If a grammar can produce two different parse
    trees for the same sentence, it is said to be
    ambiguous.
  • Why is this a problem?

29
Parsing...
  • The end result of the parser is a tree
    representation of the code
  • Sometimes called a parse tree
  • Sometimes called an abstract syntax tree (AST)
  • This is fed into the next phase of the compiler
    semantic analysis

30
Semantic Analysis
  • Parsing focused on the structure of the code
  • Semantic analysis focuses on the meaning of the
    code
  • That is do the sentences make sense?

31
Semantic Analysis
  • When are statements in a programming language
    syntactically correct, but meaningless or
    nonsensical?
  • Examples?

32
Semantic Analysis
  • Examples of syntactically correct but meaningless
    statements
  • Using a variable before it is declared
  • Using a variable in a way that doesn't make sense
    for it's type (adding an int to a boolean for
    example)
  • How do we detect these problems?

33
Symbol Table
  • A dictionary-like structure which maps
    identifiers to their attributes
  • Example...
  • Now
  • If we see a variable declaration we just add a
    new entry to the table
  • If we see a variable usage we look it up in the
    table

34
Code Generation
  • Once we've determined the program is
    syntactically correct and meaningful, we can then
    generate equivalent code
  • Often the code generated is some sort of
    intermediate language, which can be more easily
    translated into the final target language.

35
Intermediate Representation (IR)
  • A simplified language that typically is very
    similar to the final target language (usually
    assembly)
  • Why is this a good thing?

36
Intermediate Representation
  • To compile a new language we only need solve half
    the problem
  • Same to add support for a new platform
Write a Comment
User Comments (0)
About PowerShow.com