Lexical and Analysis - PowerPoint PPT Presentation

1 / 22
About This Presentation
Title:

Lexical and Analysis

Description:

Compilation large industrial application development. ... execution efficiency is not critical, such as scripts embedded in HTML documents ... – PowerPoint PPT presentation

Number of Views:78
Avg rating:3.0/5.0
Slides: 23
Provided by: ics78
Category:
Tags: analysis | lexical

less

Transcript and Presenter's Notes

Title: Lexical and Analysis


1
Chapter 4
  • Lexical and Analysis

2
4.1 Introduction
  • Lexical Analysis and Compiling process

3
Introduction (cont.)
  • Compilation ? large industrial application
    development.
  • Pure interpretation ? smaller system in which
    execution efficiency is not critical, such as
    scripts embedded in HTML documents
  • Hybrid system ? high level to intermediate forms
    Java and perl.
  • Nearly all syntax analysis is based on a formal
    description of the syntax of the source language.
  • The Syntax analysis portion of a language
    processor nearly always consist of two parts
  • A low-level part called a lexical analyzer.
  • A high part called a syntax analyzer or parser.

4
Introduction (cont.)
  • Using BNF has at least three compelling
    advantages
  • BNF descriptions of the syntax of a programs are
    clear and concise, both for humans and software
    systems that use them.
  • The parser can be based directly on the BNF.
  • Implementations based on BNF are relatively easy
    to maintain because of their clear modularity.
  • Reasons to separate lexical and syntax analysis
  • Simplicity - less complex approaches can be used
    for lexical analysis separating them simplifies
    the parser
  • Efficiency - separation allows optimization of
    the lexical analyzer
  • Portability - parts of the lexical analyzer may
    not be portable, but the parser always is
    portable

5
4.2 Lexical Analysis
  • A lexical analyzer is a pattern matcher for
    character strings.
  • A lexical analyzer is a front-end for the
    parser.
  • Identifies substrings of the source program that
    belong together - lexemes
  • Lexeme mach a character pattern, which is
    associated with a lexical category called a
    token.
  • Sum oldsum value / 100
  • Lexeme ? sum, , oldsum, -, value, / , 100,
  • Token ? IDENT, ASSIGN_OP, IDENT, SUBTRACT_OP,
    IDENT, DIVISION_OP, INT_LIT, SEMICOLON

6
4.2 Lexical Analysis
  • Lexical analyzers extract lexemes from a given
    input string and produce the corresponding
    tokens.
  • Three approaches to building a lexical analyzer
  • -1. Write a formal description of the tokens and
    use a software tool that constructs table-driven
    lexical analyzers given such a description
  • -2. Design a state diagram that describes the
    tokens and write a program that implements the
    state diagram
  • -3. Design a state diagram that describes the
    tokens and hand construct a table-driven
    implementation of the state diagram.
  • We only discuss approach 2

7
4.2 Lexical Analysis
  • A native state diagram would have a transition
    from every state on every character in the source
    language such a diagram would be very large!
  • In many cases, transitions can be combined to
    simplify the state diagram
  • When recognizing an identifier, all uppercase and
    lowercase letters are equivalent - Use a
    character class that includes all letters
  • When recognizing an integer literal, all digits
    are equivalent - use a digit class
  • Reserved words and identifiers can be recognized
    together ( rather than having a part of the
    diagram for each reserved word)
  • Use a table lookup to determine whether a
    possible identifier is in fact reserved word

8
4.2 Lexical Analysis (cont.)
  • Convenient utility subprograms
  • getChar - gets the next character of input, puts
    it in
  • nextChar, determines its class and puts the class
    in
  • charClass
  • addChar - puts the character from nextChar into
    the place the lexeme is being accumulated, lexeme
  • lookup -determines whether the string in lexeme
    is a reserved word (returns a code)

9
4.2 Lexical Analysis (cont.)
10
The Parsing Problem
  • The part of the process of analyzing syntax that
    is referred to as syntax analysis is often called
    parsing.
  • Goals of the parser, given as input program
  • Syntax analysis must check the input program to
    determine whether it is syntactically correct.
    When an error is found, the analyzer must produce
    a diagnostic message and recover.
  • Produce either a complete parse tree, or at least
    trace the structure of the complete parse tree.
    In either case, the result is used as the basis
    for translation.
  • Two categories of parsers
  • Top down produce the parse tree, beginning at the
    root
  • Order is that of a leftmost derivation.
  • Bottom up produce the parse tree, beginning at
    the leaves
  • Parse tree is built from the leaves upward to the
    root.

11
Grammar symbols
  • Terminal symbols Lowercase letters at the
    beginning of the alphabet (a,b, ..,)
  • Nonterminal symbols Uppercase letters at the
    beginning of alphabet (A, B, ..)
  • Terminals or nonterminals Uppercase letters at
    the end of the alphabet (W, X, Y, Z)
  • Strings of terminals Lowercase letters at the
    end of the alphabet (w, x, y, z)
  • Mixed strings (terminals and/or nonterminals)
    Lowercase Greek letters (a, ß, µ )

12
Top-Down Parsers
  • Given a sentential form that is part of a
    leftmost derivation, the parsers task is to find
    the next sentential form in that leftmost
    derivation.
  • The most common top-down parsing algorithms
  • Recursive descent - use a parsing table rather
    than code
  • LL parsers

13
Parser
Series of sub-routine calls
14
Bottom-Up Parsers (LR parser)
15
The Complexity of Parsing
  • Parsers that works for any unambiguous grammar
    are complex and inefficient (O(n3), which means
    the amount of time they take is on the order of
    the cube of the length of the string to be
    parsed.
  • All algorithms used for the syntax analyzers of
    compilers have complexity O(n), which means the
    time take is linearly related to the length of
    the string to be parsed.

16
The Recursive-Descent Parsing Process
  • A recursive-descent parser is so named because it
    consists of a collection of subprograms, many of
    which are recursive, and it produces a parse tree
    in top-down (descending) order.
  • There is a subprogram for each nonterminal in the
    grammar, which can parse sentences that can be
    generated by that nonterminal.
  • A grammar for simple expressions
  • ltexprgt ? lttermgt ( - ) lttermgt
  • lttermgt ? ltfactorgt ( /) ltfactorgt
  • ltfactorgt ? id (ltexprgt )
  • The coding process when there is only one RHS
  • For each terminal symbol in the RHS, compare it
    with the next input token if they match,
    continue, else there is an error
  • For each nonterminal symbol in the RHS, call its
    associated parsing subprogram

17
The Recursive-Descent Parsing Process (cont.)
  • / Function expr
  • Parses strings in the language generated by
    the rule
  • ltexprgt ? lttermgt ( -) lttermgt /
  • void expr()
  • / Parse the first term /
  •   term()
  • / As long as the next token is or -, call
  • lex to get the next token, and parse the next
    term /
  •   while (nextToken PLUS_CODE
  • nextToken MINUS_CODE)
  •     lex()
  •     term()
  •   
  • This particular routine does not detect errors
  • Convention Every parsing routine leaves the next
    token in nextToken

18
The Recursive-Descent Parsing Process (cont.)
  • A nonterminal that has more than one RHS requires
    an initial process to determine which RHS it is
    to parse
  • The correct RHS is chosen on the basis of the
    next token of input (the lookahead)
  • The next token is compared with the first token
    that can be generated by each RHS until a match
    is found
  • If no match is found, it is a syntax error

19
The Recursive-Descent Parsing Process (cont.)
  • / Function factor
  • Parses strings in the language generated by
  • the rule ltfactorgt -gt id (ltexprgt) /
  • void factor()
  • / Determine which RHS /
  •    if (nextToke ID_CODE)
  • / For the RHS id, just call lex /
  •      lex()
  • / If the RHS is (ltexprgt) call lex to pass
  • over the left parenthesis, call expr, and
  • check for the right parenthesis /
  •    else if (nextToken LEFT_PAREN_CODE)
  •      lex()
  • expr()
  •     if (nextToken RIGHT_PAREN_CODE)
  • lex()
  • else
  • error()
  • / End of else if (nextToken ... /

20
The Recursive-Descent Parsing Process (cont.)
21
(No Transcript)
22
(No Transcript)
Write a Comment
User Comments (0)
About PowerShow.com