From program text to tokens the lexical structures - PowerPoint PPT Presentation

1 / 18
About This Presentation
Title:

From program text to tokens the lexical structures

Description:

Read in memory the entire file with a system call instead of ... For the purpose of compiler construction, we need to expression this in regular expression. ... – PowerPoint PPT presentation

Number of Views:18
Avg rating:3.0/5.0
Slides: 19
Provided by: ttu2
Category:

less

Transcript and Presenter's Notes

Title: From program text to tokens the lexical structures


1
From program text to tokens the lexical
structures
  • From Section 2.1, Modern Compiler Design, by
    Dick Grun et al.

2
2.1.1 Reading the program text
  • Read in memory the entire file with a system call
    instead of using the standard character-reading
    routines

3
2.1.1.1 The troublesome newline
  • Each OS implements its own convention.
  • UNIX (o12), MS-DOS (o15o12), OS-370(non-char)
  • The newline character is rather an end-of-line
    character

4
2.1.2 Lexical versus syntactic analysis
  • Where the border between the two lies.
  • Lexical analysis produces tokens and syntax
    analysis consumes them, but what exactly is a
    token?
  • If it can be separated from its left and right
    neighbors by white space without changing the
    meaning, its a token otherwise, it isnt.

5
2.1.3 Regular expressions and regular descriptions
  • An identifier is a sequence of letters, digits,
    and underscores that starts with a letter no
    consecutive underscores are allowed in it, nor
    can it have a trailing underscore.
  • This is satisfactory for the user of language.
  • For the purpose of compiler construction, we need
    to expression this in regular expression.
  • A regular expression is a formula that describes
    a possibly infinite set of strings.
  • It can be viewed both as a recipe for generating
    these strings and as a patter to match these
    strings.

6
2.1.3 Regular expressions and regular descriptions
abcd?
(a(b)(c(d?))
7
2.1.3.1 Regular expressions and BNF/EBNF
  • Basic patterns share with the BNF notation the
    invisible concatenation operators and the
    alternative operator, and with EBNF the
    repetition operators and parentheses.

8
2.1.3.2 Escape characters in regular expressions
  • \ denotes the asterisk
  • \\ the backslash

9
2.1.3.3 Regular descriptions
  • A regular description is like a context-free
    grammar in EBNF, with the restriction that no
    non-terminal can be used before it has been fully
    defined.

letter ? a-zA-Z digit ? 0-9 underscore ?
_ letter_or-digit ? letter digit underscored_tai
l ? underscore letter_or_digit identifier ?
letter letter_or_digit underscored_tail
identifier ? a-zA-Z (a-zA-Z0-9)
(_(a-zA-Z)0-9))
10
2.1.4 Lexical analysis
  • The basic task of a lexical analyzer is
  • given a set S of token descriptions and a
    position P in the input stream,
  • to determine which of the regular expressions in
    S will match a segment of the input starting at P
    and what that segment is.

11
2.1.5 Creating a lexical analyzer by hand
12
2.1.5 Creating a lexical analyzer by hand
13
2.1.5 Creating a lexical analyzer by hand
14
2.1.5 Creating a lexical analyzer by hand
15
2.1.5 Creating a lexical analyzer by hand
16
2.1.5 Creating a lexical analyzer by hand
17
2.1.5 Creating a lexical analyzer by hand
18
2.1.5 Creating a lexical analyzer by hand
Write a Comment
User Comments (0)
About PowerShow.com