From program text to tokens the lexical structures

About This Presentation

Title:

From program text to tokens the lexical structures

Description:

Read in memory the entire file with a system call instead of ... For the purpose of compiler construction, we need to expression this in regular expression. ... – PowerPoint PPT presentation

Number of Views:18

Avg rating:3.0/5.0

Slides: 19

Provided by: ttu2

Category:

more less

Transcript and Presenter's Notes

Title: From program text to tokens the lexical structures

1
From program text to tokens the lexical
structures

From Section 2.1, Modern Compiler Design, by
Dick Grun et al.

2
2.1.1 Reading the program text

Read in memory the entire file with a system call
instead of using the standard character-reading
routines

3
2.1.1.1 The troublesome newline

Each OS implements its own convention.
UNIX (o12), MS-DOS (o15o12), OS-370(non-char)
The newline character is rather an end-of-line
character

4
2.1.2 Lexical versus syntactic analysis

Where the border between the two lies.
Lexical analysis produces tokens and syntax
analysis consumes them, but what exactly is a
token?
If it can be separated from its left and right
neighbors by white space without changing the
meaning, its a token otherwise, it isnt.

5
2.1.3 Regular expressions and regular descriptions

An identifier is a sequence of letters, digits,
and underscores that starts with a letter no
consecutive underscores are allowed in it, nor
can it have a trailing underscore.
This is satisfactory for the user of language.
For the purpose of compiler construction, we need
to expression this in regular expression.
A regular expression is a formula that describes
a possibly infinite set of strings.
It can be viewed both as a recipe for generating
these strings and as a patter to match these
strings.

6
2.1.3 Regular expressions and regular descriptions
abcd?
(a(b)(c(d?))
7
2.1.3.1 Regular expressions and BNF/EBNF

Basic patterns share with the BNF notation the
invisible concatenation operators and the
alternative operator, and with EBNF the
repetition operators and parentheses.

8
2.1.3.2 Escape characters in regular expressions

\ denotes the asterisk
\\ the backslash

9
2.1.3.3 Regular descriptions

A regular description is like a context-free
grammar in EBNF, with the restriction that no
non-terminal can be used before it has been fully
defined.

letter ? a-zA-Z digit ? 0-9 underscore ?
_ letter_or-digit ? letter digit underscored_tai
l ? underscore letter_or_digit identifier ?
letter letter_or_digit underscored_tail
identifier ? a-zA-Z (a-zA-Z0-9)
(_(a-zA-Z)0-9))
10
2.1.4 Lexical analysis

The basic task of a lexical analyzer is
given a set S of token descriptions and a
position P in the input stream,
to determine which of the regular expressions in
S will match a segment of the input starting at P
and what that segment is.

11
2.1.5 Creating a lexical analyzer by hand
12
2.1.5 Creating a lexical analyzer by hand
13
2.1.5 Creating a lexical analyzer by hand
14
2.1.5 Creating a lexical analyzer by hand
15
2.1.5 Creating a lexical analyzer by hand
16
2.1.5 Creating a lexical analyzer by hand
17
2.1.5 Creating a lexical analyzer by hand
18
2.1.5 Creating a lexical analyzer by hand

Write a Comment

User Comments (0)