Title: Lexical Analysis and Introducing Haskell
1Lexical Analysis and Introducing Haskell
Please read 3.1-3.3
- Lexical analysis
- Regular Languages and regular expressions
- Lex
- The Haskell programming language
2Recall the Structure of a Compiler
character stream
Lexical Analysis
token stream
Parsing
Front End
syntax tree
Semantic Analysis
syntax tree
Intermediate Code Generate
Symbol Table
intermediate code
Optimization
Back End
intermediate code
Code Generation
target machine code
3Structure of a Compiler
character stream
Lexical Analysis
token stream
Today were going to look at implementation of
this part!
Parsing
Front End
syntax tree
Semantic Analysis
syntax tree
Intermediate Code Generate
Symbol Table
intermediate code
Optimization
Back End
intermediate code
Code Generation
target machine code
4Lexical Analysis
speed speed 10 time
Lexical Analysis
The tool lex creates lexical analyzers, Lexical
analysis is sometimes referred to as lexing or
scanning.
5Identifying Lexemes
The set of all possible lexemes for a given token
is described by the use of a pattern. Patterns
are typically described using regular
expressions Example An identifier in C is a
string of letters, numbers, or underscores
beginning with letter or underscore.
6What we need
1) We need to be able to describe our languages
lexemes unambiguously. 2) We need to create a
correct and efficient scanner that will recognize
the lexemes.
We have elegant theory for dealing with (1) and
advanced tools that completely automate (2)!
7Regular Languages
A formal language over an alphabet S is a set of
strings made up of characters drawn from S. The
regular languages are languages that can be
accepted by a deterministic finite state machine
and are the simplest of the hierarchy of formal
languages. Regular languages work well for
expressing the lexemes in a programming language.
In our case, S is all characters (including
newline).
8Regular Expressions
This is how we will describe our lexemes/tokens.
Well have a formal notation and a notation
that lex will accept. Example, C
identifiers. letter A-Za-z digit
0-9 identifier (letter _)
(letterdigit _)
9Regular Expressions Overview
10Examples
zero 0 digit 0 1 2 3 4
5 6 7 8 9 (well abbreviate
these ranges as just 0 9 and often omit the
quotation marks) letter A-Za-z
TT
11Examples
digits 0-9 letters A-Za-z identifier
(letter _) (letter digit _)
12Some common extensions
13Examples
How about 7 digit phone numbers? Phone number
with optional area code? More examples in the
ToeTipper
TT
14Regular Expressions for Languages
Well have one regular expression for each
lexeme/token. Each token will be identified by
an integer defined using defines define If
10 define Else 11 define Integer 12 define
LeftParen 13 Etc
Value will be the integers value.
lex will run a short bit of code for each matched
regular expression. Youll create the token in
that code.
15Regular Expressions for Languages
Keywords are easy If if Else else Do
do Each keyword is a single regular expression
consisting of only the keyword. Note
Ambiguities can occur. What about this
identifier
doWhatYouWant.
Sometimes there is only a keyword token with
the value being the actual keyword.
lex resolves ambiguities by always choosing the
longest regular expression that matches.
16Whitespace and Error Handling
We often ignore whitespace in the input stream.
Well often create a Whitespace token that is
just ignored when it is found Whitespace
\t \n For an error, create an Error
token. If nothing else is matched, return the
error token.
17Lex
Lex is a tool for creating lexical analyzers The
GNU version is called flex. Project 1 will
introduce the use of lex/flex and well start
building our compiler.
18Haskell
Were going to be building a complier for a
subset of Haskell. See http//haskell.org/ and
http//en.wikipedia.org/wiki/Haskell_(programming_
language) Free compilers and interpreters are
available. Im using WinHugs under Windows and
have asked that it be installed in the
labs. Haskell is a standardized, purely
functional programming language. It uses
something called Lazy evaluation.
19Haskell Examples
Well always start with this
module Main where fac n if n 0 then 1 else
n fac (n-1) main print(fac 5)
Lots more examples in class
TT