Title: CS 2130
1CS 2130
- Presentation 18
- Tools
- Lex
2Tools
- An important skill for a computer scientist is
knowing when to code the solution to a problem
and when to use a tool - It wasn't always this way
- A key to using tools is getting past the learning
curve - Some of the most useful tools are those which
produce usable code
3lex
- lex is a lexical analyzer generator
- It doesn't do lexical analysis. It writes code
that will perform lexical analysis for you - A program which can perform lexical analysis is
useful. - A program that can generate code for a custom
lexical analyzer that can be embedded into an
application you are creating is a gem
4Why learn lex?
- For some it will be a useful program that they
will use over and over - Others may use it but infrequently
- It appears in want ads!!!
- http//www.appforge.com/corp/careers/sfw_engineer.
html - The concepts involved with learning lex get at
the core material for this course. Even if you
never use lex again knowledge of its operation
will help you to better understand the
translation process
5More Info?
lex yacc, 2nd Edition By John Levine, Tony
Mason Doug Brown2nd Edition October
1992 1-56592-000-7 Order Number 0007386 pages,
29.95
http//www.oreilly.com/catalog/lex/
Note Some code taken from O'Reilly website
6Basic lex
7lex process
- Create a specification in a file called scan.l
- Note "scan" is an arbitrary name here
- lex processes scan.l and produces lex.yy.c
- lex.yy.c contains a function called yylex()
- lex scan.l
- cc -c lex.yy.c ? Produces lex.yy.o
- Now this object file can be linked with files
that call yylex.
8lex file format
- ltDefinitionsgt
- ...
- ...
-
- ltRulesgt
- ...
- ...
-
- ltSupplementary codegt
- ...
- ...
includes defines RegExps
Pattern/Action Pairs ltpattern1gt
ltaction1gt ltpattern2gt ltaction2gt
Additional code (Not always needed)
9lex
- Text not matched is echoed as read
- Thus, there is an implied ECHO
- Which can be supressed. How?
- Lex patterns only match a given input character
or string once - Lex executes the action for the longest possible
match for the current input.
10A Simple Example
-
- enum ONE1, TWO, THREE, IDENT, ENDFILE
-
- one_a (aA)
- two_as (aaAA)
- three_as (aaaAAA)
- ident A-Za-z_0-9A-Za-z_
- ignore (.\n)
-
- ltltEOFgtgt return ENDFILE
- one_a return ONE
- two_as return TWO
- three_as return THREE
- ident return IDENT
- ignore
11Build it
- Put this code in a file called
- scan.l
- Run lex
- lex scan.l
- The output will be
- lex.yy.c
- Which can be compiled
- gcc -c lex.yy.c
- Which produces
- lex.yy.o
12Write a c program
- / Includes not shown /
- enum ONE1, TWO, THREE, IDENT, ENDFILE
- int main(void)
- int token
- int i
- for(i0 ilt10 i)
- token yylex()
- if(token ENDFILE)
- PRINTF(("Goodbye!\n"))
- return EXIT_SUCCESS
-
- PRINTF(("Token d\n", token))
-
- return EXIT_SUCCESS
13Put it all together
- gcc -o tester lex.yy.o tester.c -lfl
14Warning
- lex (and flex) as well as yacc contain more bells
and whistles than you can shake a stick at... - Make sure that you understand the basic
functionality before attempting advanced
projects!!!
15The Simplest Lex Program
Put this code in a file called scan.l Run
lex lex scan.l Compile gcc lex.yy.c -ll Run by
typing a.out or a.out lt somefile.txt
This form will read from stdin. To terminate
type ctrl/d
16Why does this work?
17The Simplest Lex Program
Put this code in a file called scan.l Run
lex lex scan.l Compile gcc lex.yy.c -ll Run by
typing a.out or a.out lt somefile.txt
By default this rule exists!
This form will read from stdin. To terminate
type ctrl/d
18The Simplest Lex Program
-
- .\n ECHO
-
- int main()
-
- yylex()
- return 0
-
We got this by default!
PLUS
19lex example
A definition involving a regular expression
- wspc \t\n
-
- wspc output( ' ' )
Reduce all whitespace to a single space. Note
This works on acme. On linux boxes e.g.
helsinki substitute putc(' ', yyout) for output(
' ' )
Note the curlies meaning substitute the
definition of wspc here
20lex example
-
- include ltstdio.hgt
- include ltctype.hgt
-
- word -'A-Za-z
-
- word printf("cs", toupper(yytext),
- yytext1)
How to include c code
21lex example
-
- include ltstdio.hgt
- include ltctype.hgt
-
- word -'A-Za-z
-
- word printf("cs", toupper(yytext),
- yytext1)
yytext is an Internal variable containing text
of word matched
Capitalize first letter of each word leaving
remainder of text unchanged the 777 hits
becomes The 777 Hits
22To be more specific...
- If you don't specifiy a main you get one for
free!!! - If you call yylex it will start scanning the
appropriate input and as it recognizes rules do
the specified action - Example
- AAA printf("ltFound 3 A'sgt")
- AA printf("ltFound 2 A'sgt")
- Given AAAAAAAA
- Will print
- ltFound 3 A'sgtltFound 3 A'sgtltFound 2 A'sgt
- The scanning continues unless a value is returned!
23lex example
-
- include ltstdio.hgt
- static int lineno 0
-
-
- \n\n printf( "5d ", lineno ) ECHO
Print out file with line numbers
24another way
-
- include ltstdio.hgt
- static int lineno 0
-
- line \n\n
-
- line printf( "5d s", lineno, yytext )
Print out file with line numbers
25Or
-
- include ltstdio.hgt
- static int lineno 0
-
- line .\n
-
- line printf( "5d s", lineno, yytext )
Print out file with line numbers
26Or even
-
- include ltstdio.hgt
- static int lineno 0
-
- line .\n
-
- line printf("/ 5d / s", lineno, yytext )
Print out file with line numbers commented for c
27another example
-
- include "defs.h"
- static char BigLine NULL
- static int BigLineLen -1
-
- line \n\n
-
- line if( yyleng gt BigLineLen )
- free( BigLine )
- BigLineLen
- ( BigLine strdup( yytext ) )
- NULL ? -1 yyleng
-
-
- int yywrap( void )
- PRINTF(
- ("s",( BigLine NULL ) ? "" BigLine ))
- return 1
-
yywrap gets called at the end of input
28count chars, words, lines
-
- include ltstdio.hgt
- static int words 0, lines 0, chars 0
-
- word -'A-Za-z
-
- word words chars yyleng
- \n lines chars
- . chars
-
- int yywrap( void )
- printf( "8u8u8u\n", lines, words, chars )
- return 1
29Distribution of word lengths
30-
- include "defs.h"
- define MAX_WORD_LEN 100
- static unsigned int WrdLengArr MAX_WORD_LEN
- static unsigned int WrdLengSum, NumWords
-
- word \t\n
-
- word if( yyleng lt MAX_WORD_LEN )
-
- WrdLengArr yyleng
- WrdLengSum yyleng
- NumWords
-
- .\n / do nothing /
31- int yywrap( void )
- int i
- PRINTF(( "Length\tFrqncy\n" ))
- for( i 0 i lt MAX_WORD_LEN i )
- if(WrdLengArr i ! 0 )
-
- PRINTF(( "4u\t4u\n", i, WrdLengArr i
)) -
- PRINTF((" Avg\t0.2f\n", (float)WrdLengSum /
NumWords)) - return 1
-
32Word Replace
33-
- include "defs.h"
- define ARG( n ) ( argc lt (n) ? "" argv (n)
) - static char SearchWord
- static char InsertWord
-
- word -a-zA-Z
- num 0-9
- punct !.,()
-
34- punct
- num
- word
- if( strcmp( yytext, SearchWord )
0 ) -
- PRINTF(( "s", InsertWord ))
-
- else
-
- PRINTF(( "s", yytext ))
-
-
35- int main( int argc, char argv )
-
- const char OutFile "output.txt"
- char InFile ARG( 1 )
- SearchWord ARG( 2 )
- InsertWord ARG( 3 )
- if((yyin freopen( InFile, "r", stdin ) )
NULL -
- (yyout freopen( OutFile, "w", stdout ) )
NULL ) -
- ERR_MSG( freopen )
- return EXIT_FAILURE
-
- return ( yylex( ) 0 ) ? EXIT_SUCCESS
EXIT_FAILURE -
36Questions?
37(No Transcript)