Title: Introduction to Yacc
1Lex and Yacc
COP - 3402
2General Compiler Infra-structure
Syntactic Structure
Program source (stream of characters)
Tokens
Scanner (tokenizer)
Parser
Semantic Routines
Lex
Yacc
IR Intermediate Representation (1)
Analysis/ Transformations/ optimizations
Symbol and Attribute Tables
IR Intermediate Representation (2)
Code Generator
Assembly code
3Lex Yacc
- Lex
- generates C code for the lexical analyzer
(scanner) - Token patterns specified by regular expressions
- Yacc
- generates C code for a LR(1) syntax analyzer
(parser) - BNF rules for the grammar
4Lex
- lex is a program (generator) that generates
lexical analyzers, (widely used on Unix). - It is mostly used with Yacc parser generator.
- Written by Eric Schmidt and Mike Lesk.
- It reads the input stream (specifying the lexical
analyzer ) and outputs source code implementing
the lexical analyzer in the C programming
language. - Lex will read patterns (regular expressions)
then produces C code for a lexical analyzer that
scans for identifiers.
5Lex predefined variables and functions
6Lex Pattern Matching Primitives
7Lex Pattern Matching Examples
8Example Simple Calculator
- Computes the basic arithmetic operations
- Allows declaration of variables
- Enough to illustrate the basic structure of Lex
and Yacc programs
9Lex program structure
definitions rules subroutines
include ltstdio.hgt include "y.tab.h" int
c extern int yylval " " a-z
c yytext0 yylval c - 'a' return(LETTER)
0-9 yylval atoi(yytext)
return(NUMBER) a-z0-9\b c yytext0
return(c)
10Pattern Matching and Action
Match a character in the a-z range
Buffer
a-z c yytext0 yylval c - 'a'
return(LETTER) 0-9 yylval
atoi(yytext) return(NUMBER)
Place the offset c a In the stack
Match a positive integer (sequence of 0-9 digits)
Place the integer value In the stack
11Yacc
- Grammars described by rules using a variant of
the Backus Naur Form (BNF) - Context-free grammars
- LALR(1) parse table is generated automatically
based on the rules - Actions are added to the rules and executed after
each reduction
12Yacc Program Structure
include ltstdio.hgt int regs26 int
base token NUMBER LETTER left ''
'- left '' '/ list list
stat '\n' list error '\n' yyerrok stat
expr printf("d\n",1)
LETTER '' expr regs1
3 expr '(' expr ')'Â Â 2
expr '' expr 1
3 LETTER regs1
main()return(yyparse()) yyerror(CHAR
s)fprintf(stderr, "s\n",s) yywrap()
return(1)
definitions rules subroutines
13Rule Reduction and Action
Action
Grammar rule
stat expr printf("d\n",1)
LETTER '' expr regs1
3 expr expr '' expr
1 3 LETTER
regs1
or operator For multiple RHS
14Further reading
- A Compact Guide to Lex Yacc, Thomas Niemann
(recommended) - Lex Yacc, Doug Brown (OReily)
- Lots of resources on the web
- Check our website for some suggestions
15Conclusions
- Yacc and Lex are very helpful for building the
compiler front-end - A lot of time is saved when compared to
hand-implementation of parser and scanner - They both work as a mixture of rules and C
code - C code is generated and is merged with the rest
of the compiler code