Title: Translating High Level Languages
1Translating High Level Languages
2Stages of translation
- Lexical analysis
- Syntactic analysis
- Code generation
- Linking
- Before
- Execution
3Lexical analysis
- Translate stream of characters into lexemes
- Lexemes belong to categories called tokens
- Token identity of lexemes is used at the next
stage of syntactic analysis
4Examples tokens and lexemes
- Some token categories contain only one lexeme
semi-colon - Some tokens categorize many lexemes identifier
count, maxCost,
5Tokens and Lexemes
yVal x 450 min ( 100, 4xVal ))
identifier
illegal lexeme
equal_sign
left_paren
- Lexical analysis
- identifies lexemes and their token type
- recognizes illegal lexemes (4xVal)
- does NOT identify syntax error ) )
6Syntax or Grammar of Language
- rules for
- generating (used by programmer) or
- recognizing (used by syntactic analyzer in
translation - a valid sequence of lexemes
7Grammars
- 4 categories of grammars (Chomsky)
- Two categories are important in computing
- Regular expressions (pattern matching)
- Context-free grammars (programming languages)
8Context-free grammar
- Meta-language for describing languages
- States rules or productions for what lexeme
sequences are correct in the language - Written in Backus-Naur Form (BNF)
9Example of BNF rule
- PROBLEM how to recognize all these as correct?
- y x
- f rVec.length 1
- button4.label Exit
- RULE for defining assignment statement
- ltassigngt ltvariablegt ltexpressiongt
- Assumes other rules for ltvariablegt, ltexpressiongt
10BNF rules
- Non-terminal and terminal symbols
- Non-terminals are defined by at least one rule
- Terminals are tokens or lexemes
- ltassignmentgt
- lt vargt ltexpressiongt
11Simple sample grammar(p.113)
ltassigngt ltidgt ltexprgt ltidgt A B
C // lexical ltexprgt ltidgt ltexprgt
ltidgt ltexprgt (
ltexprgt) ltidgt Assumes other
rules for ltvariablegt, ltexpressiongt
12Simple sample production
ltassigngt ltidgt ltexprgt lt- apply one rule at
each step B ltexprgt to leftmost
non-terminal B ltidgt ltexprgt B A
ltexprgt B A ( ltexprgt ) B A ( ltidgt
ltexprgt ) B A ( C ltexprgt ) B A ( C
ltidgt ) B A ( C C )
13Sample parse tree
ltassigngt
ltexprgt
ltidgt
ltexprgt
ltidgt
B
ltexprgt
)
A
(
ltexprgt
ltidgt
ltidgt
C
C
Leaves represent the sentence of lexemes
14Ambiguous grammar
- Different parse trees for same sentence
- Different translations for same sentence
- Different machine code for same source code!
15Grammars for human conventions
- Putting features of languages into grammars
- expression any length
- precedence - an extra non-terminal
- associativity - order in recursive rules
- nested if statements - dangling else problem
p. 119
16Forms for grammars
- Backus-Naur form (BNF)
- Extended Backus-Naur fomr (EBNF)
- -shortens set of rules
- Syntax graphs
- -easier to read for learning language
17EBNF
- optional zero or one occurrence
- ltexprgt -gt ltexprgt lttermgt
- optional zero or more occurrences
- ltexprgt -gt lttermgt lttermgt
- or choice of alternative symbols
- lttermgt -gt lttermgt (/) lttermgt
18Syntax Graph - basic structures
expr
term
term
factor
term
/
term
factor
factor
/
expr
term
term
-
19BNF (p. 121)
EBNF
ltexprgt -gt ltexprgtlttermgt
ltexprgt-lttermgt lttermgt lttermgt -gt
lttermgtltfactorgt
lttermgt/ltfactorgt ltfactorgt
ltexprgt -gt ltexprgt (-) lttermgt lttermgt -gt lttermgt
(\/) ltfactorgt
ltexprgt -gt lttermgt (-) lttermgt lttermgt -gt
ltfactorgt (/)ltfactorgt
Syntax Graph
expr
term
term
-
term
factor
factor
/
20Attribute grammars
- Problem context-free grammars cannot describe
some features needed in programming - e.g. rules for using data types