Title: CS 2130
1CS 2130
- Presentation 15
- Compiler Introduction
2Big Picture
- Parsing Matching code we are translating to
rules of a grammar. Building a representation of
the code. - Scanning An abstraction that simplifies the
parsing process by converting the raw text input
into a stream of known objects called tokens. - Abstract Syntax Tree (e.g.) One of several
possible ways of storing the program in a form
that captures the essence of the desired
operations - Code Generation In general, when the parser
finds a match some sequence of instructions in
the target language may be generated - Optimization Making the raw code generated better
3Overall Operation
- Parser is in control of the overall operation
- Demands scanner to produce a token
- Scanner reads input file into token buffer
forms a token (How?) - Token is returned to parser
- Parser attempts to match the token (How?)
- Failure Syntax Error!
- Success
- Does nothing and returns to get next token
- OR
- Takes Semantic Action
4Overall Operation
- Semantic Action Lookup variable name
- If found okay
- If not Put in symbol table
- If semantic checks succeed, do code-generation
(How?) - Return to get next token
- No more tokens? Done!
5Tokenization
Input File
Token Buffer
6Example
main()
m
7Example
main()
am
8Example
main()
iam
9Example
main()
niam
10Example
main()
(niam
11Example
main()
niam
Keyword main
12Overall Operation
- Parser is in control of the overall operation
- Demands scanner to produce a token
- Scanner reads input file into token buffer
forms a token (How?) - Token is returned to parser
- Parser attempts to match the token (How?)
- Failure Syntax Error!
- Success
- Does nothing and returns to get next token
- OR
- Takes Semantic Action
13Overall Operation
- Semantic Action Lookup variable name
- If found okay
- If not Put in symbol table
- If semantic checks succeed, do code-generation
(How?) - Return to get next token
- No more tokens? Done!
14Rules
- ltC-PROGgt ? MAIN OPENPAR ltPARAMSgt CLOSEPAR
ltMAIN-BODYgt - ltPARAMSgt ? NULL
- ltPARAMSgt ? VAR ltVAR-LISTgt
- ltVARLISTgt ? , VAR ltVARLISTgt
- ltVARLISTgt ? NULL
- ltMAIN-BODYgt ? CURLYOPEN ltDECL-STMTgt ltASSIGN-STMTgt
CURLYCLOSE - ltDECL-STMTgt ? ltTYPEgt VAR ltVAR-LISTgt
- ltASSIGN-STMTgt ? VAR ltEXPRgt
- ltEXPRgt ? VAR
- ltEXPRgt ? VARltOPgtltEXPRgt
- ltOPgt ?
- ltOPgt ? -
- ltTYPEgt ? INT
- ltTYPEgt ? FLOAT
15Demo
main() int a,b a b
Scanner
Token Buffer
Parser
16Demo
main() int a,b a b
Scanner
Token Buffer
"Please, get me the next token"
Parser
17Demo
main() int a,b a b
Scanner
m
Parser
18Demo
main() int a,b a b
Scanner
am
Parser
19Demo
main() int a,b a b
Scanner
iam
Parser
20Demo
main() int a,b a b
Scanner
niam
Parser
21Demo
main() int a,b a b
Scanner
(niam
Parser
22Demo
main() int a,b a b
Scanner
niam
Parser
23Demo
main() int a,b a b
Scanner
Token Buffer
Token main
Parser
24Demo
main() int a,b a b
Scanner
Token Buffer
Parser
"I recognize this"
25Parsing (Matching)
- Start matching using a rule
- When match takes place at a certain position,
move further (get next token repeat the
process) - If expansion needs to be done, choose appropriate
rule (How to decide which rule to choose?) - If no rule found, declare error
- If several rules found the grammar (set of rules)
is ambiguous - Grammar ambiguous? Language ambiguous?
26Scanning Parsing Combined
main() int a,b a b
Scanner
"Please, get me the next token"
Parser
27Scanning Parsing Combined
main() int a,b a b
Scanner
Token MAIN
Parser
ltC-PROGgt ? MAIN OPENPAR ltPARAMETERSgt CLOSEPAR
ltMAIN-BODYgt
28Scanning Parsing Combined
main() int a,b a b
Scanner
"Please, get me the next token"
Parser
ltC-PROGgt ? MAIN OPENPAR ltPARAMETERSgt CLOSEPAR
ltMAIN-BODYgt
29Scanning Parsing Combined
main() int a,b a b
Scanner
Token OPENPAR
Parser
ltC-PROGgt ? MAIN OPENPAR ltPARAMETERSgt CLOSEPAR
ltMAIN-BODYgt
30Scanning Parsing Combined
main() int a,b a b
Scanner
Token CLOSEPAR
Parser
ltC-PROGgt ? MAIN OPENPAR ltPARAMETERSgt CLOSEPAR
ltMAIN-BODYgt ltPARAMETERSgt ? NULL
31Scanning Parsing Combined
main() int a,b a b
Scanner
Token CLOSEPAR
Parser
ltC-PROGgt ? MAIN OPENPAR ltPARAMETERSgt CLOSEPAR
ltMAIN-BODYgt ltPARAMETERSgt ? NULL
32Scanning Parsing Combined
main() int a,b a b
Scanner
Token CLOSEPAR
Parser
ltC-PROGgt ? MAIN OPENPAR ltPARAMETERSgt CLOSEPAR
ltMAIN-BODYgt
33Scanning Parsing Combined
main() int a,b a b
Scanner
Token CURLYOPEN
Parser
ltC-PROGgt ? MAIN OPENPAR ltPARAMETERSgt CLOSEPAR
ltMAIN-BODYgt ltMAIN-BODYgt ? CURLYOPEN ltDECL-STMTgt
ltASSIGN-STMTgt CURLYCLOSE
34Scanning Parsing Combined
main() int a,b a b
Scanner
Token INT
Parser
ltC-PROGgt ? MAIN OPENPAR ltPARAMETERSgt CLOSEPAR
ltMAIN-BODYgt ltMAIN-BODYgt ? CURLYOPEN ltDECL-STMTgt
ltASSIGN-STMTgt CURLYCLOSE ltDECL-STMTgt ?
ltTYPEgtVARltVAR-LISTgt ltTYPEgt ? INT
35Scanning Parsing Combined
main() int a,b a b
Scanner
Token INT
Parser
ltC-PROGgt ? MAIN OPENPAR ltPARAMETERSgt CLOSEPAR
ltMAIN-BODYgt ltMAIN-BODYgt ? CURLYOPEN ltDECL-STMTgt
ltASSIGN-STMTgt CURLYCLOSE ltDECL-STMTgt ?
ltTYPEgtVARltVAR-LISTgt ltTYPEgt ? INT
36Scanning Parsing Combined
main() int a,b a b
Scanner
Token INT
Parser
ltC-PROGgt ? MAIN OPENPAR ltPARAMETERSgt CLOSEPAR
ltMAIN-BODYgt ltMAIN-BODYgt ? CURLYOPEN ltDECL-STMTgt
ltASSIGN-STMTgt CURLYCLOSE ltDECL-STMTgt ?
ltTYPEgtVARltVAR-LISTgt ltTYPEgt ? INT
37Scanning Parsing Combined
main() int a,b a b
Scanner
Token VAR
Parser
ltC-PROGgt ? MAIN OPENPAR ltPARAMETERSgt CLOSEPAR
ltMAIN-BODYgt ltMAIN-BODYgt ? CURLYOPEN ltDECL-STMTgt
ltASSIGN-STMTgt CURLYCLOSE ltDECL-STMTgt ?
ltTYPEgtVARltVAR-LISTgt ltVARLISTgt ? , VAR
ltVARLISTgt ltVARLISTgt ? NULL
38Scanning Parsing Combined
main() int a,b a b
Scanner
Token ',' COMMA
Parser
ltC-PROGgt ? MAIN OPENPAR ltPARAMETERSgt CLOSEPAR
ltMAIN-BODYgt ltMAIN-BODYgt ? CURLYOPEN ltDECL-STMTgt
ltASSIGN-STMTgt CURLYCLOSE ltDECL-STMTgt ?
ltTYPEgtVARltVAR-LISTgt ltVARLISTgt ? , VAR
ltVARLISTgt ltVARLISTgt ? NULL
39Scanning Parsing Combined
main() int a,b a b
Scanner
Token VAR
Parser
ltC-PROGgt ? MAIN OPENPAR ltPARAMETERSgt CLOSEPAR
ltMAIN-BODYgt ltMAIN-BODYgt ? CURLYOPEN ltDECL-STMTgt
ltASSIGN-STMTgt CURLYCLOSE ltDECL-STMTgt ?
ltTYPEgtVARltVAR-LISTgt ltVARLISTgt ? , VAR
ltVARLISTgt ltVARLISTgt ? NULL
40Scanning Parsing Combined
main() int a,b a b
Scanner
Token ''
Parser
ltC-PROGgt ? MAIN OPENPAR ltPARAMETERSgt CLOSEPAR
ltMAIN-BODYgt ltMAIN-BODYgt ? CURLYOPEN ltDECL-STMTgt
ltASSIGN-STMTgt CURLYCLOSE ltDECL-STMTgt ?
ltTYPEgtVARltVAR-LISTgt ltVARLISTgt ? , VAR
ltVARLISTgt ltVARLISTgt ? NULL
41Scanning Parsing Combined
main() int a,b a b
Scanner
Token ''
Parser
ltC-PROGgt ? MAIN OPENPAR ltPARAMETERSgt CLOSEPAR
ltMAIN-BODYgt ltMAIN-BODYgt ? CURLYOPEN ltDECL-STMTgt
ltASSIGN-STMTgt CURLYCLOSE ltDECL-STMTgt ?
ltTYPEgtVARltVAR-LISTgt ltVARLISTgt ? , VAR
ltVARLISTgt ltVARLISTgt ? NULL
42Scanning Parsing Combined
main() int a,b a b
Scanner
Token ''
Parser
ltC-PROGgt ? MAIN OPENPAR ltPARAMETERSgt CLOSEPAR
ltMAIN-BODYgt ltMAIN-BODYgt ? CURLYOPEN ltDECL-STMTgt
ltASSIGN-STMTgt CURLYCLOSE ltDECL-STMTgt ?
ltTYPEgtVARltVAR-LISTgt ltVARLISTgt ? , VAR
ltVARLISTgt ltVARLISTgt ? NULL
43Scanning Parsing Combined
main() int a,b a b
Scanner
Token ''
Parser
ltC-PROGgt ? MAIN OPENPAR ltPARAMETERSgt CLOSEPAR
ltMAIN-BODYgt ltMAIN-BODYgt ? CURLYOPEN ltDECL-STMTgt
ltASSIGN-STMTgt CURLYCLOSE ltDECL-STMTgt ?
ltTYPEgtVARltVAR-LISTgt ltVARLISTgt ? , VAR
ltVARLISTgt ltVARLISTgt ? NULL
44Scanning Parsing Combined
main() int a,b a b
Scanner
Token ''
Parser
ltC-PROGgt ? MAIN OPENPAR ltPARAMETERSgt CLOSEPAR
ltMAIN-BODYgt ltMAIN-BODYgt ? CURLYOPEN ltDECL-STMTgt
ltASSIGN-STMTgt CURLYCLOSE ltDECL-STMTgt ?
ltTYPEgtVARltVAR-LISTgt ltVARLISTgt ? , VAR
ltVARLISTgt ltVARLISTgt ? NULL
45Scanning Parsing Combined
main() int a,b a b
Scanner
Token ''
Parser
ltC-PROGgt ? MAIN OPENPAR ltPARAMETERSgt CLOSEPAR
ltMAIN-BODYgt ltMAIN-BODYgt ? CURLYOPEN ltDECL-STMTgt
ltASSIGN-STMTgt CURLYCLOSE ltDECL-STMTgt ?
ltTYPEgtVARltVAR-LISTgt
46Scanning Parsing Combined
main() int a,b a b
Scanner
Token ''
Parser
ltC-PROGgt ? MAIN OPENPAR ltPARAMETERSgt CLOSEPAR
ltMAIN-BODYgt ltMAIN-BODYgt ? CURLYOPEN ltDECL-STMTgt
ltASSIGN-STMTgt CURLYCLOSE ltDECL-STMTgt ?
ltTYPEgtVARltVAR-LISTgt
47Scanning Parsing Combined
main() int a,b a b
Scanner
Token VAR
Parser
ltC-PROGgt ? MAIN OPENPAR ltPARAMETERSgt CLOSEPAR
ltMAIN-BODYgt ltMAIN-BODYgt ? CURLYOPEN ltDECL-STMTgt
ltASSIGN-STMTgt CURLYCLOSE ltASSIGN-STMTgt ? VAR
ltEXPRgt ltEXPRgt ? VAR
48Scanning Parsing Combined
main() int a,b a b
Scanner
Token ''
Parser
ltC-PROGgt ? MAIN OPENPAR ltPARAMETERSgt CLOSEPAR
ltMAIN-BODYgt ltMAIN-BODYgt ? CURLYOPEN ltDECL-STMTgt
ltASSIGN-STMTgt CURLYCLOSE ltASSIGN-STMTgt ? VAR
ltEXPRgt ltEXPRgt ? VAR
49Scanning Parsing Combined
main() int a,b a b
Scanner
Token VAR
Parser
ltC-PROGgt ? MAIN OPENPAR ltPARAMETERSgt CLOSEPAR
ltMAIN-BODYgt ltMAIN-BODYgt ? CURLYOPEN ltDECL-STMTgt
ltASSIGN-STMTgt CURLYCLOSE ltASSIGN-STMTgt ? VAR
ltEXPRgt ltEXPRgt ? VAR
50Scanning Parsing Combined
main() int a,b a b
Scanner
Token VAR
Parser
ltC-PROGgt ? MAIN OPENPAR ltPARAMETERSgt CLOSEPAR
ltMAIN-BODYgt ltMAIN-BODYgt ? CURLYOPEN ltDECL-STMTgt
ltASSIGN-STMTgt CURLYCLOSE ltASSIGN-STMTgt ? VAR
ltEXPRgt ltEXPRgt ? VAR
51Scanning Parsing Combined
main() int a,b a b
Scanner
Token VAR
Parser
ltC-PROGgt ? MAIN OPENPAR ltPARAMETERSgt CLOSEPAR
ltMAIN-BODYgt ltMAIN-BODYgt ? CURLYOPEN ltDECL-STMTgt
ltASSIGN-STMTgt CURLYCLOSE ltASSIGN-STMTgt ? VAR
ltEXPRgt ltEXPRgt ? VAR
52Scanning Parsing Combined
main() int a,b a b
Scanner
Token ''
Parser
ltC-PROGgt ? MAIN OPENPAR ltPARAMETERSgt CLOSEPAR
ltMAIN-BODYgt ltMAIN-BODYgt ? CURLYOPEN ltDECL-STMTgt
ltASSIGN-STMTgt CURLYCLOSE ltASSIGN-STMTgt ? VAR
ltEXPRgt
53Scanning Parsing Combined
main() int a,b a b
Scanner
Token ''
Parser
ltC-PROGgt ? MAIN OPENPAR ltPARAMETERSgt CLOSEPAR
ltMAIN-BODYgt ltMAIN-BODYgt ? CURLYOPEN ltDECL-STMTgt
ltASSIGN-STMTgt CURLYCLOSE ltASSIGN-STMTgt ? VAR
ltEXPRgt
54Scanning Parsing Combined
main() int a,b a b
Scanner
Token ''
Parser
ltC-PROGgt ? MAIN OPENPAR ltPARAMETERSgt CLOSEPAR
ltMAIN-BODYgt ltMAIN-BODYgt ? CURLYOPEN ltDECL-STMTgt
ltASSIGN-STMTgt CURLYCLOSE ltASSIGN-STMTgt ? VAR
ltEXPRgt
55Scanning Parsing Combined
main() int a,b a b
Scanner
Token CURLYCLOSE
Parser
ltC-PROGgt ? MAIN OPENPAR ltPARAMETERSgt CLOSEPAR
ltMAIN-BODYgt ltMAIN-BODYgt ? CURLYOPEN ltDECL-STMTgt
ltASSIGN-STMTgt CURLYCLOSE
56What happens?
- During/after parsing?
- Tokens get gobbled
- Screen blinks?
- Smoke comes out?
- Further checking and setup for checks
- Semantic actions
- Semantic checks
- Symbol tables
- What are semantic actions?
- Variables have attributes
- Declaration attached attributes to variables
57Symbol Table
- int a,b
- Declares a and b
- within current scope
- Of type integer
- Use of a and b now legal
58Semantic Actions
- What are typical Semantic Actions?
- How do they get invoked?
- What happens after a Semantic Action?
59Typical Semantic Actions
- Enter variable declaration into symbol table
- Look up variables in symbol table
- Do binding of looked-up variables (scoping rules,
etc.) - Do type checking for compatibility
- Keep the semantic context of processing
- a b c ? t1 a b
- t2 t1 c
Semantic Context
60How are Semantic Actions Called?
- Action symbols embedded in the grammar
- Each action symbol represents a semantic
procedure - Semantic procedures are called by parser at
appropriate places during parsing - These procedures do things and/or return values
- Semantic stack implements stores semantic
records - Semantic actions could do checking and/or storage
or retrieval of information
61Semantic Actions
- ltdecl-stmtgt ? lttypegtput-typeltvar-listgtdo-decl
- ltvar-listgt ? var, ltvar-listgtadd-decl
- ltvar-listgt ? var
- var ? IDproc-decl
- put-type puts given type on semantic stack
- proc-decl builds decl record for var
- add-decl builds decl-chain
- do-decl traverses chain on semantic stack
using - backwards pointers entering each var
into - symbol table
decl record
id3
Name
Type
Scope
id2
id1
1
3
do-decl ?
id1
id2
1
3
type
id3
1
3
62Semantic Actions
- What else can semantic actions do in addition to
storing and looking up names in a symbol table? - Do type checking for type compatibility
assignment - Two type of Semantic Actions
- Checking (Binding, Type Compatibility, Scoping,
etc.) - Translation (Generate temporary values, propagate
them to keep semantic context).
63Semantic Actions (Translation)
- Consider a b c d
- Grammar
- ltASSGNSTMTgt ? VAR ltEXPRgtdo-assign
- ltEXPRgt ? VARltEXPRTAILgt
- VAR ? IDprocess-id
- ltEXPRTAILgt ? ltOPgtprocess-opVARdo-infixltEXPRTAI
Lgt - ltEXPTAILgt ? NULL
64Call Chain to Semantic Actions
a b c d
- process-id Puts semantic record for "a" on
stack - Q. Checking or Translation?
"a"
65Call Chain to Semantic Actions
a b c d
- process-id Puts semantic record for "a" on
stack - Q. Checking or Translation?
- A. Checking
"a"
66Call Chain to Semantic Actions
a b c d
- process-op Puts semantic record for "" on
stack - Q. Checking or Translation?
""
"a"
67Call Chain to Semantic Actions
a b c d
- process-op Puts semantic record for "" on
stack - Q. Checking or Translation?
- A. Checking
""
"a"
68Call Chain to Semantic Actions
a b c d
- process-id Puts semantic record for "b" on
stack
"b"
""
"a"
69Call Chain to Semantic Actions
a b c d
- process-id Puts semantic record for "" on
stack
""
"b"
""
"a"
70Call Chain to Semantic Actions
a b c d
- process-id Puts semantic record for "c" on stack
"c"
""
"b"
""
"a"
71Call Chain to Semantic Actions
a b c d
- do-infix
- Get temporary (say t1)
- Evaluate
- IR t1 a b
"c"
""
"b"
""
"a"
72Call Chain to Semantic Actions
a b c d
- do-infix
- Get temporary (say t1)
- Evaluate
- IR t1 a b
"t1"
""
"a"
73Call Chain to Semantic Actions
a b c d
- process-id Puts semantic record for "" on
stack
""
"t1"
""
"a"
74Call Chain to Semantic Actions
a b c d
- process-id Puts semantic record for "d" on
stack
"d"
""
"t1"
""
"a"
75Call Chain to Semantic Actions
a b c d
- do-infix
- Get temporary (say t2)
- Evaluate
- IR t2 t1 d
"d"
""
"t1"
""
"a"
76Call Chain to Semantic Actions
a b c d
- do-infix
- Get temporary (say t2)
- Evaluate
- IR t2 t1 d
"t2"
""
"a"
77Call Chain to Semantic Actions
a b c d
- do-assign
- Put value back into variable
- IR a ? t2
"t2"
""
"a"
78Call Chain to Semantic Actions
a b c d
- do-assign
- Put value back into variable
- IR a ? t2
79Code Generation
Scanner
Request Token
Get Token
Parser
Start
Semantic Action
Semantic Error
Checking
Intermediate Representation
80Life After IR
IR
Optimizer
Optimized IR
Second Pass
Code Generator
Code
81Optionally
Code
Third Pass
Post Pass Optimizer
Better Code
82One Pass vs. Two Pass
Scanner
Parser
Start
- Comparisons
- No optimization performed in one pass
- Need to have full IR available for optimization
- Most compilers are two pass
Semantic Action
Semantic Error
Code Generation
CODE
83Code Generation (Basic)
t1 a b
- Get a register to evaluate t1
- say R5
- Load "a" in it (where is "a"?)
- LOAD R5, 4(R6) R6 frame ptr
- Add "b" to register holding "a"
- ADD R5, 5(R6)
- Store R5 into t1's location
- STOR R5, 6(R6)
t1
b
a
R6 (FP)
84Typical Code Generator Actions
- Get register for evaluation
- Load operands into registers (maybe!)
- Perform evaluation
- Store results back (maybe!)
- Free registers
- Return to Loop
- Hidden Actions
- Address calculation for operands
- Use of a given instruction format
85Code Generator
- Instruction selection (Load, Store, Add, Mult,
etc.) - Address calculation
- Instruction format selection
- Immediate mode
- Direct mode
- Register Indirect
- Base Offset
- Generating one or more instructions
- How?
- Use of pattern matching
- Post order traversal of expression tree
Constants Globals/Static Pointers Arrays
86Summary
- Parser is the brain of the compiler
- Controls everything
- Most of the time is spent in keeping checking
information (front-end job) - Translation takes place in back end
- IR useful for code optimization advanced code
generation - Scanner, parser and code generator are automated
(How and Why?)
87Questions?
88(No Transcript)