Title: Lecture 16: Symbol Table
1Lecture 16 Symbol Table
2The Symbol Table
- used during all phases of compilation
- stores information about many source language
constructs - incrementally constructed during the analysis
phases - used directly in the code generation phases
- efficient storage and access important in
practice - may be constructed during lexical and syntax
analysis, depending on the compiler
3Constructing the symbol table
- Three main operations required to build the
symbol table - determining whether a string has already been
stored - inserting an entry for a string
- hiding an entry when it goes out of scope
- Three corresponding functions
- lookup(s) returns the index of the entry for
string s, or 0 if there is no entry - insert(s,t) add a new entry for string s (of
token t), and return its index - delete(s) deletes s from the table (or,
typically,hides it)
4A simple implementation
index next token atts strPtr
7 ID_T
next node
attribute structure
position in string array
first length last
Table
78
...
1 ID_T
2 ID_T
78 ID_T
...
...
...
...
c o u n t i ...
n a m e ...
5Declarations
- There are four kinds of entity that may require
an entry - constant
- variable
- type (user-defined)
- function
- The attributes will depend on the object being
declared - All four will typically have a type signature,
representing the data type or (for functions) the
return type. - Constants may have value bindings.
- Variables may have pointers to memory locations.
- Functions may have a pointer to code segments.
- All four may have scope information.
6Scope
- In most high-level languages, variables and
functions have restricted scope - i.e. they can
only be accessed in specific areas of the source
code. - The scope of any particular variable may be
global, or within a specific code file, or in a
file after its declaration, or within specific
code blocks. - In languages with restrictive scoping rules, it
is possible to construct the symbol table during
lexical analysis - L entry lookup(yytext)
- if (entry -1) / i.e. new ID_T /
- insert(yytext,ID_T)
-
7Scoping Rules
- In block structured languages, the same variable
name can be used in different places to refer to
different objects. - We cannot simply look to see if the name is
already in the table, as the current use may be a
new declaration.
int i int f1(int k) int j ... print
i int f2() int j ...
i is globally accessible
a new integer k, in f1 only a new integer j, in
f1 only
(the global variable)
a different j, in f2 only
8Scope and the Symbol Table
- In languages with nested scope, the symbol table
functions are more complex. - lookup must search for a declaration of the
identifier valid in the current scope. - insert must not overwrite previous declarations,
but make them inaccessible. - delete should hide the most recent.
- It is still possible to construct the symbol
table during the first pass of the compiler if
explicit nesting levels are associated with each
entry in the table. - Many compilers make multiple passes over the
code, first constructing a syntax tree, and then
the table once the nested structure of the code
is known.
9One-pass symbol table construction
One possible method of constructing the symbol
table during the first pass is shown below.
Prog -gt Dec Prog Prog -gt Main Dec -gt VDec
Dec -gt FDec VDec -gt int id FDec -gt SFDec Par )
CStat SFDec -gt int id ( Par -gt Par -gt
VDec Par -gt PList , VDec PList -gt VDec PList -gt
VDec , PList
decr(stack) incr(stack)
L entry lookup(yytext,stack) if
(entry -1) insert(yytext,ID_T,stack)
10- The stack consists of entries of the form
- (nesting level, scope value)
- The last index is the index of the last entry
added to the symbol table - Initially, the stack is set to lt (0,0) gt and last
to 0. - insert associates the top of the stack with the
entry - lookup searches for a matching entry, and obtains
its nesting level. It moves down the stack until
it finds a stack entry with the same nesting
level. If the table index is less than the stack
scope value, it ignores it, and continues
searching the table. If no match is found,it
returns -1. - decr deletes the top element of the stack
- incr adds a new element to the top of the stack,
increments the nesting level, and assigns the
last index as the scope value.
11constructed symbol table
int i int f1(int k) int j ... print
i int f2() int j ...
Index Str Nest Scope Atts 0 i 0 0 ... 1 f1 0 0 2 k
1 1 3 j 1 1 4 f2 0 0 5 j 1 4
12Syntax trees and scope
Prog
VDec
func
func
VDec
int
id
int
id
VDec
l
VDec
print
int
id
l
l
i
f2
f1
int
id
int
id
int
id
id
j
k
j
i
Many compilers simply build a syntax tree on the
first pass (while carrying out lexical and
syntax analysis). On a second pass, they
construct the symbol table, check data types,
etc. It should be easier to determine the scope
of the identifiers from the syntax tree.