Title: Compiler Construction
1Compiler Construction
2Run-Time Environments (Chapter 7)ContinuedAcces
s to No-local Names
3Non-locals
- Assume we have stack allocation of activation
records. - SCOPE RULES of the source language determine how
we handle non-local references. - Most languages use LEXICAL (also called STATIC)
scoping. - Lexical scoping means it is possible to determine
the declaration corresponding to a reference just
by examining the program. - Pascal, C, Ada, etc. use static scoping.
- Languages with DYNAMIC scoping require
examination of the stack, at runtime, to find the
right declaration.
4Block structure
- C and many other languages have BLOCKs
- stmt -gt block
- block -gt decls stmts
- The scope of a declaration in a block uses the
MOSTCLOSELY- NESTED rule - The scope of a declaration in block B includes B
- If x is referred to but not declared in B, then
x is in the scope of a declaration in an
enclosing block B s.t. - B has a declaration of x and
- B is more closely nested around B than any other
block with a declaration of x
5C program with blocks
Decl Scope int a 0 B0-B2 int b 0 int b
1 int a 2 int b 3
what is the output?)
6Stack allocation of declarations in blocks
- Declarations in each block can be allocated on
the stack. - It is similar to a procedure call (with no
parameters). - Space is allocated on the stack when we enter the
block. - Space is deallocated on the stack when we exit
the block.
7Lexical scope without nested procedures
- C and related languages do NOT allow nested
procedures. - A program is a series of declarations and
functions. - All non-local references inside functions must
refer to declarations at file (global) scope.
8Example lexical scope
- Consider the C code
- int a11
- void readarray( void ) a
- int partition( int y, int z ) a
- void quicksort( int m, int n )
- int main( void ) a
- The references to a are always to the array
declared on the first line.
9Lexical scope
- Without nested procedures
- Locals use stack dynamic allocation.
- All non-local data is allocated in the static
data area. - At compile time, if a reference is not found in
the current procedures AR, we look in the static
data area and use the resulting static address. - Otherwise, the reference is local and accessible
relative to the top of stack pointer. - Passing procedures as parameters is also simple
if there is no nesting (all non-locals have
static addresses).
10- program sort( input, output )
- var a array0..10 of integer
- x integer
- procedure readarray
- var i integer
- begin a end readarray
- procedure exchange( i, j integer )
- begin
- x ai ai aj aj x
- end exchange
- procedure quicksort( m, n integer )
- var k, v integer
- function partition( y, z integer ) integer
- var i, j integer
- begin a
- v
- exchange( i, j )
- end partition
- begin end quicksort
Lexical scope with nested procedures
11Nesting depth
- The reference to a on line 15
- The ref is inside partition(), which is inside
quicksort(). - The most closely nested declaration is line 2, at
program (global) scope. - The reference to exchange on line 17
- The ref is in partition(), which is nested in
quicksort(). - The most closely nested declaration is line 7.
- The compiler need to keep track of the NESTING
DEPTH of each declaration - sort() is at depth 1
- quicksort() is at depth 2
- partition() is at depth 3
- i of partition() depth 4
12Access Links
- We need some way to traverse from one AR to
another when searching for the declaration
corresponding to a reference. - A new pointer, the ACCESS LINK, is added to the
AR. - If procedure P is nested inside procedure Q in
the program, then the access link in Ps AR
should point to the access link in Qs AR.
13(No Transcript)
14Access links
- How to find a non-local reference using access
links? - Suppose procedure P at nesting depth np refers to
a nonlocal a with nesting depth na lt np. We
find the storage for variable a as follows - When control is in P, there must be an AR for P
on top of the stack. We follow np - na access
links. - After following np - na access links, we have the
correct AR. The storage for a is some fixed
offset relative to the beginning of that AR.
15Setting up access links
- At compile time, non-local references are
represented by the pair (np-na, offset). - We need to set up the access links at procedure
call time. - Suppose procedure P at depth np calls procedure X
at depth nx. The resulting code depends on
whether the called procedure is nested within the
caller or not. - Case np lt nx this means X is nested more deeply
then P, so Xs access link just needs to point to
Ps AR. - Case np gt nx this means X is at the same level
or an outer scope. We have to find the common
ancestor of P and X. This will be np-nx1 access
links from P.
16Parameter Passing
17Parameter Passing
- Parameters are the most common way for a calling
procedure to communicate with the callee. - Different languages have different parameter
semantics. - Mostly, the differences lie in whether an l-value
or rvalue or text of the actual parameter is
passed. - We consider four protocols
- Call by value
- Call by reference
- Copy-restore
- Call by name
18Call by value
- This is the simplest parameter passing method.
- The caller computes r-values for the actuals.
- The caller places the resulting values on the
stack, in the AR of the callee. - The callee may change the parameters, but this
has no effect on the caller. - This is the default protocol in Pascal, and the
ONLY protocol in C.
19Parameter passing example
- program reference( input, output )
- var a, b integer
- procedure swap( var x, y integer )
- var temp integer
- begin
- temp x
- x y
- y temp
- end
- begin
- a 1 b 2
- swap( a, b )
- writeln( a , a ) writeln( b , b )
- end.
Specifies call-by-reference
20Call by reference
- The caller passes the called procedure a POINTER
to the storage address of the actual parameter. - If the actual has an l-value, it is used.
- If the actual is an expression, we place the
result of the expression in a temporary and pass
a pointer to the temporary. - Pascal uses call by reference if the var
keyword is used. - C uses call by reference if the operator is
specified.
21Copy restore
- This is a hybrid between call-by-value and
call-by reference. - Before callee is activated, we evaluate the
actuals and put their r-values in the AR for the
callee. - But we also compute and save the l-values of the
actuals. - In the return sequence, we copy the updated
r-values from the callees AR to the location for
the saved values. FORTRAN used this approach.
22Call by name (macro expansion)
- In this method, we just substitute the body of
the procedure for the procedure call. - In the copied body, the formal parameters are
replaced by the text of the actuals. - define macros in C/C use this technique.
23Symbol Tables
24Symbol table implementation
- The symbol table stores many kinds of information
about names - The NAME itself
- STORAGE information
- SCOPE information
- So a symbol table entry is typically a record
data type. - The table itself could be a simple linear array,
or a more complex data structure (hash table,
etc.).
25The NAME entry
- Most languages put some bound on the length of ID
names. - If the limit is small, we can place the name in
the ST entry itself - typedef struct
- char nameMAX_LENGTH1
-
- tSymbolTableEntry
- But otherwise, we should use the heap to store
the names and simply point to them - typedef struct
- char name
-
- tSymbolTableEntry
26Storage information
- The code generator needs to know about the
storage required for declared names. - Statically allocated variables just have an
offset relative to the beginning of the static
data area. - Each definition needs to reserve space in the
static data area and advance a pointer to the
next available location. - For stack dynamic variables, we need to store the
offset of the variable relative to the activation
record for the procedure. - Heap dynamic variable storage requirements are
not known until runtime.
27Linear list representations
- We add new ST entries to the end of an array.
- The array has to be reallocated if it gets too
big. - Search for an item begins at the end and goes
backwards, to ensure we get the most recent
declaration of a name. - Checking for existence takes n/2 checks on
average. - For n insertions and e lookups, we have O(n(ne))
time. - Usually e gtgt n, so we can write O(ne).
- This running time is generally too large for big
programs.
28Hash table representations of the ST
- We try to reduce search time to insert and search
the ST with a hash table. - OPEN HASHING gives us a run time of O(n(ne)/m)
for any m we desire. - The table is an array of m BUCKETS.
- To determine if s is in the table, we appy a HASH
FUNCTION h() to s, such that 0 lt h(s) lt m - Then we search the linked list for h(s).
29Hash table representations of the ST
- Complexity the average list length is n/m, so as
long as m is within a constant factor of n, the
search takes nearly constant time. - For h(s), the simplest method is to add up the
ASCII values of the characters in s, divide by m,
and take the remainder. - There are MANY other techniques.
- Most modern languages have library support for
hash tables (see hcreate()/hsearch()/hdestroy()
if you are a C lover).
30Scope and the ST
- Each entry in a ST corresponds to a declaration
of a name. - When we look up a name in the ST, we want the
entry for the declaration at the correct scope to
be returned. - The simplest approach is to have a separate hash
table for every scope. - Another way is to give each procedure a unique
number, and append the number to each name,
guaranteeing uniqueness.
31Dynamic Storage Allocation
32Explicit vs. implicit alloc/dealloc
- Most languages support dynamic allocation of
memory. - Pascal supports new(p) and dispose(p) for pointer
types. - C provides malloc() and free() in the standard
library. - C provides the new and free operators.
- These are all examples of EXPLICIT allocation.
- Other languages like Python and Lisp have
IMPLICIT allocation.
33Garbage
- In languages with explicit deallocation, the
programmer must be careful to free every
dynamically allocated variable, or GARBAGE will
accumulate. - Garbage is dynamically allocated memory that is
no longer accessible because no pointers are
pointing to it. - In some languages with implicit deallocation,
GARBAGE COLLECTION is occasionally necessary. - Other languages with implicit deallocation
carefully track references to allocated memory
and automatically free memory when nobody refers
to it any longer.
34Dynamic storage allocation
- We assume the heap is an initially empty block of
memory. - As memory is allocated and deallocated,
fragmentation occurs. - For allocation, we must find a HOLE large enough
to hold the requested memory. - For deallocation, we must merge adjacent holes to
prevent further fragmentation.