Title: Procedure Optimizations
1- Procedure Optimizations
- Timo Rasi
2Whats a Procedure Optimization?
- Optimizations appying to whole procedures instead
if procedure contents. - Tail Call f() calling g() only returns directly
after the call, enabling transformation of the
call into a branch. - Tail Recursion Special case of tail call where
f() g(), enabling transformation of the call
into a loop. - Procedure Integration Replacement of a procedure
call with the procedure contents. - In-Line Expansion Hand-crafted assembly language
Procedure Integration. - Leaf Routine Optimization Elimination of
unnecessary procedure and epilogue code from leaf
procedures. - Shrink Wrapping Deferred procedure prologue and
epilogue code.
3Procedure Body Lookup Schemes
- Depending on the case, the optimizer needs to
have access to the whole callee contents or at
least the stack frame size. - The easy part Same compilation unit.
- Other ways
- Saved intermediate code,
- Link time,
- Manual Labor Bundling of several disjoint source
code files into one.
4Tail Call Optimization
- Tail Call optimization replaces procedure call
with a branch, and callee then returns on behalf
on the caller. - Tail Call optimization cannot be done in source
code form Branch from a procedure into another
would violate almost any (?) high-level language
semantics. - Callers stack frame is larger than callees
Callees procedure epilogue deallocates callers
whole stack frame. - Callers stack frame is smaller than callees
Before entering callee, either allocate remainder
or release stack frame totally and use standard
procedure prologue.
5Tail Call and Tail Recursion Book Example, a
Curious One
6Tail Call Book Example, a Buggy One
7HP PA-RISC aCC Tail Call Optimization
- Despite experimentation, the author was not able
to manifest tail call optimization in practise. - Compiler insisted on adhering to the PA-RISC
procedure call convention - Caller Branch to a procedure and save return
address to a link register. - Callee Opionally, save the return address into
the stack frame. - Callee Optionally, save callee-save registers.
- Callee Execute procedure body.
- Callee Optionally, restore callee-save
registers. - Callee Optionally, restore the return address.
- Callee Branch back to caller via the link
register.
8Tail Recursion Optimization
- Tail Recursion optimization transforms a
recursive call into a loop. - Procudure call is replaced by parameter renaming,
branch and deletion of a return. - Can usually be done relatively simply in a
high-level code form. - A blind procedure call offers little to be
optimized, but tail recursion optimization opens
doors to other optimizations.
9Tail Recursion Book Example
10HP PA-RISC aCC Tail Recursion Optimization
- Compiler did real good job in optimizing a void
function. - Surprise Compiler refused to optimize even a
trivial leaf int function. Is there a good reason
for this? There was, last operation in the
example function was addition instead of a call. - Loop unrolling didnt manifest.
11What is Inlining? What is Inlining Not?
- Yes Substitution of procedure call with the
procedure body. - No Definition of procedure body simultaneously
or separately with declaration in a C header
file, although thats equivalent to the inline
keyword. - No Inline C assembly.
12Whats Procedure Integration Good For?
- Eliminates procedure call overhead.
- Produces larger basic blocks
- Enables other optimizations, for example Constant
Propagation, Loop Unrolling etc. - Eliminates branch pipeline penalty.
13Whats Procedure Integration Not So Good For?
- Protests revolve around code bloat and caches.
- Maintenance scheme may have a role
- Shared library delivery no-no.
- Recompilation of subproduct maybe.
- Recompilation of whole product ok.
- Bad compilers might generate bad code.
- Procedures with local state may generate
surprises. - Uninlined procedures may end up to a static copy
in every compilation unit, although the linker is
supposed to remove duplicates and/or dead code. - Integration of a procedure is not a property of
the procedure itself but a property of all call
sites within an executable after applied
optimizations. - Some considerations Procedure size, number of
calls to the procedure, resulting loop(s),
amount of constant arguments. - Very likely the compiler and runtime profiling
tools know this stuff better than humans do.
14HP PA-RISC aCC Procedure Integration
- Elaborate scheme, rough rules and more finer
control, implicit inlining, run-time inline
adviser profiler. - Keen to inline, even when explicitly instructed
not to. - Effective in short C methods augmented with the
processors inability to predict procedure return
branches. - Compiler did real good job in loop unrolling and
procedure integration. One test demonstrated a
60x improvement over a naive call into a shared
library. - One experiment compilation of 350.000 lines of
legacy C from a single file with aggressive
inlining and elaborate optimizations -gt 46000
inlined functions, 18 hours CPU, 5.5 days
wallclock time.
15In-Line Expansion
- In essence, a hand-written assembly language code
sequence. - Used for operations that the compiler cannot do
- Operations outside mainstream code generation
scheme, - Difficult processor-specific optimizations,
- Operating Systemspecific operations,
- Augmentation of missing language features.
- Compiler has no idea about asm(statement)
contents Effects need to be communicated somehow
or the compiler for example takes the safest bet
and throws out all optimizations. - Microsoft has compiler intrinsics compiling
directly into MMX instructions. Those could be
regarded as compiler-assisted in-line expansion
operations.
16HP PA-RISC aCC In-Line Expansion
- Not much to tell about in this case either
- Warning 669 "expansion.cc", line 10 The asm
declaration is ignored. - asm("nop")
-
17Leaf Routine Optimization
- Lowering the overhead of a leaf procedure call.
- Highly desirable with little effort.
- Are there enough leaf routines in the first
place? Yes, for example a binary three has more
leafs than non-leafs. - Removal of procedure prologue and epilogue code
associated with preparation to a subprocedure
call.
18HP PA-RISC aCC Compiler Leaf Routine Optimization
- A lightweight leaf routine call was one original
PA-RISC design goal due to the simple hardwired
RISC instructions. - A millicode call such as multiply only requires a
branch link instruction and utilizes
caller-save registers. - Separate scheme for millicode and high-level
language leaf routines. - Actions a leaf routine can usually or always do
without - Saving of the return address register into the
stack frame. - Allocation of a stack frame,
- Saving and restoring of scratch registers. Input
parameter registers are also eligible for scratch
registers. - Deallocation of the stack frame,
- Restoration of the return address register.
19Shrink Wrapping
- In Essence Lazy Stack Frame construction.
- Moving of prologue and epilogue code to enclose
only minimal appropriate code segments.
20Shrink Wrapping Data Flow Analysis
- A register is anticipatable at a poit in a
flowgraph, if all execution paths from that point
contain defintions or uses of it. In other words,
the register is being taken into use. - A register is available at a point in a
flowgraph, if all execution paths to that point
include definitions or uses of the register. In
other words, the register is released from use. - Main Idea Register saving code is inserted where
the register is anticipatable and register
restoring code is inserted where the register is
available. - In Other Words Saving code is inserted just
before the first use and restore code is inserted
just after the final use. - Legend for the example
- RANTin(i) Register i is anticipatable on entry
to block i. - RANTout(i) Register i is anticipatable on exit
from block i. - RAVin(i) Register i is available on entry to
block i. - RAVout(i) Register i is avaiable on exit from
block i.
21Shrink Wrapping Formulas, Theory
22Shrink Wrapping Example
23Shrink Wrapping Formulas, Practise