Title: J. Bradley Chen and Bradley D. D. Leupen
1Improving Instruction Locality with Just-In-Time
Code Layout
- J. Bradley Chen and Bradley D. D. Leupen
- Division of Engineering and Applied Sciences
- Harvard University
2Goals
- Improve instruction reference locality
- big problem for commodity applications
- Eliminate need for profile information
- required by current compiler-based solutions
3How?
- Implement layout dynamically using Activation
Order - A new heuristic for code layout.
- Locate procedures in order of use.
4Requirements
- No special hardware support.
- Minimal changes to the operating system.
- Minimal system overhead.
5Optimizing Procedure Layout
Bad Layout
Better Layout
6Current Practice Pettis and Hansen
- Nodes are procedures.
- Edges are caller/callee pairs.
- Weights are call frequency.
7Pettis and Hansen Layout
layout
layout GetEvent, CheckForInputErrors
layout EventLoop, GetEvent, CheckForInputError
s
layout React, EventLoop, GetEvent, CheckForI
nputErrors
layout HandleCommonCase, React, EventLoop,
GetEvent, CheckForInputErrors
8A New Heuristic
Activation Order Co-locate procedures that are
activated sequentially. Example
9Implementing JITCL
__start perform initializations
call thunk_main thunk_main . . .
thunk_foo . . . __InstructionMemory
Thunk routines implement code layout on-the-fly.
10Thunk routines
// Global variables // ProcPointers - one
element per procedure // INDEX_proc and
LENGTH_proc for each procedure thunk_main if
(InCodeSegment(ProcPointersINDEX_main))
ProcPointersINDEX_main
CopyToTextSegment(ProcPointerINDEX_main, LEN
GTH_main) PatchCallSite(ProcPointerINDEX_main
, ComputeCallSiteFromReturnAddress
(RA)) jmp ProcPointerINDEX_main
The thunk routines copy procedures into the
text segment and update call sites at run-time.
11Simulation Methodology
12Workloads
13Results
- The AO heuristic is effective.
- The overhead of JITCL is negligible.
- JITCL improves procedure layout without requiring
profile information. - JITCL reduces program memory requirements.
14Results The AO Heuristic
Improvement in I-Cache Miss Rate
Conclusion Effectiveness of heuristic is
comparable to PH.
15Overhead of JITCL
- Copy overhead
- instruction overhead
- cache overhead
- Cache consistency
- Disk overhead - comparable to demand loaded text
not evaluated.
16Results Overhead
Overhead Instructions ()
Conclusion JITCL Overhead is less than 0.1 in
all cases.
17Results Performance
Saved Cycles per Instruction
Conclusion Overall performance is comparable to
PH.
18JITCL for Win32 Applications
- Windows applications are composed of multiple
executable modules. - When transitions between modules are frequent,
intra-module code layout is less effective. - With JITCL, inter-module code layout is possible
and beneficial.
19Win32 Cache Miss Rates
Conclusion Careful layout did not help Win32
applications.
20Text Segment Size
Text size in megabytes
Conclusion JITCL typically reduces text size by
50.
21JITCL vs. PBO
- JITCL provides an alternative to feedback-based
procedure layout. - Many important optimizations still require
profile information. - instruction scheduling
- register allocation
- other intra-procedural optimizations
- Dont expect profile-based optimization to go
away!
22Conclusions
- Just-In-Time code layout achieves comparable
benefit to profile-based code layout without the
need for profiles. - The AO heuristic is effective.
- The overhead of procedure copying is low.
- Benefit in I-Cache is comparable to Pettis and
Hansen layout. - JITCL can reduce working set size.
23The Morph Project
For more information http//www.eecs.harvard.edu/
morph/