Title: Online%20Subpath%20Profiling
1Online Subpath Profiling
- Yossi Matias
- David Oren
- Mooly Sagiv
- School of Computer Science
- Tel-Aviv University
2Motivation for Profiling
- Feedback on dynamic program behavior
- The 80-20 rule
- Can be used by
- Computer Architects
- Compiler Writers
- Programmers
- Better program performance
3Motivation for Profiling
4Types of Profiling
- Vertex profiling
- No context, just count of instructions
- Edge profiling
- Branch-transition
- Profile-directed optimization
- Path profiling
- Multiple branch-transition
- Intra- or inter-procedural
5Types of Profiling
- Offline
- Results are collected and then displayed
- User in the loop
- Online
- Results are collected and acted upon
- JIT compilation
- Display to user
6Motivation for Subpath Profiling
- Programs may have hot subpaths
- which are part of cold paths
7Challenges
- Large number of subpaths
- gt4M distinct subpaths of length 2,4,...,64k in
JLex - gt35M total subpaths
- Counting all subpaths is prohibitively expensive
- Memory
- Time
- non linear
8Online Subpath Profiler
- Based on an adaptive sampling technique
- Identifies arbitrary hot subpaths
- Low memory overhead
- Low runtime overhead
- Online
- Appropriate for JIT-like compilers
- Can be adapted to different requirements
9Outline
- Algorithm overview
- Adaptive sampling
- Issues
- The OSP algorithm
- Reference implementation
- Experimental results
- Related work
- Conclusion
10Algorithm Overview
- Select on-the-fly a random sample of subpaths
- Count the popularity of sampled subpaths and
obtain estimation by scaling - Achieve high accuracy using limited memory
11Adaptive Sampling
- Based on a hot-list algorithm by Gibbons and
Matias (SIGMOD 1998) - Sample elements from the input set
- Frequently occurring elements will be sampled
more often - Sampling probability determined at runtime,
according to the allowed memory usage - Tradeoff between overhead and accuracy
- Give an estimate of the samples accuracy
12Concise Samples
- Uniform random sampling
- Maintain an ltid, countgt pair for each element
- The sample size can be much larger than the
memory size - For skewed input sets the gain is much larger
- Sampling is not applied at every block
- Vitters reservoir sampling
13Concise Samples
14Issues
- Encoding
- Generating a unique ID for paths
- Path length bias
- Longer or shorter paths?
- Path representation
15The OSP Algorithm
void sampleBlock (BasicBlock b)
subpath.appendBlock (b) if (--length 0)
updateHotList (subpath.id)
skip chooseSkipValue () subpath new
subPath () sampling false
void enterBlock (BasicBlock b) if
(sampling) sampleBlock (b) else if
(--skip 0) length choosePathLength
() sampling true
16The OSP Algorithm
17OSP Algorithm Walkthrough
skip 5 sampling false
void enterBlock (BasicBlock b) if
(sampling) sampleBlock (b) else if
(--skip 0) length choosePathLength
() sampling true
Skipping Sampling
18OSP Algorithm Walkthrough
skip 4 sampling false
Skipping Sampling
19OSP Algorithm Walkthrough
skip 3 sampling false
Skipping Sampling
20OSP Algorithm Walkthrough
skip 2 sampling false
Skipping Sampling
21OSP Algorithm Walkthrough
skip 1 sampling false
Skipping Sampling
22OSP Algorithm Walkthrough
skip 0 length 2 sampling true
void enterBlock (BasicBlock b) if
(sampling) sampleBlock (b) else if
(--skip 0) length choosePathLength
() sampling true
Skipping Sampling
23OSP Algorithm Walkthrough
skip 0 length 1 sampling true
Skipping Sampling
24OSP Algorithm Walkthrough
skip 4 length 0 sampling false
doA-doCommon 1
void sampleBlock (BasicBlock b)
subpath.appendBlock (b) if (--length 0)
updateHotList (subpath.id)
skip chooseSkipValue () subpath new
subPath () sampling false
Skipping Sampling
25OSP Algorithm Walkthrough
skip 3 sampling false
doA-doCommon 1
Skipping Sampling
26OSP Algorithm Walkthrough
skip 2 sampling false
doA-doCommon 1
Skipping Sampling
27OSP Algorithm Walkthrough
skip 1 sampling false
doA-doCommon 1
Skipping Sampling
28OSP Algorithm Walkthrough
skip 0 length 2 sampling true
doA-doCommon 1
Skipping Sampling
29OSP Algorithm Walkthrough
skip 0 length 1 sampling true
doA-doCommon 1
Skipping Sampling
30OSP Algorithm Walkthrough
skip 8 sampling false
doA-doCommon 1 doCommon-if2 1
Skipping Sampling
31OSP Algorithm Walkthrough
skip 7 sampling false
doA-doCommon 1 doCommon-if2 1
Skipping Sampling
32OSP Algorithm Walkthrough
skip 6 sampling false
doA-doCommon 1 doCommon-if2 1
Skipping Sampling
33OSP Algorithm Walkthrough
skip 5 sampling false
doA-doCommon 1 doCommon-if2 1
Skipping Sampling
34OSP Algorithm Walkthrough
skip 4 sampling false
doA-doCommon 1 doCommon-if2 1
Skipping Sampling
35OSP Algorithm Walkthrough
skip 3 sampling false
doA-doCommon 1 doCommon-if2 1
Skipping Sampling
36OSP Algorithm Walkthrough
skip 2 sampling false
doA-doCommon 1 doCommon-if2 1
Skipping Sampling
37OSP Algorithm Walkthrough
skip 1 sampling false
doA-doCommon 1 doCommon-if2 1
Skipping Sampling
38OSP Algorithm Walkthrough
skip 0 length 2 sampling true
doA-doCommon 1 doCommon-if2 1
Skipping Sampling
39OSP Algorithm Walkthrough
skip 0 length 1 sampling true
doA-doCommon 1 doCommon-if2 1
Skipping Sampling
40OSP Algorithm Walkthrough
skip 6 sampling false
doA-doCommon 1 doCommon-if2 2
Skipping Sampling
41After 1000 Iterations
doCommon-if2 253 If1-doA 130 If2-doD
127 if1-doB 122 if2-doC 118 if1-doA-..-if2 65
42Prototype Implementation
- Written in Java, using the Soot Framework
- Handles full Java
- Low memory overhead (50kB)
- Low sampling overhead (5-50)
- Sampling Skipping overhead (current
implementation) 30-360 - High accuracy on tested benchmarks
43Prototype Implementation
- Limited to paths of length 2n
- Favorable tradeoff
- Simple encoding
- Tested for practical performance
- Gives more weight to shorter paths
- Only implementation details!
44Results Runtime Overhead
Program Full Overhead Sampling Overhead
JLex 93 47
FFT 36 16
HeapSort 204 56
MolDyn 241 32
RayTrace 361 41
javac 79 7
45Results Memory Overhead
Program Program memory Profiler memory
JLex 169,728 43,304
FFT 107,416 39,742
HeapSort 107,400 48,960
MolDyn 111,800 40,864
RayTrace 108,106 65,800
46Results Accuracy (FFT)
Rank in sample Accurate rank Count error
1 1 0.94
2 2 0.11
3 3 1.00
4 4 0.76
5 6 29.27
6 11 3.04
7 12 0.76
47Results Incremental (FFT)
True Rank 6 12 18 24 30 36
1 2 6 1 2 2 1
2 3 4 2 1 1 2
3 1 2 3 3 3 4
4 4 1 8 4 4 3
5 5 5 7 5 5 5
48Related Work
- Ball, Larus Efficient path profiling (MICRO
1996) - Larus Whole program paths (PLDI 1999)
- Melski, Reps Interprocedural path profiling (CC
1999) - Taub, Schechter, Smith Ephemeral instrumentation
for lightweight program profiling (2000) - Sastry, Bodik, Smith Rapid profiling via
stratified sampling (Computer Architecture 2001) - Bala, Duesterwald, Banerjia Dynamo a
transparent dynamic optimization system (PLDI
2001)
49Related Work
- Ball-Larus path profiler (MICRO 1996) and
extensions - Only Acyclic paths
- Whole Program Path (Larus, PLDI 1999)
- Uses an alphabet representing acyclic paths
- Compact image of a whole program trace
- Not online
50Related Work
- Dynamo (PLDI 2000)
- A dynamic compiler for native code
- Locates hot traces and optimizes them
- Limits places where hot traces may start
- It would be interesting to integrate OSP into
Dynamo
51Limitations
- Results are only an approximation
- Other methods are approximations as well
- Guaranteed confidence and accuracy as function of
hotness - Context not taken into account
- Robust, works for arbitrary subpaths
- Stand alone tool
- Integrate into existing tools
52Conclusions
- We have presented a framework for online subpath
profiling - We have a reference implementation
- Simple
- Efficient
- Accurate
53Motivation for Subpath Profiling
- Programs may have hot subpaths
- which are part of cold paths
54Extensions
- Post-processing
- Reconstruct longer paths from the sampled
subpaths - More memory efficient path representation
- Different path length bias
55Online Subpath Profiler
- Based on an adaptive sampling technique
- At any moment during execution can report hottest
subpaths encountered so far - Can easily be adapted to different requirements