Title: DynamoSim A Tracebased Dynamically Compiled Instruction Set Simulator
1DynamoSim -A Trace-based Dynamically Compiled
Instruction Set Simulator
- Wai Sum Mong, Jianwen Zhu
- ECE, University of Toronto
- Nov 8th, 2004
ICCAD 04
2Outline
- Instruction Set Simulation Techniques
- Our Strategies
- Experiments
- Conclusion
3Interpretive Simulation
- ? Simple
- ? Slow
- Software decoding is slow and redundant
- Each target instruction is simulated by
- multiple host instructions
E.g. SimpleScalar (Winconsin)
4Static Compiled Simulation
- ? Avoid expensive and redundant decoding
- ? Cannot handle dynamic program code
E.g. Compiled simulator for DSP architectures
(Zivojnvic95) Ultra-fast instruction set
simulator (Zhu99)
5Dynamic Compiled Simulation
- Translation
- Native code dynamically compiled from a chunk of
target instructions - Caches translation for reuse
Lookup PC in Translation Cache
simulation compiler
Translation Cache
00100010001011000 ..
00100010001011000 01001000100100001 . 010
00100100.00001 00001000100001000 1111100000101
01..100 .
00100010001011000 01001000100100001 .
6Dynamic Compiled Simulation Contd
- Examples
- Shade (Cmelik94)
- Embra (Witchel96)
- Other solutions
- JIT-CCS (Nohl02)
- Caches decoded info.
- IS-CS (Reshadi03)
- Only decodes dynamic code at runtime
- ? Flexibility of
- interpretive simulation
- ? Performance approaches
- static compilation if
- translations are
- repeatedly reused
We propose three techniques to improve the
dynamic compiled simulation
7Our Strategies
- Selective Compilation
- Applying dynamic compilation only on selected
parts of program code - Widening Translation Scope
- Extending compilation region beyond the scope of
basic blocks - Register Allocation
- Mapping target registers directly to host
registers in translations
8Selective Compilation Why?
- Dynamic compilation
- High compilation overhead
- Low variable cost
- Lower avg. cost if the translation is reused
- Interpretation
- High variable cost
- Cheaper approach for infrequently used instruction
Solution Interpretation Compilation
9Selective Compilation How?
- Interpret
- Observe program behavior
- Switch to compile when profitable code portion
is detected - Switch to execute when a translation is found
- Compile
- Compiles profitable code
- Caches the translation
- Switch to execute
- Execute
- Execute a translation
10Widening Translation Scope Why?
translations
- Overhead of prologue and epilogue
- Traffic between interpreter and executor
- Cache lookup operations
- instruction-level parallelism
simulator
11Widening Translation Scope - Trace
- Trace
- Sequence of dynamically executed instructions
- Spans multiple basic blocks
- Single-entry multiple exits
- Hot Trace
- Frequently executed path
Trace
A
B
C
D
12Trace-based Selective Compilation
- Interpret
- Watches for a hot trace
- Switch when a hot trace is identified
- Interpret Compile
- Works out the trace
- Interprets the instruction as we compile
- Switch back to interpret
How to identify hot trace with partial execution?
13Hot Trace Prediction Methodology - Dynamo
- Hot trace prediction using partial execution
profile - Dynamo (PLDI 2000, HP Lab)
- A transparent dynamic optimization system
- Interprets and optimizes native code in software
at runtime - This is NOT a simulator!
- Methodology
- Interprets application in user-mode
- Watches for a hot trace
- Optimizes the hot trace and caches for reuse
Predicting hot traces from interpretation
14Adapting Dynamos Hot Trace Prediction Method to
DynamoSim
- Identifies hot traces in loops
- Associates a counter to the start of a potential
hot trace - Increments the counter at each time the
start-of-trace condition is satisfied - If counter gt threshold,
- mark as start of a hot trace
- Compilation engine starts
- Interprets and compiles until end-of-trace
condition is satisfied
15Dynamos Hot Trace Prediction Method Hot Trace
Definition
- Start-of-trace candidate counter
- Targets of taken backward branches
- Potential loop header
- Exits of previously identified hot traces
- Statistically likely that subsequent trace is hot
- End-of-trace compiler stops
- backward-taken branches
- Signals end of a loop
- Taken branch addresses hit translation cache
- Potential start of a translation
- Expensive to do lookup for each inst.
A
B
C
D
16Register Allocation Why?
- ? Reduce memory traffic
- ? Reduce translation size
- ? May not have enough usable host registers for
each - target register
?
17Register Allocation How?
- Trace-based
- Host registers are lazily allocated in
compilation - Allocated host registers are not released until
translation ends - If no host register can be allocated, use a
scratch register - Reserved host registers
- Temporary mapping
- Commit values in dirty registers back to
simulated registers in the end of translation
18Register Allocation How?
scratch
Allocation Table
host registers h0, h1, h2, h3, h4
t1 Ø
t1 t2
load simRegs1 to h3 load simRegs2 to h4 h2
h3 h4 store h2 into simRegs3
load simRegs3 to h0 h2 h0 h4 Store h2
into simRegs3
Translation with RA
19Experiment Setup
- SimpleScalar
- Translation cache
- 4-way associative cache
- Trace prediction
- Threshold 3
- Dynamic compilation
- VCODE (Engler95)
- Fast dynamic code generation system
- SPEC2000 Testbenches
20Experimental Result
Compiled traditional dynamic compiled
simulation Hybrid dynamic compiled
selective compilation
trace-based Hybrid RA dynamic compiled
selective compilation
trace-based register allocation
21Experimental Result Statistics of 181.mcf
22Conclusions
- We propose 3 techniques to improve dynamic
compiled instruction-set simulation - Selective compilation
- Interpretation compilation
- Widening translation scope
- Extends from basic block to trace
- Register Allocation
- Maps host registers directly to target registers
- Our experimental results proved that the proposed
techniques are effective