Title: TOR AAMODT
1The Predictability of Computations that Produce
Unpredictable Outcomes
- TOR AAMODT
- (aamodt_at_eecg.utoronto.ca
- Andreas Moshovos Paul Chow
- Electrical and Computer Engineering
- University of Toronto
- Canada
2Outcome-Based Prediction
History of Outcomes leading up to Branch
X TNTTNTT ...NTN... TNTTNTT
History
Next time we encounter X after TNTTNT we can
predict T
Outcome of Branch X
Why this works Locality in the outcome stream
3Problem
- Unpredictable Branches THE Problem.
- No Outcome-Locality
4Operation-Based Prediction
add
ld
slt
bne
- Find locality in the computations that produce
the outcome
5This Work
- First work that looks at the fundamental program
behaviour that would facilitate operation-based
prediction. - Related work
- Characterization of slices
- Prefetching loads / pre-execution of branches
6Ideally...
- Slice (i.e., slice trace) will always be the
same. - Slice will contain very few operations spanning
large portion of original program. - Easy (fast) to pre-compute.
7Terminology
- Lead earliest instruction in slice
- Target branch we want to precompute
add
ld
slt
bne
8What Should a Slice be?
- Commited Instructions
- 32, 64, 128, or 256 window
- Ignore Control Flow
- retain side-effect of JAL on r31
- Memory Dependence
- follow resolved load-store dependence M
- Restrict Instructions
- R max 1/4, U no restriction
FETCH ... COMMIT
older
9Methodology
- 12 programs from SPEC2000
- Baseline Outcome Prediction Hardware
- 64K Gshare 64K bimodal w/ 64K selector
- 64 entry RAS
- sim-outorder (SimpleScalar 3.0)
- 8-way, 128 entry RUU, 64 entry-fetch buffer
- 64K dual LI, 256K unified L2
- 64 entry LSQ
- Perfect Memory Disambiguation
10Measuring Slice Locality
- locality(1) Probability same slice was seen
last time. High value of locality(1) indicates
that last-operation based slice prediction would
work well. - locality(N) Probability same slice seen in last
N unique slices.
11Measuring Slice Locality
- Save the FOUR unique, most recent slice traces
per static branch (only on misprediction). - Each time a mispredicted branch is encountered
check whether the slice trace was the most
recent, 2nd most recent, etc...
12Measuring Slice Locality
- All results are weighted averages.
- Result for each static branch weighted
proportionally to the number of times the
operation-based predictor mispredicted it. - Characteristics of branches that cause most
mispredictions emphasized.
13Unrestricted Slices 32UM
Better
Locality
gcc equake ammp
bzip
Saving ONE slice captures most of locality.
14Restricted vs. Unrestricted
Better
Locality
32UM
32RM
gcc equake ammp
bzip
Most slices have few instructions.
15Effect of Memory Dependence
Better
Locality
64R
64RM
gcc equake ammp
bzip
Tracking Dependence Does Not Affect Locality Much.
16Window Size
Better
Locality
32RM
64RM
128RM
256RM
gcc equake ammp
bzip
Locality good even for large windows.
17Effect of Selection Context 128RM
Better
Locality
On Mispredict
Always
gcc equake ammp
bzip
Focusing on Mispredictions Improves Locality.
18Idealized Predictor
Lead PC
- Spawn and execute instantaneously when lead
operation is encountered. - Store up to 4 slice traces per lead operation
19Idealized Predictor
- Match operations register dependencies as
instructions are fetched. - After matching there is usually only one
prediction per target, if any (gt80 of time)... - Tie-breaker 1 longest lead-target distance.
- Tie-breaker 2 most recently detected slice.
20Correcting Mispredictions
32RM
64RM
128RM
gcc equake ammp
bzip
High Coverage of Mispredicted Branches
21Interaction with Outcome-Based Predictor
32RM
64RM
128RM
gcc equake ammp
bzip
Very Little Destructive Interference
22Summary
- Slice-locality for mispredicted branches
- average of 70 for restricted slices on a 64
entry window following load-store dependencies
(12 SPEC2000 benchmarks). - Accuracy of idealized predictor
- 74 of mispredicted branches eliminated
23Conclusion
- First work that looks at the fundamental program
behaviour, slice-locality, that would facilitate
predicting slice traces to pre-execute outcomes. - SPEC2000 benchmarks show very high slice-locality
for mispredicted branches.