TOR AAMODT

About This Presentation

Title:

TOR AAMODT

Description:

The Predictability of Computations that Produce Unpredictable Outcomes. Aamodt, ... Commited Instructions. 32, 64, 128, or 256 window. Ignore Control Flow ... – PowerPoint PPT presentation

Number of Views:152

Avg rating:3.0/5.0

Slides: 24

Provided by: toraa

Category:

more less

Transcript and Presenter's Notes

Title: TOR AAMODT

1
The Predictability of Computations that Produce
Unpredictable Outcomes

TOR AAMODT
(aamodt_at_eecg.utoronto.ca
Andreas Moshovos Paul Chow
Electrical and Computer Engineering
University of Toronto
Canada

2
Outcome-Based Prediction
History of Outcomes leading up to Branch
X TNTTNTT ...NTN... TNTTNTT
History
Next time we encounter X after TNTTNT we can
predict T
Outcome of Branch X
Why this works Locality in the outcome stream
3
Problem

Unpredictable Branches THE Problem.
No Outcome-Locality

4
Operation-Based Prediction
add
ld
slt
bne

Find locality in the computations that produce
the outcome

5
This Work

First work that looks at the fundamental program
behaviour that would facilitate operation-based
prediction.
Related work
Characterization of slices
Prefetching loads / pre-execution of branches

6
Ideally...

Slice (i.e., slice trace) will always be the
same.
Slice will contain very few operations spanning
large portion of original program.
Easy (fast) to pre-compute.

7
Terminology

Lead earliest instruction in slice
Target branch we want to precompute

add
ld
slt
bne
8
What Should a Slice be?

Commited Instructions
32, 64, 128, or 256 window
Ignore Control Flow
retain side-effect of JAL on r31
Memory Dependence
follow resolved load-store dependence M
Restrict Instructions
R max 1/4, U no restriction

FETCH ... COMMIT
older
9
Methodology

12 programs from SPEC2000
Baseline Outcome Prediction Hardware
64K Gshare 64K bimodal w/ 64K selector
64 entry RAS
sim-outorder (SimpleScalar 3.0)
8-way, 128 entry RUU, 64 entry-fetch buffer
64K dual LI, 256K unified L2
64 entry LSQ
Perfect Memory Disambiguation

10
Measuring Slice Locality

locality(1) Probability same slice was seen
last time. High value of locality(1) indicates
that last-operation based slice prediction would
work well.
locality(N) Probability same slice seen in last
N unique slices.

11
Measuring Slice Locality

Save the FOUR unique, most recent slice traces
per static branch (only on misprediction).
Each time a mispredicted branch is encountered
check whether the slice trace was the most
recent, 2nd most recent, etc...

12
Measuring Slice Locality

All results are weighted averages.
Result for each static branch weighted
proportionally to the number of times the
operation-based predictor mispredicted it.
Characteristics of branches that cause most
mispredictions emphasized.

13
Unrestricted Slices 32UM
Better
Locality
gcc equake ammp
bzip
Saving ONE slice captures most of locality.
14
Restricted vs. Unrestricted
Better
Locality
32UM
32RM
gcc equake ammp
bzip
Most slices have few instructions.
15
Effect of Memory Dependence
Better
Locality
64R
64RM
gcc equake ammp
bzip
Tracking Dependence Does Not Affect Locality Much.
16
Window Size
Better
Locality
32RM
64RM
128RM
256RM
gcc equake ammp
bzip
Locality good even for large windows.
17
Effect of Selection Context 128RM
Better
Locality
On Mispredict
Always
gcc equake ammp
bzip
Focusing on Mispredictions Improves Locality.
18
Idealized Predictor
Lead PC

Spawn and execute instantaneously when lead
operation is encountered.
Store up to 4 slice traces per lead operation

19
Idealized Predictor

Match operations register dependencies as
instructions are fetched.
After matching there is usually only one
prediction per target, if any (gt80 of time)...
Tie-breaker 1 longest lead-target distance.
Tie-breaker 2 most recently detected slice.

20
Correcting Mispredictions
32RM
64RM
128RM
gcc equake ammp
bzip
High Coverage of Mispredicted Branches
21
Interaction with Outcome-Based Predictor
32RM
64RM
128RM
gcc equake ammp
bzip
Very Little Destructive Interference
22
Summary

Slice-locality for mispredicted branches
average of 70 for restricted slices on a 64
entry window following load-store dependencies
(12 SPEC2000 benchmarks).
Accuracy of idealized predictor
74 of mispredicted branches eliminated

23
Conclusion

First work that looks at the fundamental program
behaviour, slice-locality, that would facilitate
predicting slice traces to pre-execute outcomes.
SPEC2000 benchmarks show very high slice-locality
for mispredicted branches.

Write a Comment

User Comments (0)