TOR AAMODT - PowerPoint PPT Presentation

1 / 23
About This Presentation
Title:

TOR AAMODT

Description:

The Predictability of Computations that Produce Unpredictable Outcomes. Aamodt, ... Commited Instructions. 32, 64, 128, or 256 window. Ignore Control Flow ... – PowerPoint PPT presentation

Number of Views:149
Avg rating:3.0/5.0
Slides: 24
Provided by: toraa
Category:
Tags: aamodt | tor | commited

less

Transcript and Presenter's Notes

Title: TOR AAMODT


1
The Predictability of Computations that Produce
Unpredictable Outcomes
  • TOR AAMODT
  • (aamodt_at_eecg.utoronto.ca
  • Andreas Moshovos Paul Chow
  • Electrical and Computer Engineering
  • University of Toronto
  • Canada

2
Outcome-Based Prediction
History of Outcomes leading up to Branch
X TNTTNTT ...NTN... TNTTNTT
History
Next time we encounter X after TNTTNT we can
predict T
Outcome of Branch X
Why this works Locality in the outcome stream
3
Problem
  • Unpredictable Branches THE Problem.
  • No Outcome-Locality

4
Operation-Based Prediction
add
ld
slt
bne
  • Find locality in the computations that produce
    the outcome

5
This Work
  • First work that looks at the fundamental program
    behaviour that would facilitate operation-based
    prediction.
  • Related work
  • Characterization of slices
  • Prefetching loads / pre-execution of branches

6
Ideally...
  • Slice (i.e., slice trace) will always be the
    same.
  • Slice will contain very few operations spanning
    large portion of original program.
  • Easy (fast) to pre-compute.

7
Terminology
  • Lead earliest instruction in slice
  • Target branch we want to precompute

add
ld
slt
bne
8
What Should a Slice be?
  • Commited Instructions
  • 32, 64, 128, or 256 window
  • Ignore Control Flow
  • retain side-effect of JAL on r31
  • Memory Dependence
  • follow resolved load-store dependence M
  • Restrict Instructions
  • R max 1/4, U no restriction

FETCH ... COMMIT
older
9
Methodology
  • 12 programs from SPEC2000
  • Baseline Outcome Prediction Hardware
  • 64K Gshare 64K bimodal w/ 64K selector
  • 64 entry RAS
  • sim-outorder (SimpleScalar 3.0)
  • 8-way, 128 entry RUU, 64 entry-fetch buffer
  • 64K dual LI, 256K unified L2
  • 64 entry LSQ
  • Perfect Memory Disambiguation

10
Measuring Slice Locality
  • locality(1) Probability same slice was seen
    last time. High value of locality(1) indicates
    that last-operation based slice prediction would
    work well.
  • locality(N) Probability same slice seen in last
    N unique slices.

11
Measuring Slice Locality
  • Save the FOUR unique, most recent slice traces
    per static branch (only on misprediction).
  • Each time a mispredicted branch is encountered
    check whether the slice trace was the most
    recent, 2nd most recent, etc...

12
Measuring Slice Locality
  • All results are weighted averages.
  • Result for each static branch weighted
    proportionally to the number of times the
    operation-based predictor mispredicted it.
  • Characteristics of branches that cause most
    mispredictions emphasized.

13
Unrestricted Slices 32UM
Better
Locality
gcc equake ammp
bzip
Saving ONE slice captures most of locality.
14
Restricted vs. Unrestricted
Better
Locality
32UM
32RM
gcc equake ammp
bzip
Most slices have few instructions.
15
Effect of Memory Dependence
Better
Locality
64R
64RM
gcc equake ammp
bzip
Tracking Dependence Does Not Affect Locality Much.
16
Window Size
Better
Locality
32RM
64RM
128RM
256RM
gcc equake ammp
bzip
Locality good even for large windows.
17
Effect of Selection Context 128RM
Better
Locality
On Mispredict
Always
gcc equake ammp
bzip
Focusing on Mispredictions Improves Locality.
18
Idealized Predictor
Lead PC
  • Spawn and execute instantaneously when lead
    operation is encountered.
  • Store up to 4 slice traces per lead operation

19
Idealized Predictor
  • Match operations register dependencies as
    instructions are fetched.
  • After matching there is usually only one
    prediction per target, if any (gt80 of time)...
  • Tie-breaker 1 longest lead-target distance.
  • Tie-breaker 2 most recently detected slice.

20
Correcting Mispredictions
32RM
64RM
128RM
gcc equake ammp
bzip
High Coverage of Mispredicted Branches
21
Interaction with Outcome-Based Predictor
32RM
64RM
128RM
gcc equake ammp
bzip
Very Little Destructive Interference
22
Summary
  • Slice-locality for mispredicted branches
  • average of 70 for restricted slices on a 64
    entry window following load-store dependencies
    (12 SPEC2000 benchmarks).
  • Accuracy of idealized predictor
  • 74 of mispredicted branches eliminated

23
Conclusion
  • First work that looks at the fundamental program
    behaviour, slice-locality, that would facilitate
    predicting slice traces to pre-execute outcomes.
  • SPEC2000 benchmarks show very high slice-locality
    for mispredicted branches.
Write a Comment
User Comments (0)
About PowerShow.com