Predicting%20Conditional%20Branches%20With%20Fusion-Based%20Hybrid%20Predictors - PowerPoint PPT Presentation

About This Presentation
Title:

Predicting%20Conditional%20Branches%20With%20Fusion-Based%20Hybrid%20Predictors

Description:

Predicting Conditional Branches With Fusion-Based Hybrid Predictors Yale University Dept. of Computer Science Gabriel H. Loh Yale University Depts. of Elec. Eng ... – PowerPoint PPT presentation

Number of Views:86
Avg rating:3.0/5.0
Slides: 34
Provided by: susan768
Category:

less

Transcript and Presenter's Notes

Title: Predicting%20Conditional%20Branches%20With%20Fusion-Based%20Hybrid%20Predictors


1
Predicting Conditional Branches With Fusion-Based
Hybrid Predictors
Gabriel H. Loh Yale University Dept. of Computer Science
Dana S. Henry Yale University Depts. of Elec. Eng. Comp. Sci.
This research was funded by NSF Grant MIP-9702281
2
The Branch Prediction Problem
PC Compute
Branch resolution
  • 1 out of 5 instructions is a branch
  • May require many cycles to resolve
  • P4 has 20 cycle branch resolution pipeline
  • Future pipeline depths likely to increase
    Sprangle02
  • Predict branches to keep pipeline full

3
Bigger Predictors More Accurate
(but bigger predictors slower)
  • Larger predictors tend to yield more accurate
    predictions
  • Faster cycle times force smaller branch
    predictors
  • Overriding predictor couples small, fast
    predictor with a large, multi-cycle predictor
    Jiménez2000
  • performs close to ideal large-fast predictor

4
Hybrid Predictors
  • Wide variety of branch prediction algorithms
    available
  • Hybrid combines more than one stand-alone or
    component predictor McFarling93

P1
P2
Meta- Predictor
Final Prediction
5
Multi-Hybrids
P1
P2
M1
P3
P4
M2
P1
P2
Pn


M3


Pr. Encoder
Final Prediction
Final Prediction
Multi-Hybrid Evers96
Quad-Hybrid Evers00
6
Our Idea Prediction Fusion
P1
P2
P3
Pn


Prediction Selection
7
Early Attempt from ML
P2
P8
P7
P3
P6
P5
P1
P4
0.487
0.513
P2, P6 and P7 say not-taken
P1, P3, P4, P5 and P8 say taken
  • Weighted Majority algorithm LW94
  • Better predictors get assigned larger weights
  • Make final prediction with larger sum
  • Predictor with largest weight not always correct

8
Outline
  • COLT Predictor
  • Choosing parameters and components
  • Performance
  • Prediction distributions, component choice

9
COLT Organization
P1
P2
P3
Pn

Branch Address
Mapping Table
Branch History
1
0
1
0


Final Prediction
VMT
10
Pathological Example
P1
P2
P3
0
0
0
Actual outcome 1 (taken)
11
Example (contd)
Selection
COLT
P1
P2
P3
P1
P2
P3
VMT
0 0 0
1 1 0 1
0
0
0
Can recognize and remember this pattern
Outcome is always wrong
1
12
COLT Lookup Delay
time
P1
P2
Pn


1
0
0
1
1
...
...
.
.
.
.
.
.
Prediction
13
Design Choices
  • of branch address bits
  • of branch history bits
  • of components
  • Choice of components
  • gshare, PAs, gskewed,
  • History length, PHT size,

Determines number of mapping tables

Determines size of individual MTs
14
Predictor Components
  • Global History
  • gshare McFarling93
  • Bi-Mode Lee97
  • Enhanced gskewed Michaud97
  • YAGS Eden98
  • Local History
  • PAs Yeh94
  • pskewed Evers96
  • Other
  • 2bC (bimodal) Smith81
  • Loop Chang95
  • alloyed Perceptron Jiménez02

history lengths optimized on test data sets
Total of 59 configurations Sizes vary up to 64KB
15
Huge Search Space
  • 259 ways to choose components
  • ? ways to choose COLT parameters
  • We use a genetic search

gene format


bit-k 0 means dont include Pk bit-k 1 means
do include Pk
VMT Size
history length
16
Methodology
  • SPEC2000 integer benchmarks
  • For tuning/optimization 10M branches from test
  • For evaluation 500M branches from train
  • Skipped first 100M branches
  • Compiled with cc arch ev6 O4 fast non_shared
  • SimpleScalar simulator
  • sim-safe for trace collection
  • MASE for ILP simulations

17
Genetic Search COLT Results
Name Size (KB) Components VMT Counter width History length
a 16 alpct(34/10) gskewed(12) gshare(8) 2048 4 8
b 32 alpct(34/10) gshare(15) gshare(9) PAs(7) 8192 4 7
g 64 alpct(40/14) gshare(16) YAGS(11) pskewed(6) 16384 4 10
d 128 alpct(40/14) alpct(38/14) gshare(16) gskewed(13) YAGS(12) PAs(8) 16384 4 7
h 256 alpct(50/18) alpct(34/10) gshare(18) Bi-Mode(16) gskewed(15) PAs(8) 32768 4 4
18
Overall Predictor Performance
19
Per-Benchmark Performance
20
ILP Performance
  • Simulated CPU
  • 6-issue
  • 20 cycle pipeline
  • Same functional units, latencies, caches as Intel
    P4/NetBurst microarchitecture



1-cycle 2bC
4-cycle OR alpct
4-cycle OR COLT
Ideal 1-cycle COLT
21
ILP Impact
22
COLT Parameter Sensitivity
  • Mapping table counter widths
  • Number of mapping tables
  • Number of history bits for VMT index

23
Counter Width
24
VMT Size
25
History Length
26
Explaining Choice of Components
  • Parameter sensitivity results shows GA performed
    well for the COLT parameters
  • Why did it choose the component predictors that
    it did?

27
Classifying COLT Predictions
  • We examined the b (32KB) COLT config.
  • For each mapping table lookup, we examine the
    neighboring entries

entry 0001 NT
0010
P1
P2
P3
P4
1
0
0
1
entry 1001 T
1111
entry 1101 T
1001
28
Classifying Predictions (contd)
gshare (9)
gshare (14)
PAs (7)
alpct (34/10)
32KB COLT
Classes
  • easy all neighboring entries agree
  • short only gshare(9) distinguishes
  • long only gshare(14) distinguishes
  • local only PAs(7) distinguishes
  • perceptron only alpct(34/10) distinguishes
  • multi-length mix of gshare(9), (14) or alpct
  • mixed both global and local components

29
Prediction Classifications
30
Related Work/Issues
  • Alloyed history Skadron00
  • Variable path history length Stark98
  • Dynamic history length fitting Juan98
  • Interference reduction lots
  • COLT handles all of these cases
  • Doesnt support partial update policies

31
Open Research
  • Better individual components
  • Augment with SBI Manne99, agree Sprangle97
  • Better fusion algorithms
  • Hybrid fusion/selection algorithms
  • Other domains (branch confidence prediction,
    value prediction, memory dependence prediction,
    instruction criticality prediction, )

32
Summary
  • Fusion is more powerful than selection
  • Combines multiple sources of information
  • Branch behavior is very varied
  • Need long, short, global and local histories,
    multiple simultaneous lengths and types of
    history
  • COLT is one possible fusion-based predictor
  • Combines multiple types of information
  • Current best purely dynamic predictor

33
Questions?
Write a Comment
User Comments (0)
About PowerShow.com