Title: Research on Branch Prediction Algorithms
1Research on Branch Prediction Algorithms
Baozhen Yu Wenyuan Xu Xiaoxuan Li Department
of Electrical Computer Engineering Rutgers
University
2Agenda
- Why branch prediction
- Our Branch Prediction Simulator
- Our simulator system environment.
- Some branch prediction schemes and some
experiment result - Comparison
- Conclusions
- Contributions from team members
3Why need branch prediction?
4How an Instruction is Processed
Processing can be divided into five stages
Instruction fetch
Instruction decode
Execute
Memory access
Write back
5Instruction-Level Parallelism
To speed up the process, pipelining overlaps
execution of multiple instructions, exploiting
parallelism between instructions
Instruction fetch
Instruction decode
Execute
Memory access
Write back
6Control Hazards Branches
Conditional branches create a problem for
pipelining the next instruction can't be fetched
until the branch has executed, several stages
later.
Branch instruction
7Pipelining and Branches
Pipelining overlaps instructions to exploit
parallelism, allowing the clock rate to be
increased. Branches cause bubbles in the
pipeline, where some stages are left idle.
Instruction fetch
Instruction decode
Execute
Memory access
Write back
Unresolved branch instruction
8Branch Prediction
A branch predictor allows the processor to
speculatively fetch and execute instructions down
the predicted path.
Instruction fetch
Instruction decode
Execute
Memory access
Write back
Speculative execution
Branch predictors must be highly accurate to
avoid mispredictions!
9Branch Predictors Must Improve
- The cost of a misprediction is proportional to
pipeline depth - Pentium 4 pipeline has 20 stages
- Future pipelines will have gt 32 stages
- As pipelines deepen, we need more accurate branch
predictors - Our research on branch prediction will mainly
based on the prediction accuracy.
10Our BP Simulator System environment
11Simulator System
- Our benchmarks represent general computer.
Because sorting is the typical task for both
integer and float point program. Our benchmarks
use quick sort and heap sort, most widely used. - Date file collect two kind of information
- Conditional branch address
- Branch taken result
- One BP simulator for each branch
prediction scheme.
12Simulator System---- Benchmarks
- Heapsort is much more predictable than quicksort.
- The size of branch pattern in Heapsort is much
larger than the pattern in Quicksort. - The pattern in heapsort is more obvious.
- Patterns are more regular in sorting longer list
of numbers than sorting short. - More branch address in Heapsort
13Branch Prediction Schemes and Some Experiment
Result
14Scheme1--- Branch History Table(BHT)
- Use part of the branch address as index.
- Uses branch history to predict outcome.
- Larger bits of history table yield better
performance but higher cost.
Branch Taken Result
update
Branch history Table
index
Branch Address
Prediction
15BHT Prediction for HeapSort (10000)
- Larger width bits, lower miss rate. But the miss
rate wont improve for ever. If the size of the
pattern is much smaller than bits, larger bits
will hurt performance. - Bigger BHT table, lower miss rate. Because there
are less conflicts or alias.
16BHT Prediction for QuickSort (10000)
- Table size has little influence on miss rate,
its the benchmarks limit - Longer width bits dont match with the relatively
random pattern in quicksort, it hurt the
performance.
17Scheme2 2-level adaptive prediction
- Use two levels of branch history information to
make prediction. - Use the branch address and pattern of branch
history as two-dimension index. - Predict according to the content in Pattern
History Table.
182-level-adaptive prediction for HeapSort
192-level-adaptive prediction for QuickSort
202-level-adaptive prediction conclusions
- Using Heapsort Benchmark
- The bigger the PHT width , the better the
performance. But after PHTgt64, the improvement
is not obvious, because no more address aliases
exist after BHTgt64 - The bigger the BHR width , the better the
performance. But when BHR widthgt64, performance
improvement is not obvious, because no more
patterns can be traced. - Using Quicksort Benchmark
- Wider PHT and BHR dont always bring better
performance. That may be due to the lack of
obvious pattern in the branches of Quicksort.
21Scheme3 A Neural Method
- Perceptron - Artificial neural networks
- Simple model of neural networks in brain cells
- Learn to recognize and classify patterns
- A small and fast neural method
- Very high accuracy for branch prediction
22Branch-Predicting Perceptron
- Inputs (xs) are from branch history register
- Weights (ws) are small integers learned by
on-line training - Output (y) gives prediction dot product of xs
and ws - Training finds correlations between history and
outcome
23Training Algorithm
24Organization of the Perceptron Predictor
- Keeps a table of perceptrons, indexed by branch
address - Inputs are from branch history register
- Predict taken if output ? 0, otherwise predict
not taken
25Perceptron prediction history length
26Perceptron prediction of perceptrons
27Comparing Different Schemes
28Scheme under QuickSort data file
0 10 100
1000 10000 100000
Number of random numbers to be sorted
29Scheme under HeapSort data file
0 10 100
1000 10000 100000
Number of random numbers to be sorted
30Scheme comparison
0 10 100
1000 10000 100000
Number of random numbers to be sorted
31Conclusions
32Conclusions
- The pattern of QuickSort branch is more random
than HeapSort, its harder to predict, especially
when the list to be sorted is small. - For BHT and 2-level-adaptive predictor, the
parameters are sensitive to the pattern of
branchs. The performance will not improve
indefinitely when increase the table size or
Branch history register. Sometimes larger
parameter even hurts.
33Conclusions --- continue
- Perceptron neural predictor is a Smart scheme.
- It has accurate prediction even when the branches
have no obvious patterns. - Better representation
- The weight for history branch result is not
always 1. Important one has higher weight. - Neural predictor needs less hardware(15 global
perceptron history register, and 163
preceptrons), to achieve higher prediction
accuracy.
34Contribution
35Reference
- Some slides from Neural Methods for Dynamic
Branch Prediction-----Professor Daniel A.
Jiménez - 1 Calder, Brad, Dirk Grumwald and J. Emer, A
System Level Perspective on Branch Prediction
Architecture Performance, Proceeding of the 28th
Intl. Symposium on Microarchitecture, pp 199-206,
1995 - 2 S-T Pan, K. So, and J.T. Rahmeh, Improving
the accuracy of dynamic branch prediction using
branch correlation, Fifth Intl. Conf. on Arch.
Support for Prog. Lang. And OS, Boston, MA, Oct.
1992,pp 76-84 - 3 Yeh, Tse-Yu, and Yale N, Patt, A
comprehensive instruction fetch mechanism for a
processor supporting speculative execution, In
25th Intl. Symposium on Microarchitecture,
Portland, OR, ACM, Dec 1992, pp 129-139 - 4 S. McFarling, Combining Branch Predictors,
WRL Technical Note TN-36, Digital Equipment
Corporation, Jun 1993 - 5 P.-Y Chang E. Hao, T-Y Yeh and Y. n. Patt,
branch Classification a new Mechanism for
Improving Branch Predictor Performance,
Proceedings of the International Conference on
Parallel Processing, 1995. - 6 Daniel Jimenez, Neural Methods for Dynamic
Branch Prediction, ACM Transactions on Computer
Systems, Vol 20, No.4, November 2002, page
369-397.
36Question?