Title: CENG 450 Computer Systems and Architecture Lecture 12
1CENG 450Computer Systems and
ArchitectureLecture 12
- Amirali Baniasadi
- amirali_at_ece.uvic.ca
2This Lecture
- Branch Prediction
- Multiple Issue
3Branch Prediction
- Predicting the outcome of a branch
- Direction
- Taken / Not Taken
- Direction predictors
- Target Address
- PCoffset (Taken)/ PC4 (Not Taken)
- Target address predictors
- Branch Target Buffer (BTB)
4Why do we need branch prediction?
- Branch prediction
- Increases the number of instructions available
for the scheduler to issue. Increases
instruction level parallelism (ILP) - Allows useful work to be completed while waiting
for the branch to resolve
5Branch Prediction Strategies
- Static
- Decided before runtime
- Examples
- Always-Not Taken
- Always-Taken
- Backwards Taken, Forward Not Taken (BTFNT)
- Profile-driven prediction
- Dynamic
- Prediction decisions may change during the
execution of the program
6What happens when a branch is predicted?
- On misprediction
- No speculative state may commit
- Squash instructions in the pipeline
- Must not allow stores in the pipeline to occur
- Cannot allow stores which would not have happened
to commit - Even for good branch predictors more than half of
the fetched instructions are squashed
7A Generic Branch Predictor
Predicted Stream PC, T or NT
Fetch
f(PC, x)
Resolve
Actual Stream f(PC, x) T or NT
Actual Stream
Execution Order
Predicted Stream
- Whats f (PC, x)? - x can be any relevant
info thus far x was empty
8Bimodal Branch Predictors
- Dynamically store information about the branch
behaviour - Branches tend to behave in a fixed way
- Branches tend to behave in the same way across
program execution - Index a Pattern History Table using the branch
address - 1 bit branch behaves as it did last time
- Saturating 2 bit counter branch behaves as it
usually does
9Saturating-Counter Predictors
- Consider strongly biased branch with infrequent
outcome - TTTTTTTTNTTTTTTTTNTTTT
- Last-outcome will misspredict twice per
infrequent outcome encounter - TTTTTTTTNTTTTTTTTNTTTT
- Idea Remember most frequent case
- Saturating-Counter Hysteresis
- often called bi-modal predictor
- Captures Temporal Bias
10Bimodal Prediction
- Table of 2-bit saturating counters
- Predict the most common direction
- Advantages simple, cheap, good accuracy
- Bimodal will misspredict once per infrequent
outcome encounter - TTTTTTTTNTTTTTTTTNTTTT
11Correlating Predictors
- From program perspective
- Different Branches may be correlated
- if (aa 2) aa 0
- if (bb 2) bb 0
- if (aa ! bb) then
- Can be viewed as a pattern detector
- Instead of keeping aggregate history information
- I.e., most frequent outcome
- Keep exact history information
- Pattern of n most recent outcomes
- Example
- BHR n most recent branch outcomes
- Use PC and BHR (xor?) to access prediction table
12Pattern-based Prediction
- Nested loops
- for i 0 to N
- for j 0 to 3
-
- Branch Outcome Stream for j-for branch
- 11101110111011101110
- Patterns
- 111 -gt 0
- 110 -gt 1
- 101 -gt 1
- 011 -gt 1
- 100 accuracy
- Learning time 4 instances
- Table Index (PC, 3-bit history)
13Two-level Branch Predictors
- A branch outcome depends on the outcomes of
previous branches - First level Branch History Registers (BHR)
- Global history / Branch correlation past
executions of all branches - Self history / Private history past executions
of the same branch - Second level Pattern History Table (PHT)
- Use first level information to index a table
- Possibly XOR with the branch address
- PHT Usually saturating 2 bit counters
- Also private, shared or global
14Gshare Predictor (McFarling)
Branch History Table
Global BHR
Prediction
f
PC
- PC and BHR can be
- concatenated
- completely overlapped
- partially overlapped
- xored, etc.
- How deep BHR should be?
- Really depends on program
- But, deeper increases learning time
- May increase quality of information
15Hybrid Prediction
- Combining branch predictors
- Use two different branch predictors
- Access both in parallel
- A third table determines which prediction to use
Two or more predictor components combined -
- Different
- branches benefit
- from different types
- of history
16Issues Affecting Accurate Branch Prediction
- Aliasing
- More than one branch may use the same BHT/PHT
entry - Constructive
- Prediction that would have been incorrect,
predicted correctly - Destructive
- Prediction that would have been correct,
predicted incorrectly - Neutral
- No change in the accuracy
17More Issues
- Training time
- Need to see enough branches to uncover pattern
- Need enough time to reach steady state
- Wrong history
- Incorrect type of history for the branch
- Stale state
- Predictor is updated after information is needed
- Operating system context switches
- More aliasing caused by branches in different
programs
18Performance Metrics
- Misprediction rate
- Mispredicted branches per executed branch
- Unfortunately the most usually found
- Instructions per mispredicted branch
- Gives a better idea of the program behaviour
- Branches are not evenly spaced
19Upper Limit to ILP Ideal Machine
Amount of parallelism when there are no branch
mis-predictions and were limited only by data
dependencies.
FP 75 - 150
Integer 18 - 60
IPC
Instructions that could theoretically be issued
per cycle.
20Impact of Realistic Branch Prediction
- Limiting the type of branch prediction.
FP 15 - 45
Integer 6 - 12
IPC
21Multiple Issue
- Multiple Issue is the ability of the processor to
start more than one instruction in a given cycle. - Superscalar processors
- Very Long Instruction Word (VLIW) processors
221990s Superscalar Processors
- Bottleneck CPI gt 1
- Limit on scalar performance (single instruction
issue) - Hazards
- Superpipelining? Diminishing returns (hazards
overhead) - How can we make the CPI 0.5?
- Multiple instructions in every pipeline stage
(super-scalar) - 1 2 3 4 5 6 7
- Inst0 IF ID EX MEM WB
- Inst1 IF ID EX MEM WB
- Inst2 IF ID EX MEM WB
- Inst3 IF ID EX MEM WB
- Inst4 IF ID EX MEM WB
- Inst5 IF ID EX MEM WB
23Superscalar Vs. VLIW
- Religious debate, similar to RISC vs. CISC
- Wisconsin Michigan (Super scalar) Vs. Illinois
(VLIW) - Q. Who can schedule code better, hardware or
software?
24Hardware Scheduling
- High branch prediction accuracy
- Dynamic information on latencies (cache misses)
- Dynamic information on memory dependences
- Easy to speculate ( recover from
mis-speculation) - Works for generic, non-loop, irregular code
- Ex databases, desktop applications, compilers
- Limited reorder buffer size limits lookahead
- High cost/complexity
- Slow clock
25Software Scheduling
- Large scheduling scope (full program), large
lookahead - Can handle very long latencies
- Simple hardware with fast clock
- Only works well for regular codes (scientific,
FORTRAN) - Low branch prediction accuracy
- Can improve by profiling
- No information on latencies like cache misses
- Can improve by profiling
- Pain to speculate and recover from
mis-speculation - Can improve with hardware support
26Superscalar Processors
- Pioneer IBM (America gt RIOS, RS/6000, Power-1)
- Superscalar instruction combinations
- 1 ALU or memory or branch 1 FP (RS/6000)
- Any 1 1 ALU (Pentium)
- Any 1 ALU or FP 1 ALU 1 load 1 store 1
branch (Pentium II) - Impact of superscalar
- More opportunity for hazards (why?)
- More performance loss due to hazards (why?)
27Superscalar Processors
- Issues varying number of instructions per clock
- Scheduling Static (by the compiler) or
dynamic(by the hardware) - Superscalar has a varying number of
instructions/cycle (1 to 8), scheduled by
compiler or by HW (Tomasulo). - IBM PowerPC, Sun UltraSparc, DEC Alpha, HP 8000
28Elements of Advanced Superscalars
- High performance instruction fetching
- Good dynamic branch and jump prediction
- Multiple instructions per cycle, multiple
branches per cycle? - Scheduling and hazard elimination
- Dynamic scheduling
- Not necessarily Alpha 21064 Pentium were
statically scheduled - Register renaming to eliminate WAR and WAW
- Parallel functional units, paths/buses/multiple
register ports - High performance memory systems
- Speculative execution
29SS DS Speculation
- Superscalar Dynamic scheduling Speculation
- Three great tastes that taste great together
- CPI gt 1?
- Overcome with superscalar
- Superscalar increases hazards
- Overcome with dynamic scheduling
- RAW dependences still a problem?
- Overcome with a large window
- Branches a problem for filling large window?
- Overcome with speculation
30The Big Picture
issue
Static program
Fetch branch predict
execution
Reorder commit
31Readings
- New paper on branch prediction online. READ.
- Material would be used in the THIRD quiz