CSC%204250%20Computer%20Architectures - PowerPoint PPT Presentation

About This Presentation

Title:

CSC%204250%20Computer%20Architectures

Description:

Number of Views:35

Avg rating:3.0/5.0

Slides: 15

Provided by: stude6

Learn more at: http://www.cs.rpi.edu

Category:

Tags: 20architectures | 20computer | csc | penalty

Transcript and Presenter's Notes

Title: CSC%204250%20Computer%20Architectures

1
CSC 4250Computer Architectures

2
Simple 5-Stage Pipeline

Branch prediction may not help 5-stage pipeline
IFIDEXMEWB
We decode branch instruction, test branch
condition, and compute branch address during ID
No gain in predicting branch outcome in ID
How to speed up branch prediction?

3
How to Reduce Branch Penalty

5-stage pipeline IFIDEXMEWB
Predict fetched instruction as a branch instr.
- Decide that instr. just fetched is a branch
during IF
Predict target instruction and fetch it next -
No need to compute address for next instr.
Branch penalty becomes zero cycle if prediction
is correct

4
Figure 3.19. A Branch Target Buffer
5
Figure 3.20. Steps to handle an instruction with
a branch-target buffer
6
Figure 3.21

Penalties, assuming that we store only taken
branches in the buffer
If the branch is not correctly predicted, the
penalty is equal to one clock cycle to update the
buffer with the correct information (during which
an instruction cannot be fetched) and one clock
cycle to restart fetching the next correct
instruction for the branch
If the branch is not found and taken, a two-cycle
penalty is encountered, during which time the
buffer is updated

Instruction in buffer Prediction Actual branch Penalty cycle
Yes Taken Taken 0
Yes Taken Not taken 2
No Taken 2
No Not taken 0
7
Example (p. 211)

Determine the total branch penalty for a
branch-target buffer assuming the penalty cycles
from Figure 3.21
The following assumptions are made
Prediction accuracy is 90 (for instructions in
the buffer)
Hit rate in the buffer is 90 (for branches
predicted taken)
Assume that 60 of the branches are taken

8
Answer (p. 211)

Compute the penalty by looking at two events the
branch is predicted taken but ends up being not
taken, and the branch is taken but is not found
in the buffer. Both carry a penalty of two cycles
Probability (branch in buffer, but actually not
taken)
Percent buffer hit rate Percent incorrect
predictions
90 10 0.09
Probability (branch not in buffer, but actually
taken) 10
Branch penalty (0.09 0.10) 2 0.38

9
Comparison

10
Variation of Branch-Target Buffer (p. 211)

Store one or more target instructions instead of,
or in addition to, the predicted target address
Two potential advantages
Allow the branch-target buffer access to take
longer than the time between successive
instruction fetches, possibly allowing a larger
branch-target buffer
Allow us to perform an optimization called branch
folding

11
Branch Folding (p. 213)

12
Integrated Instruction Fetch Unit

An instruction fetch unit that integrates several
functions
Integrated branch prediction - the branch
predictor becomes a part of the integrated unit
and is constantly predicting branches, so as to
drive the fetch pipeline
Instruction prefetch - the unit autonomously
manages prefetching, integrating it with branch
prediction
Instruction memory access and buffering
prediction - the unit uses prefetching to hide
the cost of crossing cache blocks it also
provides buffering, to provide instructions to
the issue stage as needed and in the quantity
needed.

13
Return Address Predictor

Want to predict indirect jumps, i.e., jumps whose
destination address varies at run time
Vast majority of indirect jumps come from
procedure returns 85 for SPEC89
May predict procedure returns with a
branch-target buffer. But accuracy will be low if
procedure is called from multiple sites and the
calls from one site are not clustered in time
What can we do?

14
Figure 3.22. Prediction accuracy for a return
address buffer operated as a stack