CSC%204250%20Computer%20Architectures - PowerPoint PPT Presentation

About This Presentation
Title:

CSC%204250%20Computer%20Architectures

Description:

Compute the penalty by looking at two events: the branch is predicted taken but ... Both carry a penalty of two cycles. Probability (branch in buffer, but ... – PowerPoint PPT presentation

Number of Views:35
Avg rating:3.0/5.0
Slides: 15
Provided by: stude6
Learn more at: http://www.cs.rpi.edu
Category:

less

Transcript and Presenter's Notes

Title: CSC%204250%20Computer%20Architectures


1
CSC 4250Computer Architectures
  • October 31, 2006
  • Chapter 3. Instruction-Level Parallelism
  • Its Dynamic Exploitation

2
Simple 5-Stage Pipeline
  • Branch prediction may not help 5-stage pipeline
  • IFIDEXMEWB
  • We decode branch instruction, test branch
    condition, and compute branch address during ID
  • No gain in predicting branch outcome in ID
  • How to speed up branch prediction?

3
How to Reduce Branch Penalty
  • 5-stage pipeline IFIDEXMEWB
  • Predict fetched instruction as a branch instr.
    - Decide that instr. just fetched is a branch
    during IF
  • Predict target instruction and fetch it next -
    No need to compute address for next instr.
  • Branch penalty becomes zero cycle if prediction
    is correct

4
Figure 3.19. A Branch Target Buffer
5
Figure 3.20. Steps to handle an instruction with
a branch-target buffer
6
Figure 3.21
  • Penalties, assuming that we store only taken
    branches in the buffer
  • If the branch is not correctly predicted, the
    penalty is equal to one clock cycle to update the
    buffer with the correct information (during which
    an instruction cannot be fetched) and one clock
    cycle to restart fetching the next correct
    instruction for the branch
  • If the branch is not found and taken, a two-cycle
    penalty is encountered, during which time the
    buffer is updated

Instruction in buffer Prediction Actual branch Penalty cycle
Yes Taken Taken 0
Yes Taken Not taken 2
No Taken 2
No Not taken 0
7
Example (p. 211)
  • Determine the total branch penalty for a
    branch-target buffer assuming the penalty cycles
    from Figure 3.21
  • The following assumptions are made
  • Prediction accuracy is 90 (for instructions in
    the buffer)
  • Hit rate in the buffer is 90 (for branches
    predicted taken)
  • Assume that 60 of the branches are taken

8
Answer (p. 211)
  • Compute the penalty by looking at two events the
    branch is predicted taken but ends up being not
    taken, and the branch is taken but is not found
    in the buffer. Both carry a penalty of two cycles
  • Probability (branch in buffer, but actually not
    taken)
  • Percent buffer hit rate Percent incorrect
    predictions
  • 90 10 0.09
  • Probability (branch not in buffer, but actually
    taken) 10
  • Branch penalty (0.09 0.10) 2 0.38

9
Comparison
  • Branch-Target Buffer (BTB) versus
  • Branch-Prediction Buffer (BPB)
  • Shape, size, and contents
  • Which stage in pipeline?
  • How to find an entry?
  • Placement of an entry
  • Replacement of an entry
  • With BTB, why need BPB?
  • Does BPB save any clock cycles?
  • If predicted NT, should branch instr. be kept in
    BTB?

10
Variation of Branch-Target Buffer (p. 211)
  • Store one or more target instructions instead of,
    or in addition to, the predicted target address
  • Two potential advantages
  • Allow the branch-target buffer access to take
    longer than the time between successive
    instruction fetches, possibly allowing a larger
    branch-target buffer
  • Allow us to perform an optimization called branch
    folding

11
Branch Folding (p. 213)
  • Use branch folding to obtain zero-cycle
    unconditional branches
  • Consider a branch-target buffer that buffers
    instructions from the predicted path and is being
    accessed with the address of an unconditional
    branch. The only function of the unconditional
    branch is to change the PC. Thus, when the
    branch-target buffer signals a hit and indicates
    that the branch is unconditional, the pipeline
    can simply substitute the instruction from the
    branch-target buffer in place of the instruction
    that is returned from the cache (which is the
    unconditional branch).

12
Integrated Instruction Fetch Unit
  • An instruction fetch unit that integrates several
    functions
  • Integrated branch prediction - the branch
    predictor becomes a part of the integrated unit
    and is constantly predicting branches, so as to
    drive the fetch pipeline
  • Instruction prefetch - the unit autonomously
    manages prefetching, integrating it with branch
    prediction
  • Instruction memory access and buffering
    prediction - the unit uses prefetching to hide
    the cost of crossing cache blocks it also
    provides buffering, to provide instructions to
    the issue stage as needed and in the quantity
    needed.

13
Return Address Predictor
  • Want to predict indirect jumps, i.e., jumps whose
    destination address varies at run time
  • Vast majority of indirect jumps come from
    procedure returns 85 for SPEC89
  • May predict procedure returns with a
    branch-target buffer. But accuracy will be low if
    procedure is called from multiple sites and the
    calls from one site are not clustered in time
  • What can we do?

14
Figure 3.22. Prediction accuracy for a return
address buffer operated as a stack
  • The accuracy is the fraction of return addresses
    predicted correctly. Since call depths are
    typically not large, a modest buffer works well.
Write a Comment
User Comments (0)
About PowerShow.com