CS 7960-4 Lecture 7 - PowerPoint PPT Presentation

About This Presentation
Title:

CS 7960-4 Lecture 7

Description:

A single history register neighboring branches. have correlated results ... Pipeline predictions predict two branches ahead ... – PowerPoint PPT presentation

Number of Views:12
Avg rating:3.0/5.0
Slides: 17
Provided by: RajeevBala4
Category:
Tags: lecture | register

less

Transcript and Presenter's Notes

Title: CS 7960-4 Lecture 7


1
CS 7960-4 Lecture 7
Combining Branch Predictors Scott McFarling WRL
Tech. Report TN-36 1993
2
Bimodal Branch Prediction
  • Identifies most popular prediction in recent
    past
  • Updates happen during commit

1
0
PC
10-bit index
1024 entries
2-bit saturating counters
3
Results
  • SPEC89 programs simulated for 10M instrs
  • (modern studies use hard-to-predict programs)
  • A larger predictor reduces contention for
    counters
  • Prediction rates saturate at 93.5 (at 2K bytes)
  • (Fig.3)

4
Local Predictors
  • Two-Level predictor The first level has
    history,
  • the second level has saturating counters
  • History gets updated immediately

0
1
1
1
PC
1
0
10-bit index
16 entries
1024 entries
2-bit saturating counters
4-bit history table
5
Results
  • For small predictors, there could be contention
  • at both levels, resulting in inaccurate
    predictions
  • Will also take longer to warm up after every
  • context switch
  • Does very well for large predictors saturates
    at
  • 97.1

6
Global Predictors
  • A single history register neighboring branches
  • have correlated results
  • However, the PC is not used

1
0
1024 entries
10-bit global history
2-bit saturating counters
7
Do We Need PC?
  • Note that the global history reveals which
    branch
  • is being examined
  • Hence, it outdoes bimodal predictors when the
  • transistor budget is large (Fig.7)
  • Local predictor does better it is more
    important
  • to identify the PC and local history than
    behavior
  • of neighboring branches

8
Gselect
  • Use a combination of PC and global history
  • Bimodal and global prediction are special cases
  • (Fig.9)

1
0
n
PC
/
nm
/
/
1024 entries
m
5-bit global history
2-bit saturating counters
9
GShare
  • Xor-ing 10 history bits and 10 PC bits has more
  • info than the concatenation of 5 bits of each
    and
  • more info than each individual component

Branch Address Global History Gselect 4/4 Gshare 8/8
00000000 00000001 00000001 00000001
00000000 00000000 00000000 00000000
11111111 00000000 11110000 11111111
11111111 10000000 11110000 01111111
01111110
00000001
11100001
01111111
10
Terminology
  • GAG Global history indexes into global array
  • of saturating counters
  • PAG Per-address history indexes into global
    array
  • of saturating counters
  • GAP Global history indexes into each PCs
    private
  • array of counters (gselect)
  • PAP Per-address history indexes into each PCs
  • private array of counters

11
Trade-Offs
  • Some predictors warm-up faster than others
  • Some programs benefit from global history, some
  • from local history
  • Some programs have branches that interfere
  • with each other
  • Note that a 64KB local predictor has fewer
  • saturating counters than a 64KB bimodal
    predictor
  • the former wont be better for every program

12
Combining Predictors
  • Use an array of saturating counters to pick the
  • best available predictor for each PC

Predictor A
1
0
PC
1024 entries
Predictor B
2-bit saturating counters
13
Results
  • The combination of local and gshare increases
  • the prediction accuracy to 98.1 (Fig.16)
  • For smaller transistor budgets, the combination
  • of bimodal and gshare is better (gshare is
    twice
  • the size to make sure the total is a power of
    two)
  • A 1KB combined predictor does as well as a
  • 16KB gselect predictor

14
Future Work
  • Detect conflicts, correlations, and common
  • predictions through profiling/compiler analysis
  • Functions that compress information in history
  • or PC
  • Pipeline predictions predict two branches
    ahead
  • Hierarchical predictors get a quick prediction
    in
  • a cycle and a more accurate one two cycles later

15
Next Weeks Paper
  • Design Trade-Offs for the Alpha EV8 Conditional
  • Branch Predictor, Seznec et al., ISCA02

16
Title
  • Bullet
Write a Comment
User Comments (0)
About PowerShow.com