EE 382N Guest Lecture Wish Branches - PowerPoint PPT Presentation

About This Presentation
Title:

EE 382N Guest Lecture Wish Branches

Description:

... or normal branch code at run-time based on the confidence of branch prediction ... predicate data dependency and one extra instruction (-) X. Y. N. T. H. H. H ... – PowerPoint PPT presentation

Number of Views:51
Avg rating:3.0/5.0
Slides: 37
Provided by: hyeso3
Category:
Tags: 382n | branches | ee | guest | lecture | wish

less

Transcript and Presenter's Notes

Title: EE 382N Guest Lecture Wish Branches


1
EE 382N Guest LectureWish Branches

Hyesoon Kim HPS Research Group The University
of Texas at Austin
2
Lecture Outline
  • Predicated execution
  • Wish branches
  • 2D-profiling

3
Motivation
  • Branch predictors are still not perfect.
  • Deeper pipeline and larger instruction window
    increase the branch misprediction penalty.
  • Predicated execution can eliminate branch
    misprediction by converting control-dependency to
    data dependency. However, predicated code has
    overhead.

4
Predicated Execution
(predicated code)
A
p1 (cond) (!p1) mov b, 1 (p1) mov
b, 0
B
C
D
add x, b, 1
  • Convert control flow dependency to data
    dependency
  • Pro Eliminate hard-to-predict branches

Cons (1) Fetch blocks B and C all the time
(2) Wait until p1 is resolved
5
The Overhead of Predicated Execution
-2
13
16
non-predicated
p1 (cond) (!p1) mov b, 1 (p1) mov
b, 0
p1 (cond) (0) mov b,1 (1) mov
b,0
A
B
C
D
add x, b, 1
(Predicated code)
If all overhead is ideally eliminated, predicated
execution would provide 16 improvement in
average execution time
6
The Problem
  • Due to the predication overhead, predicated
    execution sometimes reduces performance
  • Branch misprediction characteristics are
    dependent on run-time behavior input set,
    control-flow path and phase behavior. The
    compiler cannot accurately estimate the run-time
    behavior of branches

7
Predicated Code Performance vs. Branch
Misprediction Rate
Predicated code performs better
run-time (input B)
profile-time (input A)
X
Normal branch code performs better
  • Converting a branch to predicated code could hurt
    performance if run-time misprediction rate is
    lower than profile-time misprediction rate
  • Execution time(normal branch code) exec_T
    P(T) exec_N P(N)
    misp_penalty P(misprediction)
  • Execution time of predicated code exec_pred

8
Lecture Outline
  • Predicated execution
  • Wish branches
  • 2D-profiling

9
Wish Branches Kim et al. Micro-38
  • A new type of control flow instruction
    3 types wish jump/join and wish loop
  • The compiler generates code (with wish branches)
    that can be executed either as predicated code or
    non-predicated code (normal branch code)
  • The hardware decides to execute predicated code
    or normal branch code at run-time based on the
    confidence of branch prediction
  • Easy to predict normal branch code
  • Hard to predict predicated code

10
Wish Jump/Join
High Confidence
Low Confidence
A
wish jump
nop
B
wish join
Taken
Not-Taken
C
D
A
p1(cond) wish.jump p1 TARGET
p1 (cond) branch p1, TARGET
B
nop
(!p1) mov b,1 wish.join !p1 Join
(1) mov b,1 wish.join (1) Join
C
TARGET (p1) mov b,0
TARGET (1) mov b,0
D
JOIN
wish jump/join code
11
Wish Loop
H
X
T
X
T
N
N
Low Confidence
High Confidence
Y
Y
H
mov p1, 1 LOOP (p1) add a,
a, 1 (p1) add i, i, 1 (p1) p1
(cond) wish. loop p1, LOOP EXIT
X
X
LOOP add a, a, 1 add i, i,
1 p1 (iltN) branch p1,
LOOP EXIT
(1) (1) (1)
Y
Y
wish loop code
normal backward branch code
12
Mispredicted Case 1 Early-Exit
H
X1
X2
X3
Y
H
Correct execution
T
T
N
X
T
Early-exit (Low confidence)
Flush pipeline
N
X1
X2
Y

H
T
N
Y
X3
Y
N
  • Compared to normal branch code
  • predicate data dependency and one extra
    instruction (-)

13
Mispredicted Case 2 Late-Exit
H
Correct execution
X1
X2
X3
Y
H
T
T
N
X
T
nop
nop
Late-exit (Low confidence)
N
X1
X2
X3
X4
X5
Y

H
T
T
T
T
N
Y
  • Compared to normal branch code
  • pro reduce flush penalty ()
  • cons predicate data dependency and one
    extra instruction (-)

14
Mispredicted Cases3 No-Exit
H
Correct execution
X1
X2
X3
Y
H
T
T
N
nop
nop
Late-exit
X
T
X1
X2
X3
X4
X5
Y

H
N
T
T
T
T
N
Flush pipeline
Y
No-exit
X1
X2
X3
X4
X5
X6

H
T
T
T
T
T
Y
  • No-Exit
  • predicate data dependency and one extra
    instruction (-)

15
Questions?
  • What kind of branches should be converted to wish
    branches (jump/join)?
  • Why not all branches?
  • What kind of branches should be converted to wish
    loops?

16
Advantages/Disadvantages of Wish Branches
  • Advantages compared to predicated execution
  • Reduce the overhead of predication
  • Increase the benefits of predicated code by
    allowing the compiler to generate more
    aggressively-predicated code
  • Provide a mechanism to exploit predication to
    reduce the branch misprediction penalty for
    backward branches (Wish loops)
  • Make predicated code less dependent on machine
    configuration (e.g. branch predictor)

17
Advantages/Disadvantages of Wish Branches
  • Disadvantages compared to predicated execution
  • Extra branch instructions use machine resources
  • Extra branch instructions increase the contention
    for branch predictor table entries
  • May constrain the compilers scope for code
    optimizations

18
Wish Branch Support
  • ISA Support
  • predicated execution, wish branch instruction
  • Compiler Support
  • Wish branch generation algorithms
  • The compiler needs to decide which branches are
    predicated, which are converted to wish branches,
    and which stay as normal branches
  • Hardware Support
  • Instruction decode logic
  • Predicate dependency elimination module
  • Confidence estimator
  • Front-end and branch misprediction
    detection/recovery module

19
ISA Support
  • Using existing hint bits (IA-64, x86, PowerPC)
  • Hint bits can be ignored. A wish branch can be
    treated as a normal branch.

OPCODE btype wtype target offset p
btye branch type (0normal branch
1wish branch) wtype wish branch type (0jump
1loop 2join) p predicate register identifier
20
Wish Branch Support
  • ISA Support
  • predicated execution, wish branch instruction
  • Compiler Support
  • Wish branch generation algorithms
  • The compiler needs to decide which branches are
    predicated, which are converted to wish branches,
    and which stay as normal branches
  • Hardware Support
  • Instruction decode logic
  • Predicate dependency elimination module
  • Confidence estimator
  • Front-end and branch misprediction
    detection/recovery module

21
Compiler Support
region formation
if-conversion
loop opt (swp, unrolling)
global inst. sched
register allocation
modified
local inst. sched
new
existing
  • Major phase ordering with wish branch generation
    in code generation ORC

22
Wish Branch Generation Algorithm
  • wish jump/join candidates all branch which are
    suitable for if-conversion
  • The number of instructions in the fall-through
    block gt N (N5) wish jump and join are inserted
  • All other branches converted to predicated code
  • A loop branch is converted into a wish loop when
    the loop body has fewer than L instructions (L30)

23
Wish Branch Support
  • ISA Support
  • predicated execution, wish branch instruction
  • Compiler Support
  • Wish branch generation algorithms
  • The compiler needs to decide which branches are
    predicated, which are converted to wish branches,
    and which stay as normal branches
  • Hardware Support
  • Instruction decode logic
  • Predicate dependency elimination module
  • Front-end and branch misprediction
    detection/recovery module
  • Confidence estimator

24
Hardware Support
  • Instruction Fetch/decode logic
  • Decoder decode wish branches
  • BTB mark wish branches
  • Wish branch state machine hardware
  • Wish loop stays as low-confidence mode until the
    loop exits
  • Predicate dependency elimination module
  • High-confidence mode predicate values are
    predicted
  • Branch misprediction detection/recovery module
  • No flush if wish branch is mispredicted during
    low-confidence mode
  • Confidence estimator

25
JRS Confidence Estimator
Estimate how much confidence the processor has in
a branch prediction Trained with branch
misprediction information
n bit Counters
m bits
PC

2m entries
High Confidence Low Confidence
Global BHR
  • Assigning Confidence to Conditional Branch
    Predictions
  • Jacobsen et al. Micro-29

26
Experimental Infrastructure
Source Code
IA-64 Binary
IA-64 Trace
µops
IA-64 Compiler (ORC)
Micro-op Translator
Micro-op Simulator
Trace generation module
  • IA-64 provides full support for predication
  • Convert IA-64 traces to micro-ops to simulate an
    out-of-order superscalar processor model

27
Simulation Methodology
  • Nine SPEC 2000 integer benchmarks
  • Baseline Processor Configuration
  • Front End
  • Large and accurate branch predictor (64KB
    hybrid branch predictor gshare local)
  • Minimum 30-cycle branch misprediction penalty
  • 64KB, 2-cycle latency I-cache
  • Execution Core
  • 8-wide out-of-order processor
  • 512-entry instruction window
  • Confidence Estimator
  • 1KB tagged 16-bit history JRS confidence
    estimator (Jacobsen et al. MICRO-29)

28
Performance Improvement
-4
14
2.02
8
24
non-predicated
16 over conditional branch prediction (w/o
mcf) 11 over selective-predication (w/o mcf) 7
over aggressive predication (w/o mcf)
14 over conditional branch prediction and 13
over selective-predication and 16 over
aggressive-predication
12 over conditional branch prediction 11 over
selective-predication 13 over aggressive
predication
SELECTIVE-PREDICATION branches are selectively
predicated using compile-time cost-benefit
analysis
AGGRESSIVE-PREDICATION all branches that are
suitable for if-conversion are predicated
29
Wish Branch Conclusion
  • New control flow instructions wish branches
    (jump/join/loop)
  • Wish branches improve performance by dividing the
    work of predication between the compiler and the
    microarchitecture
  • Compiler analyzes the control-flow graph and
    generates code
  • Microarchitecture makes run-time decision to use
    predication
  • Wish branches provide significant performance
    benefits
  • 16 compared to conditional branch prediction
  • 13 compared to selectively predicated code
  • Wish branches can make predicated execution more
    viable and effective in high performance
    processors
  • By enabling adaptive and aggressive predicated
    execution

30
Lecture Outline
  • Predicated execution
  • Wish branches
  • 2D-profiling

31
2D-profiling
  • Goal Identify input-dependent branches by using
    a single input set for profiling
  • If We Know a Branch is Input-Dependent
  • May not convert it to predicated code.
  • May convert it to a wish branch.
  • May not perform other compiler optimizations or
    may perform them less aggressively.
  • Hot-path/trace/superblock-based optimizations
  • Fisher81, Pettis90, Hwu93, Merten99

32
Key Insight of 2D-profiling
Phase behavior in prediction accuracy is a good
indicator of input dependence
phase 2
phase 3
phase 1
33
Traditional Profiling
pr. Acc
MEAN pr.Acc(brA)
pr. Acc
MEAN pr.Acc(brB)
MEAN pr.Acc(brA) ? MEAN pr.Acc(brB) behavior
of brA ? behavior of brB
34
2D-profiling
pr. Acc
MEAN pr.Acc(brA) STD pr.Acc(brA)
pr. Acc
MEAN pr.Acc(brB) STD pr.Acc(brB)
MEAN pr.Acc(brA) ? MEAN pr.Acc(brB) STD
pr.Acc(brA) ? STD pr.Acc(brB) behavior of brA ?
behavior of brB A input-dependent br, B
input-independent br
35
2D-profiling Mechanism
  • The profiler collects branch prediction accuracy
    information for every static branch over time

slice size M instructions
Slice 1
Slice 2
Slice N

time
mean Pr.Acc(brA,s1)
mean Pr.Acc(brA,s2)
mean Pr.Acc(brA,sN)
...
mean Pr.Acc(brB,s1)
mean Pr.Acc(brB,s2)
mean Pr.Acc(brB,sN)
...
. . .
. . .
. . .
PAM50
brA
mean brA
Calculate MEAN (brA, brB, ),
Standard deviation (brA, brB, ), PAMPoints
Above Mean (brA, brB, )
brB
PAM0
mean brB

36
2D-profiling Conclusion Future Work
  • 2D-profiling is a new profiling technique to find
    input-dependent characteristics by using a single
    input data set for profiling
  • 2D-profiling uses time-varying information
    instead of just average data
  • Phase behavior in prediction accuracy in a
    profile run ? input-dependent
  • Future Work
  • Better predicated code/wish branch generation
    algorithms
  • Detecting other input-dependent program
    characteristics
Write a Comment
User Comments (0)
About PowerShow.com