Pipeline Control, Data Hazards and Branch Hazards - PowerPoint PPT Presentation

About This Presentation

Title:

Pipeline Control, Data Hazards and Branch Hazards

Description:

Single-cycle model (non-overlapping) The instruction ... solutions: forwarding/bypassing, stall/bubble. I. n. s. t. r. u. c. t. i. o. n. m. e. m. o. r. y. R ... – PowerPoint PPT presentation

Number of Views:3326

Avg rating:3.0/5.0

Slides: 46

Provided by: francis55

Learn more at: http://bear.ces.cwru.edu

Category:

more less

Transcript and Presenter's Notes

Title: Pipeline Control, Data Hazards and Branch Hazards

1
Pipeline Control, Data Hazardsand Branch
Hazards
EECS 322 Computer Architecture
Instructor Francis G. Wolff wolff_at_eecs.cwru.edu
Case Western Reserve University This
presentation uses powerpoint animation please
viewshow
2
Models
Single-cycle model (non-overlapping) The
instruction latency executes in a single cycle
Every instruction and clock-cycle must
be stretched to the slowest instruction (p.438)
Multi-cycle model (non-overlapping) The
instruction latency executes in multiple-cycles
The clock-cycle must be stretched to the
slowest step Ability to share functional units
within the execution of a single instruction
Pipeline model (overlapping, p. 522) The
instruction latency executes in multiple-cycles
The clock-cycle must be stretched to the
slowest step The throughput is mainly one
clock-cycle/instruction Gains efficiency by
overlapping the execution of multiple instruction
s, increasing hardware utilization. (p. 377)
3
Recap Can pipelining get us into trouble?

Yes Pipeline Hazards
structural hazards attempt to use the same
resource two different ways at the same time
e.g., multiple memory accesses, multiple register
writes
solutions
multiple memories (separate instruction data
memory)
stretch pipeline
control hazards attempt to make a decision
before condition is evaulated
e.g., any conditional branch
solutions prediction, delayed branch
data hazards attempt to use item before it is
ready
e.g., add r1,r2,r3 sub r4, r1 ,r5 lw r6, 0(r7)
or r8, r6 ,r9
solutions forwarding/bypassing, stall/bubble

4
Review Single-Cycle Datapath
And
M
A
d
d
u
x
Add Result
4
Branch
RegWrite
S
h
i
f
t
l
e
f
t

2
MemWrite
MemRead
ALUctl
RegDst
3
R
e
a
d
ALUSrc
MemtoReg
R
e
a
d
r
e
g
i
s
t
e
r

1
P
C
R
e
a
d
a
d
d
r
e
s
s
R
e
a
d
d
a
t
a

1
Z
e
r
o
r
e
g
i
s
t
e
r

2
A
L
U
A
L
U
R
e
a
d
W
r
i
t
e
R
e
a
d
M
A
d
d
r
e
s
s
r
e
s
u
l
t
M
u
r
e
g
i
s
t
e
r
d
a
t
a
d
a
t
a

2
M
u
I
n
s
t
r
u
c
t
i
o
n
x
u
x
W
r
i
t
e
m
e
m
o
r
y
D
a
t
a
x
d
a
t
a
m
e
m
o
r
y
W
r
i
t
e
d
a
t
a
3
2
1
6
S
i
g
n
e
x
t
e
n
d
5
Review Multi vs. Single-cycle Processor Datapath
Combine adders add 1½ Mux 3 temp. registers,
A, B, ALUOut
Combine Memory add 1 Mux 2 temp. registers,
IR, MDR
I
o
r
D
M
e
m
R
e
a
d
M
e
m
W
r
i
t
e
R
e
g
D
s
t
R
e
g
W
r
i
t
e
A
L
U
S
r
c
A
I
R
W
r
i
t
e
P
C
0
0
R
e
a
d
I
n
s
t
r
u
c
t
i
o
n
M
M
r
e
g
i
s
t
e
r

1

2
5

2
1

A
d
d
r
e
s
s
u
u
x
R
e
a
d
x
A
R
e
a
d
I
n
s
t
r
u
c
t
i
o
n
Z
e
r
o
d
a
t
a

1
M
e
m
o
r
y
1
1
r
e
g
i
s
t
e
r

2

2
0

1
6

A
L
U
A
L
U
0
A
L
U
O
u
t
M
e
m
D
a
t
a
R
e
g
i
s
t
e
r
s
r
e
s
u
l
t
W
r
i
t
e
I
n
s
t
r
u
c
t
i
o
n
R
e
a
d
M
B
r
e
g
i
s
t
e
r

1
5

0

d
a
t
a

2
0
u
I
n
s
t
r
u
c
t
i
o
n
W
r
i
t
e
x
M

1
5

1
1

4
I
n
s
t
r
u
c
t
i
o
n
1
W
r
i
t
e
d
a
t
a
1
u
r
e
g
i
s
t
e
r
d
a
t
a
2
x
0
I
n
s
t
r
u
c
t
i
o
n
3

1
5

0

M
u
x
1
M
e
m
o
r
y
3
2
1
6
d
a
t
a
A
L
U
S
h
i
f
t
S
i
g
n
r
e
g
i
s
t
e
r
c
o
n
t
r
o
l
l
e
f
t

2
e
x
t
e
n
d
I
n
s
t
r
u
c
t
i
o
n

5

0

Single-cycle 1 ALU 2 Mem 4 Muxes 2 adders
OpcodeDecoders
Multi-cycle 1 ALU 1 Mem 5½ Muxes 5 Reg
(IR,A,B,MDR,ALUOut) FSM
6
Multi-cycle Processor Datapath
Single-cycle 1 ALU 2 Mem 4 Muxes 2 adders
OpcodeDecoders
Multi-cycle 1 ALU 1 Mem 5½ Muxes 5 Reg
(IR,A,B,MDR,ALUOut) FSM
I
o
r
D
M
e
m
R
e
a
d
M
e
m
W
r
i
t
e
R
e
g
D
s
t
R
e
g
W
r
i
t
e
A
L
U
S
r
c
A
I
R
W
r
i
t
e
P
C
0
0
R
e
a
d
I
n
s
t
r
u
c
t
i
o
n
M
M
r
e
g
i
s
t
e
r

1

2
5

2
1

A
d
d
r
e
s
s
u
u
x
R
e
a
d
x
A
R
e
a
d
I
n
s
t
r
u
c
t
i
o
n
Z
e
r
o
d
a
t
a

1
M
e
m
o
r
y
1
1
r
e
g
i
s
t
e
r

2

2
0

1
6

A
L
U
A
L
U
0
A
L
U
O
u
t
M
e
m
D
a
t
a
R
e
g
i
s
t
e
r
s
r
e
s
u
l
t
W
r
i
t
e
I
n
s
t
r
u
c
t
i
o
n
R
e
a
d
M
B
r
e
g
i
s
t
e
r

1
5

0

d
a
t
a

2
0
u
I
n
s
t
r
u
c
t
i
o
n
W
r
i
t
e
x
M

1
5

1
1

4
I
n
s
t
r
u
c
t
i
o
n
1
W
r
i
t
e
d
a
t
a
1
u
r
e
g
i
s
t
e
r
d
a
t
a
2
x
0
I
n
s
t
r
u
c
t
i
o
n
3

1
5

0

M
u
x
1
M
e
m
o
r
y
3
2
1
6
d
a
t
a
A
L
U
S
h
i
f
t
S
i
g
n
r
e
g
i
s
t
e
r
c
o
n
t
r
o
l
l
e
f
t

2
e
x
t
e
n
d
I
n
s
t
r
u
c
t
i
o
n

5

0

5x32 160 additional FFs for multi-cycle
processor over single-cycle processor
7
Figure 6.25
2 W3 M4 EX
2 W3 M
PC 32 bits
PC 32
2 W
PC32
M D R 32
Z 1
A32
PC 32 bits
IR 32 bits
B32
ALUOut32
ALUOut32
Datapath Registers
Si32
B32
160 FFs
D5
RT5
D5
213 FFs
RD5
16 FFs
21316 229 additional FFs for pipeline over
multi-cycle processor
8
Overhead
Single-cycle model 8 ns Clock (125 MHz),
(non-overlapping) 1 ALU 2 adders 0 Muxes
0 Datapath Register bits (Flip-Flops)
Chip Area
Speed
Multi-cycle model 2 ns Clock (500 MHz),
(non-overlapping) 1 ALU Controller 5
Muxes 160 Datapath Register bits (Flip-Flops)
Pipeline model 2 ns Clock (500 MHz),
(overlapping) 2 ALU Controller 4 Muxes
373 Datapath 16 Controlpath Register bits
(Flip-Flops)
9
Pipeline Control Controlpath Register bits
9 control bits
5 control bits
2 control bits
Figure 6.29
10
Pipeline Control Controlpath table
Figure 5.20, Single Cycle
Instruction
RegDst
ALUSrc
MemReg
RegWrt
MemRed
MemWrt
Bra-nch
ALUop1
ALUop0
R-format
1
0
0
1
0
0
0
1
0
lw
1
1
1
1
1
0
0
0
0
sw
X
1
X
0
0
1
0
0
0
beq
X
0
X
0
0
0
1
0
1
Figure 6.28
ID / EXcontrol lines
EX / MEMcontrol lines
MEM / WBcntrl lines
Instruction
RegDst
ALUOp1
ALUOp0
ALUSrc
Bra-nch
MemRed
MemWrt
RegWrt
MemReg
R-format
1
1
0
0
0
0
0
1
0
lw
1
0
0
1
0
1
0
1
1
sw
X
0
0
1
0
0
1
0
X
beq
X
0
1
0
1
0
0
0
X
11
Pipeline Hazards
Pipeline hazards Solution 1 always works (for
non-realtime) applications stall, delay
procrastinate!
Structural Hazards (i.e. fetching same memory
bank) Solution 2 partition architecture
Control Hazards (i.e. branching) Solution 1
stall! but decreases throughput Solution 2
guess and back-track Solution 3 delayed
decision delay branch fill slot
Data Hazards (i.e. register dependencies)
Worst case situation Solution 2 re-order
instructions Solution 3 forwarding or
bypassing delayed load
12
Pipeline Datapath and Controlpath
Figure 6.30
13
load inst.
Figure 6.30
14
load inst.
Figure 6.30
15
Pipeline single stepping
Contents of Register 1 C1 3 C24 C34
C46 C57 C108 Memory239 Formats ad
d rd,rsA,rtB lw rtB,_at_(rsA)
Clock ltIF/IDgt ltID/EXgt ltEX/MEMgt
ltMEM/WBgt ltPC, IRgt ltPC, A, B, S, Rt, Rdgt
ltPC, Z, ALU, B, Rgt ltMDR, ALU,
Rgt 0 lt0,?gt lt?,?,?,?,?,?gt lt?,?,?,?,?gt lt?,?,?gt
1 lt4,lw 10,20(1)gt lt0,?,?,?,?,?gt
lt?,?,?,?,?gt lt?,?,?gt
2 lt8,sub 11,2,3gt lt4,C1?3,C10?8,20,10,0gt
lt0,?,?,?,?gt lt?,?,?gt
3 lt12,and 12,4,5gt lt8,C2?4,C3?4,X,3,11gt
lt420ltlt2?84,0,203?23,8,10gtlt?,?,?gt
4 lt16,or 13,6,7gt lt12,C4?6,C5?7,X,5,12gtltX,1,
4-40,4,11gt ltMem23?9,23,10gt
5 lt20,add 14,8,9gt lt16,C6 ,C7,X,7,13gt
ltX,0,1,7,12gt ltX,0,11gt
16
Clock 1 Figure 6.31a
PC4
PC0
IRlw 10,20(1)
17
C
PC4
AC1
BX
PC4
S20
T10
D0
Figure 6.31b
18
C
PC420ltlt2
PC8
ALU20C1
D10
Figure 6.32a
19
Clock 4 Figure 6.32b
PC20
20
Data Dependencies that can be resolved by
forwarding
Figure 6.36
21
Data Hazards arithmetic
Figure 6.37
22
Data Dependencies no forwarding
sub 2,1,3
and 12,2,5
Suppose every instruction is dependant 1 2
stalls 3 clocks
MIPS Clock 500 Mhz 167 MIPS
CPI 3
23
Data Dependencies no forwarding
A dependant instruction will take 1 2 stalls
3 clocks
An independent instruction will take 1 0
stalls 1 clocks
Suppose 10 of the time the instructions are
dependant?
Averge instruction time 103 901 0.103
0.901 1.2 clocks
MIPS Clock 500 Mhz 417 MIPS (10
dependency) CPI 1.2
MIPS Clock 500 Mhz 167 MIPS (100
dependency) CPI 3
MIPS Clock 500 Mhz 500 MIPS (0
dependency) CPI 1
24
Data Dependencies with forwarding
sub 2,1,3
and 12,2,5
DetectedData Hazard 1a ID/EX.rs EX/M.rd
Suppose every instruction is dependant 1 0
stalls 1 clock
MIPS Clock 500 Mhz 500 MIPS
CPI 1
25
Data Dependencies Hazard Conditions
Data Hazard Condition occurs whenever a data
source needs a previous unavailable result due
to a data destination.
Data Hazard Detection is always comparing a
destination with a source.
26
Data Dependencies Hazard Conditions
27
Data Dependencies Worst case
Data Hazard sub 2, 1, 3 sub
rd, rs, rt and 12, 2, 2 and
rd, rs, rt or 13, 2, 2 and
rd, rs, rt
28
Data Dependencies Hazard Conditions
Hazard Type
Source
Destination

ID/EX.rsID/EX.rt
1a.1b.
EX/MEM.rdest
Pipeline Registers
ID/EX
EX/MEM
rs
rt
rd
rd
29
Figure 6.38
30
Data Hazards Loads
Figure 6.44
31
Data Hazards load stalling
Figure 6.45
32
Data Hazards Hazard detection unit (page 490)
Stall Condition
Source
Destination

IF/ID.rsIF/ID.rt
ID/EX.rt ? ID/EX.MemRead1
No Stall Example (only need to look at next
instruction) lw 2, 20(1) lw rt,
addr(rs) and 4, 1, 5 and rd,
rs, rt or 8, 2, 6 or rd,
rs, rt
33
Data Hazards Hazard detection unit (page 490)
No Stall Example (only need to look at next
instruction) lw 2, 20(1) lw rt,
addr(rs) and 4, 1, 5 and rd,
rs, rt or 8, 2, 6 or rd,
rs, rt
Exampleload assume half of the instructions are
immediately followed by an instruction that uses
it.
What is the average number of clocks for the load?
load instruction time 50(1 clock) 50(2
clocks)1.5
34
Hazard Detection Unit when to stall
Figure 6.46
35
Data Dependency Units
36
Data Dependency Units
Pipeline Registers
Forwarding Comparisons
Stalling Comparisons
ID/EX
IF/ID
rs
rs
rt
rt
rd
rd
37
Branch Hazards Soln 1, Stall until Decision
made (fig. 6.4)
Decision made in ID stage do load
Stall
38
Branch Hazards Soln 2, Predict until Decision
made
8
Clock
1
6
7
2
5
3
4
WB
beq 1,3,7
IF
ID
EX
M
Predict false branch
and 12, 2, 5
WB
EX
M
IF
ID
discard and 12,2,5 instruction
lw 4, 50(7)
WB
EX
M
IF
ID
Decision made in ID stage discard branch
39
Branch Hazards Soln 3, Delayed Decision
8
Clock
1
6
7
2
5
3
4
WB
beq 1,3,7
IF
ID
EX
M
Move instruction before branch
add 4,6,6
WB
EX
M
IF
ID
Do not need to discard instruction
lw 4, 50(7)
WB
EX
M
IF
ID
Decision made in ID stage branch
40
Branch Hazards Soln 3, Delayed Decision
8
Clock
1
6
7
2
5
3
4
WB
beq 1,3,7
IF
ID
EX
M
and 12, 2, 5
WB
EX
M
IF
ID
Decision made in ID stage do branch
lw 4, 50(7)
WB
EX
M
IF
ID
41
Branch Hazards Decision made in the ID stage
(figure 6.4)
8
Clock
1
6
7
2
5
3
4
WB
beq 1,3,7
IF
ID
EX
M
nop
WB
EX
M
IF
ID
No decision yet insert a nop
Decision do load
lw 4, 50(7)
WB
EX
M
IF
ID
42
Branch Hazards Soln 2, Predict until Decision
made
Branch Decision made in MEM stage Discard values
when wrong prediction
Predict false branch
Same effect as 3 stalls
Figure 6.50
43
Figure 6.51
Early branch comparison
Flush if wrong prediciton, add nops
44
Performance
load assume half of the instructions are
immediately followed by an instruction that uses
it (i.e. data dependency) load instruction time
50(1 clock) 50(2 clocks)1.5
Jump assume that jumps always pay 1 full clock
cycle delay (stall). Jump instruction time 2
Branch the branch delay of misprediction is 1
clock cycle that 25 of the branches are
mispredicted. branch time 75(1 clocks)
25(2 clocks) 1.25
45
Performance, page 504
Also known as the instruction latency with in a
pipeline
Pipeline throughput
Instruction
PipelineCycles
InstructionMix
Single-Cycle
Multi-CycleClocks
loads
1.5(50 dependancy)
23
1
5
stores
1
13
1
4
arithmetic
1
43
1
4
branches
1.25(25 dependancy)
19
1
3
jumps
2
2
1
3
Clockspeed
500 Mhz2 ns
125 Mhz8 ns
500 Mhz2 ns
CPI
1.18
1
4.02
? CyclesMix
MIPS
424 MIPS
Clock/CPI
125 MIPS
125 MIPS
load instruction time 50(1 clock) 50(2
clocks)1.5
branch time 75(1 clocks) 25(2 clocks)
1.25

Write a Comment

User Comments (0)