Pipelining - Hazards - PowerPoint PPT Presentation

1 / 49

About This Presentation

Title:

Pipelining - Hazards

Description:

Pipelining - Hazards – PowerPoint PPT presentation

Number of Views:247

Avg rating:3.0/5.0

Slides: 50

Provided by: Ata124

Category:

more less

Transcript and Presenter's Notes

Title: Pipelining - Hazards

1
Pipelining - Hazards
2
Can Pipelining Get Us Into Trouble?

Yes Pipeline Hazards
Structural hazards attempt to use the same
resource two different ways at the same time
E.g., combined washer/dryer would be a structural
hazard or folder busy doing something else
(watching TV)
Control hazards attempt to make a decision
before condition is evaluated
E.g., washing football uniforms and need to get
proper detergent level need to see after dryer
before next load in
Branch instructions
Data hazards attempt to use item before it is
ready
E.g., one sock of pair in dryer and one in
washer cant fold until get sock from washer
through dryer
Instruction depends on result of prior
instruction still in the pipeline

3
Structural Hazard

A relation between two instructions indicating
that the two instructions may want to use the
same hardware resource (function unit, register
file port, shared bus, cache port, etc.) at the
same time
MIPS pipeline as designed so far does not have
structural hazard
But we had to avoid it
Usually occurs when a functional unit is not
fully pipelined (e.g., in floating point pipeline)

4
Single Memory Port / Structural Hazard
Time (clock cycles)
Cycle 1
Cycle 2
Cycle 3
Cycle 4
Cycle 6
Cycle 7
Cycle 5
I n s t r. O r d e r
Load
Instr 1
Instr 2
Instr 3
Instr 4
5
Single Memory Port / Structural Hazard
Time (clock cycles)
Cycle 1
Cycle 2
Cycle 3
Cycle 4
Cycle 6
Cycle 7
Cycle 5
I n s t r. O r d e r
Load
DMem
Instr 1
Instr 2
Stall
Instr 3
How do you bubble the pipe?
6
Single Memory Port / Structural Hazard

Instead of stalling the pipeline
Other solutions
Make dual ported memory
Physically separate memory architecture into
instruction and data (Harvard Architecture from
Harvard Mark I project of IBM led by Dr. Howard
Aiken)
Another typical structural hazard
Functional unit is not fully pipelined due to
cost/complexity
Pipeline interval gt 1 pipe stage

7
Example Cost of Structural Hazard
Suppose that 40 of instruction mix are loads or
stores, and that the ideal CPI of the pipelined
machine is 1. Assume that the machine with the
structural hazard has a clock rate that is 5
higher than the clock rate of the machine
without the hazard. Which pipeline is faster,
and by how much?
8
Data Hazards
9
Three Generic Data Hazards

True (or Flow) Dependency (Read After Write, or
RAW)
A later instruction tries to read operand before
earlier instructions write it

I add r1,r2,r3 J sub r4,r1,r3
10
RAW Hazards

True (value, flow) dependence between
instructions i and j means i produces a result
value that j uses
This is a producer-consumer relationship
This is a dependence based on values, not on the
names of the containers of the values
Every true dependence is a RAW hazard
Not every RAW hazard is a true dependence
Any RAW hazard that cannot be removed by renaming
is a true dependence

Original program 1 A BC 2 A DE 3 G AH
Renamed Program 1 X BC 2 A DE 3 G AH
True dependence (2,3) RAW hazard (2,3)
True dependence (2,3) RAW hazard (1,3), (2,3)
11
Three Generic Data Hazards

Anti-Dependency (Write After Read, or WAR)
A later instruction tries to write operand before
earlier instructions read it
This hazard results from reuse of the same
register
Cant happen in our simple 5 stage pipeline
because
All instructions take 5 stages, and
Reads are always in stage 2, and
Writes are always in stage 5

I add r2, r1,r3 J sub r1,r4,r3
12
Three Generic Data Hazards

Output Dependency (Write After Write, or WAW)
A later instruction tries to write operand before
earlier instructions write it
This hazard results from reuse of the same
register
Cant happen in our simple 5 stage pipeline
because
All instructions take 5 stages, and
Reads are always in stage 2, and
Writes are always in stage 5

I add r1,r2,r3 J sub r1,r4,r3
13
More on WAR and WAW

WAR and WAW hazards are name dependences
Two instructions happen to use the same register
(name), although they dont have to
Can often be eliminated by renaming, either in
software or hardware
Implies the use of additional resources, hence
additional cost
Renaming is not always possible implicit
operands such as accumulator, PC, or condition
codes cannot be renamed

14
How to Break the Dependency

Dependency reduces concurrency
Can we break
True dependency (RAW)
Name dependency or False dependency (WAR, WAW)

15
Software Solution

Have compiler guarantee no hazards
Where do we insert the nops ? sub 2, 1,
3 and 12, 2, 5 or 13, 6, 2 add 14,
2, 2 sw 15, 100(2)
Problem this really slows us down!

16
Hardware Solution Forwarding
Time (clock cycles)
add r1,r2,r3
I n s t r O r d e r
sub r4,r1,r3
and r6,r1,r7
or r8,r1,r9
xor r10,r1,r11
17
Forwarding (simplified)
ID/EX
EX/MEM
MEM/WB
Register File
Data Memory
ALU
MUX
18
Forwarding Unit
1. Forwarding between ALUOut and ALUMuxA sub
2, 1, 3 and 12, 2, 5
EX/MEM.RegisterRd ID/EX.RegisterRs 2 gt
Use EX/MEM.ALUOut instead of ID/EX.A a.
Some instructions do not write registers
b. Every use of 0 as an operand must yield an
operand value of zero

If ( EX/MEM.RegWrite (EX/MEM.RegisterRd ?
0) (EX/MEM.RegisterRd ID/EX.RegisterRs)
) ForwardA 01
19
Forwarding Unit
2. Forwarding between ALUOut and ALUMuxB sub
2, 1, 3 and 12,5, 2
EX/MEM.RegisterRd ID/EX.RegisterRt 2 gt
Use EX/MEM.ALUOut instead of ID/EX.B

If ( EX/MEM.RegWrite (EX/MEM.RegisterRd ?
0) (EX/MEM.RegisterRd ID/EX.RegisterRt)
) ForwardB 01
20
Forwarding (from EX/MEM)
ID/EX
EX/MEM
MEM/WB
Register File
ALU
Data Memory
MUX
21
Forwarding Unit
3. Forwarding between ALUOut and ALUMuxA sub
2, 1, 3 and 12, 2, 5 or 13, 2,
6 MEM/WB.RegisterRd MEM/WB.RegisterRs 2
gt Use MEM/WB.ALUOut instead of ID/EX.A
If ( MEM/WB.RegWrite (MEM/WB.RegisterRd ?
0) (MEM/WB.RegisterRd ID/EX.RegisterRs)
) ForwardA 10
22
Forwarding Unit
4. Forwarding between ALUOut and ALUMuxB sub
2, 1, 3 and 12, 2, 5 or 13, 6,
2 MEM/WB.RegisterRd MEM/WB.RegisterRt 2
gt Use MEM/WB.ALUOut instead of ID/EX.B

If ( MEM/WB.RegWrite (MEM/WB.RegisterRd ?
0) (MEM/WB.RegisterRd ID/EX.RegisterRt)
) ForwardB 10
23
Forwarding (from MEM/WB)
ID/EX
EX/MEM
MEM/WB
Register File
ALU
Data Memory
MUX
24
Forwarding (operand selection)
ID/EX
EX/MEM
MEM/WB
Register File
ALU
Data Memory
MUX
Forwarding Unit
25
Forwarding (operand propagation)
ID/EX
EX/MEM
MEM/WB
Register File
ALU
Data Memory
MUX
Rd
Rt
EX/MEM Rd
Forwarding Unit
Rt
Rs
MEM/WB Rd
26
Forwarding
27
Datapath with Forwarding Unit
28
Forwarding Unit
add 1, 1, 2 add 1, 1, 3 add 1,
1, 4

If ( MEM/WB.RegWrite (MEM/WB.RegisterRd ?
0) (EX/MEM.RegisterRd ? ID/EX.RegisterRs)
(MEM/WB.RegisterRd ID/EX.RegisterRs) )
ForwardA 10
If ( MEM/WB.RegWrite (MEM/WB.RegisterRd ?
0) (EX/MEM.RegisterRd ? ID/EX.RegisterRt)
(MEM/WB.RegisterRd ID/EX.RegisterRt) )
ForwardB 10
29
Some Other Data Dependencies

lw 1, 0(2) F D X M W
sw 1, 0(7) F D X M W
sw 1, 0(8) F D X M W
sw 1, 0(9) F D X
M W

30
Can't always forward

Load word can still cause a hazard

Time (clock cycles)
I n s t r. O r d e r
31
Data Hazard Even with Forwarding
Time (clock cycles)
I n s t r. O r d e r
lw r1, 0(r2)
NO ISSUE
sub r4,r1,r6
and r6,r1,r7
Bubble
ALU
DMem
or r8,r1,r9
Thus, we need a hazard detection unit to stall
the load instruction
32
Stalling

Hazard detection unit

If ( ID/EX.MemRead ((ID/EX.RegisterRt
IF/ID.RegisterRs) (ID/EX.RegisterRt
IF/ID.RegisterRt) )) stall the pipeline

When the pipeline is stalled
Do not fetch a new instruction Prevent PC and
IF/ID registers from changing
Create a buble in the pipeline Set all control
signals to 0 to create a do nothing instruction

33
Hazard Detection Unit
34
Code rescheduling to Avoid Load Hazards
Try producing fast code for a b c d e
f assuming a, b, c, d ,e, and f in memory.
Slow code LW Rb,b LW Rc,c ADD
Ra,Rb,Rc SW a,Ra LW Re,e LW
Rf,f SUB Rd,Re,Rf SW d,Rd

Fast code
LW Rb,b
LW Rc,c
LW Re,e
ADD Ra,Rb,Rc
LW Rf,f
SW a,Ra
SUB Rd,Re,Rf
SW d,Rd

Compiler optimizes for performance. Hardware
checks for safety.
35
Branch in the Pipelined Datapath
Computes branch target address
Computes branch outcome
Changes PC
36
Branch (Control) Hazards

When we decide to branch, other instructions are
in
the pipeline!

37
Solving Branch Hazards

Stall the pipeline until the branch is complete
Brach is detected in ID stage
Pipeline is stalled
Pipeline is started in IF stage
Next instruction
Branch target
Three clock cycles will be lost for each branch
!!!

38
Reducing Taken Branch Penalty

Compute branch target address earlier
Compute branch outcome earlier

39
Reducing Taken Branch Penalty

Branch is completed in ID stage
If branch is taken, flush the pipeline
1 cycle loss for a taken branch

Taken branch F D X M W
Branch 1 F FL FL FL FL
Branch target F D X M W
BT 1 F D X M W
40
Flushing the Instruction After Branch
41
Predictnot-Taken (Predict-Untaken)

Continue execution after the branch
If branch is not taken, no penalty
If branch is taken, flush the pipeline and loss
of 1
clock cycles

What about Predict-Taken?

42
Delayed Branches

Execution cycle with a branch delay of length n
branch instruction sequential
successor1 sequential successor2 ........ seque
ntial successorn
branch target if taken
Instructions in the branch delay slot are
executed irrespective of branch outcome

Branch delay of length n
43
Delayed Branches on MIPS

One branch delay slot on MIPS
Taken and untaken branch behaviour are similar
Compiler must fill in the branch delay slot with
useful instructions

44
Delayed Branches

Question What instruction do we put in the
branch delay slot?
Fill with NOP (always possible)
Fill from before (not always possible)
Fill from target (not always possible)
Fill from fall-through (not always possible)

45
Filling Branch Delay Slot
Make sure R7 will not be used in taken path
before redefined
46
Filling Branch Delay Slot
47
Cancelling Branches

Improves the ability of the compiler to fill in
delay slots
Instruction includes a bit showing its predicted
direction
When branch behaves as predicted, instruction in
the delay slot is executed
When branch is incorrectly predicted, instruction
in the delay slot is turned to NOP

48
Predict-Taken Cancelling Branch
49
Summary Pipelining