Pipelining - Hazards - PowerPoint PPT Presentation

1 / 49
About This Presentation
Title:

Pipelining - Hazards

Description:

Pipelining - Hazards – PowerPoint PPT presentation

Number of Views:242
Avg rating:3.0/5.0
Slides: 50
Provided by: Ata124
Category:

less

Transcript and Presenter's Notes

Title: Pipelining - Hazards


1
Pipelining - Hazards
2
Can Pipelining Get Us Into Trouble?
  • Yes Pipeline Hazards
  • Structural hazards attempt to use the same
    resource two different ways at the same time
  • E.g., combined washer/dryer would be a structural
    hazard or folder busy doing something else
    (watching TV)
  • Control hazards attempt to make a decision
    before condition is evaluated
  • E.g., washing football uniforms and need to get
    proper detergent level need to see after dryer
    before next load in
  • Branch instructions
  • Data hazards attempt to use item before it is
    ready
  • E.g., one sock of pair in dryer and one in
    washer cant fold until get sock from washer
    through dryer
  • Instruction depends on result of prior
    instruction still in the pipeline

3
Structural Hazard
  • A relation between two instructions indicating
    that the two instructions may want to use the
    same hardware resource (function unit, register
    file port, shared bus, cache port, etc.) at the
    same time
  • MIPS pipeline as designed so far does not have
    structural hazard
  • But we had to avoid it
  • Usually occurs when a functional unit is not
    fully pipelined (e.g., in floating point pipeline)

4
Single Memory Port / Structural Hazard
Time (clock cycles)
Cycle 1
Cycle 2
Cycle 3
Cycle 4
Cycle 6
Cycle 7
Cycle 5
I n s t r. O r d e r
Load
Instr 1
Instr 2
Instr 3
Instr 4
5
Single Memory Port / Structural Hazard
Time (clock cycles)
Cycle 1
Cycle 2
Cycle 3
Cycle 4
Cycle 6
Cycle 7
Cycle 5
I n s t r. O r d e r
Load
DMem
Instr 1
Instr 2
Stall
Instr 3
How do you bubble the pipe?
6
Single Memory Port / Structural Hazard
  • Instead of stalling the pipeline
  • Other solutions
  • Make dual ported memory
  • Physically separate memory architecture into
    instruction and data (Harvard Architecture from
    Harvard Mark I project of IBM led by Dr. Howard
    Aiken)
  • Another typical structural hazard
  • Functional unit is not fully pipelined due to
    cost/complexity
  • Pipeline interval gt 1 pipe stage

7
Example Cost of Structural Hazard
Suppose that 40 of instruction mix are loads or
stores, and that the ideal CPI of the pipelined
machine is 1. Assume that the machine with the
structural hazard has a clock rate that is 5
higher than the clock rate of the machine
without the hazard. Which pipeline is faster,
and by how much?
8
Data Hazards
9
Three Generic Data Hazards
  • True (or Flow) Dependency (Read After Write, or
    RAW)
  • A later instruction tries to read operand before
    earlier instructions write it

I add r1,r2,r3 J sub r4,r1,r3
10
RAW Hazards
  • True (value, flow) dependence between
    instructions i and j means i produces a result
    value that j uses
  • This is a producer-consumer relationship
  • This is a dependence based on values, not on the
    names of the containers of the values
  • Every true dependence is a RAW hazard
  • Not every RAW hazard is a true dependence
  • Any RAW hazard that cannot be removed by renaming
    is a true dependence

Original program 1 A BC 2 A DE 3 G AH
Renamed Program 1 X BC 2 A DE 3 G AH
True dependence (2,3) RAW hazard (2,3)
True dependence (2,3) RAW hazard (1,3), (2,3)
11
Three Generic Data Hazards
  • Anti-Dependency (Write After Read, or WAR)
  • A later instruction tries to write operand before
    earlier instructions read it
  • This hazard results from reuse of the same
    register
  • Cant happen in our simple 5 stage pipeline
    because
  • All instructions take 5 stages, and
  • Reads are always in stage 2, and
  • Writes are always in stage 5

I add r2, r1,r3 J sub r1,r4,r3
12
Three Generic Data Hazards
  • Output Dependency (Write After Write, or WAW)
  • A later instruction tries to write operand before
    earlier instructions write it
  • This hazard results from reuse of the same
    register
  • Cant happen in our simple 5 stage pipeline
    because
  • All instructions take 5 stages, and
  • Reads are always in stage 2, and
  • Writes are always in stage 5

I add r1,r2,r3 J sub r1,r4,r3
13
More on WAR and WAW
  • WAR and WAW hazards are name dependences
  • Two instructions happen to use the same register
    (name), although they dont have to
  • Can often be eliminated by renaming, either in
    software or hardware
  • Implies the use of additional resources, hence
    additional cost
  • Renaming is not always possible implicit
    operands such as accumulator, PC, or condition
    codes cannot be renamed

14
How to Break the Dependency
  • Dependency reduces concurrency
  • Can we break
  • True dependency (RAW)
  • Name dependency or False dependency (WAR, WAW)

15
Software Solution
  • Have compiler guarantee no hazards
  • Where do we insert the nops ? sub 2, 1,
    3 and 12, 2, 5 or 13, 6, 2 add 14,
    2, 2 sw 15, 100(2)
  • Problem this really slows us down!

16
Hardware Solution Forwarding
Time (clock cycles)
add r1,r2,r3
I n s t r O r d e r
sub r4,r1,r3
and r6,r1,r7
or r8,r1,r9
xor r10,r1,r11
17
Forwarding (simplified)
ID/EX
EX/MEM
MEM/WB
Register File
Data Memory
ALU
MUX
18
Forwarding Unit
1. Forwarding between ALUOut and ALUMuxA sub
2, 1, 3 and 12, 2, 5
EX/MEM.RegisterRd ID/EX.RegisterRs 2 gt
Use EX/MEM.ALUOut instead of ID/EX.A a.
Some instructions do not write registers
b. Every use of 0 as an operand must yield an
operand value of zero

If ( EX/MEM.RegWrite (EX/MEM.RegisterRd ?
0) (EX/MEM.RegisterRd ID/EX.RegisterRs)
) ForwardA 01
19
Forwarding Unit
2. Forwarding between ALUOut and ALUMuxB sub
2, 1, 3 and 12,5, 2
EX/MEM.RegisterRd ID/EX.RegisterRt 2 gt
Use EX/MEM.ALUOut instead of ID/EX.B

If ( EX/MEM.RegWrite (EX/MEM.RegisterRd ?
0) (EX/MEM.RegisterRd ID/EX.RegisterRt)
) ForwardB 01
20
Forwarding (from EX/MEM)
ID/EX
EX/MEM
MEM/WB
Register File
ALU
Data Memory
MUX
21
Forwarding Unit
3. Forwarding between ALUOut and ALUMuxA sub
2, 1, 3 and 12, 2, 5 or 13, 2,
6 MEM/WB.RegisterRd MEM/WB.RegisterRs 2
gt Use MEM/WB.ALUOut instead of ID/EX.A
If ( MEM/WB.RegWrite (MEM/WB.RegisterRd ?
0) (MEM/WB.RegisterRd ID/EX.RegisterRs)
) ForwardA 10
22
Forwarding Unit
4. Forwarding between ALUOut and ALUMuxB sub
2, 1, 3 and 12, 2, 5 or 13, 6,
2 MEM/WB.RegisterRd MEM/WB.RegisterRt 2
gt Use MEM/WB.ALUOut instead of ID/EX.B

If ( MEM/WB.RegWrite (MEM/WB.RegisterRd ?
0) (MEM/WB.RegisterRd ID/EX.RegisterRt)
) ForwardB 10
23
Forwarding (from MEM/WB)
ID/EX
EX/MEM
MEM/WB
Register File
ALU
Data Memory
MUX
24
Forwarding (operand selection)
ID/EX
EX/MEM
MEM/WB
Register File
ALU
Data Memory
MUX
Forwarding Unit
25
Forwarding (operand propagation)
ID/EX
EX/MEM
MEM/WB
Register File
ALU
Data Memory
MUX
Rd
Rt
EX/MEM Rd
Forwarding Unit
Rt
Rs
MEM/WB Rd
26
Forwarding
27
Datapath with Forwarding Unit
28
Forwarding Unit
add 1, 1, 2 add 1, 1, 3 add 1,
1, 4

If ( MEM/WB.RegWrite (MEM/WB.RegisterRd ?
0) (EX/MEM.RegisterRd ? ID/EX.RegisterRs)
(MEM/WB.RegisterRd ID/EX.RegisterRs) )
ForwardA 10
If ( MEM/WB.RegWrite (MEM/WB.RegisterRd ?
0) (EX/MEM.RegisterRd ? ID/EX.RegisterRt)
(MEM/WB.RegisterRd ID/EX.RegisterRt) )
ForwardB 10
29
Some Other Data Dependencies
  • lw 1, 0(2) F D X M W
  • sw 1, 0(7) F D X M W
  • sw 1, 0(8) F D X M W
  • sw 1, 0(9) F D X
    M W


30
Can't always forward
  • Load word can still cause a hazard

Time (clock cycles)
I n s t r. O r d e r
31
Data Hazard Even with Forwarding
Time (clock cycles)
I n s t r. O r d e r
lw r1, 0(r2)
NO ISSUE
sub r4,r1,r6
and r6,r1,r7
Bubble
ALU
DMem
or r8,r1,r9
Thus, we need a hazard detection unit to stall
the load instruction
32
Stalling
  • Hazard detection unit

If ( ID/EX.MemRead ((ID/EX.RegisterRt
IF/ID.RegisterRs) (ID/EX.RegisterRt
IF/ID.RegisterRt) )) stall the pipeline
  • When the pipeline is stalled
  • Do not fetch a new instruction Prevent PC and
    IF/ID registers from changing
  • Create a buble in the pipeline Set all control
    signals to 0 to create a do nothing instruction

33
Hazard Detection Unit
34
Code rescheduling to Avoid Load Hazards
Try producing fast code for a b c d e
f assuming a, b, c, d ,e, and f in memory.
Slow code LW Rb,b LW Rc,c ADD
Ra,Rb,Rc SW a,Ra LW Re,e LW
Rf,f SUB Rd,Re,Rf SW d,Rd
  • Fast code
  • LW Rb,b
  • LW Rc,c
  • LW Re,e
  • ADD Ra,Rb,Rc
  • LW Rf,f
  • SW a,Ra
  • SUB Rd,Re,Rf
  • SW d,Rd

Compiler optimizes for performance. Hardware
checks for safety.
35
Branch in the Pipelined Datapath
Computes branch target address
Computes branch outcome
Changes PC
36
Branch (Control) Hazards
  • When we decide to branch, other instructions are
    in
  • the pipeline!

37
Solving Branch Hazards
  • Stall the pipeline until the branch is complete
  • Brach is detected in ID stage
  • Pipeline is stalled
  • Pipeline is started in IF stage
  • Next instruction
  • Branch target
  • Three clock cycles will be lost for each branch
    !!!

38
Reducing Taken Branch Penalty
  • Compute branch target address earlier
  • Compute branch outcome earlier

39
Reducing Taken Branch Penalty
  • Branch is completed in ID stage
  • If branch is taken, flush the pipeline
  • 1 cycle loss for a taken branch

Taken branch F D X M W
Branch 1 F FL FL FL FL
Branch target F D X M W
BT 1 F D X M W
40
Flushing the Instruction After Branch
41
Predictnot-Taken (Predict-Untaken)
  • Continue execution after the branch
  • If branch is not taken, no penalty
  • If branch is taken, flush the pipeline and loss
    of 1
  • clock cycles
  • What about Predict-Taken?

42
Delayed Branches
  • Execution cycle with a branch delay of length n
  • branch instruction sequential
    successor1 sequential successor2 ........ seque
    ntial successorn
  • branch target if taken
  • Instructions in the branch delay slot are
    executed irrespective of branch outcome

Branch delay of length n
43
Delayed Branches on MIPS
  • One branch delay slot on MIPS
  • Taken and untaken branch behaviour are similar
  • Compiler must fill in the branch delay slot with
    useful instructions

44
Delayed Branches
  • Question What instruction do we put in the
    branch delay slot?
  • Fill with NOP (always possible)
  • Fill from before (not always possible)
  • Fill from target (not always possible)
  • Fill from fall-through (not always possible)

45
Filling Branch Delay Slot
Make sure R7 will not be used in taken path
before redefined
46
Filling Branch Delay Slot
47
Cancelling Branches
  • Improves the ability of the compiler to fill in
    delay slots
  • Instruction includes a bit showing its predicted
    direction
  • When branch behaves as predicted, instruction in
    the delay slot is executed
  • When branch is incorrectly predicted, instruction
    in the delay slot is turned to NOP

48
Predict-Taken Cancelling Branch
49
Summary Pipelining
  • Reduce CPI by overlapping many instructions
  • Average throughput of approximately 1 CPI with
    fast clock
  • Utilize capabilities of the Datapath
  • Start next instruction while working on the
    current one
  • Limited by length of longest stage (plus
    fill/flush)
  • Detect and resolve hazards
  • What makes it easy
  • All instructions are the same length
  • Just a few instruction formats
  • Memory operands appear only in loads and stores
  • What makes it hard?
  • Structural hazards suppose we had only one
    memory
  • Control hazards need to worry about branch
    instructions
  • Data hazards an instruction depends on a
    previous instruction
Write a Comment
User Comments (0)
About PowerShow.com