CS152

About This Presentation

Title:

CS152

Description:

... hazard by waiting stall but affects throughput ... Reduces the number of stall cycles to one (like ... target instruction) so that a stall can be avoided ... – PowerPoint PPT presentation

Number of Views:37

Avg rating:3.0/5.0

Slides: 30

Provided by: wwwinstEe

Category:

more less

Transcript and Presenter's Notes

Title: CS152

1
CS152 Computer Architecture andEngineeringLect
ure 12 Pipeline Wrap up Control Hazards,
RAW/WAR/WAW
2004-10-07 John Lazzaro(www.cs.berkeley.edu/lazz
aro) Dave Patterson (www.cs.berkeley.edu/patters
on) www-inst.eecs.berkeley.edu/cs152/
2
Pipelining Review

What makes it easy
all instructions are the same length
just a few instruction formats
memory operands appear only in loads and stores
Hazards limit performance
Structural need more HW resources
Data need forwarding, compiler scheduling
Data hazards must be handled carefully
MIPS I instruction set architecture made pipeline
visible (delayed branch, delayed load)

3
Outline

Pipelined Control
Control Hazards
RAW, WAR, WAW
Brainstorm on pipeline bugs

4
MIPS Pipeline Data / Control Paths A (fast)
1
PCSrc
ID/EX
0
EX/MEM
EX
Control
MEM
IF/ID
Add
MEM/WB
Branch
Add
WB
4
Shift left 2
RegWrite
Read Addr 1
Instruction Memory
Data Memory
Register File
Read Data 1
Read Addr 2
MemtoReg
Read Address
ALUSrc
PC
Read Data
Address
1
Write Addr
ALU
Read Data 2
0
Write Data
0
Write Data
1
ALU cntrl
MemWrite
MemRead
Sign Extend
16
32
ALUOp
0
1
RegDst
5
MIPS Pipeline Data / Control Paths (debug)
1
PCSrc
ID/EX
EX/MEM
MEM/WB
0
EX
MEM
WB
Instr
Instr
Instr
IF/ID
Control
Control
Add
Branch
Add
4
Shift left 2
RegWrite
Read Addr 1
Instruction Memory
Data Memory
Register File
Read Data 1
Read Addr 2
MemtoReg
Read Address
ALUSrc
PC
Read Data
Address
1
Write Addr
ALU
Read Data 2
0
Write Data
0
Write Data
1
ALU cntrl
MemWrite
MemRead
Sign Extend
16
32
ALUOp
0
1
RegDst
6
MIPS Pipeline Control (pipelined debug)
1
PCSrc
ID/EX
EX/MEM
MEM/WB
0
Instr
Instr
Instr
MEM
EX
IF/ID
WB
Control
Control
Control
Add
Branch
Add
4
Shift left 2
RegWrite
Read Addr 1
Instruction Memory
Data Memory
Register File
Read Data 1
Read Addr 2
MemtoReg
Read Address
ALUSrc
PC
Read Data
Address
1
Write Addr
ALU
Read Data 2
0
Write Data
0
Write Data
1
ALU cntrl
MemWrite
MemRead
Sign Extend
16
32
ALUOp
0
1
RegDst
7
Control Hazards

When the flow of instruction addresses is not
what the pipeline expects incurred by change of
flow instructions
Conditional branches (beq, bne)
Unconditional branches (j)
Possible solutions
Stall
Move decision point earlier in the pipeline
Predict
Delay decision (requires compiler support)
Control hazards occur less frequently than data
hazards there is nothing as effective against
control hazards as forwarding is for data hazards

8
Datapath Branch and Jump Hardware
9
Datapath Branch and Jump Hardware
10
Administrivia

Finish Lab 3 meet with TA Friday
Midterm Tue Oct 12 530 - 830 in 101 Morgan
Northwest corner of campus, near Arch and Hearst
Midterm review Sunday Oct 10, 7 PM, 306 Soda
Bring 1 page, handwritten notes, both sides
Nothing electronic no calculators, cell phones,
pagers,
Meet at LaVals Northside afterwards for Pizza

11
Jumps Incur One Stall

Jumps not decoded until ID, so one stall is needed

j
I n s t r. O r d e r
lw
and

Fortunately, jumps are very infrequent only 2
of the SPECint instruction mix

12
Review Branches Incur Three Stalls
beq
I n s t r. O r d e r
Can fix branch hazard by waiting stall but
affects throughput
13
Moving Branch Decisions Earlier in Pipe

Move the branch decision hardware back to the EX
stage
Reduces the number of stall cycles to two
Adds an and gate and a 2x1 mux to the EX timing
path
Add hardware to compute the branch target address
and evaluate the branch decision to the ID stage
Reduces the number of stall cycles to one (like
with jumps)
Computing branch target address can be done in
parallel with RegFile read (done for all
instructions only used when needed)
Comparing the registers cant be done until after
RegFile read, so comparing and updating the PC
adds a comparator, an and gate, and a 3x1 mux to
the ID timing path
Need forwarding hardware in ID stage
For longer pipelines, decision points are later
in the pipeline, incurring more stalls, so we
need a better solution

14
Early Branch Forwarding Issues

Bypass of source operands from the EX/MEM
if (IDcontrol.Branch
and (EX/MEM.RegisterRd ! 0)
and (EX/MEM.RegisterRd IF/ID.RegisterRs))
ForwardC 1
if (IDcontrol.Branch
and (EX/MEM.RegisterRd ! 0)
and (EX/MEM.RegisterRd IF/ID.RegisterRt))
ForwardD 1

Forwards the result from the second previous
instr. to either input of the Compare

MEM/WB dependency also needs to be forwarded
If the instruction 2 before the branch is a load,
then a stall will be required since the MEM stage
memory access is occurring at the same time as
the ID stage branch compare operation

15
Branch Prediction

Resolve branch hazards by assuming a given
outcome and proceeding without waiting to see the
actual branch outcome
Predict not taken always predict branches will
not be taken, continue to fetch from the
sequential instruction stream, only when branch
is taken does the pipeline stall
If taken, flush instructions in the pipeline
after the branch
in IF, ID, and EX if branch logic in MEM three
stalls
in IF if branch logic in ID one stall
ensure that those flushed instructions havent
changed machine state automatic in the MIPS
pipeline since machine state changing operations
are at the tail end of the pipeline (MemWrite or
RegWrite)
restart the pipeline at the branch destination

16
Flushing with Misprediction (Not Taken)
4 beq 1,2,2
8 sub 4,1,5

To flush the IF stage instruction, add a IF.Flush
control line that zeros the instruction field of
the IF/ID pipeline register (transforming it into
a noop)

17
Flushing with Misprediction (Not Taken)
4 beq 1,2,2
8 sub 4,1,5

To flush the IF stage instruction, add a IF.Flush
control line that zeros the instruction field of
the IF/ID pipeline register (transforming it into
a noop)

18
Branch Prediction, cont

Resolve branch hazards by statically assuming a
given outcome and proceeding
Predict taken always predict branches will be
taken
Predict taken always incurs a stall (if branch
destination hardware has been moved to the ID
stage)
As the branch penalty increases (for deeper
pipelines), a simple static prediction scheme
will hurt performance
With more hardware, possible to try to predict
branch behavior dynamically during program
execution
Dynamic branch prediction predict branches at
run-time using run-time information

19
Dynamic Branch Prediction

A branch prediction buffer (aka branch history
table (BHT)) in the IF stage, addressed by the
lower bits of the PC, contains a bit that tells
whether the branch was taken the last time it was
execute
Bit may predict incorrectly (may be from a
different branch with the same low order PC bits,
or may be a wrong prediction for this branch) but
the doesnt affect correctness, just performance
If the prediction is wrong, flush the incorrect
instructions in pipeline, restart the pipeline
with the right instructions, and invert the
prediction bit
The BHT predicts when a branch is taken, but does
not tell where its taken to!
A branch target buffer (BTB) in the IF stage can
cache the branch target address (or !even! the
branch target instruction) so that a stall can be
avoided

20
1-bit Prediction Accuracy

1-bit predictor in loop is incorrect twice when
not taken

Assume predict_bit 0 to start (indicating
branch not taken) and loop control is at the
bottom of the loop code
First time through the loop, the predictor
mispredicts the branch since the branch is taken
back to the top of the loop invert prediction
bit (predict_bit 1)
As long as branch is taken (looping), prediction
is correct
Exiting the loop, the predictor again mispredicts
the branch since this time the branch is not
taken falling out of the loop invert prediction
bit (predict_bit 0)

Loop 1st loop instr 2nd loop instr
. . . last loop
instr bne 1,2,Loop fall out instr

For 10 times through the loop we have a 80
prediction accuracy for a branch that is taken
90 of the time

21
2-bit Predictors

A 2-bit scheme can give 90 accuracy since a
prediction must be wrong twice before the
prediction bit is changed.

Loop 1st loop instr 2nd loop instr
. . . last loop
instr bne 1,2,Loop fall out instr
Taken
Not taken
Predict Taken
Predict Taken
Taken
Not taken
Taken
Not taken
Predict Not Taken
Predict Not Taken
Taken
Not taken
22
2-bit Predictors

A 2-bit scheme can give 90 accuracy since a
prediction must be wrong twice before the
prediction bit is changed

right 9 times
Loop 1st loop instr 2nd loop instr
. . . last loop
instr bne 1,2,Loop fall out instr
wrong on loop fall out
Taken
Not taken
1
Predict Taken
Predict Taken
1
Taken
right on 1st iteration
Not taken
Taken
Not taken
0
Predict Not Taken
Predict Not Taken
0
Taken
Not taken
23
Delayed Decision

First, move the branch decision hardware and
target address calculation to the ID pipeline
stage
A delayed branch always executes the next
sequential instruction the branch takes effect
after that next instruction
MIPS software moves an instruction to immediately
after the branch that is not affected by the
branch (a safe instruction) thereby hiding the
branch delay

As processor go to deeper pipelines and multiple
issue, the branch delay grows and need more
than one delay slot.
Delayed branching has lost popularity compared to
more expensive but more flexible dynamic
approaches
Growth in available transistors has made dynamic
approaches relatively cheaper

24
Scheduling Branch Delay Slots
A. From before branch
B. From branch target
C. From fall through
add 1,2,3 if 10 then
add 1,2,3 if 20 then
sub 4,5,6
delay slot
delay slot
add 1,2,3 if 10 then
sub 4,5,6
delay slot

A is the best choice, fills delay slot reduces
instruction count (IC)
In B, the sub instruction may need to be copied,
increasing IC
In B and C, must be okay to execute sub when
branch fails

25
3 Generic Data Hazards RAW, WAR, WAW

Read After Write (RAW) InstrJ tries to read
operand before InstrI writes it
Caused by a Dependence (in compiler
nomenclature). This hazard results from an
actual need for communication.
Forwarding handles many, but not all, RAW
dependencies in 5 stage MIPS pipeline

I add r1,r2,r3 J sub r4,r1,r3
26
3 Generic Data Hazards RAW, WAR, WAW

Write After Read (WAR) InstrJ writes operand
before InstrI reads it
Called an anti-dependence by compiler
writers.This results from reuse of the name
r1.
Cant happen in MIPS 5 stage pipeline because
All instructions take 5 stages, and
Reads are always in stage 2, and
Register Writes must be in stage 5

27
3 Generic Data Hazards RAW, WAR, WAW

Write After Write (WAW) InstrJ writes operand
before InstrI writes it.
Called an output dependence by compiler
writersThis also results from the reuse of
name r1.
Cant happen in MIPS 5 stage pipeline because
All instructions take 5 stages, and
Register Writes must be in stage 5
Can see WAR and WAW in more complicated pipes

28
Supporting ID Stage Branches
PCSrc
Branch
1
ID/EX
Hazard Unit
0
EX/MEM
1
0
0
Control
IF/ID
Add
MEM/WB
4
Shift left 2
Add
Compare
Read Addr 1
Instruction Memory
Data Memory
RegFile
Read Addr 2
Read Address
Read Data 1
PC
Read Data
1
Write Addr
ALU
Address
1
ReadData 2
Write Data
0
Write Data
0
ALU cntrl
16
Sign Extend
32
Forward Unit
Forward Unit
29
Brain storm on pipeline bugs

Where are bugs likely to hide in a pipelined
processor?
How can you write tests to uncover these likely
bugs?
Once it passes a test, never need to run it again
in the design process?

30
Brain storm on pipeline bugs

Depending on branch solution (move to ID,
delayed, static prediction, dynamic prediction),
where are bugs likely to hide?
How can you write tests to uncover these likely
bugs?
Once it passes a test, dont need to run it
again?

31
Peer Instruction
Cycle 1
Cycle 2
Cycle 3
Cycle 4
Cycle 5
Cycle 6
Cycle 7
Clock
1st add
Mem/Wr
2nd lw
3rd add
Mem/Wr

Suppose we use with a 4 stage pipeline that
combines memory access and write back stages for
all instructions but load, stalling when there
are structural hazards. Impact?
1. The branch delay slot is now 0 instructions
2. Most loads cause stall since often a
structural hazard on reg. writes
3. Most stores cause stall since they have a
structural hazard
4. Both 2 3 most loadsstores cause stall due
to structural hazards
5. Most loads cause stall, but there is no
load-use hazard anymore
6. Both 2 3, but there is no load-use hazard
anymore
7. None of the above

32
Peer Instruction
Cycle 1
Cycle 2
Cycle 3
Cycle 4
Cycle 5
Cycle 6
Cycle 7
Clock
1st add
Mem/Wr
2nd lw
3rd add
Mem/Wr

Suppose we use with a 4 stage pipeline that
combines memory access and write back stages for
all instructions but load, stalling when there
are structural hazards. Impact?
1. The branch delay slot is now 0 instructions
2. Most loads cause stall since often a
structural hazard on reg. writes
3. Most stores cause stall since they have a
structural hazard
4. Both 2 3 most loadsstores cause stall due
to structural hazards
5. Most loads cause stall, but there is no
load-use hazard anymore
6. Both 2 3, but there is no load-use hazard
anymore
7. None of the above

Q Why not say every load stalls?
A Not all next instructions write in Wr stage
33
Summary Designing a Pipelined Processor

Go back and examine your data path and control
diagram
Associate resources with states
Be sure there are no structural hazards one use
/ clock cycle
Add pipeline registers between stages to balance
clock cycle
Amdahls Law suggests splitting longest stage
Resolve all data and control dependencies
If backwards in time in pipeline drawing to
registersgt data hazard forward or stall to
resolve them
If backwards in time in pipeline drawing to PCgt
control hazard well see next time
5 stage pipeline with reads early in same stage,
writes later in same stage, avoids WAR/WAW
hazards
Assert control in appropriate stage
Develop test instruction sequences likely to
uncover pipeline bugs (If you dont test it, it
wont work )

Write a Comment

User Comments (0)

About PowerShow.com

CS152 - PowerPoint PPT Presentation

CS152

... hazard by waiting stall but affects throughput ... Reduces the number of stall cycles to one (like ... target instruction) so that a stall can be avoided ... – PowerPoint PPT presentation