CS152 - PowerPoint PPT Presentation

1 / 29
About This Presentation
Title:

CS152

Description:

... hazard by waiting stall but affects throughput ... Reduces the number of stall cycles to one (like ... target instruction) so that a stall can be avoided ... – PowerPoint PPT presentation

Number of Views:37
Avg rating:3.0/5.0
Slides: 30
Provided by: wwwinstEe
Category:
Tags: cs152 | stall

less

Transcript and Presenter's Notes

Title: CS152


1
CS152 Computer Architecture andEngineeringLect
ure 12 Pipeline Wrap up Control Hazards,
RAW/WAR/WAW
2004-10-07 John Lazzaro(www.cs.berkeley.edu/lazz
aro) Dave Patterson (www.cs.berkeley.edu/patters
on) www-inst.eecs.berkeley.edu/cs152/
2
Pipelining Review
  • What makes it easy
  • all instructions are the same length
  • just a few instruction formats
  • memory operands appear only in loads and stores
  • Hazards limit performance
  • Structural need more HW resources
  • Data need forwarding, compiler scheduling
  • Data hazards must be handled carefully
  • MIPS I instruction set architecture made pipeline
    visible (delayed branch, delayed load)

3
Outline
  • Pipelined Control
  • Control Hazards
  • RAW, WAR, WAW
  • Brainstorm on pipeline bugs

4
MIPS Pipeline Data / Control Paths A (fast)
1
PCSrc
ID/EX
0
EX/MEM
EX
Control
MEM
IF/ID
Add
MEM/WB
Branch
Add
WB
4
Shift left 2
RegWrite
Read Addr 1
Instruction Memory
Data Memory
Register File
Read Data 1
Read Addr 2
MemtoReg
Read Address
ALUSrc
PC
Read Data
Address
1
Write Addr
ALU
Read Data 2
0
Write Data
0
Write Data
1
ALU cntrl
MemWrite
MemRead
Sign Extend
16
32
ALUOp
0
1
RegDst
5
MIPS Pipeline Data / Control Paths (debug)
1
PCSrc
ID/EX
EX/MEM
MEM/WB
0
EX
MEM
WB
Instr
Instr
Instr
IF/ID
Control
Control
Add
Branch
Add
4
Shift left 2
RegWrite
Read Addr 1
Instruction Memory
Data Memory
Register File
Read Data 1
Read Addr 2
MemtoReg
Read Address
ALUSrc
PC
Read Data
Address
1
Write Addr
ALU
Read Data 2
0
Write Data
0
Write Data
1
ALU cntrl
MemWrite
MemRead
Sign Extend
16
32
ALUOp
0
1
RegDst
6
MIPS Pipeline Control (pipelined debug)
1
PCSrc
ID/EX
EX/MEM
MEM/WB
0
Instr
Instr
Instr
MEM
EX
IF/ID
WB
Control
Control
Control
Add
Branch
Add
4
Shift left 2
RegWrite
Read Addr 1
Instruction Memory
Data Memory
Register File
Read Data 1
Read Addr 2
MemtoReg
Read Address
ALUSrc
PC
Read Data
Address
1
Write Addr
ALU
Read Data 2
0
Write Data
0
Write Data
1
ALU cntrl
MemWrite
MemRead
Sign Extend
16
32
ALUOp
0
1
RegDst
7
Control Hazards
  • When the flow of instruction addresses is not
    what the pipeline expects incurred by change of
    flow instructions
  • Conditional branches (beq, bne)
  • Unconditional branches (j)
  • Possible solutions
  • Stall
  • Move decision point earlier in the pipeline
  • Predict
  • Delay decision (requires compiler support)
  • Control hazards occur less frequently than data
    hazards there is nothing as effective against
    control hazards as forwarding is for data hazards

8
Datapath Branch and Jump Hardware
9
Datapath Branch and Jump Hardware
10
Administrivia
  • Finish Lab 3 meet with TA Friday
  • Midterm Tue Oct 12 530 - 830 in 101 Morgan
  • Northwest corner of campus, near Arch and Hearst
  • Midterm review Sunday Oct 10, 7 PM, 306 Soda
  • Bring 1 page, handwritten notes, both sides
  • Nothing electronic no calculators, cell phones,
    pagers,
  • Meet at LaVals Northside afterwards for Pizza

11
Jumps Incur One Stall
  • Jumps not decoded until ID, so one stall is needed

j
I n s t r. O r d e r
lw
and
  • Fortunately, jumps are very infrequent only 2
    of the SPECint instruction mix

12
Review Branches Incur Three Stalls
beq
I n s t r. O r d e r
Can fix branch hazard by waiting stall but
affects throughput
13
Moving Branch Decisions Earlier in Pipe
  • Move the branch decision hardware back to the EX
    stage
  • Reduces the number of stall cycles to two
  • Adds an and gate and a 2x1 mux to the EX timing
    path
  • Add hardware to compute the branch target address
    and evaluate the branch decision to the ID stage
  • Reduces the number of stall cycles to one (like
    with jumps)
  • Computing branch target address can be done in
    parallel with RegFile read (done for all
    instructions only used when needed)
  • Comparing the registers cant be done until after
    RegFile read, so comparing and updating the PC
    adds a comparator, an and gate, and a 3x1 mux to
    the ID timing path
  • Need forwarding hardware in ID stage
  • For longer pipelines, decision points are later
    in the pipeline, incurring more stalls, so we
    need a better solution

14
Early Branch Forwarding Issues
  • Bypass of source operands from the EX/MEM
  • if (IDcontrol.Branch
  • and (EX/MEM.RegisterRd ! 0)
  • and (EX/MEM.RegisterRd IF/ID.RegisterRs))
  • ForwardC 1
  • if (IDcontrol.Branch
  • and (EX/MEM.RegisterRd ! 0)
  • and (EX/MEM.RegisterRd IF/ID.RegisterRt))
  • ForwardD 1

Forwards the result from the second previous
instr. to either input of the Compare
  • MEM/WB dependency also needs to be forwarded
  • If the instruction 2 before the branch is a load,
    then a stall will be required since the MEM stage
    memory access is occurring at the same time as
    the ID stage branch compare operation

15
Branch Prediction
  • Resolve branch hazards by assuming a given
    outcome and proceeding without waiting to see the
    actual branch outcome
  • Predict not taken always predict branches will
    not be taken, continue to fetch from the
    sequential instruction stream, only when branch
    is taken does the pipeline stall
  • If taken, flush instructions in the pipeline
    after the branch
  • in IF, ID, and EX if branch logic in MEM three
    stalls
  • in IF if branch logic in ID one stall
  • ensure that those flushed instructions havent
    changed machine state automatic in the MIPS
    pipeline since machine state changing operations
    are at the tail end of the pipeline (MemWrite or
    RegWrite)
  • restart the pipeline at the branch destination

16
Flushing with Misprediction (Not Taken)
4 beq 1,2,2
8 sub 4,1,5
  • To flush the IF stage instruction, add a IF.Flush
    control line that zeros the instruction field of
    the IF/ID pipeline register (transforming it into
    a noop)

17
Flushing with Misprediction (Not Taken)
4 beq 1,2,2
8 sub 4,1,5
  • To flush the IF stage instruction, add a IF.Flush
    control line that zeros the instruction field of
    the IF/ID pipeline register (transforming it into
    a noop)

18
Branch Prediction, cont
  • Resolve branch hazards by statically assuming a
    given outcome and proceeding
  • Predict taken always predict branches will be
    taken
  • Predict taken always incurs a stall (if branch
    destination hardware has been moved to the ID
    stage)
  • As the branch penalty increases (for deeper
    pipelines), a simple static prediction scheme
    will hurt performance
  • With more hardware, possible to try to predict
    branch behavior dynamically during program
    execution
  • Dynamic branch prediction predict branches at
    run-time using run-time information

19
Dynamic Branch Prediction
  • A branch prediction buffer (aka branch history
    table (BHT)) in the IF stage, addressed by the
    lower bits of the PC, contains a bit that tells
    whether the branch was taken the last time it was
    execute
  • Bit may predict incorrectly (may be from a
    different branch with the same low order PC bits,
    or may be a wrong prediction for this branch) but
    the doesnt affect correctness, just performance
  • If the prediction is wrong, flush the incorrect
    instructions in pipeline, restart the pipeline
    with the right instructions, and invert the
    prediction bit
  • The BHT predicts when a branch is taken, but does
    not tell where its taken to!
  • A branch target buffer (BTB) in the IF stage can
    cache the branch target address (or !even! the
    branch target instruction) so that a stall can be
    avoided

20
1-bit Prediction Accuracy
  • 1-bit predictor in loop is incorrect twice when
    not taken
  • Assume predict_bit 0 to start (indicating
    branch not taken) and loop control is at the
    bottom of the loop code
  • First time through the loop, the predictor
    mispredicts the branch since the branch is taken
    back to the top of the loop invert prediction
    bit (predict_bit 1)
  • As long as branch is taken (looping), prediction
    is correct
  • Exiting the loop, the predictor again mispredicts
    the branch since this time the branch is not
    taken falling out of the loop invert prediction
    bit (predict_bit 0)

Loop 1st loop instr 2nd loop instr
. . . last loop
instr bne 1,2,Loop fall out instr
  • For 10 times through the loop we have a 80
    prediction accuracy for a branch that is taken
    90 of the time

21
2-bit Predictors
  • A 2-bit scheme can give 90 accuracy since a
    prediction must be wrong twice before the
    prediction bit is changed.

Loop 1st loop instr 2nd loop instr
. . . last loop
instr bne 1,2,Loop fall out instr
Taken
Not taken
Predict Taken
Predict Taken
Taken
Not taken
Taken
Not taken
Predict Not Taken
Predict Not Taken
Taken
Not taken
22
2-bit Predictors
  • A 2-bit scheme can give 90 accuracy since a
    prediction must be wrong twice before the
    prediction bit is changed

right 9 times
Loop 1st loop instr 2nd loop instr
. . . last loop
instr bne 1,2,Loop fall out instr
wrong on loop fall out
Taken
Not taken
1
Predict Taken
Predict Taken
1
Taken
right on 1st iteration
Not taken
Taken
Not taken
0
Predict Not Taken
Predict Not Taken
0
Taken
Not taken
23
Delayed Decision
  • First, move the branch decision hardware and
    target address calculation to the ID pipeline
    stage
  • A delayed branch always executes the next
    sequential instruction the branch takes effect
    after that next instruction
  • MIPS software moves an instruction to immediately
    after the branch that is not affected by the
    branch (a safe instruction) thereby hiding the
    branch delay
  • As processor go to deeper pipelines and multiple
    issue, the branch delay grows and need more
    than one delay slot.
  • Delayed branching has lost popularity compared to
    more expensive but more flexible dynamic
    approaches
  • Growth in available transistors has made dynamic
    approaches relatively cheaper

24
Scheduling Branch Delay Slots
A. From before branch
B. From branch target
C. From fall through
add 1,2,3 if 10 then
add 1,2,3 if 20 then
sub 4,5,6
delay slot
delay slot
add 1,2,3 if 10 then
sub 4,5,6
delay slot
  • A is the best choice, fills delay slot reduces
    instruction count (IC)
  • In B, the sub instruction may need to be copied,
    increasing IC
  • In B and C, must be okay to execute sub when
    branch fails

25
3 Generic Data Hazards RAW, WAR, WAW
  • Read After Write (RAW) InstrJ tries to read
    operand before InstrI writes it
  • Caused by a Dependence (in compiler
    nomenclature). This hazard results from an
    actual need for communication.
  • Forwarding handles many, but not all, RAW
    dependencies in 5 stage MIPS pipeline

I add r1,r2,r3 J sub r4,r1,r3
26
3 Generic Data Hazards RAW, WAR, WAW
  • Write After Read (WAR) InstrJ writes operand
    before InstrI reads it
  • Called an anti-dependence by compiler
    writers.This results from reuse of the name
    r1.
  • Cant happen in MIPS 5 stage pipeline because
  • All instructions take 5 stages, and
  • Reads are always in stage 2, and
  • Register Writes must be in stage 5

27
3 Generic Data Hazards RAW, WAR, WAW
  • Write After Write (WAW) InstrJ writes operand
    before InstrI writes it.
  • Called an output dependence by compiler
    writersThis also results from the reuse of
    name r1.
  • Cant happen in MIPS 5 stage pipeline because
  • All instructions take 5 stages, and
  • Register Writes must be in stage 5
  • Can see WAR and WAW in more complicated pipes

28
Supporting ID Stage Branches
PCSrc
Branch
1
ID/EX
Hazard Unit
0
EX/MEM
1
0
0
Control
IF/ID
Add
MEM/WB
4
Shift left 2
Add
Compare
Read Addr 1
Instruction Memory
Data Memory
RegFile
Read Addr 2
Read Address
Read Data 1
PC
Read Data
1
Write Addr
ALU
Address
1
ReadData 2
Write Data
0
Write Data
0
ALU cntrl
16
Sign Extend
32
Forward Unit
Forward Unit
29
Brain storm on pipeline bugs
  • Where are bugs likely to hide in a pipelined
    processor?
  • How can you write tests to uncover these likely
    bugs?
  • Once it passes a test, never need to run it again
    in the design process?

30
Brain storm on pipeline bugs
  • Depending on branch solution (move to ID,
    delayed, static prediction, dynamic prediction),
    where are bugs likely to hide?
  • How can you write tests to uncover these likely
    bugs?
  • Once it passes a test, dont need to run it
    again?

31
Peer Instruction
Cycle 1
Cycle 2
Cycle 3
Cycle 4
Cycle 5
Cycle 6
Cycle 7
Clock
1st add
Mem/Wr
2nd lw
3rd add
Mem/Wr
  • Suppose we use with a 4 stage pipeline that
    combines memory access and write back stages for
    all instructions but load, stalling when there
    are structural hazards. Impact?
  • 1. The branch delay slot is now 0 instructions
  • 2. Most loads cause stall since often a
    structural hazard on reg. writes
  • 3. Most stores cause stall since they have a
    structural hazard
  • 4. Both 2 3 most loadsstores cause stall due
    to structural hazards
  • 5. Most loads cause stall, but there is no
    load-use hazard anymore
  • 6. Both 2 3, but there is no load-use hazard
    anymore
  • 7. None of the above

32
Peer Instruction
Cycle 1
Cycle 2
Cycle 3
Cycle 4
Cycle 5
Cycle 6
Cycle 7
Clock
1st add
Mem/Wr
2nd lw
3rd add
Mem/Wr
  • Suppose we use with a 4 stage pipeline that
    combines memory access and write back stages for
    all instructions but load, stalling when there
    are structural hazards. Impact?
  • 1. The branch delay slot is now 0 instructions
  • 2. Most loads cause stall since often a
    structural hazard on reg. writes
  • 3. Most stores cause stall since they have a
    structural hazard
  • 4. Both 2 3 most loadsstores cause stall due
    to structural hazards
  • 5. Most loads cause stall, but there is no
    load-use hazard anymore
  • 6. Both 2 3, but there is no load-use hazard
    anymore
  • 7. None of the above

Q Why not say every load stalls?
A Not all next instructions write in Wr stage
33
Summary Designing a Pipelined Processor
  • Go back and examine your data path and control
    diagram
  • Associate resources with states
  • Be sure there are no structural hazards one use
    / clock cycle
  • Add pipeline registers between stages to balance
    clock cycle
  • Amdahls Law suggests splitting longest stage
  • Resolve all data and control dependencies
  • If backwards in time in pipeline drawing to
    registersgt data hazard forward or stall to
    resolve them
  • If backwards in time in pipeline drawing to PCgt
    control hazard well see next time
  • 5 stage pipeline with reads early in same stage,
    writes later in same stage, avoids WAR/WAW
    hazards
  • Assert control in appropriate stage
  • Develop test instruction sequences likely to
    uncover pipeline bugs (If you dont test it, it
    wont work )
Write a Comment
User Comments (0)
About PowerShow.com