Title: CPE 431531 Chapter 6 Enhancing Performance with Pipelining
1CPE 431/531Chapter 6 - Enhancing Performance
with Pipelining
26.1 An Overview of Pipelining
- Pipelining is an implementation technique in
which multiple instructions are overlapped in
execution. - Pipelining helps ______________, not individual
execution time. - You cant skip a stage.
36.1 An Overview of Pipelining -The Laundry
Analogy
46.1 An Overview of Pipelining -MIPS Processor
Stages
- Instruction fetch (IF)
- Instruction decode and register read (ID)
- Execution (calculate address) (EX)
- Memory access (MEM)
- Register write (WB)
56.1 An Overview of Pipelining -Single Cycle vs.
Pipelined Performance
- Consider lw, sw, add, sub, and, or, slt, beq
- Operation times memory, ALU 200 ps, register 100
ps
66.1 An Overview of Pipelining -Single Cycle vs.
Pipelined Timeline
Speedup
76.1 An Overview of Pipelining -Designing
Instruction Sets for Pipelining
- All MIPS instructions are the same ______.
- MIPS has only a few ________________.
- Memory operands appear only in ________.
- Operands must be ________ in memory.
86.1 An Overview of Pipelining -Pipeline Hazards
- Structural Hazard - not enough hardware
- Data Hazards one instruction needs the result
of another - Control Hazard - decisions arent made
- Conservative Approach stall
- Alternative Approach predict
-
96.1 An Overview of Pipelining- Data Hazards
- A data hazard occurs when a needed result has not
yet been written to the register file. - Consider
- add s0, t0, t1
- sub t2, s0, t3
- Though the result is not written until WB, it is
available after the add has finished the EX
stage, __________ it to the right place.
106.1 An Overview of Pipelining -Data Hazards Two
Instruction Forwarding
- Forwarding paths are valid only if the
destination stage is _________ than the source
stage.
116.1 An Overview of Pipelining -Data Hazards
More on Forwarding
- Forwarding cant fix everything.
- Consider
- lw s0, 20(t1)
- sub t2, s0, t3
126.1 An Overview of Pipelining -Data Hazards The
Compiler Can Help
- Consider the following
- A B E
- C B F
- lw t1, 0(t0)
- lw t2, 4(t0)
- add t3, t1, t2
- sw t3, 12(t0)
- lw t4, 8(t0)
- add t5, t1, t4
- sw t5, 16(t0)
136.1 An Overview of Pipelining -Control Hazards
Stalling
Performance of Stall on Branch
146.1 An Overview of Pipelining -Control Hazards
Prediction
156.2 A Pipelined Datapath Identifying the Stages
166.2 A Pipelined Datapath Representing Multiple
Instruction Execution
176.2 A Pipelined Datapath - Adding Pipeline
Registers
186.2 A Pipelined Datapath lw Instruction
Execution IF Stage
196.2 A Pipelined Datapath lw Instruction
Execution ID Stage
206.2 A Pipelined Datapath- lw Instruction
Execution EX Stage
216.2 A Pipelined Datapath - lw Instruction
Execution MEM Stage
226.2 A Pipelined Datapath - lw Instruction
Execution WB Stage
236.2 A Pipelined Datapath - sw Instruction
Exection EX stage
246.2 A Pipelined Datapath sw Instruction
Exection MEM stage
256.2 A Pipelined Datapath - sw Instruction
Exection WB stage
266.2 A Pipelined Datapath - Additions for lw and
R-type
276.2 A Pipelined Datapath - Datapath used by lw
286.2 A Pipelined Datapath - Stylized Multiple
Clock Cycle Diagrams
296.2 A Pipelined Datapath - Traditional Multiple
Clock Cycle Diagrams
306.2 A Pipelined Datapath - Single Cycle Diagram
Cycle 5 Slice
316.3 Pipelined Control - Identifying Control
Lines Needed
326.3 Pipelined Control Generating and Saving
Control Lines
336.3 Pipelined Control - Putting it all Together
346.4 Data Hazards and Forwarding - Data
Dependencies
- In the previous example, there were no data
dependencies. Now, the rest of the story. - sub 2, 1, 3
- and 12, 2, 5
- or 13, 6, 2,
- add 14, 2, 2
- sw 15, 100(2)
356.4 Data Hazards and Forwarding - Which Data
Dependencies are Hazards?
- A multiple-clock-cycle diagram is useful for
looking at the effects of data dependencies.
366.4 Data Hazards and Forwarding - Classifying
Hazards
- Type 1 The information needed in the EX stage
by an instruction is the result of the
instruction one stage ahead (found in the EX/MEM
pipeline register) - A. The information is needed in Rrs
- B. The information is needed as Rrt
- Type 2 The information needed in the EX stage
by an instruction is the result of the
instruction two stages ahead (found in the MEM/WB
pipeline register) - A. The information is needed in Rrs
- B. The information is needed as Rrt
376.4 Data Hazards and Forwarding - Forwarding in
Action
386.4 Data Hazards and Forwarding - Datapath
without and with Forwarding
396.4 Data Hazards and Forwarding -Forwarding Unit
Implementation
406.4 Data Hazards and Forwarding - Forwarding
Unit A Complication
- Consider
- add 1, 1, 2
- add 1, 1, 3
- add 1, 1, 4
- MEM Hazard
- If (MEM/WB.RegWrite 1 and (MEM/WB.RegisterRd
? 0) and (MEM/WB.RegisterRd ID/EX.RegisterRs)
and -
) ForwardA 10 - If (MEM/WB.RegWrite 1 and (MEM/WB.RegisterRd
? 0) and (MEM/WB.RegisterRd ID/EX.RegisterRt)
and -
) ForwardB 10
416.4 Data Hazards and Forwarding -Forwarding
Datapath with Control
426.4 Data Hazards and Forwarding - EX Forwarding
Completed
436.4 Data Hazards and Forwarding - Forwarding in
the MEM Stage
- Look at what happens when a load is followed by a
store to the same address. Forwarding is possible
in the MEM stage
446.5 Data Hazards and Stalls -Forwarding Cant
Always Save the Day
456.5 Data Hazards and Stalls - Hazard Detection
Unit
466.5 Data Hazards and Stalling -Stalling in
Pictures
476.5 Data Hazards and Stalling -Stalling in
Pictures
486.6 Branch Hazard - Example
496.6 Branch Hazards - Approaches
- Assume Branch Not Taken
- ___________________________________________
- ___________________________________________
- Reducing the Delay of Branches
- ___________________________________________
506.6 Branch Hazards - Datapath Changes
516.6 Branch Hazards -Action Shots Hazard is
Detected
- 36 sub 10, 4, 8
- 40 beq 1, 3, 7
- 44 and 12, 2, 5
- 48 or 13, 2, 6
- 52 add 14, 4, 2
- 56 slt 15, 6, 7
-
- 72 lw 4, 50(7)
526.6 Branch Hazards - Action Shots and
Instruction is Flushed
36 sub 10, 4, 8 40 beq 1, 3, 7 44 and
12, 2, 5 48 or 13, 2, 6 52 add 14, 4,
2 56 slt 15, 6, 7 72 lw 4, 50(7)
536.6 Branch Hazards -Branch Prediction
- Dynamic Branch Prediction
- Assuming branch not taken is _____________________
. - Dynamic approaches
- ___________________________
- ___________________________
- Loops and Predictions
- Consider a one-bit prediction scheme. How
accurate is it for a loop branch thats taken 9
times and then not taken once?
546.6 Branch Hazards -Delayed Branch Slot
- ProblemA branch predictor tells us whether or
not a branch is taken, but still requires the
calculation of the _____________ - Solution Use a cache to hold the _______________
or the ___________________
556.6 Branch Hazards -Comparing Performance
- Compare single-cycle, multicycle, and pipeline
- Functional Unit Times Memory access 200 ps, ALU
operation 100 ps, Register file read/write 50
ps - CTsingle 600 ps, CPIsingle 1, CTmulti 200
ps, CPImulti 4.12 - Instructions 25 loads, 10 stores, 11
branches, 2 jumps, 52 ALU - Pipeline Assumptions 50 of load instructions
are followed immediately by a use, the branch
delay on misprediction is one cycle and 25 of
the branches are mispredicted, jumps always pay
one full clock cycle of delay - CPIpipe 0.25(0.5(12))0.110.11(0.2520.751
)0.022 0.521 - 1.17
- Average instruction time CPICT
566.6 Branch Hazards - Final Datapath and Control
576.8 Exceptions - Datapath Changes
586.8 Exceptions - Exception in a Pipelined
Computer
- 40hex sub 11, 2, 4
- 44hex and 12, 2, 5
- 48hex or 13, 2, 6
- 4Chex add 1, 2, 1
- 50hex slt 15, 6, 7
- 54hex lw 16, 50(7)
- 40000040hexsw 25, 1000(0)
- 40000044hexsw 26, 1004(0)
596.8 Exceptions - Exception in a Pipelined
Computer
- 40hex sub 11, 2, 4
- 44hex and 12, 2, 5
- 48hex or 13, 2, 6
- 4Chex add 1, 2, 1
- 50hex slt 15, 6, 7
- 54hex lw 16, 50(7)
- 40000040hex sw 25, 1000(0)
- 40000044hex sw 26, 1004(0)
606.8 Exceptions More About Exception Causes
Handling
- Causes of Exceptions
- _________________
- _________________
- _________________
- _________________
- Issues
- Which instruction in the pipeline is responsible
for the exception? - What happens if multiple exceptions occur in a
single clock cycle? - Solutions
- Prioritize the exceptions - easy to determine
which is serviced first. - Most MIPS implementations have hardware that
sorts exceptions so that the earliest instruction
is interrupted. - I/O device requests and hardware malfunctions are
not associated with a specific instruction, so
there is some flexibility.
616.9 Advanced Pipelining Extracting More
Performance
- Pipelining exploits the potential parallelism,
called _______________________, among
instructions. - Methods for increasing performance
- Increase the depth of the pipeline
- Replicate the internal components of the computer
so that it can launch multiple instructions in
every cycle _______________ - Launching multiple instructions per cycle allows
the instruction execution rate to exceed the
clock rate, the CPI _______, so we consider IPC
________________ instead - Todays high end processors attempt to issue from
_______ to _______ instructions in every cycle - Multiple Issue
- Static Multiple Issue
- Dynamic Multiple Issue
626.9 Advanced Pipelining Extracting More
Performance
- Multiple Issue
- Static Multiple Issue
- Dynamic Multiple Issue
- Issues in Multiple Issue Pipelines
- Packaging instructions into issue slots
- Dealing with control and data hazards
- The Concept of Speculation
- The compiler or processor guesses about the
properties of an instruction, so as to enable
execution to begin for dependent instructions. - Any speculation mechanism must be able to check
_______________ ______________ and to undo
____________________________. - Speculation in software requires
________________________, speculation in hardware
usually requires __________________
_____________________________.
636.9 Advanced Pipelining Extracting More
Performance Static Multiple Issue
- The set of instructions that issue in a given
clock cycle is called an _____________________. - An issue packet is basically _____________________
______ with _____________________(_____). - An Example Static Multiple Issue with the MIPS
ISA - Two-issue
- _____________________
- ______________________
646.9 Advanced Pipelining Extracting More
Performance Static Two-Issue Datapath
656.9 Advanced Pipelining Extracting More
Performance Multiple-Issue Scheduling
Loop lw t0, 0(s1) addu t0, t0, s2
sw t0, 0(s1) addi s1, s1, -4
bne s1, zero, Loop
666.9 Advanced Pipelining Extracting More
Performance Multiple-Issue Scheduling
Loop lw t0, 0(s1) lw t1,
-4(s1) addu t0, t0, s2 addu
t1, t1, s2 sw t0, 0(s1) sw
t1, -4(s1) addi s1, s1, -8
bne s1, zero, Loop
676.9 Advanced Pipelining Extracting More
Performance Multiple-Issue Scheduling
Loop lw t0, 0(s1) sw t0, 0(s1)
lw t1, -4(s1) sw t1, -4(s1)
lw t2, -8(s1) t2, -8(s1) lw
t3, -12(s1) sw t3, -12(s1)
addu t0, t0, s2 addi s1, s1, -16
addu t1, t1, s2 bne s1, zero, Loop
addu t2, t2, s2 addu t3, t3,
s2
686.9 Advanced Pipelining Extracting More
Performance Dynamic Multiple-Issue
696.10 Real Stuff The Pentium 4 Pipeline
706.11 Fallacies and Pitfalls
- Fallacy - Pipelining is easy.
- Fallacy - Pipelining ideas can be implemented
independent of technology. - Pitfall Failure to consider instruction set
design can adversely impact pipelining