Title: Pipelining Recap
1Pipelining(Recap)
2MIPS 5-stage pipeline
- The MIPS processor (DLX processor) needs 5 stages
to execute instructions - Pipelining stages
- IF - Instruction Fetch
- ID - Instruction Decode
- EX - Execute / Address Calculation
- MEM - Memory Access (read / write)
- WB - Write Back (results into register file)
- Not all instructions need all the stages (e.g.,
add instruction does not need the MEM stage)
3Basic MIPS Pipelined Processor
IF/ID
ID/EX
EX/MEM
MEM/WB
4Pipelined Example - Executing Multiple
Instructions
- Consider the following instruction sequence
- lw r0, 10(r1)
- sw sr3, 20(r4)
- add r5, r6, r7
- sub r8, r9, r10
5Executing Multiple InstructionsClock Cycle 1
LW
6Executing Multiple InstructionsClock Cycle 2
LW
SW
7Executing Multiple InstructionsClock Cycle 3
LW
SW
ADD
8Executing Multiple InstructionsClock Cycle 4
LW
SW
ADD
SUB
9Executing Multiple InstructionsClock Cycle 5
LW
SW
ADD
SUB
10Executing Multiple InstructionsClock Cycle 6
SW
ADD
SUB
11Executing Multiple InstructionsClock Cycle 7
ADD
SUB
12Executing Multiple InstructionsClock Cycle 8
SUB
13Alternative View - Multicycle Diagram
14Processor Pipelining
- There are two ways that pipelining can help
- Reduce the clock cycle time, and keep the same
CPI - Reduce the CPI, and keep the same clock cycle
time - CPU time Instruction count CPI Clock cycle
time
15Reduce the clock cycle time, and keep the same CPI
CPI 1 Clock X Hz
16Reduce the clock cycle time, and keep the same CPI
CPI 1 Clock X5 Hz
4
PC
ltlt2
Instruction
I
RD
ADDR
32
32
16
5
5
5
Instruction
Memory
RN1
RN2
WN
RD1
Register File
ALU
WD
RD2
ADDR
Data
RD
Memory
16
32
WD
17Reduce the CPI, and keep the same cycle time
CPI 5 Clock X5 Hz
18Reduce the CPI, and keep the same cycle time
CPI 1 Clock X5 Hz
19Pipeline performance
- Ideally we get a speedup (by reducing clock cycle
or reducing the CPI) equal to the number of
stages. - In practice, we do not achieve that but we get
close - Pipelining has additional overhead (e.g.,
pipeline registers) - Pipeline hazards
20Pipeline Hazards
- Hazards are situations in pipelining which
prevent the next instruction in the instruction
stream from executing during the designated clock
cycle. - Hazards reduce the ideal speedup gained from
pipelining (e.g., CPI 1) and are classified into
three classes - Structural hazards
- Data hazards
- Control hazards
21Structural Hazards
- If a resource conflict arises due to a hardware
resource being required by more than one
instruction in a single cycle, and one or more
such instructions cannot be accommodated, then a
structural hazard has occurred, for example - when a machine has only one register file write
port - or when a pipelined machine has a shared
single-memory pipeline for data and instructions. - stall the pipeline for one cycle for register
writes or memory data access
22Register File/Structural Hazards
Operation on register set by 2 different
instructions in the same clock cycle
Time (clock cycles)
Cycle 1
Cycle 2
Cycle 3
Cycle 4
Cycle 6
Cycle 7
Cycle 5
I n s t r. O r d e r
Load
Reg
Reg
DMem
Instr 1
Reg
Reg
Reg
Reg
Instr 2
Instr 3
Reg
Reg
Ifetch
Reg
Reg
Instr 4
23Register File/Structural Hazards
We need 3 stall cycles In order to solve this
hazard
Time (clock cycles)
Cycle 1
Cycle 2
Cycle 3
Cycle 4
Cycle 6
Cycle 7
Cycle 5
I n s t r. O r d e r
Load
Reg
Reg
DMem
Instr 1
Reg
Reg
Reg
Reg
Instr 2
Instr 3
3 stalls cycles
Instr 4
24Register File/Structural Hazards
Allow writing registers in first ½ of cycle and
reading in 2nd ½ of cycle
Time (clock cycles)
Cycle 1
Cycle 2
Cycle 3
Cycle 4
Cycle 6
Cycle 7
Cycle 5
No stalls are required
I n s t r. O r d e r
Load
Reg
Reg
DMem
Instr 1
Reg
Reg
Reg
Reg
Instr 2
Instr 3
Reg
Reg
Ifetch
Reg
Reg
Instr 4
251 Memory Port/Structural Hazards
Time (in Cycles)
Operation on Memory by 2 different
instructions in the same clock cycle
Mem
Mem
Load
Instruction Order
Mem
Mem
Instruction1
Instruction2
Mem
Mem
Instruction3
Mem
Mem
Instruction4
Mem
Mem
26Inserting Bubbles (Stalls)
Time (in Cycles)
Mem
Mem
Load
3 stall cycles with 1-port memory
Mem
Mem
Instruction1
Instruction2
Mem
Mem
Bubble
Bubble
Bubble
Bubble
Bubble
Stall
Bubble
Bubble
Bubble
Bubble
Bubble
Stall
Instruction3
Mem
Mem
272 Memory Port/Structural Hazards(Read Write at
the same time)
Time (in Cycles)
No stall with 2-memory ports
Mem
Mem
Load
Instruction Order
Mem
Mem
Instruction1
Instruction2
Mem
Mem
Instruction3
Mem
Mem
Instruction4
Mem
Mem
28Performance of Pipelines with Stalls
- Hazards in pipelines may make it necessary to
stall the pipeline by one or more cycles and thus
degrading performance from the ideal CPI of 1. - CPI pipelined Ideal CPI Pipeline stall
clock cycles per instruction - Speedup CPI unpipelined/(1Pipeline stall
cycles per instruction) - Speedup Pipeline depth/(1 Pipeline stall
cycles per instruction) -
29Example Dual-port vs. Single-port Memory
- Machine A Dual ported memory (0 stalls)
- Machine B Single ported memory (3 stalls), but
its pipelined implementation has a 1.05 times
faster clock rate - Ideal CPI 1 for both
- Loads are 40 of instructions executed
- SpeedUpA Pipeline Depth/(1 0) x
(clockunpipe/clockpipe) - Pipeline Depth
- SpeedUpB Pipeline Depth/(1 0.4 x 3)
x (clockunpipe/(clockunpipe / 1.05) - (Pipeline Depth/2.2) x 1.05
- 0.48 x Pipeline Depth
- SpeedUpA / SpeedUpB Pipeline
Depth/(0.48 x Pipeline Depth) 2.1 - Machine A is 2.1 times faster