Pipelining Recap - PowerPoint PPT Presentation

1 / 29

About This Presentation

Title:

Pipelining Recap

Description:

The MIPS processor (DLX processor) needs 5 stages to ... WD. Data. Memory. ADDR. 5. Instruction. I. 32. M. U. X 2. RD. Instruction. Memory. ADDR. PC. 4. ADD ... – PowerPoint PPT presentation

Number of Views:24

Avg rating:3.0/5.0

Slides: 30

Provided by: mot112

Category:

Tags: pipelining | recap | wd

more less

Transcript and Presenter's Notes

Title: Pipelining Recap

1
Pipelining(Recap)
2
MIPS 5-stage pipeline

The MIPS processor (DLX processor) needs 5 stages
to execute instructions
Pipelining stages
IF - Instruction Fetch
ID - Instruction Decode
EX - Execute / Address Calculation
MEM - Memory Access (read / write)
WB - Write Back (results into register file)
Not all instructions need all the stages (e.g.,
add instruction does not need the MEM stage)

3
Basic MIPS Pipelined Processor
IF/ID
ID/EX
EX/MEM
MEM/WB
4
Pipelined Example - Executing Multiple
Instructions

Consider the following instruction sequence
lw r0, 10(r1)
sw sr3, 20(r4)
add r5, r6, r7
sub r8, r9, r10

5
Executing Multiple InstructionsClock Cycle 1
LW
6
Executing Multiple InstructionsClock Cycle 2
LW
SW
7
Executing Multiple InstructionsClock Cycle 3
LW
SW
ADD
8
Executing Multiple InstructionsClock Cycle 4
LW
SW
ADD
SUB
9
Executing Multiple InstructionsClock Cycle 5
LW
SW
ADD
SUB
10
Executing Multiple InstructionsClock Cycle 6
SW
ADD
SUB
11
Executing Multiple InstructionsClock Cycle 7
ADD
SUB
12
Executing Multiple InstructionsClock Cycle 8
SUB
13
Alternative View - Multicycle Diagram
14
Processor Pipelining

There are two ways that pipelining can help
Reduce the clock cycle time, and keep the same
CPI
Reduce the CPI, and keep the same clock cycle
time
CPU time Instruction count CPI Clock cycle
time

15
Reduce the clock cycle time, and keep the same CPI
CPI 1 Clock X Hz
16
Reduce the clock cycle time, and keep the same CPI
CPI 1 Clock X5 Hz
4
PC
ltlt2
Instruction
I
RD
ADDR
32
32
16
5
5
5
Instruction
Memory
RN1
RN2
WN
RD1
Register File
ALU
WD
RD2
ADDR
Data
RD
Memory
16
32
WD
17
Reduce the CPI, and keep the same cycle time
CPI 5 Clock X5 Hz
18
Reduce the CPI, and keep the same cycle time
CPI 1 Clock X5 Hz
19
Pipeline performance

Ideally we get a speedup (by reducing clock cycle
or reducing the CPI) equal to the number of
stages.
In practice, we do not achieve that but we get
close
Pipelining has additional overhead (e.g.,
pipeline registers)
Pipeline hazards

20
Pipeline Hazards

Hazards are situations in pipelining which
prevent the next instruction in the instruction
stream from executing during the designated clock
cycle.
Hazards reduce the ideal speedup gained from
pipelining (e.g., CPI 1) and are classified into
three classes
Structural hazards
Data hazards
Control hazards

21
Structural Hazards

If a resource conflict arises due to a hardware
resource being required by more than one
instruction in a single cycle, and one or more
such instructions cannot be accommodated, then a
structural hazard has occurred, for example
when a machine has only one register file write
port
or when a pipelined machine has a shared
single-memory pipeline for data and instructions.
stall the pipeline for one cycle for register
writes or memory data access

22
Register File/Structural Hazards
Operation on register set by 2 different
instructions in the same clock cycle
Time (clock cycles)
Cycle 1
Cycle 2
Cycle 3
Cycle 4
Cycle 6
Cycle 7
Cycle 5
I n s t r. O r d e r
Load
Reg
Reg
DMem
Instr 1
Reg
Reg
Reg
Reg
Instr 2
Instr 3
Reg
Reg
Ifetch
Reg
Reg
Instr 4
23
Register File/Structural Hazards
We need 3 stall cycles In order to solve this
hazard
Time (clock cycles)
Cycle 1
Cycle 2
Cycle 3
Cycle 4
Cycle 6
Cycle 7
Cycle 5
I n s t r. O r d e r
Load
Reg
Reg
DMem
Instr 1
Reg
Reg
Reg
Reg
Instr 2
Instr 3
3 stalls cycles
Instr 4
24
Register File/Structural Hazards
Allow writing registers in first ½ of cycle and
reading in 2nd ½ of cycle
Time (clock cycles)
Cycle 1
Cycle 2
Cycle 3
Cycle 4
Cycle 6
Cycle 7
Cycle 5
No stalls are required
I n s t r. O r d e r
Load
Reg
Reg
DMem
Instr 1
Reg
Reg
Reg
Reg
Instr 2
Instr 3
Reg
Reg
Ifetch
Reg
Reg
Instr 4
25
1 Memory Port/Structural Hazards
Time (in Cycles)
Operation on Memory by 2 different
instructions in the same clock cycle
Mem
Mem
Load
Instruction Order
Mem
Mem
Instruction1
Instruction2
Mem
Mem
Instruction3
Mem
Mem
Instruction4
Mem
Mem
26
Inserting Bubbles (Stalls)
Time (in Cycles)
Mem
Mem
Load
3 stall cycles with 1-port memory
Mem
Mem
Instruction1
Instruction2
Mem
Mem
Bubble
Bubble
Bubble
Bubble
Bubble
Stall
Bubble
Bubble
Bubble
Bubble
Bubble
Stall
Instruction3
Mem
Mem
27
2 Memory Port/Structural Hazards(Read Write at
the same time)
Time (in Cycles)
No stall with 2-memory ports
Mem
Mem
Load
Instruction Order
Mem
Mem
Instruction1
Instruction2
Mem
Mem
Instruction3
Mem
Mem
Instruction4
Mem
Mem
28
Performance of Pipelines with Stalls

Hazards in pipelines may make it necessary to
stall the pipeline by one or more cycles and thus
degrading performance from the ideal CPI of 1.
CPI pipelined Ideal CPI Pipeline stall
clock cycles per instruction
Speedup CPI unpipelined/(1Pipeline stall
cycles per instruction)
Speedup Pipeline depth/(1 Pipeline stall
cycles per instruction)

29
Example Dual-port vs. Single-port Memory

Machine A Dual ported memory (0 stalls)
Machine B Single ported memory (3 stalls), but
its pipelined implementation has a 1.05 times
faster clock rate
Ideal CPI 1 for both
Loads are 40 of instructions executed
SpeedUpA Pipeline Depth/(1 0) x
(clockunpipe/clockpipe)
Pipeline Depth
SpeedUpB Pipeline Depth/(1 0.4 x 3)
x (clockunpipe/(clockunpipe / 1.05)
(Pipeline Depth/2.2) x 1.05
0.48 x Pipeline Depth
SpeedUpA / SpeedUpB Pipeline
Depth/(0.48 x Pipeline Depth) 2.1
Machine A is 2.1 times faster