Title: Recap (Pipelining)
1Recap(Pipelining)
2What is Pipelining?
- A way of speeding up execution of tasks
- Key idea
- overlap execution of multiple taks
3Automobile Manufacturing
1. Build frame. 60 min.
2. Add engine. 50 min.
3. Build body. 80 min.
4. Paint. 40 min.
5. Finish. 45 min.
275 min.
Latency Time from start to finish for one car.
275 minutes per car.
(smaller is better)
Throughput Number of finished cars per time unit.
1 car/275 min 0.218 cars/hour
(larger is better)
Issues How can we make the process better by
adding?
4An Assembly line
80
80
60
50
80
80
80
40
45
Last two stages only receive onecar/80 min to
work on.
Latency 400 min/car
Throughput 4 cars/640 min (1 car/160 min)
Will approach 1 car/80 min as time goes on
First two stagescant produce faster thanone
car/80 min or a backlog will occurat third stage.
5Pipelining a Digital System
- Key idea break big computation up into
piecesSeparate each piece with a pipeline
register
6Pipelining a Digital System
- Why do this? Because it's faster for repeated
computations
7Comments about pipelining
- Pipelining increases throughput, but not latency
- Answer available every 200ps, BUT
- A single computation still takes 1ns
- Limitations
- Computations must be divisible into stages of
equal sizes - Pipeline registers add overhead
8Another Example
Unpipelined System
Delay 33ns Throughput 30MHz
Op1
Op2
Op3
??
Time
- One operation must complete before next can begin
- Operations spaced 33ns apart
93 Stage Pipelining
Delay 39ns Throughput 77MHz
Op1
- Space operations 13ns apart
- 3 operations occur simultaneously
Op2
Op3
Op4
Time
10Limitation Nonuniform Pipelining
Delay 18 3 54 ns Throughput 55MHz
Clock
- Throughput limited by slowest stage
- Delay determined by clock period number of
stages - Must attempt to balance stages
11Limitation Deep Pipelines
Delay 48ns, Throughput 128MHz
- Diminishing returns as add more pipeline stages
- Register delays become limiting factor
- Increased latency
- Small throughput gains
- More hazards
12MIPSPipelining
13MIPS 5-stage pipeline
- The MIPS processor needs 5 stages to execute
instructions - Pipelining stages
- IF - Instruction Fetch
- ID - Instruction Decode
- EX - Execute / Address Calculation
- MEM - Memory Access (read / write)
- WB - Write Back (results into register file)
- Not all instructions need all the stages (e.g.,
add instruction does not need the MEM stage)
14Basic MIPS Pipelined Processor
IF/ID
ID/EX
EX/MEM
MEM/WB
15Pipelined Example - Executing Multiple
Instructions
- Consider the following instruction sequence
- lw r0, 10(r1)
- sw sr3, 20(r4)
- add r5, r6, r7
- sub r8, r9, r10
16Executing Multiple InstructionsClock Cycle 1
LW
17Executing Multiple InstructionsClock Cycle 2
LW
SW
18Executing Multiple InstructionsClock Cycle 3
LW
SW
ADD
19Executing Multiple InstructionsClock Cycle 4
LW
SW
ADD
SUB
20Executing Multiple InstructionsClock Cycle 5
LW
SW
ADD
SUB
21Executing Multiple InstructionsClock Cycle 6
SW
ADD
SUB
22Executing Multiple InstructionsClock Cycle 7
ADD
SUB
23Executing Multiple InstructionsClock Cycle 8
SUB
24Alternative View - Multicycle Diagram
25Processor Pipelining
- There are two ways that pipelining can help
- Reduce the clock cycle time, and keep the same
CPI - Reduce the CPI, and keep the same clock cycle
time - CPU time Instruction count CPI Clock cycle
time
26Reduce the clock cycle time, and keep the same CPI
CPI 1 Clock X Hz
27Reduce the clock cycle time, and keep the same CPI
CPI 1 Clock X5 Hz
4
PC
ltlt2
Instruction
I
RD
ADDR
32
32
16
5
5
5
Instruction
Memory
RN1
RN2
WN
RD1
Register File
ALU
WD
RD2
ADDR
Data
RD
Memory
16
32
WD
28Reduce the CPI, and keep the same cycle time
CPI 5 Clock X5 Hz
29Reduce the CPI, and keep the same cycle time
CPI 1 Clock X5 Hz
30Pipeline performance
- Ideally we get a speedup (by reducing clock cycle
or reducing the CPI) equal to the number of
stages. - In practice, we do not achieve that but we get
close - Pipelining has additional overhead (e.g.,
pipeline registers) - Pipeline hazards
31Pipeline Hazards
- Hazards are situations in pipelining which
prevent the next instruction in the instruction
stream from executing during the designated clock
cycle. - Hazards reduce the ideal speedup gained from
pipelining (e.g., CPI 1) and are classified into
three classes - Structural hazards
- Data hazards
- Control hazards