Recap (Pipelining) - PowerPoint PPT Presentation

About This Presentation

Title:

Recap (Pipelining)

Description:

Latency: Time from start to finish for one car. ... gained from pipelining (e.g., CPI =1) and are classified into three classes: ... – PowerPoint PPT presentation

Number of Views:24

Avg rating:3.0/5.0

Slides: 32

Provided by: Ham71

Category:

more less

Transcript and Presenter's Notes

Title: Recap (Pipelining)

1
Recap(Pipelining)
2
What is Pipelining?

A way of speeding up execution of tasks
Key idea
overlap execution of multiple taks

3
Automobile Manufacturing
1. Build frame. 60 min.
2. Add engine. 50 min.
3. Build body. 80 min.
4. Paint. 40 min.
5. Finish. 45 min.
275 min.
Latency Time from start to finish for one car.
275 minutes per car.
(smaller is better)
Throughput Number of finished cars per time unit.
1 car/275 min 0.218 cars/hour
(larger is better)
Issues How can we make the process better by
adding?
4
An Assembly line
80
80
60
50
80
80
80
40
45
Last two stages only receive onecar/80 min to
work on.
Latency 400 min/car
Throughput 4 cars/640 min (1 car/160 min)
Will approach 1 car/80 min as time goes on
First two stagescant produce faster thanone
car/80 min or a backlog will occurat third stage.
5
Pipelining a Digital System

Key idea break big computation up into
piecesSeparate each piece with a pipeline
register

6
Pipelining a Digital System

Why do this? Because it's faster for repeated
computations

7
Comments about pipelining

Pipelining increases throughput, but not latency
Answer available every 200ps, BUT
A single computation still takes 1ns
Limitations
Computations must be divisible into stages of
equal sizes
Pipeline registers add overhead

8
Another Example
Unpipelined System
Delay 33ns Throughput 30MHz
Op1
Op2
Op3
??
Time

One operation must complete before next can begin
Operations spaced 33ns apart

9
3 Stage Pipelining
Delay 39ns Throughput 77MHz
Op1

Space operations 13ns apart
3 operations occur simultaneously

Op2
Op3
Op4
Time
10
Limitation Nonuniform Pipelining
Delay 18 3 54 ns Throughput 55MHz
Clock

Throughput limited by slowest stage
Delay determined by clock period number of
stages
Must attempt to balance stages

11
Limitation Deep Pipelines
Delay 48ns, Throughput 128MHz

Diminishing returns as add more pipeline stages
Register delays become limiting factor
Increased latency
Small throughput gains
More hazards

12
MIPSPipelining
13
MIPS 5-stage pipeline

The MIPS processor needs 5 stages to execute
instructions
Pipelining stages
IF - Instruction Fetch
ID - Instruction Decode
EX - Execute / Address Calculation
MEM - Memory Access (read / write)
WB - Write Back (results into register file)
Not all instructions need all the stages (e.g.,
add instruction does not need the MEM stage)

14
Basic MIPS Pipelined Processor
IF/ID
ID/EX
EX/MEM
MEM/WB
15
Pipelined Example - Executing Multiple
Instructions

Consider the following instruction sequence
lw r0, 10(r1)
sw sr3, 20(r4)
add r5, r6, r7
sub r8, r9, r10

16
Executing Multiple InstructionsClock Cycle 1
LW
17
Executing Multiple InstructionsClock Cycle 2
LW
SW
18
Executing Multiple InstructionsClock Cycle 3
LW
SW
ADD
19
Executing Multiple InstructionsClock Cycle 4
LW
SW
ADD
SUB
20
Executing Multiple InstructionsClock Cycle 5
LW
SW
ADD
SUB
21
Executing Multiple InstructionsClock Cycle 6
SW
ADD
SUB
22
Executing Multiple InstructionsClock Cycle 7
ADD
SUB
23
Executing Multiple InstructionsClock Cycle 8
SUB
24
Alternative View - Multicycle Diagram
25
Processor Pipelining

There are two ways that pipelining can help
Reduce the clock cycle time, and keep the same
CPI
Reduce the CPI, and keep the same clock cycle
time
CPU time Instruction count CPI Clock cycle
time

26
Reduce the clock cycle time, and keep the same CPI
CPI 1 Clock X Hz
27
Reduce the clock cycle time, and keep the same CPI
CPI 1 Clock X5 Hz
4
PC
ltlt2
Instruction
I
RD
ADDR
32
32
16
5
5
5
Instruction
Memory
RN1
RN2
WN
RD1
Register File
ALU
WD
RD2
ADDR
Data
RD
Memory
16
32
WD
28
Reduce the CPI, and keep the same cycle time
CPI 5 Clock X5 Hz
29
Reduce the CPI, and keep the same cycle time
CPI 1 Clock X5 Hz
30
Pipeline performance

Ideally we get a speedup (by reducing clock cycle
or reducing the CPI) equal to the number of
stages.
In practice, we do not achieve that but we get
close
Pipelining has additional overhead (e.g.,
pipeline registers)
Pipeline hazards

31
Pipeline Hazards

Hazards are situations in pipelining which
prevent the next instruction in the instruction
stream from executing during the designated clock
cycle.
Hazards reduce the ideal speedup gained from
pipelining (e.g., CPI 1) and are classified into
three classes
Structural hazards
Data hazards
Control hazards