Recap (Pipelining) - PowerPoint PPT Presentation

About This Presentation
Title:

Recap (Pipelining)

Description:

Latency: Time from start to finish for one car. ... gained from pipelining (e.g., CPI =1) and are classified into three classes: ... – PowerPoint PPT presentation

Number of Views:24
Avg rating:3.0/5.0
Slides: 32
Provided by: Ham71
Category:
Tags: pipelining | recap

less

Transcript and Presenter's Notes

Title: Recap (Pipelining)


1
Recap(Pipelining)
2
What is Pipelining?
  • A way of speeding up execution of tasks
  • Key idea
  • overlap execution of multiple taks

3
Automobile Manufacturing
1. Build frame. 60 min.
2. Add engine. 50 min.
3. Build body. 80 min.
4. Paint. 40 min.
5. Finish. 45 min.
275 min.
Latency Time from start to finish for one car.
275 minutes per car.
(smaller is better)
Throughput Number of finished cars per time unit.
1 car/275 min 0.218 cars/hour
(larger is better)
Issues How can we make the process better by
adding?
4
An Assembly line
80
80
60
50
80
80
80
40
45
Last two stages only receive onecar/80 min to
work on.
Latency 400 min/car
Throughput 4 cars/640 min (1 car/160 min)
Will approach 1 car/80 min as time goes on
First two stagescant produce faster thanone
car/80 min or a backlog will occurat third stage.
5
Pipelining a Digital System
  • Key idea break big computation up into
    piecesSeparate each piece with a pipeline
    register

6
Pipelining a Digital System
  • Why do this? Because it's faster for repeated
    computations

7
Comments about pipelining
  • Pipelining increases throughput, but not latency
  • Answer available every 200ps, BUT
  • A single computation still takes 1ns
  • Limitations
  • Computations must be divisible into stages of
    equal sizes
  • Pipeline registers add overhead

8
Another Example
Unpipelined System
Delay 33ns Throughput 30MHz
Op1
Op2
Op3
??
Time
  • One operation must complete before next can begin
  • Operations spaced 33ns apart

9
3 Stage Pipelining
Delay 39ns Throughput 77MHz
Op1
  • Space operations 13ns apart
  • 3 operations occur simultaneously

Op2
Op3
Op4
Time
10
Limitation Nonuniform Pipelining
Delay 18 3 54 ns Throughput 55MHz
Clock
  • Throughput limited by slowest stage
  • Delay determined by clock period number of
    stages
  • Must attempt to balance stages

11
Limitation Deep Pipelines
Delay 48ns, Throughput 128MHz
  • Diminishing returns as add more pipeline stages
  • Register delays become limiting factor
  • Increased latency
  • Small throughput gains
  • More hazards

12
MIPSPipelining
13
MIPS 5-stage pipeline
  • The MIPS processor needs 5 stages to execute
    instructions
  • Pipelining stages
  • IF - Instruction Fetch
  • ID - Instruction Decode
  • EX - Execute / Address Calculation
  • MEM - Memory Access (read / write)
  • WB - Write Back (results into register file)
  • Not all instructions need all the stages (e.g.,
    add instruction does not need the MEM stage)

14
Basic MIPS Pipelined Processor
IF/ID
ID/EX
EX/MEM
MEM/WB
15
Pipelined Example - Executing Multiple
Instructions
  • Consider the following instruction sequence
  • lw r0, 10(r1)
  • sw sr3, 20(r4)
  • add r5, r6, r7
  • sub r8, r9, r10

16
Executing Multiple InstructionsClock Cycle 1
LW
17
Executing Multiple InstructionsClock Cycle 2
LW
SW
18
Executing Multiple InstructionsClock Cycle 3
LW
SW
ADD
19
Executing Multiple InstructionsClock Cycle 4
LW
SW
ADD
SUB
20
Executing Multiple InstructionsClock Cycle 5
LW
SW
ADD
SUB
21
Executing Multiple InstructionsClock Cycle 6
SW
ADD
SUB
22
Executing Multiple InstructionsClock Cycle 7
ADD
SUB
23
Executing Multiple InstructionsClock Cycle 8
SUB
24
Alternative View - Multicycle Diagram
25
Processor Pipelining
  • There are two ways that pipelining can help
  • Reduce the clock cycle time, and keep the same
    CPI
  • Reduce the CPI, and keep the same clock cycle
    time
  • CPU time Instruction count CPI Clock cycle
    time

26
Reduce the clock cycle time, and keep the same CPI
CPI 1 Clock X Hz
27
Reduce the clock cycle time, and keep the same CPI
CPI 1 Clock X5 Hz
4
PC
ltlt2
Instruction
I
RD
ADDR
32
32
16
5
5
5
Instruction
Memory
RN1
RN2
WN
RD1
Register File
ALU
WD
RD2
ADDR
Data
RD
Memory
16
32
WD
28
Reduce the CPI, and keep the same cycle time
CPI 5 Clock X5 Hz
29
Reduce the CPI, and keep the same cycle time
CPI 1 Clock X5 Hz
30
Pipeline performance
  • Ideally we get a speedup (by reducing clock cycle
    or reducing the CPI) equal to the number of
    stages.
  • In practice, we do not achieve that but we get
    close
  • Pipelining has additional overhead (e.g.,
    pipeline registers)
  • Pipeline hazards

31
Pipeline Hazards
  • Hazards are situations in pipelining which
    prevent the next instruction in the instruction
    stream from executing during the designated clock
    cycle.
  • Hazards reduce the ideal speedup gained from
    pipelining (e.g., CPI 1) and are classified into
    three classes
  • Structural hazards
  • Data hazards
  • Control hazards
Write a Comment
User Comments (0)
About PowerShow.com