Pipelining Datapath - PowerPoint PPT Presentation

1 / 47
About This Presentation
Title:

Pipelining Datapath

Description:

Pipelining Datapath. Adapted from the lecture notes of Dr. John ... Hardware design. Control Hazard. Decision based on results. Data Hazard. Data Dependency ... – PowerPoint PPT presentation

Number of Views:44
Avg rating:3.0/5.0
Slides: 48
Provided by: sid50
Category:

less

Transcript and Presenter's Notes

Title: Pipelining Datapath


1
Pipelining Datapath
  • Adapted from the lecture notes of Dr. John
    Kubiatowicz (UC Berkeley)
  • and Hank Walker (TAMU)

2
Pipelining is Natural!
  • Laundry Example
  • Ann, Brian, Cathy, Dave each have one load of
    clothes to wash, dry, and fold
  • Washer takes 30 minutes
  • Dryer takes 40 minutes
  • Folder takes 20 minutes

3
Sequential Laundry
6 PM
Midnight
7
8
9
11
10
Time
30
40
20
30
40
20
30
40
20
30
40
20
T a s k O r d e r
  • Sequential laundry takes 6 hours for 4 loads

4
Pipelined Laundry Start work ASAP
6 PM
Midnight
7
8
9
11
10
Time
T a s k O r d e r
  • Pipelined laundry takes 3.5 hours for 4 loads

5
Pipelining Lessons
  • Latency vs. Throughput
  • Question
  • What is the latency in both cases ?
  • What is the throughput in both cases ?

Pipelining doesnt help latency of single task,
it helps throughput of entire workload
6
Pipelining Lessons contd
  • Question
  • What is the fastest operation in the example ?
  • What is the slowest operation in the example

Pipeline rate limited by slowest pipeline stage
7
Pipelining Lessons contd
Multiple tasks operating simultaneously using
different resources
8
Pipelining Lessons contd
  • Question
  • Would the speedup increase if we had more steps ?

Potential Speedup Number of pipe stages
9
Pipelining Lessons contd
  • Washer takes 30 minutes
  • Dryer takes 40 minutes
  • Folder takes 20 minutes
  • Question
  • Will it affect if Folder also took 40 minutes

Unbalanced lengths of pipe stages reduces speedup
10
Pipelining Lessons contd
Time to fill pipeline and time to drain it
reduces speedup
11
Five Stages of an Instruction
Cycle 1
Cycle 2
Cycle 3
Cycle 4
Cycle 5
Load
  • Ifetch Instruction Fetch
  • Fetch the instruction from the Instruction Memory
  • Reg/Dec Registers Fetch and Instruction Decode
  • Exec Calculate the memory address
  • Mem Read the data from the Data Memory
  • Wr Write the data back to the register file

12
Conventional Pipelined Execution Representation
Time
Program Flow
13
Example
14
Example contd
  • Timepipeline Timenon-pipeline / Pipe stages
  • Assumptions
  • Stages are perfectly balanced
  • Ideal conditions
  • Ideally, speedup 8/5 1.6
  • Most cases are not ideal !!!


15
Example contd
  • Speedup in this case 24/14 1.7
  • Lets add 1000 more instructions
  • Time (non-pipelined) 1000 x 8 24 ns 8000 ns
  • Time (pipelined) 1000 x 2 14 ns 2014 ns
  • Speedup 8000 / 2014 3.98 4 (approx) 8/2

Instruction throughput is important metric (as
opposed to individual instruction) as real
programs execute billions of instructions in
practical case !!!
16
Pipeline Hazards
  • Structural Hazard

Program Flow
17
Pipeline Hazard contd
  • Control Hazard
  • Example
  • add 4, 5, 6
  • beq 1, 2, 40
  • lw 3, 300(0)

18
Pipleline Hazard contd
  • Data Hazards
  • Example
  • add s0, t0, t1
  • sub t2, s0, t3

19
Summary Pipelining Lessons
  • Pipelining doesnt help latency of single task,
    it helps throughput of entire workload
  • Pipeline rate limited by slowest pipeline stage
  • Multiple tasks operating simultaneously using
    different resources
  • Potential speedup Number pipe stages
  • Unbalanced lengths of pipe stages reduces speedup
  • Time to fill pipeline and time to drain it
    reduces speedup
  • Stall for Dependences

6 PM
7
8
9
Time
T a s k O r d e r
20
Summary of Pipeline Hazards
  • Structural Hazards
  • Hardware design
  • Control Hazard
  • Decision based on results
  • Data Hazard
  • Data Dependency

21
Control Signals for existing Datapath
The Right to Left Control can lead to hazards
22
Place registers between each step

23
Example
10 lw r1, r2(35) 14 addI r2, r2, 3 20 sub r3,
r4, r5 24 beq r6, r7, 100 30 ori r8, r9,
17 34 add r10, r11, r12 100 and r13, r14, 15
24
Start Fetch 10
n
n
n
n
Inst. Mem
Decode
WB Ctrl
Mem Ctrl
IR
im
rs
rt
Reg. File
Reg File
Exec
Mem Access
Data Mem
10 lw r1, r2(35) 14 addI r2, r2, 3 20 sub r3,
r4, r5 24 beq r6, r7, 100 30 ori r8, r9,
17 34 add r10, r11, r12 100 and r13, r14, 15
25
Fetch 14, Decode 10
n
n
n
lw r1, r2(35)
Inst. Mem
Decode
WB Ctrl
Mem Ctrl
IR
im
2
rt
Reg. File
Reg File
Exec
Mem Access
Data Mem
10 lw r1, r2(35) 14 addI r2, r2, 3 20 sub r3,
r4, r5 24 beq r6, r7, 100 30 ori r8, r9,
17 34 add r10, r11, r12 100 and r13, r14, 15
26
Fetch 20, Decode 14, Exec 10
n
n
addI r2, r2, 3
Inst. Mem
Decode
WB Ctrl
lw r1
Mem Ctrl
IR
35
2
rt
Reg. File
Reg File
r2
Exec
Mem Access
Data Mem
EX
10 lw r1, r2(35) 14 addI r2, r2, 3 20 sub r3,
r4, r5 24 beq r6, r7, 100 30 ori r8, r9,
17 34 add r10, r11, r12 100 and r13, r14, 15
27
Fetch 24, Decode 20, Exec 14, Mem 10
n
sub r3, r4, r5
addI r2, r2, 3
Inst. Mem
Decode
WB Ctrl
lw r1
Mem Ctrl
IR
3
4
5
Reg. File
Reg File
r2
r235
Exec
Mem Access
Data Mem
M
10 lw r1, r2(35) 14 addI r2, r2, 3 20 sub r3,
r4, r5 24 beq r6, r7, 100 30 ori r8, r9,
17 34 add r10, r11, r12 100 and r13, r14, 15
28
Fetch 30, Dcd 24, Ex 20, Mem 14, WB 10
beq r6, r7 100
Inst. Mem
Decode
WB Ctrl
addI r2
lw r1
sub r3
Mem Ctrl
IR
6
7
Reg. File
Reg File
r4
Mr235
r23
Exec
Mem Access
Data Mem
10 lw r1, r2(35) 14 addI r2, r2, 3 20 sub r3,
r4, r5 24 beq r6, r7, 100 30 ori r8, r9,
17 34 add r10, r11, r12 100 and r13, r14, 15
29
Fetch 100, Dcd 30, Ex 24, Mem 20, WB 14
ori r8, r9 17
Inst. Mem
Decode
WB Ctrl
addI r2
sub r3
Mem Ctrl
beq
IR
9
xx
100
r1Mr235
Reg. File
Reg File
r6
r23
r4-r5
Exec
Mem Access
Data Mem
10 lw r1, r2(35) 14 addI r2, r2, 3 20 sub r3,
r4, r5 24 beq r6, r7, 100 30 ori r8, r9,
17 34 add r10, r11, r12 100 and r13, r14, 15
WB
M
30
Pipelining Load Instruction
Cycle 1
Cycle 2
Cycle 3
Cycle 4
Cycle 5
Cycle 6
Cycle 7
Clock
2nd lw
3rd lw
  • The five independent functional units in the
    pipeline datapath are
  • Instruction Memory for the Ifetch stage
  • Register Files Read ports (bus A and busB) for
    the Reg/Dec stage
  • ALU for the Exec stage
  • Data Memory for the Mem stage
  • Register Files Write port (bus W) for the Wr
    stage

31
Pipelining the R Instruction
Cycle 1
Cycle 2
Cycle 3
Cycle 4
R-type
  • Ifetch Instruction Fetch
  • Fetch the instruction from the Instruction Memory
  • Reg/Dec Registers Fetch and Instruction Decode
  • Exec
  • ALU operates on the two register operands
  • Update PC
  • Wr Write the ALU output back to the register file

32
Pipelingng Both L and R type
Cycle 1
Cycle 2
Cycle 3
Cycle 4
Cycle 5
Cycle 6
Cycle 7
Cycle 8
Cycle 9
Ops! We have a problem!
R-type
R-type
Load
R-type
R-type
  • We have pipeline conflict or structural hazard
  • Two instructions try to write to the register
    file at the same time!
  • Only one write port

33
Important Observations
  • Each functional unit can only be used once per
    instruction
  • Each functional unit must be used at the same
    stage for all instructions
  • Load uses Register Files Write Port during its
    5th stage
  • R-type uses Register Files Write Port during its
    4th stage

34
Solution
  • Delay R-types register write by one cycle
  • Now R-type instructions also use Reg Files write
    port at Stage 5
  • Mem stage is a NOOP stage nothing is being done.

4
1
2
3
5
Exec
Mem
R-type
Cycle 1
Cycle 2
Cycle 3
Cycle 4
Cycle 5
Cycle 6
Cycle 7
Cycle 8
Cycle 9
R-type
Load
R-type
R-type
35
Datapath (Without Pipeline)
IR lt- MemPC PC lt PC4
A lt- Rrs Blt Rrt
S lt A B
S lt A SX
S lt A or ZX
S lt A SX
If Cond PC lt PCSX
M lt MemS
MemS lt- B
Rrd lt S
Rrd lt M
Rrt lt S
Equal
Reg. File
Reg File
Exec
PC
IR
Next PC
Inst. Mem
Mem Access
Data Mem
36
Datapath (With Pipeline)
IR lt- MemPC PC lt PC4
A lt- Rrs Blt Rrt
S lt A B
S lt A SX
S lt A or ZX
S lt A SX
if Cond PC lt PCSX
M lt MemS
MemS lt- B
M lt S
M lt S
Rrd lt M
Rrd lt M
Rrt lt M
Equal
Reg. File
Reg File
S
Exec
PC
IR
Next PC
Inst. Mem
Mem Access
Data Mem
37
Structural Hazard and Solution
Time (clock cycles)
I n s t r. O r d e r
Load
Mem
Reg
Reg
Instr 1
Instr 2
Mem
Mem
Reg
Reg
Instr 3
Instr 4
38
Control Hazard - 1 Stall
  • Stall wait until decision is clear
  • Impact 2 lost cycles (i.e. 3 clock cycles per
    branch instruction) gt slow

39
Control Hazard 2 Predict
  • Predict guess one direction then back up if
    wrong
  • Impact 0 lost cycles per branch instruction if
    right, 1 if wrong (right 50 of time)
  • More dynamic scheme history of 1 branch

40
Control Hazard - 3 Delayed Branch
  • Delayed Branch Redefine branch behavior (takes
    place after next instruction)
  • Impact 0 clock cycles per branch instruction if
    can find instruction to put in slot ( 50 of
    time)

41
Data Hazards (RAW)
  • Dependencies backwards in time are hazards

Time (clock cycles)
IF
ID/RF
EX
MEM
WB
add r1,r2,r3
Reg
Reg
ALU
Im
Dm
I n s t r. O r d e r
sub r4,r1,r3
Dm
Reg
Reg
Dm
Reg
and r6,r1,r7
Reg
Im
Dm
Reg
Reg
or r8,r1,r9
ALU
xor r10,r1,r11
42
Data Hazards contd
  • Forward result from one stage to another

Time (clock cycles)
IF
ID/RF
EX
MEM
WB
add r1,r2,r3
Reg
Reg
ALU
Im
Dm
I n s t r. O r d e r
sub r4,r1,r3
Dm
Reg
Reg
Dm
Reg
and r6,r1,r7
Reg
Im
Dm
Reg
Reg
or r8,r1,r9
ALU
xor r10,r1,r11
43
Data Hazards contd
  • Dependencies backwards in time are
    hazards
  • Cant solve with forwarding
  • Must delay/stall instruction dependent on loads

Time (clock cycles)
IF
ID/RF
EX
MEM
WB
lw r1,0(r2)
Reg
Reg
ALU
Im
Dm
Stall
sub r4,r1,r3
44
Hazard Detection
I-Fet ch DCD MemOpFetch OpFetch
Exec Store
IFetch DCD
Structural Hazard
I-Fet ch DCD OpFetch Jump
Control Hazard
IFetch DCD
IF DCD EX Mem WB
RAW (read after write) Data Hazard
IF DCD EX Mem
WB
WAW Data Hazard (write after write)
IF DCD EX Mem WB
IF DCD
OF Ex Mem
IF DCD OF Ex RS
WAR Data Hazard (write after read)
45
Hazard Detection
  • Suppose instruction i is about to be issued and
    a predecessor instruction j is in the
    instruction pipeline.
  • A RAW hazard exists on register ??if ????Rregs( i
    ) ??Wregs( j )
  • A WAW hazard exists on register ??if ????Wregs( i
    ) ??Wregs( j )
  • A WAR hazard exists on register ??if ????Wregs( i
    ) ??Rregs( j )

46
Computing CPI
  • Start with Base CPI
  • Add stalls
  • Suppose
  • CPIbase1
  • Freqbranch20, freqload30
  • Suppose branches always cause 1 cycle stall
  • Loads cause a 2 cycle stall
  • Then CPI 1 (1?0.20)(2 ? 0.30) 1.8

47
Summary
  • Control Signals need to be propagated
  • Insert Registers between every stage to
    remember and propagate values
  • Solutions to Control Hazard are Stall, Predict
    and Delayed Branch
  • Solutions to Data Hazard is Forwarding
  • Effective CPI CPIideal CPIstall
Write a Comment
User Comments (0)
About PowerShow.com