CS 162 Computer Architecture Lecture 3: Pipelining Contd' - PowerPoint PPT Presentation

About This Presentation

Title:

CS 162 Computer Architecture Lecture 3: Pipelining Contd'

Description:

Next PC value is computed in the 3rd step, but we need to bring in next instn in ... Branch address is computed in 3rd stage. With pipeline, the PC value has changed! ... – PowerPoint PPT presentation

Number of Views:365

Avg rating:3.0/5.0

Slides: 22

Provided by: davep173

Learn more at: http://www.cs.ucr.edu

Category:

more less

Transcript and Presenter's Notes

Title: CS 162 Computer Architecture Lecture 3: Pipelining Contd'

1
CS 162 Computer Architecture Lecture 3
Pipelining Contd.

Instructor L.N. Bhuyan
www.cs.ucr.edu/bhuyan/cs162

2
Single Cycle Datapath (From Ch 5)
M u x
a d d
4
ltlt 2
PCSrc
MemWrite
2521
ReadReg1
Read Addr
P C
Readdata
Readdata1
Zero
ReadReg2
310
2016
A L U
Instruc- tion
Address
Readdata2
M u x
MemTo- Reg
WriteReg
M u x
Dmem
Imem
Regs
ALU- con
WriteData
WriteData
1511
M u x
RegDst
ALU- src
RegWrite
MemRead
150
ALUOp
3
Required Changes to Datapath

Introduce registers to separate 5 stages by
putting IF/ID, ID/EX, EX/MEM, and MEM/WB
registers in the datapath.
Next PC value is computed in the 3rd step, but we
need to bring in next instn in the next cycle
Move PCSrc Mux to 1st stage. The PC is
incremented unless there is a new branch address.
Branch address is computed in 3rd stage. With
pipeline, the PC value has changed! Must carry
the PC value along with instn. Width of IF/ID
register (IR)(PC) 64 bits.

4
Changes to Datapath Contd.

For lw instn, we need write register address at
stage 5. But the IR is now occupied by another
instn! So, we must carry the IR destination field
as we move along the stages. See connection in
fig.
Length of ID/EX register (Reg132)(Reg232)(of
fset32) (PC32) (destination register5)
133 bits
Assignment What are the lengths of EX/MEM, and
MEM/WB registers

5
Pipelined Datapath (with Pipeline Regs)(6.2)
Fetch Decode
Execute Memory
Write Back
0
M
u
x
1
IF/ID
EX/MEM
ID/EX
MEM/WB
A
d
d
A
d
d
4
A
d
d
r
e
s
u
l
t
S
h
i
f
t
l
e
f
t

2
R
e
a
d
n
o
r
e
g
i
s
t
e
r

1
i
A
d
d
r
e
s
s
P
C
t
R
e
a
d
c
u
d
a
t
a

1
r
t
R
e
a
d
s
Z
e
r
o
n
r
e
g
i
s
t
e
r

2
I
A
L
U
R
e
a
d
A
L
U
0
R
e
a
d
W
r
i
t
e
A
d
d
r
e
s
s
1
d
a
t
a

2
r
e
s
u
l
t
d
a
t
a
r
e
g
i
s
t
e
r
M
M
Imem
u
Regs
u
W
r
i
t
e
x
x
d
a
t
a
1
0
W
r
i
t
e
Dmem
d
a
t
a
3
2
1
6
S
i
g
n
e
x
t
e
n
d
5
69 bits
64 bits
133 bits
102 bits
6
Pipelined Control (6.3)

Start with single-cycle controller
Group control lines by pipeline stage needed
Extend pipeline registers with control bits

W
B
I
n
s
t
r
u
c
t
i
o
n
Mem
W
B
C
o
n
t
r
o
l
E
X
W
B
Mem
MemToRegRegWrite
Branch MemReadMemWrite
I
F
/
I
D
I
D
/
E
X
E
X
/
M
E
M
M
E
M
/
W
B
7
Pipelined Processor Datapath Control

More work to correctly handle pipeline hazards

PCSrc
I
D
/
E
X
0
M
W
B
u
E
X
/
M
E
M
x
1
C
o
n
t
r
o
l
M
W
B
M
E
M
/
W
B
E
X
M
W
B
I
F
/
I
D
A
d
d
A
d
d
4
A
d
d
r
e
s
u
l
t
Branch
RegWrite
S
h
i
f
t
l
e
f
t

2
ALUSrc
MemWrite
MemToReg
n
R
e
a
d
o
i
r
e
g
i
s
t
e
r

1
t
P
C
A
d
d
r
e
s
s
c
R
e
a
d
u
r
d
a
t
a

1
t
R
e
a
d
s
n
Z
e
r
o
r
e
g
i
s
t
e
r

2
I
A
L
U
R
e
a
d
A
L
U
Imem
0
R
e
a
d
W
r
i
t
e
d
a
t
a

2
r
e
s
u
l
t
1
A
d
d
r
e
s
s
d
a
t
a
r
e
g
i
s
t
e
r
M
M
Regs
u
u
W
r
i
t
e
x
x
d
a
t
a
Dmem
1
0
W
r
i
t
e
d
a
t
a
I
n
s
t
r
u
c
t
i
o
n
1
6
3
2
6

1
5

0

MemRead
S
i
g
n
A
L
U
e
x
t
e
n
d
c
o
n
t
r
o
l
I
n
s
t
r
u
c
t
i
o
n

2
0

1
6

ALUOp
0
M
u
I
n
s
t
r
u
c
t
i
o
n
x

1
5

1
1

1
RegDst
8
Recap

if can keep all pipeline stages busy, can retire
(complete) up to one instruction per clock cycle
(thereby achieving single-cycle throughput)
The pipeline paradox (for MIPS) any instruction
still takes 5 cycles to execute (even though can
retire one instruction per cycle)

9
Problems for Pipelining

Hazards prevent next instruction from executing
during its designated clock cycle, limiting
speedup
Structural hazards HW cannot support this
combination of instructions (single memory for
instruction and data)
Data hazards Instruction depends on result of
prior instruction still in the pipeline
Control hazards conditional branches other
instructions may stall the pipeline delaying
later instructions

10
Single Memory is a Structural Hazard
Time (clock cycles)
I n s t r. O r d e r
Reg
M
Reg
Load
Instr 1
Instr 2
M
Reg
M
Reg
Instr 3
Instr 4

Cant read same memory twice in same clock cycle

11
EX MIPS multicycle datapath Structural Hazard
in Memory
PC
Instruction Register
ReadReg1
Address
Memory
A
Readdata 1
ReadReg2
A L U
Instruction or Data
ALU- Out
Registers
B
Readdata 2
WriteReg
Data
MemoryData Register
Data
12
Structural Hazards limit performance

Example if 1.3 memory accesses per instruction
(30 of instructions execute loads and
stores)and only one memory access per cycle then
Average CPI ? 1.3
Otherwise datapath resource is more than 100
utilized

Structural Hazard Solution Add more Hardware
13
Speed Up Equation for Pipelining

CPIpipelined Ideal CPI Pipeline stall clock
cycles per instn
Speedup Ideal CPI x Pipeline depth Clock
Cycleunpipelined
-------------------------------
--- X -------------------------
Ideal CPI Pipeline stall CPI
Clock Cyclepipelined
Speedup Pipeline depth Clock
Cycleunpipelined
------------------------ X
---------------------------
1 Pipeline stall CPI Clock
Cyclepipelined

x
14
Example Dual-port vs. Single-port

Machine A Dual ported memory
Machine B Single ported memory, but its
pipelined implementation has a 1.05 times faster
clock rate
Ideal CPI 1 for both
Loads are 40 of instructions executed
SpeedUpA Pipeline Depth/(1 0) x
(clockunpipe/clockpipe)
Pipeline Depth
SpeedUpB Pipeline Depth/(1 0.4 x 1)
x (clockunpipe/(clockunpipe / 1.05)
(Pipeline Depth/1.4) x 1.05
0.75 x Pipeline Depth
SpeedUpA / SpeedUpB Pipeline Depth/(0.75 x
Pipeline Depth) 1.33
Machine A is 1.33 times faster

15
Data Hazard on Register 1 (6.4)
add 1 ,2, 3
sub 4, 1 ,3
and 6, 1 ,7
or 8, 1 ,9
xor 10, 1 ,11
16
Data Hazard Solution

Forward result from one stage to another
or OK if implement register file properly

Time (clock cycles)
I n s t r. O r d e r
IF
ID/RF
EX
MEM
WB
add 1,2,3
Reg
Reg
ALU
IM
DM
sub 4,1,3
DM
Reg
Reg
DM
Reg
and 6,1,7
Reg
IM
DM
Reg
Reg
or 8,1,9
ALU
xor 10,1,11
17
Hazard Detection for Forwarding

A hazard must be detected just before execution
so that in case of hazard, the data can be
forwarded to the input of the ALU.
It can be detected when a source register (Rs or
Rt or both) of the instruction at the EX stage
is equal to the destination register (Rd) of an
instruction in the pipeline (either in MEM or WB
stage)
Compare the values of Rs and Rt registers in the
ID/EX stage with Rd at EX/MEM and MEM/WB stages
gt Need to carry Rs, Rt, Rd values to the ID/EX
register from the IF/ID register (only Rd was
carried before)
If they match, forward the data to the input of
the ALU through the multiplexor.
See Fig. 6.43 pp. 488 of the text

18
Forwarding What about Loads?

Dependencies backward in time are
hazards
Cant solve with forwarding alone
Must stall instruction dependent on load
Load-Use hazard

IF
ID/RF
EX
MEM
WB
lw 1,0(2)
Reg
Reg
ALU
IM
DM
sub 4,1,3
DM
Reg
Reg
19
Data Hazard Even with Forwarding

Must stall pipeline 1 cycle (insert 1 bubble)

Time (clock cycles)
IF
ID/RF
EX
MEM
WB
lw 1, 0(2)
Reg
Reg
ALU
IM
DM
sub 4,1,6
DM
Reg
Reg
DM
Reg
Reg
and 6,1,7
or 8,1,9
IM
Reg
DM
ALU
20
Compiler Schemes to Improve Load Delay

Compiler will detect data dependency and inserts
nop instructions until data is available
sub 2, 1, 3
nop
and 12, 2, 5
or 13, 6, 2
add 14, 2, 2
sw 15, 100(2)
Compiler will find independent instructions to
fill in the delay slots

21
Software Scheduling to Avoid Load Hazards
Try producing fast code for a b c d e
f assuming a, b, c, d ,e, and f in memory.
Slow code LW Rb,b LW Rc,c ADD
Ra,Rb,Rc SW a,Ra LW Re,e LW
Rf,f SUB Rd,Re,Rf SW d,Rd