Chapter Six Enhancing Performance with Pipelining - PowerPoint PPT Presentation

1 / 43
About This Presentation
Title:

Chapter Six Enhancing Performance with Pipelining

Description:

Enhancing Performance with Pipelining Definition Pipeline is an implementation technique in which multiple instructions are overlapped in execution. – PowerPoint PPT presentation

Number of Views:176
Avg rating:3.0/5.0
Slides: 44
Provided by: TodA1
Category:

less

Transcript and Presenter's Notes

Title: Chapter Six Enhancing Performance with Pipelining


1
Chapter SixEnhancing Performance with Pipelining
2
Definition
  • Pipeline is an implementation technique in which
    multiple instructions are overlapped in
    execution.
  • Well use a laundry analogy for pipelining to
    explain the main concepts.
  • There are four stages in doing the laundry
  • put dirty clothes to the washer (wash)
  • placed washed clothes in the dryer (dry)
  • place the dry load on the table and fold (fold)
  • put clothes away (store)
  • What about the MIPS instruction?

3
Single-Cycle vs Pipelined Performance
  • Look at lw, sw, add, sub,and, or, slt and beq.
  • Operation time for major functional components
  • 200ps for memory access
  • 200ps for ALU operation
  • 100ps for register file read or write
  • Total execution time for 3 instructions
  • 3x800ps2.4 ns for a single-cycled,non-pipelined
    processor
  • 1.4 ns (see Figure in next page) for a pipelined
    processor
  • Total execution time for 1003 instructions
  • 1000x800ps 2400 ps 802.4 ns for a
    single-cycled,non-pipelined processor
  • 1000x200ps 1400 ps 201.4 ns for a pipelined
    processor
  • Speedup is less than the number of stages
    because
  • stages may be imperfectly balanced
  • overhead involved

4
Pipelining
  • Improve performance by increasing instruction
    throughput
  • Each instruction still take the same
    time to execute
  • Ideal speedup is number of stages in the
    pipeline. Do we achieve this?

P
r
o
g
r
a
m
e
x
e
c
u
t
i
o
n
o
r
d
e
r
(
i
n

i
n
s
t
r
u
c
t
i
o
n
s
)
I
n
s
t
r
u
c
t
i
o
n
D
a
t
a
2

n
s
R
e
g
A
L
U
R
e
g
f
e
t
c
h
a
c
c
e
s
s
2

n
s
2

n
s
2

n
s
2

n
s
2

n
s
5
Pipelining in MIPS- What makes it easy
  • All instructions are the same length instruction
    fetch (1st pipeline stage) and decoding(2nd
    stage) are much easier
  • MIPS has just a few instruction formats, source
    register field in the same location gt register
    file read and instruction decoding can be done at
    the same time
  • Memory operands appear only in loads and stores
    (as opposed to 80x86, where we could operate on
    the operands in memory)
  • Operands must be aligned in memory need not
    worry about a single data transfer instruction
    requiring two data memory accesses.

6
Pipelining in MIPS- What makes it hard?
  • Structural hazards suppose we had only one
    memory
  • Control hazards need to worry about branch
    instructions
  • Data hazards an instruction depends on a
    previous instruction

7
Structural Hazards
  • If we have a fourth instruction in the following
    figure?
  • What happens between time 6
    and 8 ns?

P
r
o
g
r
a
m
e
x
e
c
u
t
i
o
n
o
r
d
e
r
(
i
n

i
n
s
t
r
u
c
t
i
o
n
s
)
I
n
s
t
r
u
c
t
i
o
n
D
a
t
a
2

n
s
R
e
g
A
L
U
R
e
g
f
e
t
c
h
a
c
c
e
s
s
2

n
s
2

n
s
2

n
s
2

n
s
2

n
s
8
Control Hazards
  • Possible solution
  • stall to pause before continuing the pipeline,
    not efficient if we have a long pipeline
  • pipeline stall is also known as bubble

P
r
o
g
r
a
m
e
x
e
c
u
t
i
o
n
2
4
6
8
1
0
1
2
1
4
1
6
o
r
d
e
r
(
i
n

i
n
s
t
r
u
c
t
i
o
n
s
)
The above figure assumes that we have extra
hardware in place to resolve the branch in the
second stage. Otherwise the pause will be longer
than 4ns.
9
Control Hazards
  • Another solution Predict

1
0
1
2
1
4
P
r
o
g
r
a
m
e
x
e
c
u
t
i
o
n
o
r
d
e
r
(
i
n

i
n
s
t
r
u
c
t
i
o
n
s
)
2

n
s
b
u
b
b
l
e
b
u
b
b
l
e
b
u
b
b
l
e
b
u
b
b
l
e
b
u
b
b
l
e
I
n
s
t
r
u
c
t
i
o
n
D
a
t
a
R
e
g
A
L
U
R
e
g
f
e
t
c
h
a
c
c
e
s
s
4

n
s
10
Control Hazards
  • Delayed branch

P
r
o
g
r
a
m
e
x
e
c
u
t
i
o
n
0
1
2
1
4
o
r
d
e
r
(
i
n

i
n
s
t
r
u
c
t
i
o
n
s
)
(
D
e
l
a
y
e
d

b
r
a
n
c
h

s
l
o
t
)
2

n
s
11
Data Hazards
  • Look at the following example add s0, t0,
    t1 sub t2, s0, t3
  • We need the result s0 from the add instruction
    to do the subtraction.
  • Is the data ready?
  • Compiler cannot handle this issue
  • Solution forwarding or bypassing, i.e., getting
    the missing item early from the internal
    resources.

12
Graphical representation of the instruction
pipeline
  • IF instruction fetch
  • ID instruction decode
  • EX execution
  • MEM memory access
  • WB write back
  • Shading element used, White element not used
  • Right-shading read, Left-Shading write

2
4
6
8
1
0
T
i
m
e
I
F
I
D
E
X
M
E
M
a
d
d


s
0
,


t
0
,


t
1
W
B
13
Forwarding
  • As soon as ALU add is finished, forward the
    result

P
r
o
g
r
a
m
e
x
e
c
u
t
i
o
n
2
4
6
8
1
0
o
r
d
e
r
T
i
m
e
(
i
n

i
n
s
t
r
u
c
t
i
o
n
s
)
a
d
d


s
0
,


t
0
,


t
1
I
F
I
D
W
B
E
X
M
E
M
s
u
b


t
2
,


s
0
,


t
3
M
E
M
I
F
I
D
E
X
W
B
M
E
M
14
Forwarding with stall
  • For R-format instruction following a load that
    tries to use the data, load-use data hazard will
    occur.
  • Need to stall in this case.

b
b
l
e
b
u
b
b
l
e
15
Reordering Code to Avoid Pipeline Stalls
  • Original code register t1 has the address of
    vklw t0, 0(t1) reg t0 vklw t2,
    4(t1) reg t1vk1sw t2, 0(t1) vk
    reg t2sw t0, 4(t1) vk1 reg t0
  • Data hazard occurs on register t2 between the
    second lw and the first sw
  • Modified code removes the hazard register t1
    has the address of vklw t0, 0(t1) reg t0
    vklw t2, 4(t1) reg t1vk1sw t0,
    4(t1) vk1 reg t0sw t2, 0(t1) vk
    reg t2

16
A Pipelined Datapath
  • What do we need to add to actually split the
    datapath into stages?

x
e
c
u
t
e
/
M
E
M


M
e
m
o
r
y

a
c
c
e
s
s
W
B


W
r
i
t
e

b
a
c
k
a
d
d
r
e
s
s

c
a
l
c
u
l
a
t
i
o
n
17
Pipelined Datapath
  • Can you find a problem even if
    there are no dependencies? What instructions
    can we execute to manifest the problem?

I
D
/
E
X
R
e
a
d
r
e
g
i
s
t
e
r

1
R
e
a
d
d
a
t
a

1
R
e
a
d
Z
e
r
o
r
e
g
i
s
t
e
r

2
R
e
g
i
s
t
e
r
s
A
L
U
R
e
a
d
A
L
U
R
e
a
d
W
r
i
t
e
1
d
a
t
a

2
A
d
d
r
e
s
s
r
e
s
u
l
t
d
a
t
a
r
e
g
i
s
t
e
r
M
M
u
D
a
t
a
u
W
r
i
t
e
x
m
e
m
o
r
y
x
d
a
t
a
1
d
a
t
a
1
6
S
i
g
n
e
x
t
e
n
d
18
IF Stage
19
ID Stage
20
EX Stage
21
MEM Stage
22
WB Stage
23
Corrected Datapath
24
Portions of the Datapath used by a load
instruction
25
Graphically Representing Pipelines
  • Can help with answering questions like
  • how many cycles does it take to execute this
    code?
  • what is the ALU doing during cycle 4?
  • use this representation to help understand
    datapaths

A
L
U
A
L
U
26
Pipeline Control
27
Pipeline control
  • We have 5 stages. What needs to be controlled in
    each stage?
  • Instruction Fetch and PC Increment
  • Instruction Decode / Register Fetch
  • Execution RegDst, ALUOp, ALUSrc
  • Memory Stage Branch, MemRead, MemWrite
  • Write Back MemReg, RegWrite
  • How would control be handled in an automobile
    plant?
  • a fancy control center telling everyone what to
    do?
  • should we use a finite state machine?

28
Pipeline Control
  • Pass control signals along just like the data

29
Datapath with Control
30
Dependencies
  • Problem with starting next instruction before
    first is finished
  • dependencies that go backward in time are data
    hazards

31
Hazard Conditions
  • Type 1.a EX/MEM.RegisterRd ID/EX.RegisterRs
  • Type 1.b EX/MEM.RegisterRd ID/EX.RegisterRt
  • Type 2.a MEM/WB.RegisterRdID/EX.RegisterRs
  • Type 2.b MEM/WB.RegisterRdID/EX.RegisterRt
  • Classify the dependencies in the following
    sequence sub 2, 1, 3 Reg. 2 set by
    sub and 12, 2, 5 1st operand (2) or 13,
    6, 2 2nd operand (2) add 14, 2,
    2 sw 15, 100(2)
  • sub-and Type 1a hazard
  • sub-or Type 2b
  • sub-and no hazard, sub-sw no hazard

32
Forwarding
  • Use temporary results, dont wait for them to be
    written
  • register file forwarding to handle read/write to
    same register
  • ALU forwarding

33
Forwarding
34
Can't always forward
  • Load word can still cause a hazard
  • an instruction tries to read a register following
    a load instruction that writes to the same
    register.
  • Thus, we need a hazard detection unit to stall
    the load instruction

35
Stalling
  • We can stall the pipeline by keeping an
    instruction in the same stage

36
Hazard Detection Unit
  • Stall by letting an instruction that wont write
    anything go forward

37
Branch Hazards
  • When we decide to branch, other instructions are
    in the pipeline!
  • We are predicting branch not taken
  • need to add hardware for flushing instructions if
    we are wrong

38
Flushing Instructions

39
Improving Performance
  • Try and avoid stalls! E.g., reorder these
    instructions
  • lw t0, 0(t1)
  • lw t2, 4(t1)
  • sw t2, 0(t1)
  • sw t0, 4(t1)
  • Add a branch delay slot
  • the next instruction after a branch is always
    executed
  • rely on compiler to fill the slot with something
    useful

40
More on improving performances
  • Superpipelining decompose the stage further (not
    always practical)
  • Superscalar start more than one instruction in
    the same cycle (extra coordination required)
  • CPI can be less than 1
  • IPC instruction per clock cycle
  • Dynamic pipelining
  • lw t0, 20(s2)
  • addu t1, t0, t2
  • sub s4, s4, t3
  • slti t5, s4, 20
  • Combine extra hardware resources so later
    instructions can proceed in parallel.
  • More complicated pipeline control
  • More complicated instruction execution model

41
Superscalar MIPS
  • Assume two instructions are issued per clock
    cycle, say one integer ALU operation or branch,
    the other load or store.
  • Need to fetch and decode 64 bits of instruction
  • Extra resources are required.

42
Dynamic Scheduling
  • The hardware performs the scheduling?
  • hardware tries to find instructions to execute
  • out of order execution is possible
  • speculative execution and dynamic branch
    prediction

43
Real Stuff
  • All modern processors are very complicated
  • DEC Alpha 21264 9 stage pipeline, 6 instruction
    issue
  • PowerPC and Pentium branch history table
  • Compiler technology important
Write a Comment
User Comments (0)
About PowerShow.com