Pipelining Processor - PowerPoint PPT Presentation

About This Presentation
Title:

Pipelining Processor

Description:

Recall: pipe is stalled due to branch !!! Time 1 2 3 4 5 6 7 8. Inst 1 F D O E S ... However, Sparc does not stall but will execute the instruction next to the brach ... – PowerPoint PPT presentation

Number of Views:52
Avg rating:3.0/5.0
Slides: 27
Provided by: cpEngC
Category:

less

Transcript and Presenter's Notes

Title: Pipelining Processor


1
Pipelining Processor
2
Instruction Cycle
  • pc 0
  • do
  • ir memorypc Fetch the instruction.
  • decode(ir) Decode the instruction.
  • fetch(operands) Fetch the operands.
  • execute Execute the instruction.
  • store(results) store the results.
  • while(ir ! HALT)

3
Pipelining
  • Improve the execution speed.
  • Divide instruction cycle into stages.
  • Each stage executes independently and
    concurrently.
  • Pipelining is natural !!!(from David Pattersons
    lecture note.)

4
(No Transcript)
5
(No Transcript)
6
(No Transcript)
7
Pipelining Lessons
  • Pipelining doesnt help latency of single task.
  • It helps throughput of entire workload.
  • Multiple tasks operating simultaneously using
    different resources.
  • Potential speedup number pipe stages

8
Pipelining in Modern Processor
  • Instruction cycle is divided into five stages

Operand Fetch
Fetch
Decode
Execute
Store
9
Pipelining Execution
  • Time 1 2 3 4 5 6 7
  • Inst 1 F D O E S
  • Inst 2 F D O E S
  • Inst 3 F D O E S

10
Performance of Pipeline
  • What do we gain ?
  • Suppose we execute 1000 instructions on
    non-pipelined and pipelined CPUs.
  • Clock speed 500 MHz (1 clock 2 ns.)
  • non-pipelined CPU
  • total time 2ns/cycle x 5 cycles/inst x 1000
    instr. 10 ms.
  • Perfect pipelined CPU
  • total time 2ns/cycle x (1 cycle/inst x 1000
    instr. 4 cycles drain) 2.008 ms.

11
Nothing is perfect !!!
  • Problem with branch.
  • Dont know what to fetch next until decoded.
  • Time 1 2 3 4 5 6 7 8
  • Inst 1 F D O E S
  • Inst 2 JMP X F D O E S
  • Inst X F D O E S

Branch target address is not available until here
!!!
12
Stalled Pipe
  • When pipelining is not smooth, we called it is
    stalled.
  • Branch and others ?
  • Subroutine calling
  • Memory accessing
  • Multi-cycle execution
  • Can we do better ? YESbut discuss later.

13
Branching in Sparc
  • Sparc uses a 5-stage pipeline.
  • Recall pipe is stalled due to branch !!!
  • Time 1 2 3 4 5 6 7 8
  • Inst 1 F D O E S
  • Inst 2 JMP X F D O E S
  • Inst X F D O E S

Branch target address is not available until here
!!!
14
Branching in Sparc
  • However, Sparc does not stall but will execute
    the instruction next to the brach (or call)
    instruction BEFORE it actually branches.
  • This is called delay slot.

15
Delay Slot
  • Time 1 2 3 4 5 6 7 8
  • Inst 1 F D O E S
  • Inst 2 JMP X F D O E S
  • Inst 3 F D O E S
  • Inst X F D O E S

Delay Slot
Branch target address is not available until here
!!!
16
Filling Delay Slots with NOP
  • .global main
  • main save sp, -64, sp
  • mov 9, l0
  • sub l0, 2, o0
  • add l0, 14, o1 ! Instruction before
    branch
  • call .mul
  • nop ! Delay slot gt wasted
  • add l0, 8, o1 ! Instruction before branch
  • call .div
  • nop ! Delay slot gt wasted
  • mov o0, l1
  • mov 1, g1
  • ta 0

17
Filling Delay Slots
  • .global main
  • main save sp, -64, sp
  • mov 9, l0
  • sub l0, 2, o0
  • call .mul
  • add l0, 14, o1 ! Delay slot filled
  • call .div
  • add l0, 8, o1 ! Delay slot filled
  • mov o0, l1
  • mov 1, g1
  • ta 0

18
Optimizing Our Second Program
  • Can we fill the delay slot ?
  • ...
  • mov o0, l1 ! Store it in y
  • add l0, 1, l0 ! x
  • cmp l0, 11 ! x lt 11 ?
  • bl loop
  • nop ! Delay slot gt wasted
  • ...
  • Not with cmp, not add (cmp depends on add).
  • mov can !!! No other instructions after that
    (and before bl) depend on this instruction.

19
Optimizing Our Second Program
  • ...
  • mov o0, l1 ! Store it in y
  • add l0, 1, l0 ! x
  • cmp l0, 11 ! x lt 11 ?
  • bl loop
  • mov o0, l1 ! Store it in y
  • ...
  • The key is to fill the delay slot with the
    instruction that has no other instruction depends
    on its result !!!

20
Filling Delay Slot Summary
  • After branch and call, there is one delay slot.
  • Always feel the delay slot to improve
    performance.
  • When filling the slot, dont change the results
    the program computes.
  • No other instructions (before the branch and the
    branch itself) depend on the instruction in the
    delay slot.
  • You can always fill the slot with nop.

21
DoWhile Delay Slot
  • How to fill the delay slot ?
  • Independent instruction
  • target instruction with annulled branch
  • when you cannot find any independent instruction
  • Independent instruction
  • see our second program

22
Filling with Target Instruction
  • sub l0, 1, o0 !(x-1) to o0, execute once
  • loop call .mul
  • sub l0, 7, o1 !(x-7) to o1, delay slot
  • call .div
  • sub l0, 11, o1 !(x-11) to o1, delay slot
  • mov o0, l1 ! Store it in y
  • add l0, 1, l0 ! x
  • cmp l0, 11 ! x lt 11 ?
  • bl,a loop
  • sub l0, 1, o0 !(x-1) to o0 (delay slot)

23
Annulled Branch
  • Execute an instruction in the delay slot if and
    only if branch occurs.
  • Program is one instruction longer and waste one
    cycle when the loop exits.
  • Do not need to find an independent instruction.
  • Can be used with any type of branches.

24
While Loop Optimization
  • Reduce number of instructions to be executed
    inside the loop.
  • By first jumping to the comparison at the end of
    the loop.
  • Then fill the delay slot.

25
While Loop Optimization Example
  • ba test ! Initial jump
  • nop ! Delay slot
  • loop
  • add l0, l1, l0 ! a a b
  • add l2, 1, l2 ! c
  • test
  • cmp l0, 17 ! Check condition
  • ble loop ! Repeat if true
  • nop ! Delay slot

26
While Loop Optimization Example
  • ba test ! Initial jump
  • cmp l0, 17 ! Check condition (DS)
  • loop add l2, 1, l2 ! c
  • cmp l0, 17 ! Check condition
  • test ble,a loop ! Repeat if true
  • add l0, l1, l0 ! very tricky! (DS)
  • Performance Improvement
  • Direct translation 7number of loop iterations
  • With initial jumping 5number of loop
    iterations
  • And filling delay slots 4number of loop
    iterations
Write a Comment
User Comments (0)
About PowerShow.com