CPE 431531 Chapter 6 Enhancing Performance with Pipelining - PowerPoint PPT Presentation

1 / 70
About This Presentation
Title:

CPE 431531 Chapter 6 Enhancing Performance with Pipelining

Description:

lw Instruction Execution: EX Stage. Electrical and Computer Engineering. Page 21 of 70 ... Stalling in Pictures. Electrical and Computer Engineering. Page 47 of ... – PowerPoint PPT presentation

Number of Views:50
Avg rating:3.0/5.0
Slides: 71
Provided by: glen3
Category:

less

Transcript and Presenter's Notes

Title: CPE 431531 Chapter 6 Enhancing Performance with Pipelining


1
CPE 431/531Chapter 6 - Enhancing Performance
with Pipelining
  • Dr. Rhonda Kay Gaede
  • UAH

2
6.1 An Overview of Pipelining
  • Pipelining is an implementation technique in
    which multiple instructions are overlapped in
    execution.
  • Pipelining helps ______________, not individual
    execution time.
  • You cant skip a stage.

3
6.1 An Overview of Pipelining -The Laundry
Analogy
4
6.1 An Overview of Pipelining -MIPS Processor
Stages
  • Instruction fetch (IF)
  • Instruction decode and register read (ID)
  • Execution (calculate address) (EX)
  • Memory access (MEM)
  • Register write (WB)

5
6.1 An Overview of Pipelining -Single Cycle vs.
Pipelined Performance
  • Consider lw, sw, add, sub, and, or, slt, beq
  • Operation times memory, ALU 200 ps, register 100
    ps

6
6.1 An Overview of Pipelining -Single Cycle vs.
Pipelined Timeline
Speedup
7
6.1 An Overview of Pipelining -Designing
Instruction Sets for Pipelining
  • All MIPS instructions are the same ______.
  • MIPS has only a few ________________.
  • Memory operands appear only in ________.
  • Operands must be ________ in memory.

8
6.1 An Overview of Pipelining -Pipeline Hazards
  • Structural Hazard - not enough hardware
  • Data Hazards one instruction needs the result
    of another
  • Control Hazard - decisions arent made
  • Conservative Approach stall
  • Alternative Approach predict

9
6.1 An Overview of Pipelining- Data Hazards
  • A data hazard occurs when a needed result has not
    yet been written to the register file.
  • Consider
  • add s0, t0, t1
  • sub t2, s0, t3
  • Though the result is not written until WB, it is
    available after the add has finished the EX
    stage, __________ it to the right place.

10
6.1 An Overview of Pipelining -Data Hazards Two
Instruction Forwarding
  • Forwarding paths are valid only if the
    destination stage is _________ than the source
    stage.

11
6.1 An Overview of Pipelining -Data Hazards
More on Forwarding
  • Forwarding cant fix everything.
  • Consider
  • lw s0, 20(t1)
  • sub t2, s0, t3

12
6.1 An Overview of Pipelining -Data Hazards The
Compiler Can Help
  • Consider the following
  • A B E
  • C B F
  • lw t1, 0(t0)
  • lw t2, 4(t0)
  • add t3, t1, t2
  • sw t3, 12(t0)
  • lw t4, 8(t0)
  • add t5, t1, t4
  • sw t5, 16(t0)

13
6.1 An Overview of Pipelining -Control Hazards
Stalling
Performance of Stall on Branch
14
6.1 An Overview of Pipelining -Control Hazards
Prediction
15
6.2 A Pipelined Datapath Identifying the Stages
16
6.2 A Pipelined Datapath Representing Multiple
Instruction Execution
17
6.2 A Pipelined Datapath - Adding Pipeline
Registers
18
6.2 A Pipelined Datapath lw Instruction
Execution IF Stage
19
6.2 A Pipelined Datapath lw Instruction
Execution ID Stage
20
6.2 A Pipelined Datapath- lw Instruction
Execution EX Stage
21
6.2 A Pipelined Datapath - lw Instruction
Execution MEM Stage

22
6.2 A Pipelined Datapath - lw Instruction
Execution WB Stage

23
6.2 A Pipelined Datapath - sw Instruction
Exection EX stage
24
6.2 A Pipelined Datapath sw Instruction
Exection MEM stage
25
6.2 A Pipelined Datapath - sw Instruction
Exection WB stage
26
6.2 A Pipelined Datapath - Additions for lw and
R-type
27
6.2 A Pipelined Datapath - Datapath used by lw
28
6.2 A Pipelined Datapath - Stylized Multiple
Clock Cycle Diagrams
29
6.2 A Pipelined Datapath - Traditional Multiple
Clock Cycle Diagrams
30
6.2 A Pipelined Datapath - Single Cycle Diagram
Cycle 5 Slice
31
6.3 Pipelined Control - Identifying Control
Lines Needed
32
6.3 Pipelined Control Generating and Saving
Control Lines
  • EX
  • MEM
  • WB

33
6.3 Pipelined Control - Putting it all Together
34
6.4 Data Hazards and Forwarding - Data
Dependencies
  • In the previous example, there were no data
    dependencies. Now, the rest of the story.
  • sub 2, 1, 3
  • and 12, 2, 5
  • or 13, 6, 2,
  • add 14, 2, 2
  • sw 15, 100(2)

35
6.4 Data Hazards and Forwarding - Which Data
Dependencies are Hazards?
  • A multiple-clock-cycle diagram is useful for
    looking at the effects of data dependencies.

36
6.4 Data Hazards and Forwarding - Classifying
Hazards
  • Type 1 The information needed in the EX stage
    by an instruction is the result of the
    instruction one stage ahead (found in the EX/MEM
    pipeline register)
  • A. The information is needed in Rrs
  • B. The information is needed as Rrt
  • Type 2 The information needed in the EX stage
    by an instruction is the result of the
    instruction two stages ahead (found in the MEM/WB
    pipeline register)
  • A. The information is needed in Rrs
  • B. The information is needed as Rrt

37
6.4 Data Hazards and Forwarding - Forwarding in
Action
38
6.4 Data Hazards and Forwarding - Datapath
without and with Forwarding
39
6.4 Data Hazards and Forwarding -Forwarding Unit
Implementation
  • EX Hazard
  • MEM Hazard

40
6.4 Data Hazards and Forwarding - Forwarding
Unit A Complication
  • Consider
  • add 1, 1, 2
  • add 1, 1, 3
  • add 1, 1, 4
  • MEM Hazard
  • If (MEM/WB.RegWrite 1 and (MEM/WB.RegisterRd
    ? 0) and (MEM/WB.RegisterRd ID/EX.RegisterRs)
    and

  • ) ForwardA 10
  • If (MEM/WB.RegWrite 1 and (MEM/WB.RegisterRd
    ? 0) and (MEM/WB.RegisterRd ID/EX.RegisterRt)
    and

  • ) ForwardB 10

41
6.4 Data Hazards and Forwarding -Forwarding
Datapath with Control
42
6.4 Data Hazards and Forwarding - EX Forwarding
Completed
43
6.4 Data Hazards and Forwarding - Forwarding in
the MEM Stage
  • Look at what happens when a load is followed by a
    store to the same address. Forwarding is possible
    in the MEM stage

44
6.5 Data Hazards and Stalls -Forwarding Cant
Always Save the Day
45
6.5 Data Hazards and Stalls - Hazard Detection
Unit
46
6.5 Data Hazards and Stalling -Stalling in
Pictures
47
6.5 Data Hazards and Stalling -Stalling in
Pictures
48
6.6 Branch Hazard - Example
49
6.6 Branch Hazards - Approaches
  • Assume Branch Not Taken
  • ___________________________________________
  • ___________________________________________
  • Reducing the Delay of Branches
  • ___________________________________________

50
6.6 Branch Hazards - Datapath Changes
51
6.6 Branch Hazards -Action Shots Hazard is
Detected
  • 36 sub 10, 4, 8
  • 40 beq 1, 3, 7
  • 44 and 12, 2, 5
  • 48 or 13, 2, 6
  • 52 add 14, 4, 2
  • 56 slt 15, 6, 7
  • 72 lw 4, 50(7)

52
6.6 Branch Hazards - Action Shots and
Instruction is Flushed
36 sub 10, 4, 8 40 beq 1, 3, 7 44 and
12, 2, 5 48 or 13, 2, 6 52 add 14, 4,
2 56 slt 15, 6, 7 72 lw 4, 50(7)
53
6.6 Branch Hazards -Branch Prediction
  • Dynamic Branch Prediction
  • Assuming branch not taken is _____________________
    .
  • Dynamic approaches
  • ___________________________
  • ___________________________
  • Loops and Predictions
  • Consider a one-bit prediction scheme. How
    accurate is it for a loop branch thats taken 9
    times and then not taken once?

54
6.6 Branch Hazards -Delayed Branch Slot
  • ProblemA branch predictor tells us whether or
    not a branch is taken, but still requires the
    calculation of the _____________
  • Solution Use a cache to hold the _______________
    or the ___________________

55
6.6 Branch Hazards -Comparing Performance
  • Compare single-cycle, multicycle, and pipeline
  • Functional Unit Times Memory access 200 ps, ALU
    operation 100 ps, Register file read/write 50
    ps
  • CTsingle 600 ps, CPIsingle 1, CTmulti 200
    ps, CPImulti 4.12
  • Instructions 25 loads, 10 stores, 11
    branches, 2 jumps, 52 ALU
  • Pipeline Assumptions 50 of load instructions
    are followed immediately by a use, the branch
    delay on misprediction is one cycle and 25 of
    the branches are mispredicted, jumps always pay
    one full clock cycle of delay
  • CPIpipe 0.25(0.5(12))0.110.11(0.2520.751
    )0.022 0.521
  • 1.17
  • Average instruction time CPICT

56
6.6 Branch Hazards - Final Datapath and Control
57
6.8 Exceptions - Datapath Changes
58
6.8 Exceptions - Exception in a Pipelined
Computer
  • 40hex sub 11, 2, 4
  • 44hex and 12, 2, 5
  • 48hex or 13, 2, 6
  • 4Chex add 1, 2, 1
  • 50hex slt 15, 6, 7
  • 54hex lw 16, 50(7)
  • 40000040hexsw 25, 1000(0)
  • 40000044hexsw 26, 1004(0)

59
6.8 Exceptions - Exception in a Pipelined
Computer
  • 40hex sub 11, 2, 4
  • 44hex and 12, 2, 5
  • 48hex or 13, 2, 6
  • 4Chex add 1, 2, 1
  • 50hex slt 15, 6, 7
  • 54hex lw 16, 50(7)
  • 40000040hex sw 25, 1000(0)
  • 40000044hex sw 26, 1004(0)

60
6.8 Exceptions More About Exception Causes
Handling
  • Causes of Exceptions
  • _________________
  • _________________
  • _________________
  • _________________
  • Issues
  • Which instruction in the pipeline is responsible
    for the exception?
  • What happens if multiple exceptions occur in a
    single clock cycle?
  • Solutions
  • Prioritize the exceptions - easy to determine
    which is serviced first.
  • Most MIPS implementations have hardware that
    sorts exceptions so that the earliest instruction
    is interrupted.
  • I/O device requests and hardware malfunctions are
    not associated with a specific instruction, so
    there is some flexibility.

61
6.9 Advanced Pipelining Extracting More
Performance
  • Pipelining exploits the potential parallelism,
    called _______________________, among
    instructions.
  • Methods for increasing performance
  • Increase the depth of the pipeline
  • Replicate the internal components of the computer
    so that it can launch multiple instructions in
    every cycle _______________
  • Launching multiple instructions per cycle allows
    the instruction execution rate to exceed the
    clock rate, the CPI _______, so we consider IPC
    ________________ instead
  • Todays high end processors attempt to issue from
    _______ to _______ instructions in every cycle
  • Multiple Issue
  • Static Multiple Issue
  • Dynamic Multiple Issue

62
6.9 Advanced Pipelining Extracting More
Performance
  • Multiple Issue
  • Static Multiple Issue
  • Dynamic Multiple Issue
  • Issues in Multiple Issue Pipelines
  • Packaging instructions into issue slots
  • Dealing with control and data hazards
  • The Concept of Speculation
  • The compiler or processor guesses about the
    properties of an instruction, so as to enable
    execution to begin for dependent instructions.
  • Any speculation mechanism must be able to check
    _______________ ______________ and to undo
    ____________________________.
  • Speculation in software requires
    ________________________, speculation in hardware
    usually requires __________________
    _____________________________.

63
6.9 Advanced Pipelining Extracting More
Performance Static Multiple Issue
  • The set of instructions that issue in a given
    clock cycle is called an _____________________.
  • An issue packet is basically _____________________
    ______ with _____________________(_____).
  • An Example Static Multiple Issue with the MIPS
    ISA
  • Two-issue
  • _____________________
  • ______________________

64
6.9 Advanced Pipelining Extracting More
Performance Static Two-Issue Datapath
65
6.9 Advanced Pipelining Extracting More
Performance Multiple-Issue Scheduling
Loop lw t0, 0(s1) addu t0, t0, s2
sw t0, 0(s1) addi s1, s1, -4
bne s1, zero, Loop
66
6.9 Advanced Pipelining Extracting More
Performance Multiple-Issue Scheduling
Loop lw t0, 0(s1) lw t1,
-4(s1) addu t0, t0, s2 addu
t1, t1, s2 sw t0, 0(s1) sw
t1, -4(s1) addi s1, s1, -8
bne s1, zero, Loop
67
6.9 Advanced Pipelining Extracting More
Performance Multiple-Issue Scheduling
Loop lw t0, 0(s1) sw t0, 0(s1)
lw t1, -4(s1) sw t1, -4(s1)
lw t2, -8(s1) t2, -8(s1) lw
t3, -12(s1) sw t3, -12(s1)
addu t0, t0, s2 addi s1, s1, -16
addu t1, t1, s2 bne s1, zero, Loop
addu t2, t2, s2 addu t3, t3,
s2
68
6.9 Advanced Pipelining Extracting More
Performance Dynamic Multiple-Issue
  • Dynamic Pipelining

69
6.10 Real Stuff The Pentium 4 Pipeline
  • Characteristics

70
6.11 Fallacies and Pitfalls
  • Fallacy - Pipelining is easy.
  • Fallacy - Pipelining ideas can be implemented
    independent of technology.
  • Pitfall Failure to consider instruction set
    design can adversely impact pipelining
Write a Comment
User Comments (0)
About PowerShow.com