Appendix%20A.%20Pipelining:%20Basic%20and%20Intermediate%20Concept

About This Presentation

Title:

Appendix%20A.%20Pipelining:%20Basic%20and%20Intermediate%20Concept

Description:

Rung-Bin Lin Appendix A. Pipelining: Basic and Intermediate Concept What is Pipelining? Pipelining is an implementation technique whereby multiple instructions are ... – PowerPoint PPT presentation

Number of Views:148

Avg rating:5.0/5.0

Slides: 54

Provided by: Run102

Category:

more less

Transcript and Presenter's Notes

Title: Appendix%20A.%20Pipelining:%20Basic%20and%20Intermediate%20Concept

1
Appendix A. Pipelining Basic and Intermediate
Concept
Rung-Bin Lin

What is Pipelining?
Pipelining is an implementation technique whereby
multiple instructions are overlaped in execution.
Pipe stage (pipe segment)
Throughput
Machine cycle The time required between moving
an instruction one step down the pipeline. This
time is equal to the time required for the
slowest pipe stage.
In a computer, the machine cycle is usually one
clock cycle.
The pipeline designers goal is to balance the
length of each pipe stage.
If the stages are perfectly balanced,

2
A Simple Implementation of A RISC ISA

Five-cycle implementation
Instruction fetch cycle (IF)
Instruction decode/register fetch cycle (ID)
Operand fetches
Sign-extending the immediate field
Decoding is done in parallel with reading
registers. This technique is known as fixed-field
decoding
Test branch condition and computed branch
address finished branching at the end of this
cycle.
Execution/effective address cycle (EX)
Memory reference
Register-Register ALU instruction
Register-Immediate ALU instruction
Memory access/branch completion cycle (MEM)
Write-back cycle (WB)
Register-Register ALU instruction
Register-Immediate ALU instruction
Load instruction

3
Performance of the Five-Cycle Implementation

CPI4.54
Branch instructions (12) take 2 cycles
Store instructions (10) require 4 cycles
Others takes 5 cycles

4
The Classic Five-Stage Pipeline for a RSIC
Processor
5
The RISC Pipeline with Registers
6
Instruction Issue

The process of letting an instruction move from
the instruction decode stage (ID) into execution
stage (EX) of this pipeline.

7
Basic Performance Issues in Pipelining

Pipelining increasing instruction execution
throughput, but it does not reduce the execution
time of an individual instruction due to pipeline
overhead.
Register delay
Clock skew
The limitation of pipeline depth is due to
Pipeline latency
Pipe stage imbalance
Pipeline overhead
Example in A-10.

8
The Major Hurdle of Pipelining - Pipelining
Hazards

A hazard is a situation that prevents the next
instruction in the instruction stream from
executing during its designated clock cycle.
Three classes of hazards
Structural hazard Arise from resource conflicts.
Data hazard Arise when an instruction depends on
the results of a previous instruction.
Control hazard Arise from branches and other
instructions that change the PC.
A pipeline can be stalled by a hazard. To
eliminate hazards,
Instructions issued later than the stalled
instruction are also stalled.
Instructions issued earlier than the stalled one
must continue.
Note that a cache miss stalls the whole pipeline.

9
Performance of Pipeline with Stalls

When pipelining is thought of as decreasing the
CPI,

When pipelining is thought of as improving the
clock cycle time,

11
Structural Hazards

Due to resource conflicts (Example in A-14)
Due to some functional unit being not fully
pipelined.
When some resources have not been duplicated
enough.

12
Data Hazards

A memory access depends on the results of
unfinishing instructions.

13
Forwarding (Bypassing) ALU Results To Minimize
Hazards
14
Forwarding (Bypassing) Results to Store
15
Bypassing Results of LOAD
16
Data Hazard Classification

Consider two instructions i and j, with i
occurring before j, the possible hazards are,
RAW (read after write) j tries to read a source
before i writes it.
WAW (write after write) j tries to write an
operand before it is written by i. For example,
LW R1, 0(R2) IF ID EX MEM1
MEM2 WB
DADD R1, R2, R3 IF ID EX
WB
WAR (write after read) j tries to write a
destination before it is read by i. For example,
if read is done in the second half of MEM2, and
write is done in the first half of WB.
SW 0(R1), R2 IF ID EX MEM1 MEM2
WB
DADD R2, R3, R4 IF ID EX
WB
RAR (read after read) not a hazard.

17
Data Hazards Requiring Stalls

Pipeline interlock
A piece of hardware that detects a hazard and
stalls the pipeline until the hazard is cleared.
Load interlock
Example (Fig. A.10 at A-21)

18
Control Hazards

Caused by the instructions that change PC.
Some basics
If a branch changes the PC to its target address,
it is a taken branch. If it does not change the
PC, it falls through or it is not taken.
Recall that if an instruction i is a taken
branch, the PC is normally not changed until the
end of ID. A stall cycle is required.
Branch Instruction IF ID EX MEM WB
Branch successor IF IF ID EX
MEM WB
Branch successor1 IF
ID EX MEM WB
Branch successor2
IF ID EX MEM WB

19
Branch Penalty

Branch delay The length of a control hazard.
Branch penalty The branch delay, unless it is
dealt with, turns into branch penalty.
The deeper the pipeline, the worse the branch
penalty.
The number of branch stalls can be reduced by two
steps
Find out whether the branch is taken or not taken
earlier in the pipeline.
Compute the taken PC (i.e., the address of the
branch target) earlier.
Branch behavior in programs
Average frequency of taken branches 67
60 of the forward branches are taken.
85 of the backward branches are taken.

20
Reducing Pipeline Branch Penalties

Static branch prediction methods (Compile-time
guess).
Free or flush the pipeline
Holding or deleting any instructions after the
branch until the branch destination is known.
Predict-not-taken (untaken) (Fig. A.12 in A-23)
Predict-taken
Does it have any advantage? Ans no.
Delayed branch
The execution cycle with a branch delay n is
Branch instruction
Sequential successor 1
Sequential successor 2
Sequential successor n (n1 for MIPS)
Branch target if taken

21
Scheduling the Branch Delay Slot
22
Effectiveness of Scheduling Branch Delay Slots

Requirements for being effective
Scheduling from before Always
Scheduling from target Taken
Scheduling from fall through Not taken
The limitation on delayed-branch scheduling
arises from
The restrictions on the instructions that are
scheduled into the delay slots.
The ability to predict at compile time whether a
branch is likely to be taken or not.
Using canceling or nullified branch to relieve
the limlits
In a canceling branch, the instruction includes
the direction that the branch was predicted. When
the branch behaves as predicted, the instruction
in the branch delay slot is simply executed.
Otherwise, the instruction in the branch delay
slot is simply turned into a No-Op.

23
How Is Pipelining Implemented?

Unpipelined 5-cycle implementation

24
Simple Pipelining Implementation for MIPS
25
Implementing the Control for MIPS Pipeline

Implementing the control focuses on detecting of
hazards and generating of control signals for
forwarding.
Hazard detection
All the data hazards can be checked and
forwarding control signals can be set during the
ID phase. If a data hazard exists, the
instruction is stalled before it is issued.
Or, alternatively, hazards forwarding are checked
at the beginning of a clock cycle that uses an
operand (EX and MEM for the MIPS pipeline).
Implementing the logic for hazard detection
Hazard detection by comparing the destination and
sources of adjacent instructions (fig. A.20 on
page A-34).
An example shows detecting of all load interlocks
when the instruction using the load result in the
ID stage (fig. A.21 on page A-34).

26
Implementing Forwarding Logic

Forwarding sources ALU or data memory output.
Forwarding destination ALU input, data memory
input, or zero detection unit (for BRANCH).
The forwarding can be implemented by checking the
following conditions
EX/MEM.IR.destination ID/EX.IR.source ?
MEM/WB.IR.destination ID/EX.IR.source ?
MEM/WB.IR.destination EX/MEM.IR.source?

27
Forwarding Data to the Two ALU Inputs
28
Dealing with Branches in the Pipeline
29
What Makes Pipelining Hard to Implement

Exception (interrupt, fault) makes pipelining
difficult to implement.
Instruction set complications

30
Types of Exceptions

Types
I/O device request
Invoking an OS service from a user program
Tracing instruction execution
Breakpoint
Integer arithmetic overflow or underflow
FP arithmetic anomaly
Page fault
Misaligned memory access
Memory-protection violation
Using an undefined instruction
Hardware malfunction
Power failure
Exceptions for different architecture (fig. A.26
on page A-40).

31
Classification of Exceptions

Synchronous versus asynchronous
If the event occurs at the same place every time
that the program is executed with the same data
and memory allocation, the event is called
synchronous.
User requested versus coerced
User maskable versus nonmaskable
Within versus between instruction
Depend on whether the event prevents instruction
completion by occurring in the middle of
execution or whether it is recognized between
instructions.
Resume versus terminate (fig. 3.40 on page 182).

32
Action Requirements for Different Exception Types
(Fig. A.27 on page A-42)

Actions
Resume
Terminate
The most difficult exceptions have two
properties
They occur within instructions (i.e. at EX or MEM
stages).
They must be restartable (must save the PC of the
instruction at which to restart).

33
Exception Handling

Stopping and restarting execution
Force a trap instruction on the next IF
Until the trap is taken, turn off all writes for
the faulting instruction and for all instructions
that follow in the pipeline.
After the exception-handling routine in the
operating system receives control, it immediately
saves the PC of the faulting instruction.
IF ID EX MEM WB lt--- Faulting instruction
IF ID EX MEM WB
IF ID EX MEM WB
IF ID EX MEM WB
IF ID EX MEM
Trap instruction -gt IF ID EX
If delayed branch is used, we need to save and
restore as many PCs as the length of the branch
delay plus one.

34
Precise Interrupt

If a pipeline can be stopped so that the
instructions just before the faulting instruction
are completed and those after it can be restarted
from scratch.
Supporting precise interrupts is a requirement in
many systems.
Exceptions in DLX
With pipelining, multiple exceptions may occur in
the same clock cycle. (fig. A.28 on page A-44).

35
Implementations of Precise Exceptions

Principle
The pipeline should be able to handle the
exceptions caused by instruction i prior to the
exceptions caused by instruction i1.
Implementation
Hardware posts all exceptions caused by a given
instruction in a status vector associated that
instruction.
Once an exception indication is set in the
exception status vector, any control signal that
may cause a data value to be written is turned
off.
When an instruction enters WB, the exception
status vector is checked, if any exceptions are
posted, they are handled in the order in which
they would occur in time on an unpipelined
machine.
This will guarantee that all exceptions will be
seen on instruction i before any are seen on i1.

36
Instruction Committed

When an instruction is guaranteed to complete, it
is called committed.
In the MIPS pipeline, all instructions are
committed when they reach the end of the MEM
stage and no instruction updates the state before
that stage. Thus precise exceptions are straight
forward.

37
Instruction Set Complications

Some machines have instructions that change the
state in the middle if the instruction execution.
VAX Autoincrement addressing mode.
VAX or IBM 360 String copy.
Implicitly set condition code.
Cause difficulties in scheduling any pipeline
delays between setting condition code and the
branch.
ADD XXX lt--- Set condition code C.
lt- Can not place
instructions that change C.
BR C, YYY lt--- Use C for branch.
In fact, the condition code must be treated as an
operand that requires hazard detection for RAW
hazards with branch no matter the condition code
is set implicitly or explicitly
Multicycle operations in VAX

38
Extending the MIPS Pipeline to Handle Multi-Cycle
Operations

Assuming four separate functional units in our
MIPS implementation
Integer unit
Handle loads and stores, ALU operations and
branches.
FP and integer multiplier
FP adder
FP and integer divider
If an instruction cannot proceed to the EX stage
, the entire pipeline behind that instruction
will be stalled.

39
MIPS Pipeline with Multi-cycle Functional Units
40
Pipelining Multi-cycle Functional Units
41
Latency and Initiation(repeat interval)

Latency
The number of intervening cycles between an
instruction that produces a result and an
instruction that uses the result.
Initiation (repeat) interval
The number of cycles that must elapse between
issuing two operations of a given type.
Latency and initiation interval for pipelining
multi-cycle functional units
Functional Unit Latency Initiation interval
Integer ALU 0 1
Data memory access 1 1
FP add 3 1
FP (integer) multiply 6 1
FP (integer) divide 24 25

42
Hazards and Forwarding in Longer Latency Pipelines

Hazard detection and forwarding for a pipeline as
before.
Structural hazards can occur because the divide
unit is not fully pipelined.
The number of register writes can be larger than
1 because the instructions have varying running
time.
WAW hazards are possible, but WAR hazards are not
possible.
Instructions can complete in a different order
than they were issued, causing problems with
exceptions.
Stalls for RAW hazards will be more frequent
because of longer latency.
Assuming all hazard detection is done in ID,
three checks must be done before issuing an
instruction
Check for structural hazards
Check for a RAW data hazard
Check for a WAW data hazard

43
RAW Hazards Caused by Longer Pipeline

Fig. A.33

44
Structural Hazards in Longer Pipeline

Fig. A.34

45
Maintaining Precise Exceptions (1)

Problems caused by out-of-order completion
DIV.D F0, F2, F4
ADD.D F10, F10, F8
SUB.D F12, F12, F14
Four possible approaches
Ignore the problem and settle for imprecise
exceptions
Buffer the results of an operation until all the
operations that were issued earlier are
completed.
History file approach Buffer the original
register values.
Future file approach Keep the newer values of
registers.
Allow the exceptions to become somewhat
imprecise, but to keep enough information so that
the trap-handling routines can create a precise
sequence for exceptions. This means knowing what
operations were in the pipeline and their PCs.

46
Maintaining Precise Exceptions (2)

Worst-case scenario
Instruction 1 A long-running instruction that
interrupts.
Instruction 2 not completed.
.
Instruction n-1 not completed.
Instruction n completed. lt-- The latest
completed instruction.
The software must simulate the instruction 1
through instruction n-1 and restart the execution
at instruction n1.
Allows the instruction issue to continue only if
it is certain that all the instructions before
the issuing instruction will complete without
causing an exception. This sometimes means
stalling the machine to maintain precise
exceptions.