Cosc 2150 - PowerPoint PPT Presentation

About This Presentation

Title:

Cosc 2150

Description:

Cosc 2150 Chapter 9 a Instruction Level Parallelism and Superscalar Processors * * * * * * * * * Speedups of Machine Organizations Without Procedural Dependencies ... – PowerPoint PPT presentation

Number of Views:118

Avg rating:3.0/5.0

Slides: 38

Provided by: csUwyoEd

Learn more at: https://www.cs.uwyo.edu

Category:

more less

Transcript and Presenter's Notes

Title: Cosc 2150

1
Cosc 2150

Chapter 9 a
Instruction Level Parallelism
and Superscalar Processors

2
Introduction

Before we can look at different methods that are
used to increase the speed of processors
We need to take a closer look at the
fetch/execute cycle

3
Micro-Operations

A computer executes a program
Fetch/Execute cycle
Each cycle has a number of steps
see pipelining
Called micro-operations
Each step does very little
Atomic operation of CPU

4
Constituent Elements of Program Execution
5
Fetch - 4 Registers

Memory Address Register (MAR)
Connected to address bus
Specifies address for read or write op
Memory Buffer Register (MBR)
Connected to data bus
Holds data to write or last data read
Program Counter (PC)
Holds address of next instruction to be fetched
Instruction Register (IR)
Holds last instruction fetched

6
Fetch Sequence

Address of next instruction is in PC
Address (MAR) is placed on address bus
Control unit issues READ command
Result (data from memory) appears on data bus
Data from data bus copied into MBR
PC incremented by 1 (in parallel with data fetch
from memory)
Data (instruction) moved from MBR to IR
MBR is now free for further data fetches

7
Fetch Sequence (symbolic)

(tx time unit/clock cycle)
t1 MAR lt- (PC)
t2 MBR lt- (memory)
PC lt- (PC) 1
t3 IR lt- (MBR)
or
t1 MAR lt- (PC)
t2 MBR lt- (memory)
t3 PC lt- (PC) 1
IR lt- (MBR)

8
Rules for Clock Cycle Grouping

Proper sequence must be followed
MAR lt- (PC) must precede MBR lt- (memory)
Conflicts must be avoided
Must not read write same register at same time
MBR lt- (memory) IR lt- (MBR) must not be in same
cycle
Also PC lt- (PC) 1 involves addition
Use ALU
May need additional micro-operations

9
Indirect Cycle

MAR lt- (IRaddress) address field of IR
MBR lt- (memory)
IRaddress lt- (MBRaddress)
MBR contains an address
IR is now in same state as if direct addressing
had been used

10
Interrupt Cycle

t1 MBR lt-(PC)
t2 MAR lt- save-address
PC lt- routine-address
t3 memory lt- (MBR)
This is a minimum
May be additional micro-ops to get addresses
N.B. saving context is done by interrupt handler
routine, not micro-ops

11
Execute Cycle

Different for each instruction
In general, complete the task of the instruction
Example
ADD R1,X - add the contents of location X to
Register 1 , result in R1
t1 MAR lt- (IRaddress)
t2 MBR lt- (memory)
t3 R1 lt- R1 (MBR)

12
Execute Cycle (BSA)

BSA X - Branch and save address
Address of instruction following BSA is saved in
X
Execution continues from X1
t1 MAR lt- (IRaddress)
MBR lt- (PC)
t2 PC lt- (IRaddress)
memory lt- (MBR)
t3 PC lt- (PC) 1

13
Instruction Cycle

Each phase decomposed into sequence of elementary
micro-operations
E.g. fetch, indirect, and interrupt cycles
Execute cycle
One sequence of micro-operations for each opcode
Need to tie sequences together
Assume new 2-bit register
Instruction cycle code (ICC) designates which
part of cycle processor is in
00 Fetch
01 Indirect
10 Execute
11 Interrupt

14
What is Superscalar?

Common instructions (arithmetic, load/store,
conditional branch) can be initiated and executed
independently
Equally applicable to RISC CISC
In practice usually RISC

15
General Superscalar Organization
16
Superpipelined

Many pipeline stages need less than half a clock
cycle
Double internal clock speed gets two tasks per
external clock cycle
Superscalar allows parallel fetch and execute

17
Superscalar vSuperpipeline
18
Limitations

Instruction level parallelism
Compiler based optimisation
Hardware techniques
Limited by
True data dependency
Procedural dependency
Resource conflicts
Output dependency
Antidependency

19
True Data Dependency

ADD r1, r2 (r1 r1r2)
MOVE r3,r1 (r3 r1)
Can fetch and decode second instruction in
parallel with first
Can NOT execute second instruction until first is
finished

20
Procedural Dependency

Can not execute instructions after a branch in
parallel with instructions before a branch
Also, if instruction length is not fixed,
instructions have to be decoded to find out how
many fetches are needed
This prevents simultaneous fetches

21
Resource Conflict

Two or more instructions requiring access to the
same resource at the same time
e.g. two arithmetic instructions
Can duplicate resources
e.g. have two arithmetic units

22
Effect of Dependencies
23
Design Issues

Instruction level parallelism
Instructions in a sequence are independent
Execution can be overlapped
Governed by data and procedural dependency
Machine Parallelism
Ability to take advantage of instruction level
parallelism
Governed by number of parallel pipelines

24
Instruction Issue Policy

Order in which instructions are fetched
Order in which instructions are executed
Order in which instructions change registers and
memory

25
In-Order Issue In-Order Completion

Issue instructions in the order they occur
Not very efficient
May fetch gt1 instruction
Instructions must stall if necessary

26
In-Order Issue In-Order Completion (Diagram)
27
In-Order Issue Out-of-Order Completion

Output dependency
R3 R3 R5 (I1)
R4 R3 1 (I2)
R3 R5 1 (I3)
I2 depends on result of I1 - data dependency
If I3 completes before I1, the result from I1
will be wrong - output (read-write) dependency

28
In-Order Issue Out-of-Order Completion (Diagram)
29
Out-of-Order IssueOut-of-Order Completion

Decouple decode pipeline from execution pipeline
Can continue to fetch and decode until this
pipeline is full
When a functional unit becomes available an
instruction can be executed
Since instructions have been decoded, processor
can look ahead

30
Out-of-Order Issue Out-of-Order Completion
(Diagram)
31
Antidependency

Write-write dependency
R3R3 R5 (I1)
R4R3 1 (I2)
R3R5 1 (I3)
R7R3 R4 (I4)
I3 can not complete before I2 starts as I2 needs
a value in R3 and I3 changes R3

32
Register Renaming

Output and antidependencies occur because
register contents may not reflect the correct
ordering from the program
May result in a pipeline stall
Registers allocated dynamically
i.e. registers are not specifically named

33
Register Renaming example