Computer Organization and Architecture - PowerPoint PPT Presentation

1 / 41

About This Presentation

Title:

Computer Organization and Architecture

Description:

Instructions must stall if necessary. In-Order Issue In-Order Completion (Diagram) ... May result in a pipeline stall. Registers allocated dynamically ... – PowerPoint PPT presentation

Number of Views:65

Avg rating:3.0/5.0

Slides: 42

Provided by: adria213

Category:

more less

Transcript and Presenter's Notes

Title: Computer Organization and Architecture

1
Computer Organization and Architecture

Instruction Level Parallelism
and Superscalar Processors

Chapter 14
2
What is Superscalar?

Common instructions (arithmetic, load/store,
conditional branch) can be initiated and executed
independently
Equally applicable to RISC CISC
In practice usually RISC

3
Why Superscalar?

Most operations are on scalar quantities (see
RISC notes)
Improve these operations to get an overall
improvement

4
General Superscalar Organization
5
Superpipelined

Many pipeline stages need less than half a clock
cycle
Double internal clock speed gets two tasks per
external clock cycle
Superscalar allows parallel fetch execute

6
Superscalar vSuperpipeline
7
Limitations

Instruction level parallelism
Compiler based optimisation
Hardware techniques
Limited by
True data dependency
Procedural dependency
Resource conflicts
Output dependency
Antidependency

8
True Data Dependency

ADD r1, r2 (r1 r1r2)
MOVE r3,r1 (r3 r1)
Can fetch and decode second instruction in
parallel with first
Can NOT execute second instruction until first is
finished

9
Procedural Dependency

Can not execute instructions after a branch in
parallel with instructions before a branch
Also, if instruction length is not fixed,
instructions have to be decoded to find out how
many fetches are needed
This prevents simultaneous fetches

10
Resource Conflict

Two or more instructions requiring access to the
same resource at the same time
e.g. two arithmetic instructions
Can duplicate resources
e.g. have two arithmetic units

11
Effect of Dependencies
12
Design Issues

Instruction level parallelism
Instructions in a sequence are independent
Execution can be overlapped
Governed by data and procedural dependency
Machine Parallelism
Ability to take advantage of instruction level
parallelism
Governed by number of parallel pipelines

13
Instruction Issue Policy

Order in which instructions are fetched
Order in which instructions are executed
Order in which instructions change registers and
memory

14
In-Order Issue In-Order Completion

Issue instructions in the order they occur
Not very efficient
May fetch gt1 instruction
Instructions must stall if necessary

15
In-Order Issue In-Order Completion (Diagram)
16
In-Order Issue Out-of-Order Completion

Output dependency
R3 R3 R5 (I1)
R4 R3 1 (I2)
R3 R5 1 (I3)
I2 depends on result of I1 - data dependency
If I3 completes before I1, the result from I1
will be wrong - output (read-write) dependency

17
In-Order Issue Out-of-Order Completion (Diagram)
18
Out-of-Order IssueOut-of-Order Completion

Decouple decode pipeline from execution pipeline
Can continue to fetch and decode until this
pipeline is full
When a functional unit becomes available an
instruction can be executed
Since instructions have been decoded, processor
can look ahead

19
Out-of-Order Issue Out-of-Order Completion
(Diagram)
20
Antidependency

Write-write dependency
R3R3 R5 (I1)
R4R3 1 (I2)
R3R5 1 (I3)
R7R3 R4 (I4)
I3 can not complete before I2 starts as I2 needs
a value in R3 and I3 changes R3

21
Register Renaming

Output and antidependencies occur because
register contents may not reflect the correct
ordering from the program
May result in a pipeline stall
Registers allocated dynamically
i.e. registers are not specifically named

22
Register Renaming example

R3bR3a R5a (I1)
R4bR3b 1 (I2)
R3cR5a 1 (I3)
R7bR3c R4b (I4)
Without subscript refers to logical register in
instruction
With subscript is hardware register allocated
Note R3a R3b R3c

23
Machine Parallelism

Duplication of Resources
Out of order issue
Renaming
Not worth duplication functions without register
renaming
Need instruction window large enough (more than
8)

24
Branch Prediction

80486 fetches both next sequential instruction
after branch and branch target instruction
Gives two cycle delay if branch taken

25
RISC - Delayed Branch

Calculate result of branch before unusable
instructions pre-fetched
Always execute single instruction immediately
following branch
Keeps pipeline full while fetching new
instruction stream
Not as good for superscalar
Multiple instructions need to execute in delay
slot
Instruction dependence problems
Revert to branch prediction

26
Superscalar Execution
27
Superscalar Implementation

Simultaneously fetch multiple instructions
Logic to determine true dependencies involving
register values
Mechanisms to communicate these values
Mechanisms to initiate multiple instructions in
parallel
Resources for parallel execution of multiple
instructions
Mechanisms for committing process state in
correct order

28
Pentium 4

80486 - CISC
Pentium some superscalar components
Two separate integer execution units
Pentium Pro Full blown superscalar
Subsequent models refine enhance superscalar
design

29
Pentium 4 Block Diagram
30
Pentium 4 Operation

Fetch instructions form memory in order of static
program
Translate instruction into one or more fixed
length RISC instructions (micro-operations)
Execute micro-ops on superscalar pipeline
micro-ops may be executed out of order
Commit results of micro-ops to register set in
original program flow order
Outer CISC shell with inner RISC core
Inner RISC core pipeline at least 20 stages
Some micro-ops require multiple execution stages
Longer pipeline
c.f. five stage pipeline on x86 up to Pentium

31
Pentium 4 Pipeline
32
Pentium 4 Pipeline Operation (1)
33
Pentium 4 Pipeline Operation (2)
34
Pentium 4 Pipeline Operation (3)
35
Pentium 4 Pipeline Operation (4)
36
Pentium 4 Pipeline Operation (5)
37
Pentium 4 Pipeline Operation (6)
38
PowerPC