Csci 136 Computer Architecture II - PowerPoint PPT Presentation

About This Presentation

Title:

Csci 136 Computer Architecture II

Description:

Out of order execution is possible ... Re-order the instructions to avoid as many pipeline stalls as possible. Solution Hints: ... – PowerPoint PPT presentation

Number of Views:219

Avg rating:3.0/5.0

Slides: 15

Provided by: xiuzhe

Learn more at: https://www2.seas.gwu.edu

Category:

more less

Transcript and Presenter's Notes

Title: Csci 136 Computer Architecture II

1
Csci 136 Computer Architecture II Superscalar
and Dynamic Pipelining

Xiuzhen Cheng
cheng_at_gwu.edu

2
Announcement

Homework assignment 11, Due time by April 8.
Reading Sections 6.8
Problems 6.30 6.31
Project 3 is due on April 10, 2004
Final Tuesday, May 4th, 1100-100PM
Note you must pass final to pass this course!

3
SW is In EX Stage
sw
R-Type or lw
R-Type
Sign-Ext

ID/EX.MemWrite and MEM/WB.RegWrite and
MEM/WB.RegisterRd ID/EX.RegisterRt and
EX/MEM.RegisterRd ! ID/EX. RegisterRt and
MEM/WB.RegisterRd ! 0

ID/EX.MemWrite and EX/MEM.RegWrite
and EX/MEM.RegisterRd ID/EX.RegisterRt
and EX/MEM.RegisterRd ! 0
4
The Big Picture Where are We Now?

The Five Classic Components of a Computer
Current Topics
Superscalar and Dynamic Pipeling

Processor
Input
Control
Memory
Datapath
Output
5
Is Faster Processor Possible?

Potentially pipelining can provide CPI1. Is it
possible to design faster processor?
Yes
Superpipelining longer pipelines
Divide washer into 3 machines wash, rinse, spin
Superscaler replicate the internal components
of the computer so that it can launch multiple
instructions per CC.
Buy 3 washer, 3 dryer, etc.
Dynamic pipelining use hardware to avoid
pipeline hazard
Out of order execution is possible
More complicated pipeline control and instruction
execution model.

6
Issuing Multiple Instructions/Cycle

Two main variations Superscalar and VLIW
Superscalar varying no. instructions/cycle (1 to
6)
Parallelism and dependencies determined/resolved
by HW
IBM PowerPC 604, Sun UltraSparc, DEC Alpha 21164,
HP 7100
Very Long Instruction Words (VLIW) fixed number
of instructions (16) parallelism determined by
compiler
Pipeline is exposed compiler must schedule
delays to get right result
Explicit Parallel Instruction Computer (EPIC)/
Intel
128 bit packets containing 3 instructions (can
execute sequentially)
Can link 128 bit packets together to allow more
parallelism
Compiler determines parallelism, HW checks
dependencies and forwards/stalls

7
Superscalar MIPS

Assume two instructions are issued per clock
cycle
ALU operation or branch
Memory access instructions

Instruction Type Pipe stages
ALU or branch instruction IF ID EX MEM WB
Load or store instruction IF ID EX MEM WB
ALU or branch instruction IF ID EX MEM WB
Load or store instruction IF ID EX MEM WB
ALU or branch instruction IF ID EX MEM WB
Load or store instruction IF ID EX MEM WB
ALU or branch instruction IF ID EX MEM WB
Load or store instruction IF ID EX MEM WB
8
Additional Hardware Requirement

Instructions be paired and aligned
Extra ports in the register file 2 instructions
Separate adder for lw/sw address computation
What will happen for load-use instructions?

9
Simple Superscalar Example

How would this loop be scheduled on a superscalar
pipeline for MIPS? Loop lw t0,
0(s1) addu t0, t0, s2 sw t0,
0(s1) addi s1, s1, -4 bne s1, zero,
LoopRe-order the instructions to avoid as many
pipeline stalls as possible
Solution Hints
Figure out instructions with data dependencies
can not be out of order!
Figure out load-use instructions requiring
pipeline stalls
Any performance (in CPI) improvement?

10
Loop Unrolling

Purpose To achieve more performance improvement
from looping
Idea
Schedule multiple copies of the loop body
together
The previous example assume loop index is a
multiple of 4
What is the performance improvement?

11
Dynamic Pipeline Scheduling

The hardware performs the scheduling
hardware tries to find instructions to execute
out of order execution is possible
speculative execution and dynamic branch
prediction
Basic Idea
DPS tries to find later instructions to execute
while waiting for a stall to be resolved
Pipeline is divided into 3 major units
Instruction fetch and issue unit IF, ID
Execute unit 5 to 10 independent functional
units
Commit unit determine when to put the result
back to register or memory
In-order completion vs. out-of-order completion

12
Basic Idea
13
Summary

All modern processors are very complicated
DEC Alpha 21264 9 stage pipeline, 6 instruction
in parallel, 4 instructions per CC.
PowerPC and Pentium/Itanium branch history
table, dynamic pipelining
Compiler technology is important
Dynamic pipelining combines with branch
prediction is very challenging
Commit unit should know how to rollback-- to
discard instructions when prediction is wrong
Dynamic execution is based on prediction
Hide memory latency
Avoid stalls
Execute instructions while waiting hazards to be
resolved

14
Exercise 6.20

lw 2, 100(5) sw 2, 200(6)
Do forwarding in which stage?
How about hazard detection?

15
Forwarding Unit in EX Stage
0 1
Conditions?
16
Forwarding Unit in MEM Stage

Is it possible? -- YES
Steps
Change control unit s. t. RegDst is valid to
select ID/EX.RegisterRt for sw instruction, even
though sw does not require it
Add multiplexer to the write port of data memory
Conditions for the forwarding unit to generate
the selector signal?

17
Hazard Detection
Conditions?
18
Questions?

Write a Comment

User Comments (0)