Title: Csci 136 Computer Architecture II
1Csci 136 Computer Architecture II Superscalar
and Dynamic Pipelining
- Xiuzhen Cheng
- cheng_at_gwu.edu
2Announcement
- Homework assignment 11, Due time by April 8.
- Reading Sections 6.8
- Problems 6.30 6.31
- Project 3 is due on April 10, 2004
- Final Tuesday, May 4th, 1100-100PM
- Note you must pass final to pass this course!
3SW is In EX Stage
sw
R-Type or lw
R-Type
Sign-Ext
- ID/EX.MemWrite and MEM/WB.RegWrite and
- MEM/WB.RegisterRd ID/EX.RegisterRt and
- EX/MEM.RegisterRd ! ID/EX. RegisterRt and
MEM/WB.RegisterRd ! 0
ID/EX.MemWrite and EX/MEM.RegWrite
and EX/MEM.RegisterRd ID/EX.RegisterRt
and EX/MEM.RegisterRd ! 0
4The Big Picture Where are We Now?
- The Five Classic Components of a Computer
- Current Topics
- Superscalar and Dynamic Pipeling
Processor
Input
Control
Memory
Datapath
Output
5Is Faster Processor Possible?
- Potentially pipelining can provide CPI1. Is it
possible to design faster processor? - Yes
- Superpipelining longer pipelines
- Divide washer into 3 machines wash, rinse, spin
- Superscaler replicate the internal components
of the computer so that it can launch multiple
instructions per CC. - Buy 3 washer, 3 dryer, etc.
- Dynamic pipelining use hardware to avoid
pipeline hazard - Out of order execution is possible
- More complicated pipeline control and instruction
execution model.
6Issuing Multiple Instructions/Cycle
- Two main variations Superscalar and VLIW
- Superscalar varying no. instructions/cycle (1 to
6) - Parallelism and dependencies determined/resolved
by HW - IBM PowerPC 604, Sun UltraSparc, DEC Alpha 21164,
HP 7100 - Very Long Instruction Words (VLIW) fixed number
of instructions (16) parallelism determined by
compiler - Pipeline is exposed compiler must schedule
delays to get right result - Explicit Parallel Instruction Computer (EPIC)/
Intel - 128 bit packets containing 3 instructions (can
execute sequentially) - Can link 128 bit packets together to allow more
parallelism - Compiler determines parallelism, HW checks
dependencies and forwards/stalls
7Superscalar MIPS
- Assume two instructions are issued per clock
cycle - ALU operation or branch
- Memory access instructions
Instruction Type Pipe stages
ALU or branch instruction IF ID EX MEM WB
Load or store instruction IF ID EX MEM WB
ALU or branch instruction IF ID EX MEM WB
Load or store instruction IF ID EX MEM WB
ALU or branch instruction IF ID EX MEM WB
Load or store instruction IF ID EX MEM WB
ALU or branch instruction IF ID EX MEM WB
Load or store instruction IF ID EX MEM WB
8Additional Hardware Requirement
- Instructions be paired and aligned
- Extra ports in the register file 2 instructions
- Separate adder for lw/sw address computation
- What will happen for load-use instructions?
9Simple Superscalar Example
- How would this loop be scheduled on a superscalar
pipeline for MIPS? Loop lw t0,
0(s1) addu t0, t0, s2 sw t0,
0(s1) addi s1, s1, -4 bne s1, zero,
LoopRe-order the instructions to avoid as many
pipeline stalls as possible - Solution Hints
- Figure out instructions with data dependencies
can not be out of order! - Figure out load-use instructions requiring
pipeline stalls - Any performance (in CPI) improvement?
10Loop Unrolling
- Purpose To achieve more performance improvement
from looping - Idea
- Schedule multiple copies of the loop body
together - The previous example assume loop index is a
multiple of 4 - What is the performance improvement?
11Dynamic Pipeline Scheduling
- The hardware performs the scheduling
- hardware tries to find instructions to execute
- out of order execution is possible
- speculative execution and dynamic branch
prediction - Basic Idea
- DPS tries to find later instructions to execute
while waiting for a stall to be resolved - Pipeline is divided into 3 major units
- Instruction fetch and issue unit IF, ID
- Execute unit 5 to 10 independent functional
units - Commit unit determine when to put the result
back to register or memory - In-order completion vs. out-of-order completion
12Basic Idea
13Summary
- All modern processors are very complicated
- DEC Alpha 21264 9 stage pipeline, 6 instruction
in parallel, 4 instructions per CC. - PowerPC and Pentium/Itanium branch history
table, dynamic pipelining - Compiler technology is important
- Dynamic pipelining combines with branch
prediction is very challenging - Commit unit should know how to rollback-- to
discard instructions when prediction is wrong - Dynamic execution is based on prediction
- Hide memory latency
- Avoid stalls
- Execute instructions while waiting hazards to be
resolved
14Exercise 6.20
- lw 2, 100(5) sw 2, 200(6)
- Do forwarding in which stage?
- How about hazard detection?
15Forwarding Unit in EX Stage
0 1
Conditions?
16Forwarding Unit in MEM Stage
- Is it possible? -- YES
- Steps
- Change control unit s. t. RegDst is valid to
select ID/EX.RegisterRt for sw instruction, even
though sw does not require it - Add multiplexer to the write port of data memory
- Conditions for the forwarding unit to generate
the selector signal?
17Hazard Detection
Conditions?
18Questions?