Csci 136 Computer Architecture II - PowerPoint PPT Presentation

About This Presentation
Title:

Csci 136 Computer Architecture II

Description:

Out of order execution is possible ... Re-order the instructions to avoid as many pipeline stalls as possible. Solution Hints: ... – PowerPoint PPT presentation

Number of Views:219
Avg rating:3.0/5.0
Slides: 15
Provided by: xiuzhe
Category:

less

Transcript and Presenter's Notes

Title: Csci 136 Computer Architecture II


1
Csci 136 Computer Architecture II Superscalar
and Dynamic Pipelining
  • Xiuzhen Cheng
  • cheng_at_gwu.edu

2
Announcement
  • Homework assignment 11, Due time by April 8.
  • Reading Sections 6.8
  • Problems 6.30 6.31
  • Project 3 is due on April 10, 2004
  • Final Tuesday, May 4th, 1100-100PM
  • Note you must pass final to pass this course!

3
SW is In EX Stage
sw
R-Type or lw
R-Type
Sign-Ext
  • ID/EX.MemWrite and MEM/WB.RegWrite and
  • MEM/WB.RegisterRd ID/EX.RegisterRt and
  • EX/MEM.RegisterRd ! ID/EX. RegisterRt and
    MEM/WB.RegisterRd ! 0

ID/EX.MemWrite and EX/MEM.RegWrite
and EX/MEM.RegisterRd ID/EX.RegisterRt
and EX/MEM.RegisterRd ! 0
4
The Big Picture Where are We Now?
  • The Five Classic Components of a Computer
  • Current Topics
  • Superscalar and Dynamic Pipeling

Processor
Input
Control
Memory
Datapath
Output
5
Is Faster Processor Possible?
  • Potentially pipelining can provide CPI1. Is it
    possible to design faster processor?
  • Yes
  • Superpipelining longer pipelines
  • Divide washer into 3 machines wash, rinse, spin
  • Superscaler replicate the internal components
    of the computer so that it can launch multiple
    instructions per CC.
  • Buy 3 washer, 3 dryer, etc.
  • Dynamic pipelining use hardware to avoid
    pipeline hazard
  • Out of order execution is possible
  • More complicated pipeline control and instruction
    execution model.

6
Issuing Multiple Instructions/Cycle
  • Two main variations Superscalar and VLIW
  • Superscalar varying no. instructions/cycle (1 to
    6)
  • Parallelism and dependencies determined/resolved
    by HW
  • IBM PowerPC 604, Sun UltraSparc, DEC Alpha 21164,
    HP 7100
  • Very Long Instruction Words (VLIW) fixed number
    of instructions (16) parallelism determined by
    compiler
  • Pipeline is exposed compiler must schedule
    delays to get right result
  • Explicit Parallel Instruction Computer (EPIC)/
    Intel
  • 128 bit packets containing 3 instructions (can
    execute sequentially)
  • Can link 128 bit packets together to allow more
    parallelism
  • Compiler determines parallelism, HW checks
    dependencies and forwards/stalls

7
Superscalar MIPS
  • Assume two instructions are issued per clock
    cycle
  • ALU operation or branch
  • Memory access instructions

Instruction Type Pipe stages
ALU or branch instruction IF ID EX MEM WB
Load or store instruction IF ID EX MEM WB
ALU or branch instruction IF ID EX MEM WB
Load or store instruction IF ID EX MEM WB
ALU or branch instruction IF ID EX MEM WB
Load or store instruction IF ID EX MEM WB
ALU or branch instruction IF ID EX MEM WB
Load or store instruction IF ID EX MEM WB
8
Additional Hardware Requirement
  • Instructions be paired and aligned
  • Extra ports in the register file 2 instructions
  • Separate adder for lw/sw address computation
  • What will happen for load-use instructions?

9
Simple Superscalar Example
  • How would this loop be scheduled on a superscalar
    pipeline for MIPS? Loop lw t0,
    0(s1) addu t0, t0, s2 sw t0,
    0(s1) addi s1, s1, -4 bne s1, zero,
    LoopRe-order the instructions to avoid as many
    pipeline stalls as possible
  • Solution Hints
  • Figure out instructions with data dependencies
    can not be out of order!
  • Figure out load-use instructions requiring
    pipeline stalls
  • Any performance (in CPI) improvement?

10
Loop Unrolling
  • Purpose To achieve more performance improvement
    from looping
  • Idea
  • Schedule multiple copies of the loop body
    together
  • The previous example assume loop index is a
    multiple of 4
  • What is the performance improvement?

11
Dynamic Pipeline Scheduling
  • The hardware performs the scheduling
  • hardware tries to find instructions to execute
  • out of order execution is possible
  • speculative execution and dynamic branch
    prediction
  • Basic Idea
  • DPS tries to find later instructions to execute
    while waiting for a stall to be resolved
  • Pipeline is divided into 3 major units
  • Instruction fetch and issue unit IF, ID
  • Execute unit 5 to 10 independent functional
    units
  • Commit unit determine when to put the result
    back to register or memory
  • In-order completion vs. out-of-order completion

12
Basic Idea
13
Summary
  • All modern processors are very complicated
  • DEC Alpha 21264 9 stage pipeline, 6 instruction
    in parallel, 4 instructions per CC.
  • PowerPC and Pentium/Itanium branch history
    table, dynamic pipelining
  • Compiler technology is important
  • Dynamic pipelining combines with branch
    prediction is very challenging
  • Commit unit should know how to rollback-- to
    discard instructions when prediction is wrong
  • Dynamic execution is based on prediction
  • Hide memory latency
  • Avoid stalls
  • Execute instructions while waiting hazards to be
    resolved

14
Exercise 6.20
  • lw 2, 100(5) sw 2, 200(6)
  • Do forwarding in which stage?
  • How about hazard detection?

15
Forwarding Unit in EX Stage
0 1
Conditions?
16
Forwarding Unit in MEM Stage
  • Is it possible? -- YES
  • Steps
  • Change control unit s. t. RegDst is valid to
    select ID/EX.RegisterRt for sw instruction, even
    though sw does not require it
  • Add multiplexer to the write port of data memory
  • Conditions for the forwarding unit to generate
    the selector signal?

17
Hazard Detection
Conditions?
18
Questions?
Write a Comment
User Comments (0)
About PowerShow.com