Chapter Six - PowerPoint PPT Presentation

About This Presentation
Title:

Chapter Six

Description:

Instruction Decode / Register Fetch. Execution. Memory Stage. Write Back ... register file forwarding to handle read/write to same register. ALU forwarding ... – PowerPoint PPT presentation

Number of Views:275
Avg rating:3.0/5.0
Slides: 30
Provided by: TodA151
Learn more at: http://cse.unl.edu
Category:
Tags: chapter | register | six

less

Transcript and Presenter's Notes

Title: Chapter Six


1
Chapter Six
2
Pipelining
  • The laundry analogy

3
Pipelining
  • Improve performance by increasing instruction
    throughput
  • Ideal speedup is number of stages in the
    pipeline. Do we achieve this?

Note timing assumptions changedfor this
example
4
Pipelining
  • What makes it easy
  • all instructions are the same length
  • just a few instruction formats
  • memory operands appear only in loads and stores
  • What makes it hard?
  • structural hazards suppose we had only one
    memory
  • control hazards need to worry about branch
    instructions
  • data hazards an instruction depends on a
    previous instruction
  • Well build a simple pipeline and look at these
    issues
  • Well talk about modern processors and what
    really makes it hard
  • exception handling
  • trying to improve performance with out-of-order
    execution, etc.

5
Basic Idea
  • What do we need to add to actually split the
    datapath into stages?

6
Pipelined Datapath
  • Can you find a problem even if
    there are no dependencies? What instructions
    can we execute to manifest the problem?

7
Corrected Datapath
8
Inst. Flow in A Pipelined Datapath IF ID Stages
9
Inst. Flow in A Pipelined Datapath EX Stage
10
Inst. Flow in A Pipelined Datapath MEM WB
Stages
11
Graphically Representing Pipelines
  • Can help with answering questions like
  • how many cycles does it take to execute this
    code?
  • what is the ALU doing during cycle 4?
  • use this representation to help understand
    datapaths

12
Pipeline Control
13
Pipeline control
  • We have 5 stages. What needs to be controlled in
    each stage?
  • Instruction Fetch and PC Increment
  • Instruction Decode / Register Fetch
  • Execution
  • Memory Stage
  • Write Back
  • How would control be handled in an automobile
    plant?
  • a fancy control center telling everyone what to
    do?
  • should we use a finite state machine?

14
Pipeline Control
  • Pass control signals along just like the data

15
Datapath with Control
16
Dependencies
  • Problem with starting next instruction before
    first is finished
  • dependencies that go backward in time are data
    hazards

17
Software Solution
  • Have compiler guarantee no hazards by forcing the
    consumer instruction to wait (i.e., inserting
    no-ops)
  • Where do we insert the no-ops ? sub 2, 1,
    3 and 12, 2, 5 or 13, 6, 2 add 14,
    2, 2 sw 15, 100(2)
  • Problem this really slows us down!

18
Forwarding
  • Use temporary results, dont wait for them to be
    written
  • register file forwarding to handle read/write to
    same register
  • ALU forwarding

19
Forwarding
  • The main idea
  • Detect conditions for
  • dependencies
  • (e.g., RdI_earlier ?
  • RsI_current or RtI_current)
  • If condition true,
  • select the forward input
  • of the mux for ALU,
  • otherwise select the
  • normal mux input

20
Can't always forward
  • Load word can still cause a hazard
  • an instruction tries to read a register following
    a load instruction that writes to the same
    register.
  • Thus, we need a hazard detection unit to stall
    the load instruction

21
Stalling
  • We can stall the pipeline by keeping an
    instruction in the same stage

22
Hazard Detection Unit
  • Stall by letting an instruction that wont write
    anything (i.e. nop) go forward

23
Branch Hazards
  • When we decide to branch, other instructions are
    in the pipeline!
  • We are predicting branch not taken
  • need to add hardware for flushing the three
    instructions already in the pipeline if we are
    wrong
  • Mis-prediction penalty 3 cycles

24
Flushing Instructions

Note weve also moved branch decision to ID
stage to reduce Branch penalty (from 3 to 1)
25
Branches
  • If the branch is taken, we have a penalty of one
    cycle
  • For our simple design, this is reasonable
  • With deeper pipelines, penalty increases and
    static branch prediction drastically hurts
    performance
  • Solution dynamic branch prediction

A 2-bit prediction scheme
26
Branch Prediction
  • Sophisticated Techniques
  • A branch target buffer to help us look up the
    destination
  • Correlating predictors that base prediction on
    global behaviorand recently executed branches
    (e.g., prediction for a specificbranch
    instruction based on what happened in previous
    branches)
  • Tournament predictors that use different types of
    prediction strategies and keep track of which one
    is performing best.
  • A branch delay slot which the compiler tries to
    fill with a useful instruction (make the one
    cycle delay part of the ISA)
  • Branch prediction is especially important because
    it enables other more advanced pipelining
    techniques to be effective!
  • Modern processors predict correctly 95 of the
    time!

27
Improving Performance
  • Try and avoid stalls! E.g., reorder these
    instructions
  • lw t0, 0(t1)
  • lw t2, 4(t1)
  • sw t2, 0(t1)
  • sw t0, 4(t1)
  • Dynamic Pipeline Scheduling
  • Hardware chooses which instructions to execute
    next
  • Will execute instructions out of order (e.g.,
    doesnt wait for a dependency to be resolved, but
    rather keeps going!)
  • Speculates on branches and keeps the pipeline
    full (may need to rollback if prediction
    incorrect)
  • Trying to exploit instruction-level parallelism

28
Advanced Pipelining
  • Increase the depth of the pipeline
  • Start more than one instruction each cycle
    (multiple issue)
  • Loop unrolling to expose more ILP (better
    scheduling)
  • Superscalar processors
  • DEC Alpha 21264 9 stage pipeline, 6 instruction
    issue
  • All modern processors are superscalar and issue
    multiple instructions usually with some
    limitations (e.g., different pipes)
  • VLIW very long instruction word, static
    multiple issue (relies more on compiler
    technology)
  • This class has given you the background you need
    to learn more!

29
Chapter 6 Summary
  • Pipelining does not improve latency, but does
    improve throughput
Write a Comment
User Comments (0)
About PowerShow.com