Pujan Joshi - PowerPoint PPT Presentation

1 / 24
About This Presentation
Title:

Pujan Joshi

Description:

Analogy - NASCAR. Slipstream Processors by Pujan Joshi. 3. The Paradigm ... Results are communicated to R-stream. Slipstream Processors by Pujan ... Results ... – PowerPoint PPT presentation

Number of Views:54
Avg rating:3.0/5.0
Slides: 25
Provided by: cseU3
Category:
Tags: joshi | pujan

less

Transcript and Presenter's Notes

Title: Pujan Joshi


1
Slipstream ProcessorsImproving both Performance
and Fault Tolerance
  • Pujan Joshi
  • May 6th, 2008

2
Analogy - NASCAR
3
The Paradigm
  • It is possible to make forward progress by
    executing only a subset of the original program.
  • Ineffectual Instructions
  • Non-modifying writes
  • Unreferenced writes
  • Correctly-predicted branches
  • ...also their dependence chains

4
Slipstreaming
  • A slipstream processor creates a shorter
    instruction stream by skipping the ineffectual
    instructions.
  • Two copies of the same program is run the full
    program and the shortened program.

5
Slipstream Execution
  • The shortened program is speculatively reduced
    and runs slightly ahead of the other.
  • The leading program is called the advanced stream
    (A-stream) and the trailing program is called the
    redundant stream (R-stream).

6
Advanced Stream
  • A-stream
  • Executes fewer instruction
  • Needs some hardware support
  • Results are communicated to R-stream

7
Redundant Stream
  • R-stream
  • Executes the whole program
  • Executes efficiently as it receives control and
    data output as predictions from A-stream
  • compares the values against its own outcomes, if
    a deviation is detected, the corrupted A-stream
    context is recovered from the R-stream context.

8
Interpretation
  • A-stream a program based predictor
  • R-stream a fast checker

9
Slipstream Architecture
10

Micro-Architecture
  • Each is a conventional superscalar processor with
    a branch predictor, instruction and data caches,
    and an execution engine and a reorder buffer.
  • New Components Added
  • Instruction Removal predictor
  • Instruction Removal detector
  • Delay buffer
  • Recovery controller

11
The Instruction Removal Detector
  • Monitors the R-stream.
  • Checks for ineffective instructions.
  • Checks for correctly predicted branches.
  • Conveys the IR-predictor about the instructions
    that can be skipped.

12
Removal Mechanism
  • Removes the confident branch predictions and
    computations needed for them.
  • Removes highly value-predictable computations.
  • Ineffectual and branch predictable instructions
    removed and PC is updated.
  • Computations replaced by the value.

13
Instruction Removal Predictor
  • Generates PC for next block of instructions.
  • Removes the instructions suggested by the
    IR-detector in the fetch block.
  • Strategy
  • Built on top of conventional trace-predictor.
  • Added few more information like IR-vector,
    Intermediate PC Confidence Counter.

14
Continued..
  • Confidence Counters
  • IR-detector updates this counter.
  • Corresponding instruction is removed if the
    counter is saturated.
  • IR-vector and intermediate PCs used to remove
    instruction.

15
Delay Buffer
  • FIFO Queue.
  • Control flow trace ids IR-vector
  • Data flow operand register names, values LS
    addresses.
  • A-stream enqueues R-stream dequeues.

16
Recovery Controller
  • A-stream context can be corrupted.
  • Maintains the address where the store
    instructions are writing.
  • Recovers the corrupted A-stream context from
    R-stream (Both register and memory values).
  • Delay Buffer flushed.
  • PCs of A-stream restored.
  • IR-predictor backed up to precise program
    counter.
  • Entire register file copied via delay buffer.

17
Simulation Environment
18
Slipstream Performance
19
Doubling superscalar complexity
20
Results
  • 7 average performance improvement is achieved by
    harnessing an otherwise unused, additional
    processor in a Single Chip Multi-Processor (CMP).
  • The performance improvement due to doubling the
    window size and issue bandwidth of the
    superscalar processor is on average of 28.

21
Advantages
  • Exploiting existing, otherwise unused processor
    in a CMP
  • speeds up a single program
  • Competitive with superscalar
  • 1/4 speedup of larger superscalar. (Improved
    slipstream design performs comparably or better
    MICRO-33.)

22
Related Work/Motivation
  • A-Stream/R-Stream Simultaneous Multithreading.
  • Reduced programs with same output.

23
More Flexible Architecture
  • CMP/SMT throughput and parallel program
    performance.
  • Slipstream improved single-program performance
    and reliability.
  • AR-SMT / SRT high reliability with little
    performance overhead.

24
References
  • Slipstream Processors Improving both
    Performance and Fault Tolerance, Karthik
    Sundaramoorthy, Zach Purser, Eric Rotenberg.
  • A Study of Slipstream Processors, Zach Purser,
    Karthik Sundaramoorthy, Eric Rotenberg.
  • A Simple Mechanism for Detecting Ineffectual
    Instructions in Slipstream Processors, Jinson J.
    Koppanalil and Eric Rotenberg.
Write a Comment
User Comments (0)
About PowerShow.com