Pujan Joshi

About This Presentation

Title:

Pujan Joshi

Description:

Analogy - NASCAR. Slipstream Processors by Pujan Joshi. 3. The Paradigm ... Results are communicated to R-stream. Slipstream Processors by Pujan ... Results ... – PowerPoint PPT presentation

Number of Views:54

Avg rating:3.0/5.0

Slides: 25

Provided by: cseU3

Category:

more less

Transcript and Presenter's Notes

Title: Pujan Joshi

1
Slipstream ProcessorsImproving both Performance
and Fault Tolerance

Pujan Joshi
May 6th, 2008

2
Analogy - NASCAR
3
The Paradigm

It is possible to make forward progress by
executing only a subset of the original program.
Ineffectual Instructions
Non-modifying writes
Unreferenced writes
Correctly-predicted branches
...also their dependence chains

4
Slipstreaming

A slipstream processor creates a shorter
instruction stream by skipping the ineffectual
instructions.
Two copies of the same program is run the full
program and the shortened program.

5
Slipstream Execution

The shortened program is speculatively reduced
and runs slightly ahead of the other.
The leading program is called the advanced stream
(A-stream) and the trailing program is called the
redundant stream (R-stream).

6
Advanced Stream

A-stream
Executes fewer instruction
Needs some hardware support
Results are communicated to R-stream

7
Redundant Stream

R-stream
Executes the whole program
Executes efficiently as it receives control and
data output as predictions from A-stream
compares the values against its own outcomes, if
a deviation is detected, the corrupted A-stream
context is recovered from the R-stream context.

8
Interpretation

A-stream a program based predictor
R-stream a fast checker

9
Slipstream Architecture
10

Micro-Architecture

Each is a conventional superscalar processor with
a branch predictor, instruction and data caches,
and an execution engine and a reorder buffer.
New Components Added
Instruction Removal predictor
Instruction Removal detector
Delay buffer
Recovery controller

11
The Instruction Removal Detector

Monitors the R-stream.
Checks for ineffective instructions.
Checks for correctly predicted branches.
Conveys the IR-predictor about the instructions
that can be skipped.

12
Removal Mechanism

Removes the confident branch predictions and
computations needed for them.
Removes highly value-predictable computations.
Ineffectual and branch predictable instructions
removed and PC is updated.
Computations replaced by the value.

13
Instruction Removal Predictor

Generates PC for next block of instructions.
Removes the instructions suggested by the
IR-detector in the fetch block.
Strategy
Built on top of conventional trace-predictor.
Added few more information like IR-vector,
Intermediate PC Confidence Counter.

14
Continued..

Confidence Counters
IR-detector updates this counter.
Corresponding instruction is removed if the
counter is saturated.
IR-vector and intermediate PCs used to remove
instruction.

15
Delay Buffer

FIFO Queue.
Control flow trace ids IR-vector
Data flow operand register names, values LS
addresses.
A-stream enqueues R-stream dequeues.

16
Recovery Controller

A-stream context can be corrupted.
Maintains the address where the store
instructions are writing.
Recovers the corrupted A-stream context from
R-stream (Both register and memory values).
Delay Buffer flushed.
PCs of A-stream restored.
IR-predictor backed up to precise program
counter.
Entire register file copied via delay buffer.

17
Simulation Environment
18
Slipstream Performance
19
Doubling superscalar complexity
20
Results

7 average performance improvement is achieved by
harnessing an otherwise unused, additional
processor in a Single Chip Multi-Processor (CMP).
The performance improvement due to doubling the
window size and issue bandwidth of the
superscalar processor is on average of 28.

21
Advantages

Exploiting existing, otherwise unused processor
in a CMP
speeds up a single program
Competitive with superscalar
1/4 speedup of larger superscalar. (Improved
slipstream design performs comparably or better
MICRO-33.)

22
Related Work/Motivation

A-Stream/R-Stream Simultaneous Multithreading.
Reduced programs with same output.

23
More Flexible Architecture

CMP/SMT throughput and parallel program
performance.
Slipstream improved single-program performance
and reliability.
AR-SMT / SRT high reliability with little
performance overhead.

24
References

Slipstream Processors Improving both
Performance and Fault Tolerance, Karthik
Sundaramoorthy, Zach Purser, Eric Rotenberg.
A Study of Slipstream Processors, Zach Purser,
Karthik Sundaramoorthy, Eric Rotenberg.
A Simple Mechanism for Detecting Ineffectual
Instructions in Slipstream Processors, Jinson J.
Koppanalil and Eric Rotenberg.

Write a Comment

User Comments (0)