Reducing%20Misspeculation%20Penalty%20in%20Trace-Level%20Speculative%20Multithreaded%20Architectures - PowerPoint PPT Presentation

About This Presentation
Title:

Reducing%20Misspeculation%20Penalty%20in%20Trace-Level%20Speculative%20Multithreaded%20Architectures

Description:

ST stores it's commited instructions in the LAB. Look-Ahead Buffer. I1. I2 ... if fails and destination value obtained from memory is commited to register file. ... – PowerPoint PPT presentation

Number of Views:30
Avg rating:3.0/5.0

less

Transcript and Presenter's Notes

Title: Reducing%20Misspeculation%20Penalty%20in%20Trace-Level%20Speculative%20Multithreaded%20Architectures


1
Reducing Misspeculation Penalty in Trace-Level
Speculative Multithreaded Architectures
ISHPC-VI, Nara City (Japan) - September 7-9, 2005
  • Carlos Molina ?, ?
  • Jordi Tubella ?
  • Antonio González ?,?

? Dept. Enginyeria Informàtica Universitat Rovira
i Virgili Tarragona, Spaincarlos.molina_at_urv.net
? Intel Barcelona Research Center Intel Labs -
UPC Barcelona, Spainantoniox.gonzalez_at_intel.com
? Dept. Arquitectura de Computadors Universitat
Politècnica de Catalunya Barcelona, Spain
antonio,cmolina,jordit_at_ac.upc.edu
2
Techniques to Boost I Execution
Computation Repetition
  • Avoid serialization caused by data dependences
  • Determine results of instructions without
    executing them
  • Target is to boost the execution of programs

3
Techniques to Boost I Execution
Computation Repetition
4
Techniques to Boost I Execution
Computation Repetition
5
Trace Level Speculation
  • Avoids serialization caused by data dependences
  • Skips in a row multiple instructions
  • Predicts values based on the past
  • Introduces penalties due to misspeculations

6
Trace Level Speculation with Live Output Test
ST
NST
Trace Miss Speculation Detection Recovery
Actions
7
Motivation
  • Two orthogonal issues
  • microarchitecture support for trace speculation
  • control and data speculation techniques
  • prediction of initial and final points
  • prediction of live output values
  • This work focuses on
  • microarchitecture support (TSMA)
  • concretely, on reducing penalties due to
    misspeculations

Molina, González, Tubella, Trace-Level
Speculative Multithreaded Architecture (TSMA),
ICCD02 Molina, González, Tubella Compiler
Analysis for TSMA, INTERACT05
8
Outline
  • TSMA (Trace-level Speculative Multithreaded
    Architecture)
  • Verification Engine
  • Enhanced Verification Engine
  • Experimental Framework
  • Simulation Results
  • Conclusions

9
TSMA Block Diagram
Look Ahead Buffer
10
Verification Engine
Program Counters Operation Type Sources
Destination Register Numbers Sources
Destination Register Values Effective Address
11
Verification Engine
BRANCHES source value tested program counter
updated
12
Verification Engine
BRANCHES source value tested program counter
updated
ARITH IS source values tested destination
register updated
13
Verification Engine
BRANCHES source value tested program counter
updated
ARITH IS source values tested destination
register updated
STORES effective address verified destination
memory updated
14
Verification Engine
BRANCHES source value tested program counter
updated
ARITH IS source values tested destination
register updated
STORES effective address verified destination
memory updated
LOADS effective address verified memory value
checked register updated
15
Squashed Is from LAB
  • On average, up to 85 instructions are squashed
    from LAB in each thread synchronization

16
Correctly Executed Is
  • On average, over 20 of the squashed instructions
    were correctly executed by ST

17
Our Proposal
  • Enhanced Verification Engine
  • does not throw away execution results of
    instructions that are independent of the
    mispredicted point
  • reduce the number of Is fetched and executed
  • thread synchronizations can be delayed or even
    aborted
  • verification of branches, loads, stores and
    single-cycle instructions is reconsidered.

18
Related Work
  • Instruction reissue Lipasti 1997, González
    González 1997, Sato 1998
  • Squash reuse Sodani Sohi 1997
  • Control independence in trace processors
    Rotenberg et al, 1997
  • Dynamic control independence Chou et al 1999
  • Register integration Roth Sohi 2000

19
Enhanced Verification Engine
ENHANCED VERIFICATION ENGINE
BRANCHES branch target is validated instead of
source values.
20
Enhanced Verification Engine
ENHANCED VERIFICATION ENGINE
BRANCHES branch target is validated instead of
source values.
ARITH IS if source values do not match,
instruction is re-executed.
21
Enhanced Verification Engine
ENHANCED VERIFICATION ENGINE
BRANCHES branch target is validated instead of
source values.
ARITH IS if source values do not match,
instruction is re-executed.
STORES effective address is re-computed if
fails and memory is updated with value obtained
from the non-speculative architectural state.
22
Enhanced Verification Engine
ENHANCED VERIFICATION ENGINE
BRANCHES branch target is validated instead of
source values.
ARITH IS if source values do not match,
instruction is re-executed.
STORES effective address is re-computed if
fails and memory is updated with value obtained
from the non-speculative architectural state.
LOADS effective address is re-computed if fails
and destination value obtained from memory is
commited to register file.
23
Incorrect Speculated Is
  • Only 1 Is inserted in LAB are incorrectly
    predicted
  • On average, close to 90 of the instructions are
    branches, loads, stores and single-cycle
    instructions

24
Experimental Framework
  • Simulator
  • Alpha version of the SimpleScalar Toolset
  • Benchmarks
  • Spec2000, ref input
  • Maximum Optimization Level
  • DEC C F77 compilers with -non_shared -O5
  • Statistics Collected for 250 million instructions
  • Skipping an initial part of 500 million
    instructions

25
Simulation Parameters
  • Base microarchitecture
  • out of order machine, 4 instructions per cycle
  • I cache 16KB, D cache 16KB, L2 shared 256KB
  • bimodal predictor
  • TSMA additional structures
  • each thread I window, reorder buffer, register
    file
  • speculative data cache 1KB
  • trace table 128 entries, 4-way set associative
  • look ahead buffer 128 entries
  • verification engine up to 8 instructions per
    cycle
  • only one I reexecuted per cycle

26
Thread Synchronizations
Conventional VE
Enhanced VE
  • On average, the number of thread synchronizations
    is about 10 lower (from 30 to 20)

27
Speedup
Conventional VE
Enhanced VE
1.45
1.40
1.35
1.30
1.25
1.20
1.15
1.10
1.05
1.00
  • On average, the average performance improvement
    is around 9

28
Executed Is Reduced
  • On average, almost 8 of the instructions are
    reduced in execution with the enhanced VE

29
Conclusions
  • TSMA
  • significant number of Is are correctly executed,
    but discarded when synchronizing
  • novel hardware technique to enhance TSMA
  • Enhanced Verification Engine
  • thread synchros are delayed or even aborted
  • branches, loads, stores and single-cycle Is are
    reconsidered
  • Results show
  • speedup of 38 (9 improvement)
  • misprediction rate of 20 (10 reduction)

30
Future Work
  • Aggressive trace level predictors
  • Generalization to multiple threads

31
Questions Answers
ISHPC-VI, Nara City (Japan) - September 7-9, 2005
Write a Comment
User Comments (0)
About PowerShow.com