Slipstream Processors - PowerPoint PPT Presentation

1 / 42
About This Presentation
Title:

Slipstream Processors

Description:

Delay Buffer. Pipeline changes to enable instruction removal and ensure correctness ... The Delay Buffer is used to pass outcomes from A-stream to R-stream ... – PowerPoint PPT presentation

Number of Views:481
Avg rating:3.0/5.0
Slides: 43
Provided by: vkre
Category:

less

Transcript and Presenter's Notes

Title: Slipstream Processors


1
Slipstream Processors
  • Presenter Vimal Reddy
  • Advisor Eric Rotenberg

2
The Slipstream paradigm
  • A fraction of the dynamic instruction stream
    required for correct program execution
    Rotenberg99
  • Detect and remove ineffectual instructions run a
    shortened effectual version of the program
    (Advanced or A-stream)
  • Ensure correctness by running a complete version
    of the program (Redundant or R-stream)
  • Slipstream created shortened A-stream finishes
    fast R-stream consumes near-perfect predictions
    from A-stream and finishes close behind
  • Redundant arrangement much faster than
    conventional, non-redundant execution

3
Slipstream microarchitecture
  • Multiple cores of a Chip Multiprocessor (CMP)
    used to concurrently run R-stream and A-stream
  • Slipstream components
  • Instruction Removal (IR) Detector
  • IR Predictor
  • Delay Buffer
  • Pipeline changes to enable instruction removal
    and ensure correctness
  • Core running A-stream
  • Get predictions from IR predictor (not
    conventional branch predictor)
  • Skip ineffectual instructions in fetch
  • Write only to private, L1 data cache (not shared
    L2)
  • Core running R-stream
  • Get predictions from Delay Buffer and verify

4
Slipstream microarchitecture(contd.)
5
Creating the slipstream
  • Main steps involved
  • Create a reduced A-stream
  • Communicate A-stream outcomes to the R-stream
  • Check A-streams forward progress and recover
    from deviations

6
Creating a reduced A-stream
  • IR detector and IR predictor combine to create
    A-stream
  • IR detector
  • Monitors retired R-stream instructions
  • Detects (past) ineffectualness and conveys it to
    the IR predictor
  • IR predictor
  • Removes an instruction from A-stream after
    repeated indications from the IR detector

7
Communicating outcomes
  • The Delay Buffer is used to pass outcomes from
    A-stream to R-stream
  • Separate control and data FIFOs
  • Control flow information is complete IR
    predictor predicts all branches
  • Data flow information is incomplete 1 bit per
    dynamic instruction binds values to instructions

8
Memory hierarchy
  • A-stream loads and stores should not interfere
    with R-streams
  • Solution exploit typical memory hierarchy found
    in CMPs
  • Both A-stream and R-stream read and write their
    respective private L1 data caches
  • R-stream L1 cache is write-through it writes to
    a shared L2 cache
  • A-stream L1 cache is neither write-through or
    write-back its stores are not propagated to the
    shared L2 cache
  • R-stream close behind A-stream evicted line is
    generally regenerated by R-stream in shared L2
  • If A-stream re-references an evicted line in the
    shared L2 before regeneration, it gets stale data
    and diverges

9
Memory hierarchy (contd.)
10
A-stream deviation detection and recovery
  • When?
  • A-stream deviates due to incorrect removal or
    stale data access in L1 data cache
  • Detection?
  • Branch or value mispredict in R-stream (known as
    an IR misprediction)
  • Recovery?
  • Restore A-stream register state copy values from
    R-stream registers using delay buffer or
    shared-memory exception handler
  • Restore A-stream memory state invalidate
    A-stream L1 data cache (More recovery models
    later)

11
Slipstream components The IR detector
  • Monitor A-stream instructions for three
    triggering conditions
  • Unreferenced writes
  • Non-modifying writes
  • Correctly-predicted branches
  • Select triggering instructions as candidates for
    removal
  • Also select their computation chains for removal
    remove an instruction if it is killed and all
    consumers are selected for removal
  • Computation chains are implicitly removed
    removing consumers makes their producers
    unreferenced writes next time around

12
Slipstream components The IR detector

13
IR detection example
14
Slipstream components The IR predictor
  • Augmented g-share branch predictor
  • Each table entry corresponds to one basic block
    in the dynamic instruction stream
  • Tag start pc of the basic block
  • 2-bit counter for prediction of the branch
    terminating the basic block
  • Confidence counters one per basic block
    instruction to predict its ineffectualness
  • Updated by IR-detector
  • Counter incremented if instruction detected as
    removable
  • Counter reset to zero otherwise
  • Saturated counter gt instruction removed from
    A-stream when next encountered

15
Slipstream components The improved IR predictor
  • Key ideas
  • Use ineffectual information to skip fetch for
    completely ineffectual basic blocks
  • If execution bandwidth is high, slipstream still
    performs good due to fetch cycles saved

16
Slipstream components The improved IR
predictor (contd.)
17
Memory recovery models
  • Invalidate A-stream L1 cache
  • Complete recovery
  • Easy to implement invalidation signal to reset
    valid bits
  • Compulsory A-stream cache misses after recovery
  • Invalidate only dirty lines
  • Fewer compulsory misses
  • Easy to implement invalidation signal to reset
    valid bits of dirty lines
  • Needless invalidation of lines dirty before
    A-stream diverged
  • Incomplete recovery
  • Persistent-stale problem clean lines brought in
    from L2 before A-stream diverged, persist
  • Persistent-skipped-write clean lines not dirty
    because of incorrectly-skipped stores, persist

18
Memory recovery models (contd.)
  • Use invalidated lines as value-predictions in
    A-stream
  • Key ideas
  • Invalidating a line preserves its tag and data
  • Only few lines are corrupt when A-stream diverges
  • Match tag even if cache line is invalid use data
    as a value prediction
  • Summary
  • Memory recovery after A-stream divergence is easy
  • Only hardware support
  • Make L2 cache R-stream only
  • Provide cache invalidate signals based on
    recovery model

19
Primary performance results (spec2k)
  • Slipstream configuration used
  • IR predictor
  • 220 entries, gshare-indexed
  • 16 confidence counters per entry
  • Confidence threshold 64
  • IR detector
  • Number of entries buffered 128
  • Delay buffer
  • Data flow buffer 256 entries
  • Control flow buffer 4K branch predictions
  • Memory model
  • Invalidate dirty lines, use invalidated data as
    value predictions

20
Using a second processor for slipstreaming
21
Designing deployable slipstream components
  • IR detector
  • Operand rename table (ORT) to detect trigger
    instructions
  • FIFO to update IR predictor on a per basic block
    basis
  • Small cache like ORT to detect ineffectual stores
  • Delay buffer is a FIFO
  • Existing memory hierarchies work well for
    slipstream
  • IR predictor is complex
  • Ineffectual information tied up with gshare
    predictor
  • Tag and confidence counter per instruction adds
    too much overhead

22
New IR predictor experiments Finding where
removal lies?
  • Observation
  • Removal is a 90-10 case 90 of removal is
    contributed by
  • 10 of all dynamic basic blocks

23
New IR predictor designs
  • Key ideas
  • Current design stores ineffectual removal (IR)
    information for most frequently accessed basic
    blocks
  • Store IR information only for basic blocks that
    contribute most removal (10 basic blocks)

24
New IR predictor designs (contd.)
  • Design based on a simple filter
  • Cache the IR information and index it with
    PC,BHR
  • Problem Frequently accessed basic blocks will
    evict infrequent basic blocks which contribute
    most removal
  • Fix Use a simple filter of counters
  • Use regular gshare for branch prediction

25
New IR predictor designs (contd.)
  • Integrate confidence counters into the
  • I-cache
  • A table of confidence counters, one per
    instruction in the I-cache
  • Eliminates tag storage leverages I-cache tags

26
New IR predictor designs Roving confidence
counter (contd.)
  • Use one roving counter per basic block
  • Eliminates having one confidence counter per
    instruction in a basic block
  • Instructions in a basic block time-share a single
    counter
  • An instruction relinquishes the counter if
  • IR-detector does not select it for removal, OR
  • IR-detector selects it and the counter is
    saturated

27
New IR predictor designs preliminary results
  • IR predictor size
  • Large 78 MB!!
  • Filter cache 56 KB
  • I-cache 12KB

28
Putting slipstream components to work
  • Observations
  • Slipstream does not yield performance always (low
    instruction removal)
  • Slipstream components are off the critical path
  • Key ideas
  • Use slipstream components for profiling in the
    background and predict slipstream performance
    (while running in both modes slipstream and
    non-slipstream)
  • Perform opportunity-based slipstreaming

29
Opportunity-based slipstreaming
  • Goal Get comparable slipstream performance with
    the minimal required slipstream-on time
  • How to find best slipstreaming opportunities?
  • Using percentage of predicted-ineffectual
    instructions
  • Main steps
  • Count number of instructions predicted as
    ineffectual by the IR predictor
  • Monitor the count on an interval of retired
    instructions
  • Slipstream on next interval if predicted removal
    for current interval exceeds a threshold

30
Using predicted-ineffectual instructions Speedup
Interval 4K instructions, Slipstream turn-on
threshold 30 pred-ineff. instr.
31
Using predicted-ineffectual instructions
Slipstream-on time
32
Opportunity-based slipstreaming (contd.)
  • Problem with previous approach
  • Instruction removal not a correct measure to
    predict performance. It is the cycles saved due
    to instruction removal that matters
  • Program behavior may change across intervals.
    Prediction based on current interval may be wrong
    in next interval

33
Opportunity-based slipstreaming (contd.)
  • New approach
  • Count cycles saved due to removing
    predicted-ineffectual instructions

34
Opportunity-based slipstreaming (contd.)
  • Slipstream if cycles saved exceed threshold
  • Add confidence to handle across-interval program
    behavior change slipstream if cycles saved
    exceed threshold repeatedly

35
Opportunity-based slipstreaming (contd.)
  • Other fun stuff Managing resources

36
Conclusions
  • Slipstream a novel means to use CMP cores to
  • Enhance single program speedup and,
  • Implicitly enhance fault tolerance
  • Preliminary experiments with new IR predictor
    designs ecouraging
  • Existing slipstream components can be used to
    implement opportunity-based slipstreaming
  • Slipstream management unit allows better
    utilization of CMP cores for many job constraints

37
Questions?
38
Slipstream performance (spec2k)
  • Models used for comparison
  • SS(64x4) A single 4-way superscalar processor
    with 64 ROB entries
  • SS(128x8) A single 8-way superscalar processor
    with 128 ROB entries
  • SS(256x16) A single 16-way superscalar processor
    with 256 ROB entries
  • CMP(2x64x4) Slipstreaming on a CMP composed of
    two SS(64x4) cores
  • CMP(2x64x4)/byp Same as previous, but A-stream
    can bypass instruction fetching
  • CMP(2x128x8) Slipstreaming on a CMP composed of
    two SS(128x8) cores
  • CMP(2x128x8)/byp Same as previous, but A-stream
    can bypass instruction fetching

39
Slipstream performance using an extra core for
slipstreaming
40
Slipstream performance two small cores vs. one
large core
41
Instruction removal
42
Memory recovery model results
Write a Comment
User Comments (0)
About PowerShow.com