PredictorDirected Stream Buffers - PowerPoint PPT Presentation

About This Presentation
Title:

PredictorDirected Stream Buffers

Description:

Stream Buffers are one of the most used. simple to ... to data cache, register file, and MSHRs. Sherwood, Sair, and Calder. 6. Past Stream Buffer work ... – PowerPoint PPT presentation

Number of Views:42
Avg rating:3.0/5.0
Slides: 25
Provided by: timshe6
Category:

less

Transcript and Presenter's Notes

Title: PredictorDirected Stream Buffers


1
Predictor-Directed Stream Buffers
  • Timothy Sherwood
  • Suleyman Sair
  • Brad Calder

2
Overview
  • Introduction
  • Past Stream Buffer work
  • Predictor-Directed Stream Buffers
  • Policy Improvements
  • Results
  • Contribution

3
Introduction
  • Memory Wall
  • Latency reduction through prefetching
  • without eating too much bandwidth
  • Stream Buffers are one of the most used
  • simple to implement
  • very efficient
  • Pointer based codes

4
Past Stream Buffer work
  • Jouppi 1990
  • consecutive cache line FIFO
  • Palacharla and Kessler 1994
  • non-unit stride (based on memory chunk)
  • allocation filters
  • Farkas et. al. 1997
  • PC-based stride
  • fully associative / non-overlapping

5
Past Stream Buffer work
to data cache, register file, and MSHRs
store predict_stride in streaming buffer on
allocation

N buffers
from/to next lower level of memory
6
Past Stream Buffer work
  • Past work targeted at streaming in arrays
  • either in sequential order
  • or stride order (multidimensional array)
  • Could not handle Pointer Codes
  • repetitive non-striding references
  • Need a more General Predictor

7
Predictor-Directed Stream Buffer
  • The Goal Simple and efficient hardware based
    prefetching of complex but predictable streams
  • Approach Take a general predictor and hook it up
    to the well established stream buffer front end.
  • Separate the predictor from the prefetcher
  • Can use almost any predictor
  • 2 Delta
  • Context
  • Markov

8
PSB Generalized Architecture
to data cache, register file, and MSHRs
Prediction Info
subset of prediction info
predicted address
Load PC History Stride Confidence Last Address
update prediction information
predicted address
N buffers
from/to next lower level of memory
9
PSB Stages
  • Allocation
  • Prediction
  • Probe
  • Prefetching
  • Lookup

10
Stage Descriptions
  • Allocation
  • Stream Buffer is allocated to a particular load
  • the buffer is initialized
  • subject to Allocation Filters
  • Prediction
  • an empty buffer entry asks for an address
  • subject to limited predictor speed.

11
Stage Descriptions (Continued)
  • Probe
  • if there are free ports remove useless prefetches
  • not mandatory
  • Prefetching
  • subject to scheduling for ports and priority,
    prefetches are sent to memory
  • Lookup
  • when a load performs an L1 access, the Stream
    Buffers are checked in parallel

12
PSB Implementation
  • Tried many different address predictors
  • Best is Stride Filtered Markov
  • similar to Joseph and Grunwalds Predictor
  • first order Markov
  • striding behavior is filtered out
  • Difference is stored to reduce size

13
Difference Storing
14
PSB with SFM
15
Methods
  • SimpleScalar 3.0
  • Rewrote memory hierarchy
  • Model bandwidth between all levels
  • Added perfect store sets
  • Ran over set of Pointer Benchmarks
  • 2K entry predictor table
  • 8 buffers x 4 entry Stream Buffers
  • 32k 4-way associative cache

16
Speedup from PSB
17
Allocation Filtering
  • Farkas et.al. showed how two miss filtering
  • prevents too many streams requesting resources
  • Does not work as well for pointer codes
  • irregular miss patterns
  • We use Priority and Accuracy Counters
  • track behavior of Loads
  • allocate to Loads that are Behaving well

18
Allocation Filtering Speedup
19
Stream Buffer Priority
  • Round Robin
  • give each active buffer equal resources
  • predictor and prefetching
  • Priority Counters
  • uses small counters with each buffer
  • use the counters to rank buffer
  • more resources to better performing buffers

20
Priority Scheduling Speedup
21
Latency Reduction
22
Contributions
  • Predictor-Directed Stream Buffers allow
    decoupling of Stream Buffer front end from
    address generation
  • Using accuracy based allocation filtering and
    priority scheduling can make a large difference
    in performance
  • With some simple compression, even small Markov
    tables can be very effective

23
Accuracy
24
Bus Results
Write a Comment
User Comments (0)
About PowerShow.com