PredictorDirected Stream Buffers

About This Presentation

Title:

PredictorDirected Stream Buffers

Description:

Stream Buffers are one of the most used. simple to ... to data cache, register file, and MSHRs. Sherwood, Sair, and Calder. 6. Past Stream Buffer work ... – PowerPoint PPT presentation

Number of Views:42

Avg rating:3.0/5.0

Slides: 25

Provided by: timshe6

Learn more at: https://sites.cs.ucsb.edu

Category:

more less

Transcript and Presenter's Notes

Title: PredictorDirected Stream Buffers

1
Predictor-Directed Stream Buffers

Timothy Sherwood
Suleyman Sair
Brad Calder

2
Overview

Introduction
Past Stream Buffer work
Predictor-Directed Stream Buffers
Policy Improvements
Results
Contribution

3
Introduction

Memory Wall
Latency reduction through prefetching
without eating too much bandwidth
Stream Buffers are one of the most used
simple to implement
very efficient
Pointer based codes

4
Past Stream Buffer work

Jouppi 1990
consecutive cache line FIFO
Palacharla and Kessler 1994
non-unit stride (based on memory chunk)
allocation filters
Farkas et. al. 1997
PC-based stride
fully associative / non-overlapping

5
Past Stream Buffer work
to data cache, register file, and MSHRs
store predict_stride in streaming buffer on
allocation

N buffers
from/to next lower level of memory
6
Past Stream Buffer work

Past work targeted at streaming in arrays
either in sequential order
or stride order (multidimensional array)
Could not handle Pointer Codes
repetitive non-striding references
Need a more General Predictor

7
Predictor-Directed Stream Buffer

The Goal Simple and efficient hardware based
prefetching of complex but predictable streams
Approach Take a general predictor and hook it up
to the well established stream buffer front end.
Separate the predictor from the prefetcher
Can use almost any predictor
2 Delta
Context
Markov

8
PSB Generalized Architecture
to data cache, register file, and MSHRs
Prediction Info
subset of prediction info
predicted address
Load PC History Stride Confidence Last Address
update prediction information
predicted address
N buffers
from/to next lower level of memory
9
PSB Stages

Allocation
Prediction
Probe
Prefetching
Lookup

10
Stage Descriptions

Allocation
Stream Buffer is allocated to a particular load
the buffer is initialized
subject to Allocation Filters
Prediction
an empty buffer entry asks for an address
subject to limited predictor speed.

11
Stage Descriptions (Continued)

Probe
if there are free ports remove useless prefetches
not mandatory
Prefetching
subject to scheduling for ports and priority,
prefetches are sent to memory
Lookup
when a load performs an L1 access, the Stream
Buffers are checked in parallel

12
PSB Implementation

Tried many different address predictors
Best is Stride Filtered Markov
similar to Joseph and Grunwalds Predictor
first order Markov
striding behavior is filtered out
Difference is stored to reduce size

13
Difference Storing
14
PSB with SFM
15
Methods

SimpleScalar 3.0
Rewrote memory hierarchy
Model bandwidth between all levels
Added perfect store sets
Ran over set of Pointer Benchmarks
2K entry predictor table
8 buffers x 4 entry Stream Buffers
32k 4-way associative cache

16
Speedup from PSB
17
Allocation Filtering

Farkas et.al. showed how two miss filtering
prevents too many streams requesting resources
Does not work as well for pointer codes
irregular miss patterns
We use Priority and Accuracy Counters
track behavior of Loads
allocate to Loads that are Behaving well

18
Allocation Filtering Speedup
19
Stream Buffer Priority

Round Robin
give each active buffer equal resources
predictor and prefetching
Priority Counters
uses small counters with each buffer
use the counters to rank buffer
more resources to better performing buffers

20
Priority Scheduling Speedup
21
Latency Reduction
22
Contributions

Predictor-Directed Stream Buffers allow
decoupling of Stream Buffer front end from
address generation
Using accuracy based allocation filtering and
priority scheduling can make a large difference
in performance
With some simple compression, even small Markov
tables can be very effective

23
Accuracy
24
Bus Results

Write a Comment

User Comments (0)