Access Map Pattern Matching Prefetch: Optimization Friendly Method - PowerPoint PPT Presentation

About This Presentation
Title:

Access Map Pattern Matching Prefetch: Optimization Friendly Method

Description:

Title: DPC Slide Author: yishii Last modified by: yishii Created Date: 1/1/1601 12:00:00 AM Document presentation format: (4:3) – PowerPoint PPT presentation

Number of Views:129
Avg rating:3.0/5.0
Slides: 21
Provided by: yis3
Learn more at: https://jilp.org
Category:

less

Transcript and Presenter's Notes

Title: Access Map Pattern Matching Prefetch: Optimization Friendly Method


1
Access Map Pattern Matching PrefetchOptimization
Friendly Method
  • Yasuo Ishii1, Mary Inaba2, and Kei Hiraki2
  • 1 NEC Corporation
  • 2 The University of Tokyo

2
Background
  • Speed gap between processor and memory has been
    increased
  • To hide long memory latency, many techniques have
    been proposed.
  • Importance of HW data prefetch has been increased
  • Many HW prefetchers have been proposed

3
Conventional Methods
  • Prefetchers uses
  • Instruction Address
  • Memory Access Order
  • Memory Address
  • Optimizations scrambles information
  • Out-of-Order memory access
  • Loop unrolling

4
Limitation of Stride PrefetchChen95Out-of-Orde
r Memory Access

Memory Address Space
for (int i0 iltN i) load A2i
(A)
0xAAFF
Access 1
0xAB00
0xAB01
0xAB02
Access 2
Out of Order
0xAB03
0xAB04
Access 3
Tag Address Stride State
0xAB05
0xAB06
0xAB04 2 steady
A
Access 4

Cannot detect strides
Cache Line
0xABFF

5
Weakness of Conventional Methods
  • Out-of-Order Memory Access
  • Scrambles memory access order
  • Prefetcher cannot detect address correlations
  • Loop-Unrolling
  • Requires additional table entry
  • Each entry trained slowly
  • Optimization friendly prefetcher is required

6
Access Map Pattern Matching
  • Pattern Matching
  • Order Free Prefetching
  • Optimization Friendly Prefetch
  • Access Map
  • Map-base history
  • 2-bit state map
  • Each state is attached to cache block

7
State Diagram for Each Cache Block
  • Init
  • Initialized state
  • Access
  • Already accessed
  • Prefetch
  • Issued Pref. Requests
  • Success
  • Accessed Pref. Data

Access
Init
Access
Prefetch
Success
Pre- fetch
Access
8
Memory Access Pattern Map
  • Corresponding to memory address space
  • Cache line granularity

Memory Address Space

Zone Size

Memory Access Pattern Map
I
I

A
S
P
A
Cache Line

Pattern Match Logic
9
Pattern Matching Logic
Memory Access Pattern Map
  • Access Map Shifter
  • Pattern Detector
  • Pipeline Register
  • Prefetch Selector

I
A
A
A
I
A
I
I
I
A
A
Addr
Access Map Shifter


1
0
1
(Addr2)
10
Parallel Pattern Matching
  • Detects patterns from memory access map
  • Detects address correlations in parallel
  • Searches candidates effectively

I
I
A
A
I
I
A
S
I
A
I
A
I
I
A


Memory Access Pattern Map
11
AMPM Prefetch
  • Memory address space divides into zone
  • Detects hot zone
  • Memory Access Map Table
  • LRU replacement
  • Pattern Matching

Memory Address Space
Prefetch Request
12
Features of AMPM Prefetcher
  • Pattern Matching Base Prefetching
  • Map base history
  • Optimization friendly prefetching
  • Parallel pattern matching
  • Searches candidates effectively
  • Complexity-effective implementation

13
Configuration for DPC Competition
  • AMPM Prefetcher
  • Full-assoc 52 maps, 256 states / map
  • Adaptive Stream Prefetcher Hur 2006
  • 16 Histograms, 8 Stream Length
  • MSHR Configuration
  • 16 entries for Demand Requests (Default)
  • 32 entries for Prefetch Requests (Additional)

14
Budget Count
15
Methodology
  • Simulation Environment
  • DPC Framework
  • Skips first 4000M instructions and evaluate
    following 100M instructions
  • Benchmark
  • SPEC CPU2006 benchmark suite
  • Compile Option -O3 -fomit-frame-pointer
    -funroll-all-loops

16
IPC Measurement
  • Improves performance by 53
  • Improves performance in all benchmarks

17
L2 Cache Miss Count
  • Reduces L2 Cache Miss by 76

18
Related Works
  • Sequence-base Prefetching
  • Sequential Prefetch Smith 1978
  • Stride Prefetching Table Fu 1992
  • Markov Predictor Joseph 1997
  • Global History Buffer Nesbit 2004
  • Adaptive Prefetching
  • AC/DC Nesbit 2004
  • Feedback Directed Prefetch Srinath 2007
  • Focus PrefetchingManikantan 2008

19
Conclusion
  • Access Map Pattern Matching Prefetch
  • Order-Free Prefetch
  • Optimization friendly prefetching
  • Parallel Pattern Matching
  • Complexity-effective implementation
  • Optimized AMPM realizes good performance
  • Improves IPC by 53
  • Reduces L2 cache miss by 76

20
Q A
Buffer Block Gindele1977
Sequential Smith 1978
Commercial Processors
Software
Adaptive
Software Support Mowry 1992
Stride Prefetch Fu 1992
SuperSPARC
Adaptive Seq. Dahlgren 1993
PA7200
HW/SW Integrate Gornish 1994
Spatial
RPT Chen 1995
R10000
Markov Prefetch Joseph 1997
Hybrid Hsu 1998
Locality Detect Johnson, 1998
Pentium 4
Tag Correlation Hu 2003
Hybrid
Power4
GHB Nesbit 2004
AC/DC Nesbit 2004
Spatial Pat. Chen 2004
Sequence-Base (Order Sensitive)
Adaptive Stream Hur 2006
SMS Somogyi 2006
FDP Srinath 2007
AMPM Prefetch Ishii 2009
Feedback based Honjo 2009
Write a Comment
User Comments (0)
About PowerShow.com