Title: Access Map Pattern Matching Prefetch: Optimization Friendly Method
1Access Map Pattern Matching PrefetchOptimization
Friendly Method
- Yasuo Ishii1, Mary Inaba2, and Kei Hiraki2
- 1 NEC Corporation
- 2 The University of Tokyo
2Background
- Speed gap between processor and memory has been
increased - To hide long memory latency, many techniques have
been proposed. - Importance of HW data prefetch has been increased
- Many HW prefetchers have been proposed
3Conventional Methods
- Prefetchers uses
- Instruction Address
- Memory Access Order
- Memory Address
- Optimizations scrambles information
- Out-of-Order memory access
- Loop unrolling
4Limitation of Stride PrefetchChen95Out-of-Orde
r Memory Access
Memory Address Space
for (int i0 iltN i) load A2i
(A)
0xAAFF
Access 1
0xAB00
0xAB01
0xAB02
Access 2
Out of Order
0xAB03
0xAB04
Access 3
Tag Address Stride State
0xAB05
0xAB06
0xAB04 2 steady
A
Access 4
Cannot detect strides
Cache Line
0xABFF
5Weakness of Conventional Methods
- Out-of-Order Memory Access
- Scrambles memory access order
- Prefetcher cannot detect address correlations
- Loop-Unrolling
- Requires additional table entry
- Each entry trained slowly
- Optimization friendly prefetcher is required
6Access Map Pattern Matching
- Pattern Matching
- Order Free Prefetching
- Optimization Friendly Prefetch
- Access Map
- Map-base history
- 2-bit state map
- Each state is attached to cache block
7State Diagram for Each Cache Block
- Init
- Initialized state
- Access
- Already accessed
- Prefetch
- Issued Pref. Requests
- Success
- Accessed Pref. Data
Access
Init
Access
Prefetch
Success
Pre- fetch
Access
8Memory Access Pattern Map
- Corresponding to memory address space
- Cache line granularity
Memory Address Space
Zone Size
Memory Access Pattern Map
I
I
A
S
P
A
Cache Line
Pattern Match Logic
9Pattern Matching Logic
Memory Access Pattern Map
- Access Map Shifter
- Pattern Detector
- Pipeline Register
- Prefetch Selector
I
A
A
A
I
A
I
I
I
A
A
Addr
Access Map Shifter
1
0
1
(Addr2)
10Parallel Pattern Matching
- Detects patterns from memory access map
- Detects address correlations in parallel
- Searches candidates effectively
I
I
A
A
I
I
A
S
I
A
I
A
I
I
A
Memory Access Pattern Map
11AMPM Prefetch
- Memory address space divides into zone
- Detects hot zone
- Memory Access Map Table
- LRU replacement
- Pattern Matching
Memory Address Space
Prefetch Request
12Features of AMPM Prefetcher
- Pattern Matching Base Prefetching
- Map base history
- Optimization friendly prefetching
- Parallel pattern matching
- Searches candidates effectively
- Complexity-effective implementation
13Configuration for DPC Competition
- AMPM Prefetcher
- Full-assoc 52 maps, 256 states / map
- Adaptive Stream Prefetcher Hur 2006
- 16 Histograms, 8 Stream Length
- MSHR Configuration
- 16 entries for Demand Requests (Default)
- 32 entries for Prefetch Requests (Additional)
14Budget Count
15Methodology
- Simulation Environment
- DPC Framework
- Skips first 4000M instructions and evaluate
following 100M instructions - Benchmark
- SPEC CPU2006 benchmark suite
- Compile Option -O3 -fomit-frame-pointer
-funroll-all-loops
16IPC Measurement
- Improves performance by 53
- Improves performance in all benchmarks
17L2 Cache Miss Count
- Reduces L2 Cache Miss by 76
18Related Works
- Sequence-base Prefetching
- Sequential Prefetch Smith 1978
- Stride Prefetching Table Fu 1992
- Markov Predictor Joseph 1997
- Global History Buffer Nesbit 2004
- Adaptive Prefetching
- AC/DC Nesbit 2004
- Feedback Directed Prefetch Srinath 2007
- Focus PrefetchingManikantan 2008
19Conclusion
- Access Map Pattern Matching Prefetch
- Order-Free Prefetch
- Optimization friendly prefetching
- Parallel Pattern Matching
- Complexity-effective implementation
- Optimized AMPM realizes good performance
- Improves IPC by 53
- Reduces L2 cache miss by 76
20Q A
Buffer Block Gindele1977
Sequential Smith 1978
Commercial Processors
Software
Adaptive
Software Support Mowry 1992
Stride Prefetch Fu 1992
SuperSPARC
Adaptive Seq. Dahlgren 1993
PA7200
HW/SW Integrate Gornish 1994
Spatial
RPT Chen 1995
R10000
Markov Prefetch Joseph 1997
Hybrid Hsu 1998
Locality Detect Johnson, 1998
Pentium 4
Tag Correlation Hu 2003
Hybrid
Power4
GHB Nesbit 2004
AC/DC Nesbit 2004
Spatial Pat. Chen 2004
Sequence-Base (Order Sensitive)
Adaptive Stream Hur 2006
SMS Somogyi 2006
FDP Srinath 2007
AMPM Prefetch Ishii 2009
Feedback based Honjo 2009