Using the Compiler to Improve Cache Replacement Decisions - PowerPoint PPT Presentation

1 / 16
About This Presentation
Title:

Using the Compiler to Improve Cache Replacement Decisions

Description:

Using the Compiler to Improve Cache Replacement ... Compiler locality analysis determines data access pattern for numeric applications ... Compiler heuristics: ... – PowerPoint PPT presentation

Number of Views:21
Avg rating:3.0/5.0
Slides: 17
Provided by: ruihon
Category:

less

Transcript and Presenter's Notes

Title: Using the Compiler to Improve Cache Replacement Decisions


1
Using the Compiler to Improve Cache Replacement
Decisions
Zhenlin Wang, UMass Amherst Kathryn S. McKinley,
UT Austin Arnold L. Rosenberg, UMass
Amherst Charles C. Weems, UMass Amherst
2
Motivation and Background
  • LRU is not always effective
  • Optimal cache replacement must peek into the
    future
  • Compiler locality analysis determines data access
    pattern for numeric applications
  • Cache line tag bit(s) and ISA extension control
    cache replacement explicitly
  • Replacement logic augments LRU with compiler
    hints

3
LRU vs. Compiler Control
2-way cache with LRU
SUBROUTINE TEST(N) INTEGER AN,BN,CN DO
I 1,N CI AI BI ENDDO
DO I 1,N AI CI 5
ENDDO END
A1
A1
B1
C1
Set 1
A2
Set 2
Set 3
Set 128
( N128 )
4
Compiler Locality Analysis
Spatial (,lt)
Spatial (,lt)
Spatial (,lt)
SUBROUTINE TEST(N) INTEGER AN,BN,CN DO
I 1,N CI AI BI ENDDO
DO I 1,N AI CI 5
ENDDO END
BI at N1 1N1
CI at N1 1N1
AI at N1 1N1
Cross-loop 1N1
Cross-loop 1N1
temporal
temporal
CI at N2 1N1
AI at N2 1N1
Spatial (,lt)
Spatial (,lt)
Locality Graph
5
An Abstract Model
  • An optimal algorithm uses exact reuse distances
  • Given trace a b c d a c d e b f a, reuse
    distance of a is 4
  • Reuse level a range in which the next reuse
    will occur
  • i,j lt k,l, if j lt k
  • For example, a reuse level of a is 3,5. (a b c
    d a c d e b f a)
  • We combine data dependences with loop iteration
    point to compute reuse levels
  • For example, (, lt) lt ( lt, )

4
6
The Architecture Evict-Me bit
  • Inspired by the Alpha 21264 prefetch-and-evict-nex
    t and evict instruction
  • Each cache line has an extra evict-me bit
  • On a replacement, choose the cache line with the
    evict-me bit set
  • Use LRU policy if no evict-me bits are set
  • Extend ISA with load/store instructions that set
    the evict-me bit

7
Heuristics for Setting Evict-me Bits
  • On a replacement, evict the cache line if its
    evict-me bit is set, otherwise, use the LRU bits
  • Compiler heuristics
  • Set evict-me bit if the reuse distance of a
    reference is greater than cache size
  • Intuition even a fully set associative cache can
    not exploit the reuse
  • Reuse level 1, cache size, cache size1, ?
  • Volume based heuristics
  • Its reuse crosses nests whose data volume is
    greater than 2cache size
  • Or reuse crosses nests of nesting level gt2

8
Algorithm for Setting Evict-me Bits
  • Mark evict-me bit for an array reference if
  • It has no temporal locality in its nest
  • Its reuse crosses nests whose data volume gt
    2cache size
  • Spatial locality is resolved by run time address
    calculation or loop unrolling

Do I 1 N A(I) ENDDO
Do I 1 N A(I) A(I1)
A(I2) A(I3) ENDDO
A1
A3
A2
0
1
9
Evict-me An Example
2-way cache with evict-me
SUBROUTINE TEST(N) INTEGER AN,BN,CN DO
I 1,N CI AI BI ENDDO
DO I 1,N AI CI 5
ENDDO END
B1 1
Set 1
A1 0
C1 0
Set 2
Set 3
Set 128
( N128 )
Nest 1 volume 384 words lt 2256
Cache size 256 words
10
Experimental Framework
  • Implemented in Scale, a compiler infrastructure
    developed at UMass
  • Scale includes optimizations such as partial
    redundancy elimination, scalar replacement, value
    numbering, sparse conditional constant
    propagation, register allocation, etc.
  • Generates SPARC Assembly
  • Simulate the evict-Me cache with URSIM
  • Out of order execution
  • Lock up free cache
  • SDRAM

SPARC Assembly
Source code
Native Assembler linker
SPARC executable
URSIM
Scale
11
Cache configurations
  • Both levels are lock-up free with 8 MSHRs each

Size and associativity
Latencies (cycles)
12
Miss reduction (level 1)
13
Miss reduction (level 2)
14
Performance Impact of Evict-me (Conf. 2)
15
Evict-me and Prefetching Combined(Conf. 3)
16
Summary
  • Compiler can improve cache replacement decisions
  • Evict-me algorithm seldom degrades performance
  • Architectural support for evict-me is practical
  • Effectiveness depends on cache configuration,
    data set size, and access patterns
Write a Comment
User Comments (0)
About PowerShow.com