Title: EFFICIENT ANALYTICAL MODELING OF DATA CACHE BEHAVIOUR
1 EFFICIENT ANALYTICAL MODELING OF DATA CACHE
BEHAVIOUR
- By
- Japinder Singh Chawla
- Anil Kumar Gadgotra
2Input-Output
- INPUT Any benchmark application and other cache
parameters, e.g. Line size, Cache size. - OUTPUT Memory Performance Estimate for different
cache parameter values. - Memory Performance Estimates include
quantification of reuse in the program, cache
hits and cache misses -
3Modeling of the Program
- Any memory reference
- Af1(i1,i2,..)f2(i1,i2,..)..fn(i1,i2,..)
- for any stride values s1,s2,.,sn and loop
variable limits - (l1,h1), (l2,h2),.,(ln,hn) can be expressed as
- Aa1i1a2i2.anina
- with stride values as 1 and loop variable limits
as - (1,N1), (1,N2),.,(1,Nn)
- We generate a data structure corresponding to
each cache line (or cache set if associativity
Kgt1) - The data structure contains information about the
memory accesses that map to that cache line - The following example generates a data structure
for the cache line L3
4Modeling of the Program
- L11 for i11 to N1 step 1
- L21 for i21 to N2 step 1
- aR 1 if (i2 2)
- Read bi1i2-1
- aR 2 if ((i1 2)(i1i2 10))
- Read ai1-1i2
- L22 for i21 to N2 step 1
- aR 1 Read bi1i2
- L12 for i11 to N1 step 1
- L21 for i21 to N2 step 1
- aR 1 Read ai1i2
- Any memory access can be uniquely modeled by the
access vector - ( L1 , i1 , L2 , i2 , , Ln , in , aR )
5L 3
L1 2
L1 1
L2 1
L2 1
L2 2
ai1-1i2 (a2)
bi1i2-1 (a1)
bi1i2 (a1)
ai1i2 (a1)
6Approach
7Solving the Equation
- L11 for i11 to N1 step 1
- L21 for i21 to N2 step 1
- aR 1 if (i2 2) Read
bi1i2-1 - aR 2 if ((i1 2)(i1i2 10))
Read ai1-1i2 - L22 for i21 to N2 step 1
- aR 1 Read bi1i2
- L12 for i11 to N1 step 1
- L21 for i21 to N2 step 1
- aR 1 Read ai1i2
- In this program, let N1N220, C32, L4 and for
the cache line 3, memory reference bi1i2-1
and _at_b 4, the equation is - 8 20i1 mod 32 i2 mod 32 18 lt 12
8bi1i2-1 (a1)
ß1 20 i1 1,9,17
ß1 8 i1 2,10,18
ß1 16 i1 4,12,20
ß1 12 i1 7,15
ß1 24 i1 6,14
ß2(6,9) i2(6,9)
ß2(18,21) i2(18,21)
ß2(10,13) i2(10,13)
ß2(2,5) i2(2,5)
ß2(14,17) i2(14,17)
MIN (1,6)
MAX (20,13)
9Inter-loop Reuse for Direct Mapped Cache
- Conditions for inter-loop reuse between two loop
nests L1 and L2 for a cache line l - There is no memory access to cache line l between
the two loop nests. - The memory access vectors corresponding to the
last memory access of L1 , a and the first memory
access of L2 , b access the same array and lie on
the same memory line, - i.e. mlR1(a) mlR2(b).
10Inter-loop Reuse for Direct Mapped Cache
11L 3
L1 2
L1 1
L2 1
L2 1
L2 2
ai1-1i2 (a2)
bi1i2-1 (a1)
bi1i2 (a1)
ai1i2 (a1)
MIN(2,9) MAX(20,4)
MIN(1,5) MAX(20,12)
MIN(3,9) MAX(19,12)
MIN(1,6) MAX(20,13)
MIN B(1,6)
MAX B(20,13)
12L 3
L1 2
L1 1
L2 1
L2 1
L2 2
ai1-1i2 (a2)
bi1i2-1 (a1)
bi1i2 (a1)
ai1i2 (a1)
MIN(2,9) MAX(20,4)
MIN(1,5) MAX(20,12)
MIN(3,9) MAX(19,12)
MIN(1,6) MAX(20,13)
MIN A(2,9)
MAX B(20,12)
13Inter-loop Reuse for K-way Set Associative Cache
- Conditions for inter-loop reuse of the memory
access vector aI , 1 I K , between two loop
nests L1 and L2 for a cache set s - There are no more than I-1 memory accesses to the
cache set between the two loop nests, which
access different memory lines and are also
different from mlR(aI) . - Let the above such accesses J. Then memory
access vector aI is reused iff there exists a
vector b in the first I-J minimum memory access
vectors of L2 which access the same array as aI
and lies on the memory line mlR2(b) mlR1(aI).
14bi1i2-1 (a1)
ß1 20 i1 1,9,17
ß1 28 i1 3,11,19
ß1 16 i1 4,12,20
ß1 0 i1 8,16
ß1 24 i1 6,14
ß1 4 i1 5,13
ß2(14,20) i2(14,20)
ß2(6,13) i2(6,13)
ß2(18,20) i2(18,20)
ß2(2,6) i2(2,6)
ß2(10,17) i2(10,17)
ß2(2,9) i2(2,9)
MIN (1,14)
MAX (20,20)
Memory line 100, 2nd MAX should not lie on the
memory line 100
Memory line 3, 2nd MIN should not lie on the
memory line 3
15bi1i2-1 (a1)
ß1 20 i1 1,9,17
ß1 28 i1 3,11,19
ß1 16 i1 4,12,20
ß1 0 i1 8,16
ß1 24 i1 6,14
ß1 4 i1 5,13
ß2(14,20) i2(14,20)
ß2(6,13) i2(6,13)
ß2(18,20) i2(18,20)
ß2(2,6) i2(2,6)
ß2(10,17) i2(10,17)
ß2(2,9) i2(2,9)
2nd MIN (1,18)
2nd MAX (19,13)
16Inter-loop Reuse for K-way Set Associative Cache
17L3
Ref A v1 (i11, i12)
Af1(i11)f2(i12)
18L3
Ref A v2 (i21, i22)
Ref A v1 (i11, i12)
Af1(i21)f2(i22)
19L3
Ref B v3 (i31, i32)
Ref A v1 (i11, i12)
Ref A v2 (i21, i22)
Bf1(i31)f2(i32)
20L3
Ref B v3 (i31, i32)
Ref A v1 (i11, i12)
Ref A v2 (i21, i22)
Ref B v4 (i41, i42)
Bf1(i41)f2(i42)
21L3
Ref B v3 (i31, i32)
Ref A v1 (i11, i12)
Ref A v2 (i21, i22)
Ref B v4 (i41, i42)
Ref A v5 (i51, i52)
Af1(i51)f2(i52)
22L3
Ref B v3 (i31, i32)
Ref A v1 (i11, i12)
Ref A v2 (i21, i22)
Ref B v4 (i41, i42)
Ref A v5 (i51, i52)
(v1 ,v2), (v3 ,v4),(v5,v6),.. A B
A
Af1(i51)f2(i52)
23Intra-loop Reuse for Direct Mapped Cache
24Self-Spatial Reuse
- Self-spatial reuse occurs when a memory reference
accesses the same cache line in different
iterations - Let the number of iteration vectors in a interval
be A. The self spatial reuse within that interval
Rint - (Rint A Nmiss) where Nmiss is the number of
different memory lines brought into the cache
line - Nmiss Nr_miss L/an if group spatial reuse
with the preceding interval is zero otherwise
Nmiss Nr_miss - Nr_miss A / (L/an) are the number of
replacement misses
25Intra-loop Reuse for K-Way Set Associative Cache
26Group-Spatial Reuse
- Conditions for group-spatial reuse of the memory
access vector aI , 1 I K , between two
intervals I1 and I2 for a cache set s - The memory references corresponding to the two
intervals access the same array. - The memory access vector aI is reused iff there
exists a vector b in the first I minimum memory
access vectors of I2 which lies on the same
memory line mlR2(b) mlR1(aI).
27Self-Spatial Reuse
- The self spatial reuse within a interval Rint
- (Rint A Nmiss) where Nmiss is the number of
different memory lines brought into the cache
set. - Nmiss Nr_miss (K-m)L/an where group spatial
reuse with the preceding interval is m. - Nr_miss A / (L/an) are the number of
replacement misses - The number of iteration vectors in the interval
(A) will increase because of associativity.
28Space and Time Complexity
- The approach includes the following steps
- Building of a data structure in
O(SlCSrefsMAX_REFS?i1N) - Computing Interloop reuse in O(KSlCSfMAX_FORSSrefs
MAX_REFSN) - Computing Intraloop reuse in O(KSlCSfMAX_FORSSsC/(
LXK)N) - So the time complexity of the approach is
O(KSlCSfMAX_FORSSsC/(LXK)N) - The space complexity of the approach is
O(SrefsMAX_REFSN)
29