Title: Data cache analysis Anca Molnos
1Data cache analysis Anca
Molnos
2Outline
- Motivation
- General context
- Research questions
- Data cache analysis
- Examples
- Conclusions
3Motivation
- Increasing processor-memory gap.
- Possible solution caches.
- Need for aggressive cache optimisations.
4General context
Functional system description
Attempted architecture
What performance is intrinsically needed by the
system?
Which HW components are allowed?
How to enable an efficient mapping?
5Overflow of implementation details!
6Data view
Hierarchical distributed memory system
Memory accesses
yapi_read A lt-fifo1 for () access A
. yapi_write A -gtfifo2
yapi_read linelt-fifo2 yapi_read linelt-fifo3 for
() access line . yapi_write line -gtfifo4
- Data mapping
- data granularity
- address allocation
- access order
7Research questions
- Where are the big cache problems ?- hot spots.
- What type of optimisations can be made?
8Data cache analysis
(cross)compilation/ profiling
YAPI code
Execution
- Rewrite code
- Change addresses
Cache simulator Range analyser
9Data cache analysis
- YAPI
- C task primitives communication primitives.
- Execution
- simulation (trimedia, mips, )
- instrumentation (Aspects) execution
- Cache simulation range analysis
- single processor
10YAPI 1. Streaming application
- Cache interference
- inside tasks.
- between local task data and fifos.
11YAPI 2. Parallel streaming applications
Proposed Solution
- Cache interference
- inside tasks fifos
- between parallel tasks fifos.
12Example 1 - inside task cache conflicts
for (i 0 i lt 2000 i) Ai init()
for (i 1 i lt 2000 i) Bi f(Ai-1)
A (3759573700 - 3759581700) Memory level 0
(689, 641) rHits561 rMisses 1444 wHits
3 wMisses2003 B (3759584116 -
3759592116) Memory level 0 (1245,
1197) Read miss rate 71.478 L0 cache
2048/4/1
13Example 1 - inside task cache conflicts
Bi f(Ai-1)
14Example 1 - inside task cache conflicts
for (i 0 i lt 2000 i) Ai init()
for (i 1 i lt 2000 i) Bi f(Ai-1)
A (3759573700 - 3759581700) Memory level 0
(689, 641) rHits2003 rMisses 2 wHits 3
wMisses2003 B (3759589904 -
3759597904) Memory level 0 (644, 596) ...
Read miss rate 0.2 L0 cache 2048/4/1
15Example 2 - fifo level cache conflicts
- Smaller data granularity
- bigger locality -gt expected better data cache
behaviour - but more synchronisation points.
16Example 2 - fifo level cache conflicts
fifo1 (size 10244) miss rate 34.4 Memory
level 0 (76, 75) evicted by fifo1 209 times
evicted by fifo2 447 times fifo2 (size
10244) miss rate 29.8 Memory level 0 (84,
83) evicted by fifo1 488 times evicted by
fifo2 220 times miss rate 34.15
L0 cache 128/16/1
fifo1 (size 644) miss rate 0.3 Memory
level 0 (88, 103) evicted by fifo2 114
times fifo2 (size 10244) miss rate 33.9
Memory level 0 (116, 115) evicted by fifo1
104 times evicted by fifo2 107 times miss
rate 16.4 L0
cache 128/16/1
17Conclusions
- shows where are the data cache problems.
- hints for optimizations.
18Future work
- Analysis
- parallel cache simulation range analysis
- Optimisation
- address changing compact memory image.
- task switching - cache misses trade-off.
- minimise inter-tasks conflict misses (allocating
cache parts to tasks, ...).