Data cache analysis Anca Molnos - PowerPoint PPT Presentation

1 / 18
About This Presentation
Title:

Data cache analysis Anca Molnos

Description:

Verilog. VHDL. Verification. Testability. CAD tools. I first want a quick overview before focusing on all these details! Philips Research & TU Delft ... – PowerPoint PPT presentation

Number of Views:24
Avg rating:3.0/5.0
Slides: 19
Provided by: tcgmar
Category:

less

Transcript and Presenter's Notes

Title: Data cache analysis Anca Molnos


1
Data cache analysis Anca
Molnos
2
Outline
  • Motivation
  • General context
  • Research questions
  • Data cache analysis
  • Examples
  • Conclusions

3
Motivation
  • Increasing processor-memory gap.
  • Possible solution caches.
  • Need for aggressive cache optimisations.

4
General context
Functional system description
Attempted architecture
What performance is intrinsically needed by the
system?
Which HW components are allowed?
How to enable an efficient mapping?
5
Overflow of implementation details!
6
Data view
Hierarchical distributed memory system
Memory accesses
yapi_read A lt-fifo1 for () access A
. yapi_write A -gtfifo2
yapi_read linelt-fifo2 yapi_read linelt-fifo3 for
() access line . yapi_write line -gtfifo4
  • Data mapping
  • data granularity
  • address allocation
  • access order

7
Research questions
  • Where are the big cache problems ?- hot spots.
  • What type of optimisations can be made?

8
Data cache analysis
(cross)compilation/ profiling
YAPI code
Execution
  • Rewrite code
  • Change addresses

Cache simulator Range analyser
  • Change parameters

9
Data cache analysis
  • YAPI
  • C task primitives communication primitives.
  • Execution
  • simulation (trimedia, mips, )
  • instrumentation (Aspects) execution
  • Cache simulation range analysis
  • single processor

10
YAPI 1. Streaming application
  • Cache interference
  • inside tasks.
  • between local task data and fifos.

11
YAPI 2. Parallel streaming applications
Proposed Solution
  • Cache interference
  • inside tasks fifos
  • between parallel tasks fifos.

12
Example 1 - inside task cache conflicts
for (i 0 i lt 2000 i) Ai init()
for (i 1 i lt 2000 i) Bi f(Ai-1)
A (3759573700 - 3759581700) Memory level 0
(689, 641) rHits561 rMisses 1444 wHits
3 wMisses2003 B (3759584116 -
3759592116) Memory level 0 (1245,
1197) Read miss rate 71.478 L0 cache
2048/4/1
13
Example 1 - inside task cache conflicts
Bi f(Ai-1)
14
Example 1 - inside task cache conflicts
for (i 0 i lt 2000 i) Ai init()
for (i 1 i lt 2000 i) Bi f(Ai-1)
A (3759573700 - 3759581700) Memory level 0
(689, 641) rHits2003 rMisses 2 wHits 3
wMisses2003 B (3759589904 -
3759597904) Memory level 0 (644, 596) ...
Read miss rate 0.2 L0 cache 2048/4/1
15
Example 2 - fifo level cache conflicts
  • Smaller data granularity
  • bigger locality -gt expected better data cache
    behaviour
  • but more synchronisation points.

16
Example 2 - fifo level cache conflicts
fifo1 (size 10244) miss rate 34.4 Memory
level 0 (76, 75) evicted by fifo1 209 times
evicted by fifo2 447 times fifo2 (size
10244) miss rate 29.8 Memory level 0 (84,
83) evicted by fifo1 488 times evicted by
fifo2 220 times miss rate 34.15
L0 cache 128/16/1
fifo1 (size 644) miss rate 0.3 Memory
level 0 (88, 103) evicted by fifo2 114
times fifo2 (size 10244) miss rate 33.9
Memory level 0 (116, 115) evicted by fifo1
104 times evicted by fifo2 107 times miss
rate 16.4 L0
cache 128/16/1
17
Conclusions
  • shows where are the data cache problems.
  • hints for optimizations.

18
Future work
  • Analysis
  • parallel cache simulation range analysis
  • Optimisation
  • address changing compact memory image.
  • task switching - cache misses trade-off.
  • minimise inter-tasks conflict misses (allocating
    cache parts to tasks, ...).
Write a Comment
User Comments (0)
About PowerShow.com