Title: Xiaomi An, Jiqiang Song, Wendong Wang
1Temporal Distribution Based Software Cache
Partition To Reduce I-Cache Misses
- Xiaomi An, Jiqiang Song, Wendong Wang
- SimpLight Nanoelectronics Ltd
- 2008/03/24
2outline
- Traditional code layout optimizations
- Code layout optimizations in Open64 compiler
- Temporal distribution based software cache
partition to reduce I-Cache misses - Future work
3Traditional code layout optimizations
- Code layout is a kind of optimization to change
the code organization in memory. - Main benefits of code layout
- Improve branch prediction by placement of basic
blocks - Reduce I-cache misses by changing codes mapping
onto cache (mainly compulsory misses and conflict
misses) - Fit code into complex memory hierarchy (e.g.
scratch-pad memory and cache)
4Traditional code layout optimizations
- Representation of temporal relationship
- control flow graph with edge frequency
- weighted call graph
- temporal relation graph
- Consideration of cache architecture
- Linearize code, do not consider cache
architecture (Pettis and Hansen) - Distribute temporal interleaved code onto
different cache lines (Hashemi, Gloy, etc)
5Code layout optimizations in Open64 compiler
- Profile based basic block reordering and
procedure-splitting in CG - Based on control flow graph with edge frequency
- Pettis and Hansen based algorithm
- Procedure reordering in IPA
- Based on weighted call graph with call-edge
frequency - Kind of Pettis and Hansen based algorithm
6Software cache partition
- What is Software cache partition?
- Through code layout optimization, different code
blocks are mapped to different regions of the
I-cache. - Benefits of software cache partition
- Reduce cache misses
- Remove interference of multi-programs and avoid
additional hardware support (embedded systems) - Soft implementation of scratch pad memory on top
of I-cache
7Benefits of software cache partition (1)
- Remove interference of multi-programs and avoid
additional hardware support
I-cache is partitioned according to the
performance demand and code locality of the video
application and the audio application.
8Benefits of software cache partition (2)
- Soft implementation of scratch pad memory on top
of I-cache
I-cache is partitioned to guarantee code with
real time requirement will not be replaced after
they are brought into the cache.
9Benefits of software cache partition (3)
Runtime trace of code blocks ABCDEF(UV)5ABCDEF(P
Q)5ABCDEF(XY)5ABCDEF
Layout 1 24 misses
Layout 2 18 misses
10Temporal distribution based layout of code blocks
in the partitioned cache
- Selection of good candidates holding cache lines
exclusively - Hot, Dense and Temporal Distribution
Mapping into I-cache
Share cache lines
Share cache lines
11Temporal distribution
- Temporal locality and temporal regularity
- Trace ABCDEF(UV)5ABCDEF(PQ)5ABCDEF(XY)5ABCDEF
- A,B,C,D,E,F have good temporal regularity since
they have uniform distribution along the trace. - U,V,P,Q,X,Y have good temporal locality since
they exhibit a large skew in the reference
distribution.
Share cache lines
Our mapping Totally 18 misses
Share cache lines
12Qualification of temporal distribution
- Variance of reuse distance
- Weighted temporal distribution
13Iterative partition and layout
- Func Partition (RB, IRB)
- Sort nodes in RB by instruction density //
highest //instruction density first - RB_SIZE Calc_rb_size(RB)
- IRB_SIZE Calc_irb_size(IRB)
- While(RB_SIZEIRB_SIZEgtCACHE_SIZE)
- Adjust(RB, IRB)
- RB_SIZE Calc_rb_size(RB)
- IRB_SIZE Calc_irb_size(IRB)
-
14Experiments and results (1)
Cumulative effect of optimizations on I-cache
miss reduction
15Experiments and results (2)
Reduction of I-cache misses by TD, PH and TRG.
16Experiments and results (3)
H264 codec I-cache miss reduction by TD, PH and
TRG with various inputs
17Future work
- Improve current iterative partition algorithm
- Incorporate more cache configurations into the
layout algorithm, e.g. cache line size, L2 cache
- Develop effective software cache partition method
for multi-thread programs on our memory hierarchy
18Thank You!