Title: Associative Caches in Formal Software Timing Analysis
1Associative Caches inFormal Software Timing
Analysis
- Fabian Wolf
- Volkswagen AG, Wolfsburg, Germany
- Jan Staschulat, Rolf Ernst
- Technical University of Braunschweig,
Germany
2Outline
- Introduction
- Running time analysis using program segments
- Local cache simulation
- Data flow based cache analysis
- Experiments
- Conclusion
3Motivation
- The amount of software in embedded systems grows
rapidly - Many innovations in automotive systems are based
on software functions - Analysis of software running time
- Guarantees for hard real-time constraints
(Fuel mass calculation, Ignition timing, ...) - System performance and throughput
(Data transport, ...) - Verification of non-functional software
properties becomes essential in design automation
4Introduction
- Software running time is input data dependent
- Process control flow
- Assembly instruction execution
- Software properties Behavioral Intervals
- Running time
- Power consumption
- Communicated data
5Software Running Time and Caches
- Influences on software running time
- Context switch time
- Communication time
- Cache behavior
- Core execution time
- Caches have a significant influence
- Always hit assumptions are not conservative
- Always miss assumptions significantly
overestimate the process running time - Safe cache analysis can decrease system cost
6Timing Analysis by Simulation
- Running time is often determined by simulation
- Test patterns selection for the input data is
- unsafe (critical cases)
- complex (unnecessary cases)
7Formal Software Timing Analysis
- Conservative formal approaches overestimate the
exact running time interval - not critical (real-time guarantees)
- expensive designs
- Goal Overestimation must be minimized
- Formal Analysis Separation of path analysis and
architecture modeling
8Path Analysis
- T, t, (d,) x are Intervals Solving two ILP for x
for(j0jlt15j) if(jlt3) ajaj1
- Running time T S tixi
- Structural constraints
- x4 d3,4 d4,5
- x3 d3,5 d4,5 x5
9Architecture Modeling
- Architecture modeling on basic block (BB) level
Local running time intervals t - Source code tracing and instruction timing tables
- Cycle true simulation
- Conservative overheads for local basic block
simulation must cover - Register spills
- Pipeline stalls
- Cache misses
- because the basic block execution sequence is
not considered
10Previous Formal Analysis Approaches
- Architecture modeling on basic block level
Overhead for every basic block
- Puschner and Koza
- Park and Shaw
- Li and Malik
- Hergenhahn and Rosenstiel
- Ferdinand, Theiling and Wilhelm
- Stappert and Altenbernd
11Analysis Precision Basic Blocks
- Conservative overheads need to be added to the
necessary overestimations
- Goal Reduction of the overheads
- Idea Extension of basic blocks considering
predictable control flow
12Extension of Basic Blocks
- Data independent control flow (paths)
- Global solution ILP on segments instead of basic
blocks
13Analysis Precision Process Segments
- The consideration of basic block sequences
- improves analysis precision
- potentially reduces analysis problem size
- reduces functional constraint annotation
- Local compiler optimization is allowed
14Cache Analysis Related Work
- Caches have a significant impact on the process
running time (large overheads) - Formal cache analysis approaches determine
overheads for basic blocks - Li and Malik Cache state transition graph
- Difficult annotations from the designer are
needed - The ILP problem can get very complex
- No full consideration of basic block sequences
- Ferdinand et al. Abstract Interpretation
- Healy et al. Local simulation of loop nests ...
15Local Cache Simulation
- Process segments Local simulation
- Instruction cache Segment address sequence is
known, local simulation using first hit/miss - Data cache Access addresses are needed
- hit/miss for unknown data accesses
- Access sequence in program segments is often only
depending on loops (ajaj1) - This single data sequence is covered by local
simulation of process segments (first hit/miss) - Goal Reduction of first hit/miss assumptions and
the resulting overheads for segment beginnings
16Global Cache Analysis
- No first miss for cache set CS1 in segment PrS2
- Reduction of the overheads for the start of local
cache simulation (not only first hit/miss ) - Global analysis on PrS with reduced overheads
17Using DFA for Global Cache Analysis
- A definition of a cache set during PrS simulation
is - a priority change (miss) when reading the
I-/D-cache - every writing to the D-cache
- The gensetPrS-set, killsetPrS-set,
insetPrS-set and outsetPrS-set can be defined - The set of definitions leaving the PrS is
composed by the set of definitions in the PrS
plus the set of definitions entering the PrS that
are not replaced - The insetPrS-sets are defined from the
intersections of the predecessor outsetPrS-sets - Refined insetPrS-sets reduce overheads
18Experiments
- SYMTA tool suite Implementation of the concepts
- Experiment
- Exact bounds as a reference
- Analysis on the basic block level
- Consideration of program segments
- Consideration of set definition propagation (SDP)
- Target StrongARM SA-110, GNU compiler
- Segment/basic block simulation in one file and
isolated files - Consideration of compiler optimization - O1
19Result I
Intervals for isolated PrS, no optimization in
ms
Benchmark Exact arrcalc 19.45,20.37
chkdata 15.62,20.72 bsort
58.69,104.6 circle 47.96,151.1 FIRfilter
72.15,100.0 countsort 38.10,41.47 exchsort
43.18,43.96
ILP on BB lb 2.305,206.9
4 0.582,226.0 2 3.484,3046
2 4.269,622.1 1 38.53,2566
4 15.77,1079 2 17.46,1164 2
ILP on Seg. 3.339,29.92 1.233,152.2 8.696,1
316 4.287,154.4 42.99,158.9 16.28,475.9 1
9.40,237.9
ILP on SDP 9.200,28.93 9.039,39.82 15.09,84
6.2 5.962,153.5 60.17,136.5 29.50,290.5 30
.51,49.34
20Result I
21Result II
22Result III
23Result VI
24Conclusion
- The extension from basic blocks to program
segments improves formal running time analysis
precision - The combination of set definition propagation and
local simulation improves instruction and data
cache analysis precision - The approach can be applied using a variety of
target architectures because of decoupled path
analysis, cache analysis and processor modeling
25