Associative Caches in Formal Software Timing Analysis - PowerPoint PPT Presentation

1 / 25
About This Presentation
Title:

Associative Caches in Formal Software Timing Analysis

Description:

Volkswagen AG, Wolfsburg, Germany. Jan Staschulat, Rolf Ernst ... The amount of software in embedded systems grows rapidly. Many innovations in automotive ... – PowerPoint PPT presentation

Number of Views:23
Avg rating:3.0/5.0
Slides: 26
Provided by: fabia7
Category:

less

Transcript and Presenter's Notes

Title: Associative Caches in Formal Software Timing Analysis


1
Associative Caches inFormal Software Timing
Analysis
  • Fabian Wolf
  • Volkswagen AG, Wolfsburg, Germany
  • Jan Staschulat, Rolf Ernst
  • Technical University of Braunschweig,
    Germany

2
Outline
  • Introduction
  • Running time analysis using program segments
  • Local cache simulation
  • Data flow based cache analysis
  • Experiments
  • Conclusion

3
Motivation
  • The amount of software in embedded systems grows
    rapidly
  • Many innovations in automotive systems are based
    on software functions
  • Analysis of software running time
  • Guarantees for hard real-time constraints
    (Fuel mass calculation, Ignition timing, ...)
  • System performance and throughput
    (Data transport, ...)
  • Verification of non-functional software
    properties becomes essential in design automation

4
Introduction
  • Software running time is input data dependent
  • Process control flow
  • Assembly instruction execution
  • Software properties Behavioral Intervals
  • Running time
  • Power consumption
  • Communicated data

5
Software Running Time and Caches
  • Influences on software running time
  • Context switch time
  • Communication time
  • Cache behavior
  • Core execution time
  • Caches have a significant influence
  • Always hit assumptions are not conservative
  • Always miss assumptions significantly
    overestimate the process running time
  • Safe cache analysis can decrease system cost

6
Timing Analysis by Simulation
  • Running time is often determined by simulation
  • Test patterns selection for the input data is
  • unsafe (critical cases)
  • complex (unnecessary cases)

7
Formal Software Timing Analysis
  • Conservative formal approaches overestimate the
    exact running time interval
  • not critical (real-time guarantees)
  • expensive designs
  • Goal Overestimation must be minimized
  • Formal Analysis Separation of path analysis and
    architecture modeling

8
Path Analysis
  • T, t, (d,) x are Intervals Solving two ILP for x

for(j0jlt15j) if(jlt3) ajaj1
  • Running time T S tixi
  • Structural constraints
  • x4 d3,4 d4,5
  • x3 d3,5 d4,5 x5

9
Architecture Modeling
  • Architecture modeling on basic block (BB) level
    Local running time intervals t
  • Source code tracing and instruction timing tables
  • Cycle true simulation
  • Conservative overheads for local basic block
    simulation must cover
  • Register spills
  • Pipeline stalls
  • Cache misses
  • because the basic block execution sequence is
    not considered

10
Previous Formal Analysis Approaches
  • Architecture modeling on basic block level
    Overhead for every basic block
  • Puschner and Koza
  • Park and Shaw
  • Li and Malik
  • Hergenhahn and Rosenstiel
  • Ferdinand, Theiling and Wilhelm
  • Stappert and Altenbernd

11
Analysis Precision Basic Blocks
  • Conservative overheads need to be added to the
    necessary overestimations
  • Goal Reduction of the overheads
  • Idea Extension of basic blocks considering
    predictable control flow

12
Extension of Basic Blocks
  • Data independent control flow (paths)
  • Global solution ILP on segments instead of basic
    blocks

13
Analysis Precision Process Segments
  • Overheads can be reduced
  • The consideration of basic block sequences
  • improves analysis precision
  • potentially reduces analysis problem size
  • reduces functional constraint annotation
  • Local compiler optimization is allowed

14
Cache Analysis Related Work
  • Caches have a significant impact on the process
    running time (large overheads)
  • Formal cache analysis approaches determine
    overheads for basic blocks
  • Li and Malik Cache state transition graph
  • Difficult annotations from the designer are
    needed
  • The ILP problem can get very complex
  • No full consideration of basic block sequences
  • Ferdinand et al. Abstract Interpretation
  • Healy et al. Local simulation of loop nests ...

15
Local Cache Simulation
  • Process segments Local simulation
  • Instruction cache Segment address sequence is
    known, local simulation using first hit/miss
  • Data cache Access addresses are needed
  • hit/miss for unknown data accesses
  • Access sequence in program segments is often only
    depending on loops (ajaj1)
  • This single data sequence is covered by local
    simulation of process segments (first hit/miss)
  • Goal Reduction of first hit/miss assumptions and
    the resulting overheads for segment beginnings

16
Global Cache Analysis
  • No first miss for cache set CS1 in segment PrS2
  • Reduction of the overheads for the start of local
    cache simulation (not only first hit/miss )
  • Global analysis on PrS with reduced overheads

17
Using DFA for Global Cache Analysis
  • A definition of a cache set during PrS simulation
    is
  • a priority change (miss) when reading the
    I-/D-cache
  • every writing to the D-cache
  • The gensetPrS-set, killsetPrS-set,
    insetPrS-set and outsetPrS-set can be defined
  • The set of definitions leaving the PrS is
    composed by the set of definitions in the PrS
    plus the set of definitions entering the PrS that
    are not replaced
  • The insetPrS-sets are defined from the
    intersections of the predecessor outsetPrS-sets
  • Refined insetPrS-sets reduce overheads

18
Experiments
  • SYMTA tool suite Implementation of the concepts
  • Experiment
  • Exact bounds as a reference
  • Analysis on the basic block level
  • Consideration of program segments
  • Consideration of set definition propagation (SDP)
  • Target StrongARM SA-110, GNU compiler
  • Segment/basic block simulation in one file and
    isolated files
  • Consideration of compiler optimization - O1

19
Result I
Intervals for isolated PrS, no optimization in
ms
Benchmark Exact arrcalc 19.45,20.37
chkdata 15.62,20.72 bsort
58.69,104.6 circle 47.96,151.1 FIRfilter
72.15,100.0 countsort 38.10,41.47 exchsort
43.18,43.96
ILP on BB lb 2.305,206.9
4 0.582,226.0 2 3.484,3046
2 4.269,622.1 1 38.53,2566
4 15.77,1079 2 17.46,1164 2
ILP on Seg. 3.339,29.92 1.233,152.2 8.696,1
316 4.287,154.4 42.99,158.9 16.28,475.9 1
9.40,237.9
ILP on SDP 9.200,28.93 9.039,39.82 15.09,84
6.2 5.962,153.5 60.17,136.5 29.50,290.5 30
.51,49.34
20
Result I
21
Result II
22
Result III
23
Result VI
24
Conclusion
  • The extension from basic blocks to program
    segments improves formal running time analysis
    precision
  • The combination of set definition propagation and
    local simulation improves instruction and data
    cache analysis precision
  • The approach can be applied using a variety of
    target architectures because of decoupled path
    analysis, cache analysis and processor modeling

25
  • Thank you !
Write a Comment
User Comments (0)
About PowerShow.com