Associative Caches in Formal Software Timing Analysis - PowerPoint PPT Presentation

1 / 25

About This Presentation

Title:

Associative Caches in Formal Software Timing Analysis

Description:

Volkswagen AG, Wolfsburg, Germany. Jan Staschulat, Rolf Ernst ... The amount of software in embedded systems grows rapidly. Many innovations in automotive ... – PowerPoint PPT presentation

Number of Views:23

Avg rating:3.0/5.0

Slides: 26

Provided by: fabia7

Category:

more less

Transcript and Presenter's Notes

Title: Associative Caches in Formal Software Timing Analysis

1
Associative Caches inFormal Software Timing
Analysis

Fabian Wolf
Volkswagen AG, Wolfsburg, Germany
Jan Staschulat, Rolf Ernst
Technical University of Braunschweig,
Germany

2
Outline

Introduction
Running time analysis using program segments
Local cache simulation
Data flow based cache analysis
Experiments
Conclusion

3
Motivation

The amount of software in embedded systems grows
rapidly
Many innovations in automotive systems are based
on software functions
Analysis of software running time
Guarantees for hard real-time constraints
(Fuel mass calculation, Ignition timing, ...)
System performance and throughput
(Data transport, ...)
Verification of non-functional software
properties becomes essential in design automation

4
Introduction

Software running time is input data dependent
Process control flow
Assembly instruction execution
Software properties Behavioral Intervals
Running time
Power consumption
Communicated data

5
Software Running Time and Caches

Influences on software running time
Context switch time
Communication time
Cache behavior
Core execution time

Caches have a significant influence
Always hit assumptions are not conservative
Always miss assumptions significantly
overestimate the process running time
Safe cache analysis can decrease system cost

6
Timing Analysis by Simulation

Running time is often determined by simulation

Test patterns selection for the input data is
unsafe (critical cases)
complex (unnecessary cases)

7
Formal Software Timing Analysis

Conservative formal approaches overestimate the
exact running time interval
not critical (real-time guarantees)
expensive designs

Goal Overestimation must be minimized
Formal Analysis Separation of path analysis and
architecture modeling

8
Path Analysis

T, t, (d,) x are Intervals Solving two ILP for x

for(j0jlt15j) if(jlt3) ajaj1

Running time T S tixi
Structural constraints
x4 d3,4 d4,5
x3 d3,5 d4,5 x5

9
Architecture Modeling

Architecture modeling on basic block (BB) level
Local running time intervals t
Source code tracing and instruction timing tables
Cycle true simulation
Conservative overheads for local basic block
simulation must cover
Register spills
Pipeline stalls
Cache misses
because the basic block execution sequence is
not considered

10
Previous Formal Analysis Approaches

Architecture modeling on basic block level
Overhead for every basic block

Puschner and Koza
Park and Shaw
Li and Malik
Hergenhahn and Rosenstiel
Ferdinand, Theiling and Wilhelm
Stappert and Altenbernd

11
Analysis Precision Basic Blocks

Conservative overheads need to be added to the
necessary overestimations

Goal Reduction of the overheads
Idea Extension of basic blocks considering
predictable control flow

12
Extension of Basic Blocks

Data independent control flow (paths)

Global solution ILP on segments instead of basic
blocks

13
Analysis Precision Process Segments

Overheads can be reduced

The consideration of basic block sequences
improves analysis precision
potentially reduces analysis problem size
reduces functional constraint annotation
Local compiler optimization is allowed

14
Cache Analysis Related Work

Caches have a significant impact on the process
running time (large overheads)
Formal cache analysis approaches determine
overheads for basic blocks
Li and Malik Cache state transition graph
Difficult annotations from the designer are
needed
The ILP problem can get very complex
No full consideration of basic block sequences
Ferdinand et al. Abstract Interpretation
Healy et al. Local simulation of loop nests ...

15
Local Cache Simulation

Process segments Local simulation
Instruction cache Segment address sequence is
known, local simulation using first hit/miss
Data cache Access addresses are needed
hit/miss for unknown data accesses
Access sequence in program segments is often only
depending on loops (ajaj1)
This single data sequence is covered by local
simulation of process segments (first hit/miss)
Goal Reduction of first hit/miss assumptions and
the resulting overheads for segment beginnings

16
Global Cache Analysis

No first miss for cache set CS1 in segment PrS2
Reduction of the overheads for the start of local
cache simulation (not only first hit/miss )
Global analysis on PrS with reduced overheads

17
Using DFA for Global Cache Analysis

A definition of a cache set during PrS simulation
is
a priority change (miss) when reading the
I-/D-cache
every writing to the D-cache
The gensetPrS-set, killsetPrS-set,
insetPrS-set and outsetPrS-set can be defined
The set of definitions leaving the PrS is
composed by the set of definitions in the PrS
plus the set of definitions entering the PrS that
are not replaced
The insetPrS-sets are defined from the
intersections of the predecessor outsetPrS-sets
Refined insetPrS-sets reduce overheads

18
Experiments

SYMTA tool suite Implementation of the concepts
Experiment
Exact bounds as a reference
Analysis on the basic block level
Consideration of program segments
Consideration of set definition propagation (SDP)
Target StrongARM SA-110, GNU compiler
Segment/basic block simulation in one file and
isolated files
Consideration of compiler optimization - O1

19
Result I
Intervals for isolated PrS, no optimization in
ms
Benchmark Exact arrcalc 19.45,20.37
chkdata 15.62,20.72 bsort
58.69,104.6 circle 47.96,151.1 FIRfilter
72.15,100.0 countsort 38.10,41.47 exchsort
43.18,43.96
ILP on BB lb 2.305,206.9
4 0.582,226.0 2 3.484,3046
2 4.269,622.1 1 38.53,2566
4 15.77,1079 2 17.46,1164 2
ILP on Seg. 3.339,29.92 1.233,152.2 8.696,1
316 4.287,154.4 42.99,158.9 16.28,475.9 1
9.40,237.9
ILP on SDP 9.200,28.93 9.039,39.82 15.09,84
6.2 5.962,153.5 60.17,136.5 29.50,290.5 30
.51,49.34
20
Result I
21
Result II
22
Result III
23
Result VI
24
Conclusion

The extension from basic blocks to program
segments improves formal running time analysis
precision
The combination of set definition propagation and
local simulation improves instruction and data
cache analysis precision
The approach can be applied using a variety of
target architectures because of decoupled path
analysis, cache analysis and processor modeling