Title: Fault Tolerant Design of Distributed Automotive Systems
1Fault Tolerant Design of Distributed Automotive
Systems
Claudio Pinello (pinello_at_eecs.berkeley.edu),
Prof. Sangiovanni-Vincentelli, UC Berkeley
Design flow
Architecture
Programming model Fault-tolerant Dataflow
Fault Behavior
- Connectivity
- bipartite graph Arch
- ECUs (Electronic Control Units)
- channels
- Actuator/Sensor location
- Performance
- matrix of actor/ECU execution times
- matrix of data/channel transmission times
- Actors have criticality, inputs may have fan-in
from redundant sources (replicas) - Execution is synchronous and periodic at each
period all tasks are executed (data driven or
time triggered), satisfying precedence
constraints - Inputs and Arbiters have partial firing rules
- Failure patterns Pi ? Arch
- subsets of Arch graph that may fail
simultaneously (in a same iteration) - For each Pi specify which functionalities must
be guaranteed - typically functionality chosen based on
criticality - Sample fault behavior
- all actors
- ECU0 or ECU1 or ECU2 only critical actors
- Metropolis library to model FTDF netlists
- Support for simulation, fault injection and
visualization - Early assessment of closed loop behavior in
degraded modes
FaultBehavior
Design space exploration
Introduction Designing cost-sensitive real-time
control systems for safety-critical applications
requires a careful analysis of the
cost/fault-coverage trade-offs. This further
complicates the tasks of deploying the
corresponding embedded SW on the execution
platform, typically distributed around the plant.
We propose a synthesis-based design methodology
that relieves designers from specifying how to
tolerate execution platform faults and involves
them in the definition of the overall
fault-tolerance strategy how to address plant
faults (adaptive control algorithms), selection
of a cost-effective execution platform. Verificati
on tools analyze the solution to extract timing
and to check the fault behavior (replica
determinism, coverage, etc.). Finally a run-time
library is being developed for the deployment of
the resulting distributed system.
Parse.exe
SynDEx
- Verification provides timing coverage
- If not satisfactory?
- change architecture
- more/fewer components,
- vary the mix of performance
- change algorithms
- introduce pipelining, reduce/increase granularity
- change fault behavior
- degrade sooner/later
- provide hints to the synthesis tool
- replicate some actors, mapping constraints,
precedence constraints
Mapping
Schedule.exe
Timing analysis dynamic (shown) and
time-triggered execution
Specification
Coarse CTRL
ECU0
Sens
Act
CH0
Coarse CTRL
Arbiter Best
ECU1
Sens
Input
Output
CH1
Fine CTRL
Arbiter Best
ECU2
Sens
Input
Output
Act
Timing
Verification
Synthesis
Case Studies BMW, GM
Vehicle Level Data-Flow Architecture
Conclusions
System Faults
- Plant Faults (plant, sensors, actuators)
- estimation and control algorithms
- Application faults bugs
- can be reduced by disciplined coding
- code generation from formal models
- simulation
- formal verification
- Proposed design flow enables
- greater separation of concerns
- application, architecture, fault behavior
- formal specification and verification of fault
tolerant systems - design space exploration
- C. Pinello, L. P. Carloni, and A. L.
Sangiovanni-Vincentelli "Fault-Tolerant
Deployment of Embedded Software for
Cost-Sensitive Real-Time Feedback-Control
Applications," Proc. Conf. Design, Automation and
Test in Europe (DATE), February 2004
- Architecture faults (channels, ECUs)
- hardware redundancy
- software replication
- redundancy management