Bus Architectures for Satety-Critical Embedded Systems - PowerPoint PPT Presentation

1 / 18

About This Presentation

Title:

Bus Architectures for Satety-Critical Embedded Systems

Description:

Number of Views:111

Avg rating:3.0/5.0

Slides: 19

Provided by: Monday

Learn more at: http://vega.cs.kent.edu

Category:

more less

Transcript and Presenter's Notes

Title: Bus Architectures for Satety-Critical Embedded Systems

1
Bus Architectures for Satety-Critical Embedded
Systems

2
Introduction

Safety-critical systems are federated
Each function has its own fault tolerant embedded
control system with minor interconnections.
Provides strong barrier to fault propagation.
Federated approach is expensive(replication)

3
host
host
host
host
interface
Bus interconnect
4
Buses

5
Why not event triggered system ?

In safety-critical system it is necessary to
guarantee some basic quality of service, even in
presence of faults.
Guaranteed low latency is required.
Events arriving at different nodes may have to
contend for access to the bus
So, some form of media access control is required
Ethernet resolves contention probabilistically
To resolve contention deterministically, lowest
number wins the arbitration but latency increases
as the load increases.

In presence of faults, message may be
retransmitted thereby delaying the next message
even if it has higher priority.
Furthermore, faulty nodes may make excessive
demands for service.
ARNIC 629, uses a technique called minislotting
Each node has to wait a certain period after
sending a message before it can contend to send
another
But here also, latency is function of load
Byteflight (BMW) extends this mechanism with
guaranteed, preallocated slots for critical
messages
Provides no protection against a faulty node that
fails to recognize them, this kind of fault is
called the babbling idiot failure.

7
Time-triggered bus

Static preallocation of communication bandwidth
in the form of a global schedule
Thus , contention is resolve at design time
rather than at run time.
But what about babbling idiot failure.
Each node has an independent component, called a
bus guardian,that allows to transmit only when
its allowed to do so.
Guardian has an independent clock and independent
knowledge of the schedule and allows its node to
broadcast only when indicated by schedule.
No need for source or destination address in the
message
This reduces the size of the message.
Increases the message bandwidth of the bus.

8
Continued

Fault-tolerant clock synchronization is a
fundamental requirement for a time-triggered bus
architecture.
Abstraction of global clock is realized by each
node having a local clock that is closely
synchronized with the clocks of all other nodes.

9
Fault hypotheses and Fault Containment Units

Fault hypotheses must describe
The modes of faults that are to be tolerated
Their maximum number
And arrival rates.
It must also identify different fault containment
units.(FCU)
There must be no propagation of faults from one
FCU to another.
And no common mode failures meaning a single
physical event produces faults in multiple FCUs.
Fault may exhibit different modes at different
levels of protocol hierarchy.
Example at electrical level ? intermediate
voltage
at message level ? byzantine
failure
Such faults must be controlled at underlying
intermediate level

10
Basic Dimensions of faults

Faults can affect value, time or space.
Value fault causes an incorrect value to be
computed, transmitted or received.
Timing fault causes value to be computed,
transmitted or received at wrong time.
Spatial proximitywhere all the matter in some
specified volume is destroyed.
Redundant buses come into close proximity at each
node.
Central hub topology is more resilient.

11
Fault Classification

Manifest fault can be reliably detected.
A fault that causes FCU to cease transmitting.
Symmetric meaning whatever the effect , it is
same for all observers
Arbitrary may be asymmetric or byzantine ,
meaning that its effect is perceived differently
by different observers.
Slightly out of specification (SOS) fault
Intermediate electrical voltage or a weak edge.

Redunduncy required for fault tolerance depends
on the type of fault considered.
number of FCUs required for clock synchronization
n gt 3a 2s m
where a ? arbitrary faults
s ? symmetric faults
m ? manifest faults

Some architectures can tolerate only one fault at
a time, then they reconfigure and are able to
tolerate additional faults.
In such architecture, fault arrival rate is very
important.
faults must not arrive faster than the
architecture can reconfigure
operates according to static schedules, which
consists rounds or frames that are executed
repeatedly.
acceptable fault arrival rate is expressed in
faults per rounds.
Sometimes system may experience many simultaneous
faults. (due to HIRF).
Restart is usually initiated.
detection of such failure and restart must be
very fast.
estimate of steer-by-wire automobile application
is 50ms.

14
Services

Basic purpose of these architectures is to build
reliable distributed application.
Basic services
clock synchronization
time-triggered activation
reliable message delivery
the problem of distributing data consistently in
presence of fault is variously called interactive
consistency
Agreement all nonfaulty receivers obtain the
same message.
Validity if the transmitter is nonfaulty, then
nonfaulty receivers obtain the message actually
sent.

failure notification or membership service.
service must produce consistent knowledge.
if one nonfaulty node thinks that a particular
node has failed then all the nonfaulty nodes must
hold the same opinion.
each node maintains a private membership list.
Agreement the membership lists of all nonfaulty
nodes are the same.
Validity the membership lists of all nonfaulty
nodes contain all nonfaulty nodes and atmost one
faulty node.
When unable to maintain accurate membership, best
resource is to maintain agreement, but sacrifice
validity.This weakened requirement is called
clique avoidance.

16
Practical Implementations

SAFEbus- develop by Honeywell for cockpit
displays
Interface or BIUs are duplicated. BIUs perform
clock synchronization, message scheduling and
transmission functions
each BIUs of a pair is a different FCU.
interconnect bus is quad-redundant.
each BIU of a pair drives a different pair of
interconnect buses but is able to read all of
four.
each interconnect bus comprise of two data lines
and a clock line and operate at 30MHz
it can handle arbitrary faults and a high rate of
fault arrivals.it also tolerates spatial
proximity faults.
considered to be the best , used in passenger
aircraft in Boeing 777.

SPIDER - Scalable Processor-Independent Design
for Electromagnetic Resilience
developed at NASA langley research center
its a research platform to explore recovery
strategies for radiation-induced (HIRF) faults.
uses star configuration, in which interface may
be located either with their hosts or in
centralized hosts.
services include interactive consistent message
broadcast and identification of failed nodes
(membership service).
FlexRay- developed for powertrain and chassis
control in cars.
more flexible than other buses
supports static time-triggered operation and
dynamic event triggered operation