Bus Architectures for Satety-Critical Embedded Systems - PowerPoint PPT Presentation

1 / 18
About This Presentation
Title:

Bus Architectures for Satety-Critical Embedded Systems

Description:

Bus Architectures for Satety-Critical Embedded Systems--by Harit Desai Introduction Safety-critical systems are federated Each function has its own fault tolerant ... – PowerPoint PPT presentation

Number of Views:108
Avg rating:3.0/5.0
Slides: 19
Provided by: Monday
Learn more at: http://vega.cs.kent.edu
Category:

less

Transcript and Presenter's Notes

Title: Bus Architectures for Satety-Critical Embedded Systems


1
Bus Architectures for Satety-Critical Embedded
Systems
  • --by Harit Desai

2
Introduction
  • Safety-critical systems are federated
  • Each function has its own fault tolerant embedded
    control system with minor interconnections.
  • Provides strong barrier to fault propagation.
  • Federated approach is expensive(replication)

3
host
host
host
host
interface
Bus interconnect
4
Buses
  • Time-triggered buses
  • All activities are driven by passage of time.
  • Interacts with the environment according to
    internal schedule
  • Event triggered buses
  • All activities are driven by occurrence of events
  • under the control of environment and respond to
    stimuli as they occur.

5
Why not event triggered system ?
  • In safety-critical system it is necessary to
    guarantee some basic quality of service, even in
    presence of faults.
  • Guaranteed low latency is required.
  • Events arriving at different nodes may have to
    contend for access to the bus
  • So, some form of media access control is required
  • Ethernet resolves contention probabilistically
  • To resolve contention deterministically, lowest
    number wins the arbitration but latency increases
    as the load increases.

6
  • In presence of faults, message may be
    retransmitted thereby delaying the next message
    even if it has higher priority.
  • Furthermore, faulty nodes may make excessive
    demands for service.
  • ARNIC 629, uses a technique called minislotting
  • Each node has to wait a certain period after
    sending a message before it can contend to send
    another
  • But here also, latency is function of load
  • Byteflight (BMW) extends this mechanism with
    guaranteed, preallocated slots for critical
    messages
  • Provides no protection against a faulty node that
    fails to recognize them, this kind of fault is
    called the babbling idiot failure.

7
Time-triggered bus
  • Static preallocation of communication bandwidth
    in the form of a global schedule
  • Thus , contention is resolve at design time
    rather than at run time.
  • But what about babbling idiot failure.
  • Each node has an independent component, called a
    bus guardian,that allows to transmit only when
    its allowed to do so.
  • Guardian has an independent clock and independent
    knowledge of the schedule and allows its node to
    broadcast only when indicated by schedule.
  • No need for source or destination address in the
    message
  • This reduces the size of the message.
  • Increases the message bandwidth of the bus.

8
Continued
  • Fault-tolerant clock synchronization is a
    fundamental requirement for a time-triggered bus
    architecture.
  • Abstraction of global clock is realized by each
    node having a local clock that is closely
    synchronized with the clocks of all other nodes.

9
Fault hypotheses and Fault Containment Units
  • Fault hypotheses must describe
  • The modes of faults that are to be tolerated
  • Their maximum number
  • And arrival rates.
  • It must also identify different fault containment
    units.(FCU)
  • There must be no propagation of faults from one
    FCU to another.
  • And no common mode failures meaning a single
    physical event produces faults in multiple FCUs.
  • Fault may exhibit different modes at different
    levels of protocol hierarchy.
  • Example at electrical level ? intermediate
    voltage
  • at message level ? byzantine
    failure
  • Such faults must be controlled at underlying
    intermediate level

10
Basic Dimensions of faults
  • Faults can affect value, time or space.
  • Value fault causes an incorrect value to be
    computed, transmitted or received.
  • Timing fault causes value to be computed,
    transmitted or received at wrong time.
  • Spatial proximitywhere all the matter in some
    specified volume is destroyed.
  • Redundant buses come into close proximity at each
    node.
  • Central hub topology is more resilient.

11
Fault Classification
  • Manifest fault can be reliably detected.
  • A fault that causes FCU to cease transmitting.
  • Symmetric meaning whatever the effect , it is
    same for all observers
  • Arbitrary may be asymmetric or byzantine ,
    meaning that its effect is perceived differently
    by different observers.
  • Slightly out of specification (SOS) fault
  • Intermediate electrical voltage or a weak edge.

12
  • Redunduncy required for fault tolerance depends
    on the type of fault considered.
  • number of FCUs required for clock synchronization
  • n gt 3a 2s m
  • where a ? arbitrary faults
  • s ? symmetric faults
  • m ? manifest faults

13
  • Some architectures can tolerate only one fault at
    a time, then they reconfigure and are able to
    tolerate additional faults.
  • In such architecture, fault arrival rate is very
    important.
  • faults must not arrive faster than the
    architecture can reconfigure
  • operates according to static schedules, which
    consists rounds or frames that are executed
    repeatedly.
  • acceptable fault arrival rate is expressed in
    faults per rounds.
  • Sometimes system may experience many simultaneous
    faults. (due to HIRF).
  • Restart is usually initiated.
  • detection of such failure and restart must be
    very fast.
  • estimate of steer-by-wire automobile application
    is 50ms.

14
Services
  • Basic purpose of these architectures is to build
    reliable distributed application.
  • Basic services
  • clock synchronization
  • time-triggered activation
  • reliable message delivery
  • the problem of distributing data consistently in
    presence of fault is variously called interactive
    consistency
  • Agreement all nonfaulty receivers obtain the
    same message.
  • Validity if the transmitter is nonfaulty, then
    nonfaulty receivers obtain the message actually
    sent.

15
  • failure notification or membership service.
  • service must produce consistent knowledge.
  • if one nonfaulty node thinks that a particular
    node has failed then all the nonfaulty nodes must
    hold the same opinion.
  • each node maintains a private membership list.
  • Agreement the membership lists of all nonfaulty
    nodes are the same.
  • Validity the membership lists of all nonfaulty
    nodes contain all nonfaulty nodes and atmost one
    faulty node.
  • When unable to maintain accurate membership, best
    resource is to maintain agreement, but sacrifice
    validity.This weakened requirement is called
    clique avoidance.

16
Practical Implementations
  • SAFEbus- develop by Honeywell for cockpit
    displays
  • Interface or BIUs are duplicated. BIUs perform
    clock synchronization, message scheduling and
    transmission functions
  • each BIUs of a pair is a different FCU.
  • interconnect bus is quad-redundant.
  • each BIU of a pair drives a different pair of
    interconnect buses but is able to read all of
    four.
  • each interconnect bus comprise of two data lines
    and a clock line and operate at 30MHz
  • it can handle arbitrary faults and a high rate of
    fault arrivals.it also tolerates spatial
    proximity faults.
  • considered to be the best , used in passenger
    aircraft in Boeing 777.

17
  • SPIDER - Scalable Processor-Independent Design
    for Electromagnetic Resilience
  • developed at NASA langley research center
  • its a research platform to explore recovery
    strategies for radiation-induced (HIRF) faults.
  • uses star configuration, in which interface may
    be located either with their hosts or in
    centralized hosts.
  • services include interactive consistent message
    broadcast and identification of failed nodes
    (membership service).
  • FlexRay- developed for powertrain and chassis
    control in cars.
  • more flexible than other buses
  • supports static time-triggered operation and
    dynamic event triggered operation

18
(No Transcript)
Write a Comment
User Comments (0)
About PowerShow.com