FaultTolerant Design for LongLife Deep Space Missions - PowerPoint PPT Presentation

1 / 29
About This Presentation
Title:

FaultTolerant Design for LongLife Deep Space Missions

Description:

Galileo design borrows heavily from the experiences of the Voyager ... Galileo ... Galileo on-board fault detection software is designed to alleviate the effects ... – PowerPoint PPT presentation

Number of Views:193
Avg rating:3.0/5.0
Slides: 30
Provided by: cmpeBo
Category:

less

Transcript and Presenter's Notes

Title: FaultTolerant Design for LongLife Deep Space Missions


1
Fault-Tolerant Design for Long-Life Deep Space
Missions
  • Yigit Kültür
  • 2006702835

2
Contents
  • Introduction
  • Fault-Tolerant System Considerations and
    Techniques
  • Historical Perspective
  • Future Approach
  • Conclusion

3
Introduction
  • Recently, planet Mars has been at the focal point
    of astronomical attention because Mars will play
    a key role in humanitys expansion to the deep
    space
  • Future Mars transportation will require reliable
    operations over a lifespan of years unlike
  • Space Shuttle which requires operations over
    months
  • Space Station which is close enough to the Earth
    for maintenance logistics

4
Introduction
  • Long operation period associated with deep space
    missions demands
  • Innovative fault-tolerant technology development
  • Applications of advanced redundancy techniques
  • To enable Mars exploration safety, reliability
    and autonomy must be improved
  • A new technology plan to guide the development of
    the next generation fault tolerant computing
    technology

5
Fault Tolerant System Considerations
  • Traditionally, avionic systems achieved
    fault-tolerance through redundancy management
  • Redundancy management technique
  • Detects and isolates a failure
  • Performs hardware roconfiguration
  • A combination of self-monitoring and
    cross-comparison strategies lead to comprehensive
    fault coverage at reduced risk and cost

6
Fault Tolerant System Considerations
  • Primary Flight Control System (PFCS) Baseline
    Requirements
  • Mission reliability 0.95 success probability at
    10 years with no repair
  • Throughput 100 million instructions per second
    (MIPS)
  • Expandable I/O 100 Mbits/sec
  • Expandable Memory 1 GByte
  • Mass Storage Capacity 1 Terabyte
  • Cycle Rate 100 Hz
  • Hardware N-fail operation
  • Low life-cycle cost
  • Low power and mass
  • Radiation tolerance
  • Building block approach(Look for existing
    soultions to the parts of the problem and combine
    the soluitons)

7
Fault Tolerant Techniques for Mars Applications
  • Ultra-reliable systems for long-life applications
    like human Mars exploration are required to
    sustain
  • Permanent faults
  • Transient (temporary) faults
  • Intermittent (not continuous) faults
  • Timing faults
  • Latent (hidden) faults
  • Worst-case fault scenarios with a lower
    probability of occurence

8
Fault Tolerant Techniques for Mars Applications
  • Distributed Architectures are more suitable to
    long-life space applications
  • Function integration
  • Parallel computation
  • Graceful performance growth
  • Selective technology upgrade
  • Appropriate levels of function reliability
  • Graceful degradation of system capabilities in
    the presence of faults
  • Efficient use of hardware resources

9
Historical Perspective
  • Long-Life Unmanned Redundant Systems

Viking
Voyager
Galileo
10
Historical Perspective
  • Safety Critical High Reliability Systems

Columbia Challenger Discovery
Atlantis Endeavour
11
Long-Life Unmanned Redundant SystemsViking
  • Viking is an instance of the pre-1970
    Thermoelectric Outer Planets Spacecraft (TOPS)
    concept
  • This spacecraft firstly introduced the use of
    computer as a fault manager, to attempt to
    reconfigure and restore the spacecraft to an
    operational configuration
  • Fundamental strategy was to switch power on and
    off to various alternative subsystems until
    either the built-in fault monitoring indicated
    operation was restored, or until commands from
    the Earth are detected in the case of faults in
    the communication chain
  • There was no real-time masking of faults, so if a
    fault occured during a maneuver, an incorrect
    maneuver would have been performed

Viking Fault-Tolerant Architecture
CCS Command Computer Subsystem FDS Flight Data
Subsytem
12
Long-Life Unmanned Redundant SystemsVoyager
  • Like Viking, Voyager is an instance of the
    pre-1970 Thermoelectric Outer Planets Spacecraft
    (TOPS) concept.
  • The improvement according to Viking is in only
    limited ways, such as the addition of a pair of
    seperate computers for the attitude and
    articulation control
  • In both of them standby redundancy was used. The
    standby spares where cross-strapped so that
    either unit could be switched in to communicate
    with the other units
  • Cross-strapping and switching allowed
    reconfiguration around failed components, either
    automatically or by the ground command

Voyager Fault-Tolerant Architecture
CCS Command Computer Subsystem FDS Flight Data
Subsytem AACS Attitude and Articulation Control
Subsystem
13
Long-Life Unmanned Redundant SystemsGalileo
  • Galileo mission is a follow on to the Voyager
    Jupiter fly-by mission
  • Galileo design borrows heavily from the
    experiences of the Voyager
  • Block redundancy (An error checking method that
    generates a longitudal parity byte from a
    specified string or block of bytes on a
    longitudinal track.) is used throughout the
    subsystems
  • All except CDS operates as an active/standby pair
  • CDS operates as active redundancy wherein each
    block can issue independent commands, or they can
    operate in parallel on the same critical activity

Galileo Fault-Tolerant Architecture
CDS Command and Data Subsystem AACS Attitude
and Articulation Control Subsystem
14
Long-Life Unmanned Redundant SystemsGalileo
  • The major departure from the Voyager arcihtecture
    is the extensive use of microprocessors and the
    consequent use of bus oriented architecture to
    facilitate communications among them
  • Galileo on-board fault detection software is
    designed to alleviate the effects and symptoms of
    faults, rather than to pinpoint the exact faults.
  • Fault identification and isolation are performed
    by the ground intervention

Galileo Fault-Tolerant Architecture
CDS Command and Data Subsystem AACS Attitude
and Articulation Control Subsystem
15
Safety Critical High Reliability SystemsShuttles
  • Operational differences from planetary probes
  • being absolutely certain no fault propagates to
    the effectors during a relatively shorter
    operation cycle
  • rather than relying on fault monitors to
    interrupt processing and going through a
    reconfiguration, powering several redundant
    strings on and operating in parallel

16
Safety Critical High Reliability SystemsShuttles
  • Voting occurs both in General Purpose Computers
    (GPCs) and at the final effectors
  • Voting is much more brute force than fault
    moitoring, requiring more hardware but also
    providing greater fault coverage
  • Much more suited to real-time safety-critical
    maneuver control than a reconfiguration oriented
    strategy as in Viking, Voyager and Galileo

Conceptual Shuttle Orbiter Fault-Tolerant
Architecture
GPC General Purpose Computer
17
Mars Advanced Fault Tolerant Computing
ApproachFuture Manned Mars Missions
  • Parallel-Hybrid Redundancy will be the base for
    future long-life deep space missions
  • It combines the attractive features of parallel
    processing and redundant computation
  • Computational elements can be arranged to provide
    high throughput or ultra reliability or a
    combination of them depending on the mission
    phase

18
Mars Advanced Fault Tolerant Computing
ApproachFuture Manned Mars Missions
  • Parallel-Hybrid Redundancy was first used in 1979
    when Fault Tolerant Multi-Processor (FTMP) was
    designed and built
  • FTMP used conventional shared memory
    multiprocessor architecture
  • Each virtual processor consisted of three real
    processors working as a triad to provide
    real-time fault masking
  • Upon detection of a fault in a processor, faulty
    unit is replaced from a pool of spares

19
Mars Advanced Fault Tolerant Computing
ApproachFuture Manned Mars Missions
  • Parallel-Hybrid Redundancy had certain drawbacks
  • It was not explicitly designed to meet rigorous
    requirements of Byzantine resilience (Correctly
    functioning components of a Byzantine fault
    tolerant system will be able to reach the same
    group decisions regardless of Byzantine faulty
    components ) which is necessary to provide
  • Coverage of random hardware faults
  • Ultra-high reliability
  • Ease of validation
  • It lacked ease of expandability due to redundant
    bus connections between processors and main
    memory
  • It did not support mixed redundancy because
    processors are aranged to work in triads
    regardless of the criticality of the application

20
Mars Advanced Fault Tolerant Computing
ApproachFuture Manned Mars Missions
  • To solve the deficiencies of FTMP a new
    architecture called Fault Tolerant Parallel
    Processor (FTPP) was conceived
  • It meets all requirements of random hardware
    faults
  • FTPP will be the base of fault tolerance for
    future manned Mars missions

FTPP Arcihtecture
21
Mars Advanced Fault Tolerant Computing
ApproachFeatures of FTPP Parallel Procesing
  • Parallel Processing is provided by
  • 40 Processing Elements (PEs) in 5 Fault
    Containment Regions (FCRs)
  • 2 Input/Output Controllers (IOCs) per FCR

FTPP Arcihtecture
22
Mars Advanced Fault Tolerant Computing
ApproachFeatures of FTPP Scalable Performance
  • Increasing the number of PEs in a single cluster
    create a communication bottleneck in the Network
    Elements (NEs)
  • FTPP relies on hierarchical approach to scaling
    the performance by assebmling clusters via IOCs

FTPP Arcihtecture
23
Mars Advanced Fault Tolerant Computing
ApproachFeatures of FTPP Mixed Redundancy
  • Most fault tolerant computers are designed to
    operate in a redundant mode only, which is a
    waste of resources for the uncritical tasks
  • FTPP allows the processing elements to be
    configured as
  • Simplexnon-critical tasks
  • Triplextasks that require real-time fault
    masking
  • Quadruplex or higher when two or more sequential
    faults must be tolerated in a small time window
    without the benefit of reconfiguration
  • In the figure
  • 4 quads
  • 3 triplexes
  • 15 simplexes

FTPP Arcihtecture
24
Mars Advanced Fault Tolerant Computing
ApproachFeatures of FTPP Dynamic
Reconfiguration
  • Mission consists of several phases such as
    launch, ascent, cruise from Earth orbit to Mars,
    Mars orbit injection, Mars landing
  • For each phase the throughput, latency, iteration
    rates and criticality changes over a wide range,
    therefore the arcihecture must be flexible
  • Reconfiguration from high throughput to high
    reliability
  • 3 PEs which are operating as independent simplex
    elements can be synchronized to run the same task
    (S2,S3,S13)
  • Replacing failed members
  • A simplex in the same FCR as the failed member is
    synchronized with the non-failed members of the
    virtual group(Channel A of Q1 fails?S2,S7 or S12
    can replace)

FTPP Arcihtecture
25
Mars Advanced Fault Tolerant Computing
ApproachFeatures of FTPP Low Fault Tolerance
Overhead
  • Frequent fault tolerant related functions such as
    fault/error detection, error masking(voting) and
    synchronization are implemented in the Network
    Element
  • Less frequent functions such as identification of
    faulty modules, reconfiguration and reintegration
    are implemented in software which executes on
    PEs.
  • Each NE services 8 PEs

FTPP Arcihtecture
26
Mars Advanced Fault Tolerant Computing
ApproachFeatures of FTPP Open Architecture
  • FTTP provides open architecture for both hardware
    and software including
  • Processors
  • I/O modules
  • Fiber optic links
  • Operating Systems

FTPP Arcihtecture
27
Mars Advanced Fault Tolerant Computing
ApproachFeatures of FTPP Small Physical Size
  • Key element of meeting the weight, volume and
    power requirements is the packaging technology
  • Multi-Chip Modules (MCMs) will be used
  • A NE on a single MCM with less than 4 cm2

FTPP Arcihtecture
28
Conclusion
  • Future manned deep space missions will require
    reliable operation over years and real-time
    masking of critical faults
  • Current approaches are not enough and a new fault
    tolerant approach is needed
  • FTPP is a powerful candidate for the spacecraft
    which will bring the humans to Mars

29
References
  • Advanced fault tolerant computing for future
    manned space missionsBenjamin, A.L. Lala,
    J.H.Digital Avionics Systems Conference, 1997.
    16th DASC., AIAA/IEEEVolume 2,  26-30 Oct. 1997
    Page(s)8.5 - 26-8.5-32 vol.2
  • NASA Website
  • Computers in Spaceflight The NASA Experience
    http//www.hq.nasa.gov/office/pao/History/computer
    s/Ch6-2.html
  • NASA Jet Propulison Laboratory Website
  • Voyager The Interstellar Mission
  • http//voyager.jpl.nasa.gov/spacecraft/index.html
Write a Comment
User Comments (0)
About PowerShow.com