FaultTolerant Design for LongLife Deep Space Missions - PowerPoint PPT Presentation

1 / 29

About This Presentation

Title:

FaultTolerant Design for LongLife Deep Space Missions

Description:

Galileo design borrows heavily from the experiences of the Voyager ... Galileo ... Galileo on-board fault detection software is designed to alleviate the effects ... – PowerPoint PPT presentation

Number of Views:193

Avg rating:3.0/5.0

Slides: 30

Provided by: cmpeBo

Category:

more less

Transcript and Presenter's Notes

Title: FaultTolerant Design for LongLife Deep Space Missions

1
Fault-Tolerant Design for Long-Life Deep Space
Missions

Yigit Kültür
2006702835

2
Contents

Introduction
Fault-Tolerant System Considerations and
Techniques
Historical Perspective
Future Approach
Conclusion

3
Introduction

Recently, planet Mars has been at the focal point
of astronomical attention because Mars will play
a key role in humanitys expansion to the deep
space
Future Mars transportation will require reliable
operations over a lifespan of years unlike
Space Shuttle which requires operations over
months
Space Station which is close enough to the Earth
for maintenance logistics

4
Introduction

Long operation period associated with deep space
missions demands
Innovative fault-tolerant technology development
Applications of advanced redundancy techniques
To enable Mars exploration safety, reliability
and autonomy must be improved
A new technology plan to guide the development of
the next generation fault tolerant computing
technology

5
Fault Tolerant System Considerations

Traditionally, avionic systems achieved
fault-tolerance through redundancy management
Redundancy management technique
Detects and isolates a failure
Performs hardware roconfiguration
A combination of self-monitoring and
cross-comparison strategies lead to comprehensive
fault coverage at reduced risk and cost

6
Fault Tolerant System Considerations

Primary Flight Control System (PFCS) Baseline
Requirements
Mission reliability 0.95 success probability at
10 years with no repair
Throughput 100 million instructions per second
(MIPS)
Expandable I/O 100 Mbits/sec
Expandable Memory 1 GByte
Mass Storage Capacity 1 Terabyte
Cycle Rate 100 Hz
Hardware N-fail operation
Low life-cycle cost
Low power and mass
Radiation tolerance
Building block approach(Look for existing
soultions to the parts of the problem and combine
the soluitons)

7
Fault Tolerant Techniques for Mars Applications

Ultra-reliable systems for long-life applications
like human Mars exploration are required to
sustain
Permanent faults
Transient (temporary) faults
Intermittent (not continuous) faults
Timing faults
Latent (hidden) faults
Worst-case fault scenarios with a lower
probability of occurence

8
Fault Tolerant Techniques for Mars Applications

Distributed Architectures are more suitable to
long-life space applications
Function integration
Parallel computation
Graceful performance growth
Selective technology upgrade
Appropriate levels of function reliability
Graceful degradation of system capabilities in
the presence of faults
Efficient use of hardware resources

9
Historical Perspective

Long-Life Unmanned Redundant Systems

Viking
Voyager
Galileo
10
Historical Perspective

Safety Critical High Reliability Systems

Columbia Challenger Discovery
Atlantis Endeavour
11
Long-Life Unmanned Redundant SystemsViking

Viking is an instance of the pre-1970
Thermoelectric Outer Planets Spacecraft (TOPS)
concept
This spacecraft firstly introduced the use of
computer as a fault manager, to attempt to
reconfigure and restore the spacecraft to an
operational configuration
Fundamental strategy was to switch power on and
off to various alternative subsystems until
either the built-in fault monitoring indicated
operation was restored, or until commands from
the Earth are detected in the case of faults in
the communication chain
There was no real-time masking of faults, so if a
fault occured during a maneuver, an incorrect
maneuver would have been performed

Viking Fault-Tolerant Architecture
CCS Command Computer Subsystem FDS Flight Data
Subsytem
12
Long-Life Unmanned Redundant SystemsVoyager

Like Viking, Voyager is an instance of the
pre-1970 Thermoelectric Outer Planets Spacecraft
(TOPS) concept.
The improvement according to Viking is in only
limited ways, such as the addition of a pair of
seperate computers for the attitude and
articulation control
In both of them standby redundancy was used. The
standby spares where cross-strapped so that
either unit could be switched in to communicate
with the other units
Cross-strapping and switching allowed
reconfiguration around failed components, either
automatically or by the ground command

Voyager Fault-Tolerant Architecture
CCS Command Computer Subsystem FDS Flight Data
Subsytem AACS Attitude and Articulation Control
Subsystem
13
Long-Life Unmanned Redundant SystemsGalileo

Galileo mission is a follow on to the Voyager
Jupiter fly-by mission
Galileo design borrows heavily from the
experiences of the Voyager
Block redundancy (An error checking method that
generates a longitudal parity byte from a
specified string or block of bytes on a
longitudinal track.) is used throughout the
subsystems
All except CDS operates as an active/standby pair
CDS operates as active redundancy wherein each
block can issue independent commands, or they can
operate in parallel on the same critical activity

Galileo Fault-Tolerant Architecture
CDS Command and Data Subsystem AACS Attitude
and Articulation Control Subsystem
14
Long-Life Unmanned Redundant SystemsGalileo

The major departure from the Voyager arcihtecture
is the extensive use of microprocessors and the
consequent use of bus oriented architecture to
facilitate communications among them
Galileo on-board fault detection software is
designed to alleviate the effects and symptoms of
faults, rather than to pinpoint the exact faults.
Fault identification and isolation are performed
by the ground intervention

Galileo Fault-Tolerant Architecture
CDS Command and Data Subsystem AACS Attitude
and Articulation Control Subsystem
15
Safety Critical High Reliability SystemsShuttles

Operational differences from planetary probes
being absolutely certain no fault propagates to
the effectors during a relatively shorter
operation cycle
rather than relying on fault monitors to
interrupt processing and going through a
reconfiguration, powering several redundant
strings on and operating in parallel

16
Safety Critical High Reliability SystemsShuttles

Voting occurs both in General Purpose Computers
(GPCs) and at the final effectors
Voting is much more brute force than fault
moitoring, requiring more hardware but also
providing greater fault coverage
Much more suited to real-time safety-critical
maneuver control than a reconfiguration oriented
strategy as in Viking, Voyager and Galileo

Conceptual Shuttle Orbiter Fault-Tolerant
Architecture
GPC General Purpose Computer
17
Mars Advanced Fault Tolerant Computing
ApproachFuture Manned Mars Missions

Parallel-Hybrid Redundancy will be the base for
future long-life deep space missions
It combines the attractive features of parallel
processing and redundant computation
Computational elements can be arranged to provide
high throughput or ultra reliability or a
combination of them depending on the mission
phase

18
Mars Advanced Fault Tolerant Computing
ApproachFuture Manned Mars Missions

Parallel-Hybrid Redundancy was first used in 1979
when Fault Tolerant Multi-Processor (FTMP) was
designed and built
FTMP used conventional shared memory
multiprocessor architecture
Each virtual processor consisted of three real
processors working as a triad to provide
real-time fault masking
Upon detection of a fault in a processor, faulty
unit is replaced from a pool of spares

19
Mars Advanced Fault Tolerant Computing
ApproachFuture Manned Mars Missions

Parallel-Hybrid Redundancy had certain drawbacks
It was not explicitly designed to meet rigorous
requirements of Byzantine resilience (Correctly
functioning components of a Byzantine fault
tolerant system will be able to reach the same
group decisions regardless of Byzantine faulty
components ) which is necessary to provide
Coverage of random hardware faults
Ultra-high reliability
Ease of validation
It lacked ease of expandability due to redundant
bus connections between processors and main
memory
It did not support mixed redundancy because
processors are aranged to work in triads
regardless of the criticality of the application

20
Mars Advanced Fault Tolerant Computing
ApproachFuture Manned Mars Missions

To solve the deficiencies of FTMP a new
architecture called Fault Tolerant Parallel
Processor (FTPP) was conceived
It meets all requirements of random hardware
faults
FTPP will be the base of fault tolerance for
future manned Mars missions

FTPP Arcihtecture
21
Mars Advanced Fault Tolerant Computing
ApproachFeatures of FTPP Parallel Procesing

Parallel Processing is provided by
40 Processing Elements (PEs) in 5 Fault
Containment Regions (FCRs)
2 Input/Output Controllers (IOCs) per FCR

FTPP Arcihtecture
22
Mars Advanced Fault Tolerant Computing
ApproachFeatures of FTPP Scalable Performance

Increasing the number of PEs in a single cluster
create a communication bottleneck in the Network
Elements (NEs)
FTPP relies on hierarchical approach to scaling
the performance by assebmling clusters via IOCs

FTPP Arcihtecture
23
Mars Advanced Fault Tolerant Computing
ApproachFeatures of FTPP Mixed Redundancy

Most fault tolerant computers are designed to
operate in a redundant mode only, which is a
waste of resources for the uncritical tasks
FTPP allows the processing elements to be
configured as
Simplexnon-critical tasks
Triplextasks that require real-time fault
masking
Quadruplex or higher when two or more sequential
faults must be tolerated in a small time window
without the benefit of reconfiguration
In the figure
4 quads
3 triplexes
15 simplexes

FTPP Arcihtecture
24
Mars Advanced Fault Tolerant Computing
ApproachFeatures of FTPP Dynamic
Reconfiguration

Mission consists of several phases such as
launch, ascent, cruise from Earth orbit to Mars,
Mars orbit injection, Mars landing
For each phase the throughput, latency, iteration
rates and criticality changes over a wide range,
therefore the arcihecture must be flexible
Reconfiguration from high throughput to high
reliability
3 PEs which are operating as independent simplex
elements can be synchronized to run the same task
(S2,S3,S13)
Replacing failed members
A simplex in the same FCR as the failed member is
synchronized with the non-failed members of the
virtual group(Channel A of Q1 fails?S2,S7 or S12
can replace)

FTPP Arcihtecture
25
Mars Advanced Fault Tolerant Computing
ApproachFeatures of FTPP Low Fault Tolerance
Overhead

Frequent fault tolerant related functions such as
fault/error detection, error masking(voting) and
synchronization are implemented in the Network
Element
Less frequent functions such as identification of
faulty modules, reconfiguration and reintegration
are implemented in software which executes on
PEs.
Each NE services 8 PEs

FTPP Arcihtecture
26
Mars Advanced Fault Tolerant Computing
ApproachFeatures of FTPP Open Architecture

FTTP provides open architecture for both hardware
and software including
Processors
I/O modules
Fiber optic links
Operating Systems

FTPP Arcihtecture
27
Mars Advanced Fault Tolerant Computing
ApproachFeatures of FTPP Small Physical Size

Key element of meeting the weight, volume and
power requirements is the packaging technology
Multi-Chip Modules (MCMs) will be used
A NE on a single MCM with less than 4 cm2

FTPP Arcihtecture
28
Conclusion

Future manned deep space missions will require
reliable operation over years and real-time
masking of critical faults
Current approaches are not enough and a new fault
tolerant approach is needed
FTPP is a powerful candidate for the spacecraft
which will bring the humans to Mars

29
References

Advanced fault tolerant computing for future
manned space missionsBenjamin, A.L. Lala,
J.H.Digital Avionics Systems Conference, 1997.
16th DASC., AIAA/IEEEVolume 2, 26-30 Oct. 1997
Page(s)8.5 - 26-8.5-32 vol.2
NASA Website
Computers in Spaceflight The NASA Experience
http//www.hq.nasa.gov/office/pao/History/computer
s/Ch6-2.html
NASA Jet Propulison Laboratory Website
Voyager The Interstellar Mission
http//voyager.jpl.nasa.gov/spacecraft/index.html