One-line presentation headline

1 / 18
About This Presentation
Title:

One-line presentation headline

Description:

Title: One-line presentation headline Author: Kathryn Weiss Last modified by: rk Created Date: 2/6/2004 4:48:55 PM Document presentation format: On-screen Show – PowerPoint PPT presentation

Number of Views:0
Avg rating:3.0/5.0
Slides: 19
Provided by: Kathry133
Learn more at: http://www.klabs.org

less

Transcript and Presenter's Notes

Title: One-line presentation headline


1
An Analysis of Causation inAerospace Accidents
Kathryn Anne Weissweissk_at_mit.eduhttp//www.mit.e
du/weissk Complex Systems Research Laboratory
(CSRL)Department of Aeronautics and
AstronauticsMassachusetts Institute of
Technology Tuesday, September 7, 2004
This paper was presented at the Digital Avionics
Systems Conference in 2001. This paper and
similar papers on accidents, accident modeling
and accident reports can be found at
http//sunnyday.mit.edu/accidents/index.html
2
Recent Aerospace Losses
Ariane 5
Titan/Centaur/Milstar
SOlar HeliosphericObservatory
Mars Climate Orbiter
3
Ariane 5
  • June 4, 1996, 40 seconds after launch, the
    launcher veered off its nominal flight path and
    exploded
  • Reused the IRS software from Ariane 4 on the
    Ariane 5
  • The time sequence of the Ariane 5 lift-off is
    significantly different from that of the Ariane 4
  • A function was left in the Ariane 5 software for
    commonality reasons, based on the view that,
    unless proven necessary, it was not wise to make
    changes in software which worked well on Ariane
    4
  • An exception was raised causing the nozzle of the
    solid rocket boosters to deflect, from which the
    launcher experienced high aerodynamic loads

4
Mars Climate Orbiter
  • Relied heavily on previous designs of MGS and
    Pathfinder
  • There was an error in the spacecrafts navigation
    measurements of nearly 100 km, which resulted in
    a much lower altitude than expected during MOI
    and led to the vehicles break-up in the
    atmosphere
  • The conversion factor from English to Metric
    units was erroneously left out of the AMD files
  • Interface Specification required that the
    impulse-bit calculations should be done using
    Metric Units
  • The software supplied by a vendor that used
    English units

5
Titan/Centaur/Milstar
  • Mission to place Milstar in a geosynchronous
    orbit
  • Roll rate filter constant should have been
    entered as1.992476, but was entered as
    0.1992476
  • Centaur/Milstar began experiencing instability
    about the roll axis during the first burn
  • Instability greatly magnified during Centaurs
    second main engine burn, resulting in vehicle
    tumbling
  • The Centaur attempted to compensate with its RCS,
    which ultimately depleted available propellant
  • The third engine burn terminated early
  • Milstar satellite placed in a low elliptical
    final orbit

6
SOHO Background
  • SOHO, or the SOlar Heliospheric Observatory, is a
    joint effort between NASA and ESA to perform
    helioseismology and monitor the solar atmosphere,
    corona and wind
  • SOHO was launched on December 2, 1995, was
    declared fully operational in April of 1996, and
    completed a successful two-year primary mission
    in May of 1998
  • It then entered into its extended mission phase
  • After roughly two months of nominal activity,
    contact with SOHO was lost June 25, 1998

7
SOHO Loss (1/4)
  • The loss was preceded by a routine calibration of
    the spacecraft's three roll gyroscopes (named A,
    B and C) and by a momentum management maneuver
  • In order to increase the amount of science done
    during the mission and to increase the gyros
    lifespans, a decision was made to compress the
    timeline of the operational procedures for
    momentum management, gyro calibration and science
    instrument calibration into one continuous
    sequence
  • The previous process had included a day between
    completing gyro calibration and beginning the
    momentum management procedures

8
SOHO Loss (2/4)
  • Because the gyro calibration in the new
    compressed timeline was immediately followed by a
    momentum management procedure, despinning the
    gyros at the end of the gyro calibration and
    re-enabling the on-board software gyro control
    function was not required
  • However, after the gyro calibration, Gyro A was
    specifically despun in order to conserve its
    life, while Gyros B and C remained active

9
SOHO Loss (3/4)
  • The modified predefined command sequence in the
    on-board control software had an error it did
    not contain a necessary function to reactivate
    Gyro A, which was needed by the Emergency Sun
    Reacquisition
  • This omission resulted in the removal of the
    functionality of the spacecrafts normal safe
    mode, ESR, and ultimately caused the sequence of
    events that led to the loss of telemetry
  • In addition, there was another error in the
    software that resulted in leaving Gyro B in its
    high gain setting following the momentum
    management maneuver
  • This error originally triggered the ESR

10
SOHO Loss (4/4)
  • The first error was contained within a software
    function called A_CONFIG_N
  • ESR requires the use of Gyro A for roll control
  • Any procedure that spins down Gyro A must set a
    flag in the computer to respin Gyro A whenever
    the safe mode is triggered
  • When A_CONFIG_N was modified, the software enable
    command was omitted due to a lack of system
    knowledge of the person who modified the
    procedure
  • Because the change had not been properly
    communicated, the operator procedures did not
    indicate that Gyro A had been spun down

11
Lessons Learned
  • We can learn lessons from these and other (all
    very different) aerospace accidents by examining
    the factors common among them
  • These factors are systemic and indicative of many
    accidents involving aerospace software systems
  • Systemic factors can be grouped into the
    following categories
  • Flaws in the Safety Culture
  • Ineffective Organizational Structure
  • Ineffective Technical Activites

12
Flaws in the Safety Culture
  • Overconfidence and Complacency
  • Success is ironically one of the progenitors of
    accidents
  • In SOHO led to inadequate testing and review of
    changes to ground-issued commands, a false sense
    of confidence in the team's ability to recover
    from an ESR, the use of challenging schedules,
    etc.
  • Discounting or Not Understanding Software Risks
  • An engineering culture that has unrealistic
    expectations about software and the use of
    computers
  • Changing (SOHO) software without introducing
    errors or undesired behavior is much more
    difficult than building correct software initially

13
Flaws in the Safety Culture (Cont.)
  • Assuming Risk Decreases over Time
  • In the Titan/Centaur/Milstar loss, the Titan
    Program Office decided that because software was
    mature, stable, and had not experienced problems
    in the past, they could use the limited
    resources available after the initial development
    effort to address hardware issues
  • Inadequate Emphasis on Risk Management
  • Incorrect Prioritization of Changes
  • Slow Understanding of the Problems Associated
    with Human-Automation Mismatch

14
Ineffective Organizational Structure
  • Diffusion of Responsibility and Authority
  • In almost all of the spacecraft accidents, there
    appeared to be serious organizational and
    communication problems among the geographically
    dispersed partners
  • Low-level status or Missing System Safety Program
  • In the SOHO report, no mention is made to any
    formal safety program.
  • Limited Communication Channels and Poor
    Information Flow

15
Ineffective Technical Activities
  • Flawed or Inadequate Review Process
  • For SOHO, the changes to the ground-generated
    commands were subjected to very limited review
  • Inadequate Specifications
  • Software-related accidents almost always are due
    to misunderstandings about what the software
    should do
  • Inadequate System and Software Engineering
  • Software Reuse Without Appropriate Analysis of
    its Safety
  • Two of the spacecraft accidents, Titan and
    Ariane, involved reused software originally
    developed for other systems

16
Ineffective Technical Activities (Cont.)
  • Unnecessary Complexity and Software Functions
  • The Ariane 5 and Titan IVB-32 accidents clearly
    involved software that was not needed, but
    surprisingly the decision to put in or to keep
    these features (in the case of reuse) was not
    questioned in the accident reports. 
  • Inadequate System Safety Engineering
  • Test and Simulation Environments that do not
    Match the Operational Environment
  • A general principle in testing aerospace systems
    is to fly what you test and test what you fly

17
Ineffective Technical Activities (Cont.)
  • Deficiencies in Safety-Related Information
    Collection and Use
  • Operational Personnel Not Understanding the
    Automation
  • The SOHO report says that the software enable
    function had not been included as part of the
    modification to A-CONFIG-N due to a lack of
    system knowledge of the person who modified the
    procedure
  • Inadequate and Ineffective Cognitive Engineering
    and Feedback
  • SOHO controllers did not have the information
    they needed about the state of the gyros and the
    spacecraft in general to make appropriate
    decisions

18
Conclusions
  • By examining recent, software-related aerospace
    accidents, we notice similarities, or systemic
    factors, involved in the losses
  • These similarities and parallels should help in
    focusing efforts to prevent future accidents
Write a Comment
User Comments (0)