Title: Satellite Operations Risk Assessment Team
1Satellite Operations Risk Assessment Team
- Appendix B Best Practices
- Satellite Operations Risk Assessment (SORA) Team
Product - GSFC On-orbit Study Team Product
2Satellite Operations Risk Assessment Team
- Best Practices
- Key elements to reduce the risk and insure
mission safety for operating missions - Test the spacecraft and instruments as they will
be operated and operate the spacecraft and
instruments as they were tested - Utilize state-of-the-art processors and system
architectures to design robust safe holds
initiated on-board the spacecraft to reduce the
time criticality of ground intervention - Test and verify all safe modes on the ground
before launch - Identify, define, and document those remaining
spacecraft and / or instrument failures requiring
time critical ground intervention and document
clear and concise recovery procedures for each - Utilize the Flight Operations Team to operate the
spacecraft and instruments during IT prior to
launch - Operate the mission as it was designed
- Ensure the flight and ground software is frozen,
under configuration control, well documented,
stable, and thoroughly tested well before launch - Develop and maintain an Engineering Support
Contact Plan and contact list to aid in rapid
anomaly resolution
Appendix B
3Satellite Operations Risk Assessment Team
- Best Practices (continued)
- When an anomaly occurs, stop, get the facts, and
think. Assess the Spacecraft health safety and
the time available to resolve the anomaly. Do
not react quicker than needed. Consult the
sustaining engineering experts with the time that
is available. Mission health and safety must
take priority over temporary loss-of-data. It is
better to lose a few hours of science data than
the mission. - Anomaly Handling Process
- Assure spacecraft / instrument / component in
safe operating mode - Determine anomaly severity
- Determine whether anomaly is spacecraft or ground
system based - Initiate anomaly notification process
- Evaluate near term impact on operations to
determine need for immediate action - Gather plots / telemetry / event reports
- Engineers assess situation and take appropriate
follow-up action - Convene anomaly meeting including all appropriate
players and identify lead individual and
responsibilities of rest of team - Reach consensus on action (always including
latest available information and incorporating
most current analysis)
Appendix B
4Satellite Operations Risk Assessment Team
- Best Practices (continued)
- Anomaly Handling Process (continued)
- Initiate recovery script development and
coordination across all elements - Script review and revision as necessary
- Simulation and revision as necessary
- Recovery and analysis / verification
- Execute corrective actions ensuring necessary ops
revisions (e.g., procedures, data bases, etc.)
are incorporated verify expected results are
attained communicate results and future plans - Transition back to normal operations ensuring
that resulting system configuration, limitations,
constraints, etc. are documented and communicated
adequately - Wrap up ancillary paper work (e.g, provide full
anomaly resolution, lessons learned, etc, into
center-wide processes)
Appendix B
5Satellite Operations Risk Assessment Team
- Best Practices (continued)
- Utilize an integrated team approach to mission
design and operations - A review and reverification of the Operations
Concept should be included as an integral part of
all formal mission reviews beginning with the
Systems Requirements Review, both at the system
as well as the element level (e.g., spacecraft,
instrument, ground) - Evolve the spacecraft / instrument design and the
operations concept in parallel during the
development phase - Include the FOT and science team members early in
the instrument design - Utilize the FOT to operate the spacecraft and
instruments during the IT process - Include the FOT in the science team discussions
of mission changes and development of new
procedures after launch - Mission development team in cooperation with the
operations team should prepare training materials
for the operations phase which describe
spacecraft and instrument systems, define the
procedures for normal operation, identify
processes for recognizing mission threatening
conditions, and highlight the contingency
responses to spacecraft and instrument anomalies.
These materials should be maintained and updated
as the spacecraft ages and procedures change.
Appendix B
6Satellite Operations Risk Assessment Team
- Best Practices (continued)
- The metrics for tracing the operations contractor
performance should emphasize mission health and
safety. The current principal metric (percentage
of science data captured) is, by itself,
producing the opposite effect. Long-term mission
success is driven by other factors such as - On-board and ground system anomalies
- Overtime worked
- Response time (e.g., special operations requests,
discrepancy close out, etc.) - Number of work arounds in place, average time
work arounds remain in the system, etc. - Quality and timeliness of internal deliveries
- Percentage of FOT certified to operate spacecraft
- Reduced staffing, automated paging, etc. assume
that safeholds are safe and that spacecraft can
be safely left in safeholds for extended periods
and then recovered. If the safeholds are not
inherently safe or will not remain safe during
the time required for the FOT to return to the
control center, there is no basis for reducing
the FOT staffing and shift coverage. Safeholds
should be tested and verified prior to staff and
shift reductions.
Appendix B
7Satellite Operations Risk Assessment Team
- Best Practices (continued)
- The key to controlling the operations cost is in
the spacecraft and instrument design with ground
system automation also important - The contingency preparedness process starts with
the documentation of failure modes for each
subsystem, event tables, Standard Operating
Procedures (SOP), contingency procedures, and
command scripts. This is followed by an
authorization process with formal go / no go
decisions from all cognizant personnel. After
launch, established Anomaly Resolution Teams are
used to respond to any subsystem anomaly. - Automated Monitoring / Paging system
characteristics - Provide remote access to operations network to
assess the problems and the need for an immediate
response - Minimize the user interfaces and complexity of
the paging system - Rotate the on call individual to relieve the
burden on any one person - Ground system anomalies as well as flight system
anomalies should trigger a page - Persistent paging, repeat until answered, should
be used
Appendix B
8Satellite Operations Risk Assessment Team
- Best Practices (continued)
- Perform routine flight dynamics functions within
the FOT, especially those remotely located
relative to GSFC, and focus the FDF team on
capability development, anomaly response, special
cases, etc. - Sponsor regular Mission functions (business
and/or social) to maintain the ties among the
FOT, Science Team, Sustaining Engineering Team,
etc. - Realtime Health and Safety Monitoring Process
- Planning / Preparation Process
- Conduct pre-mission and follow-on meetings with
spacecraft developer, subsystem discipline
engineers, instrumenters, IT, etc. to
specifically define on a mission, instrument, and
special event basis what parameters are the most
important to be monitored in realtime (on a
spacecraft and instrument configuration / state
basis) and what one should be looking for in
those parameters. - Conduct periodic re-education sessions with the
realtime support personnel to review spacecraft,
instrument, and ground system elements to
identify impact of changes that affect realtime
monitoring and decision making. - Pre-Pass Process
- Generate pass manning schedule, prioritize
acquisitions, resolve conflicts - Review spacecraft configuration, playback times,
devise command plan - Brief remote tracking site, perform set-up
Appendix B
9Satellite Operations Risk Assessment Team
- Best Practices (continued)
- Realtime Health and Safety Monitoring Process
(continued) - Pass Execution Process
- Acquire telemetry and verify status of downlinks,
recorders, transmitters - Verify spacecraft state take corrective action
as required - Verify command levels take corrective action as
required - Execute and verify realtime commands take
corrective action as required - Execute and verify stored command sequences
activate as required take corrective action as
required - Monitor and record data quality take corrective
action as required - Verify spacecraft health and safety using
specified engineering pages and limits compare
spacecraft telemetry with state table for current
configuration annotate new limit violations
confirm old violations against engineering notes
take immediate and follow-up corrective action as
required - Annotate realtime execution on procedures and
logs as required - Post-Pass Process
- Perform pass debrief
- Follow-up on required paperwork including
documentation of pass anomalies - Playback Analysis and Trending Process
- Ensure that key parameters for trending each
subsystem / instrument have been selected and
provided to the FOT by cognizant engineers.
Appendix B
10Satellite Operations Risk Assessment Team
- Best Practices (continued)
- Playback Analysis and Trending Process
(continued) - There should be a clear description of each key
parameter and the significance of its data
readings and trends. Limits should be assigned
to each key parameter with specific instructions
provided for FOT handling of out of limit
occurrences. - Data sets and products should be defined for
short and long term trending. The capability
should exist to provide data in ASCII format for
use in off-line engineering analysis. The data
sets and products should be updated, as
necessary, as the spacecraft configuration
changes due to failures, etc. - Trending data products should be produced in a
consistent format on a regular basis. - Ensure that trending data are reviewed, analyzed,
and found acceptable by knowledgeable individuals
(preferably subsystem / instrument engineers). - Unexpected trending results should be further
analyzed, with potential impact evaluated across
the full operations team, and procedures updated
to reflect required operations changes to track
future conditions relating to the unexpected
results. - Periodic reports should be generated that discuss
the trending analysis and highlight any areas of
potential concern. These should be reviewed by a
senior member of the technical or system
engineering staff, with appropriate feedback
provided to the FOT. - Ensure that results are analyzed for potential
incorporation into other operational missions and
/ or development missions to lower risk and aid
in reliability engineering.
Appendix B
11Satellite Operations Risk Assessment Team
- Best Practices (continued)
- Special Events Handling Process
- Establish special event objectives, success
criteria, and decision making processes - Develop plan to go from initial state to target
state, including generation of a timeline which
coordinates all support elements and identifies
procedures to be executed by the FOT - Develop contingency plans
- Identify external support required
- Ensure continuous knowledge of spacecraft state
avoid conducting special events in the blind - Develop and test any new or special procedures
required for the event - Hold an event readiness review with all
participants, management and peers - Perform and verify successful execution of
special event refer to Anomaly Handling Process
if unexpected results are encountered - Return spacecraft to normal operations and
document results - Configuration Management Process
- Document change proposal on change request form
- Provide systems engineering review by all
elements (operations, ground software, database,
hardware developers) to ensure impacts to total
system are incorporated. - Prioritize change against other open change
requests, maintenance, and enhancements. - Conduct peer review of proposed design of change
Appendix B
12Satellite Operations Risk Assessment Team
- Best Practices (continued)
- Configuration Management Process (continued)
- Ensure change is implemented and tested by the
developers in test environment prior to
transitioning to operations environment. - Ensure that changes are prioritized and scheduled
based on need and future operations planned
activities as well as available resources. Often
a need exists to coordinate a change with a data
base update and / or ground software release in
support of flight software. Releases may be
periodic or sometimes emergency. - Ensure that operations plan is developed to
release change into operations. Include back-out
plan, request for engineering time on the
observatory, operations procedures updates (e.g.,
potential modifications to realtime monitoring
and control procedures, etc.) - Conduct operational testing on off-line /
parallel system with simulator if available. - Hold engineering operations readiness reviews
managers, engineers, developers, users, and
customers participate as appropriate. - Install change in approved time slot and closely
monitor initial operations under modification to
ensure successful operations are maintained
back-out plan allows for easy return to prior
state if problem.
Appendix B
13Satellite Operations Risk Assessment Team
- Best Practices (continued)
- Access to Engineering Expertise Process
- Develop and maintain Contact Plan to access
in-house and / or external engineering support to
aid in sustaining engineering and anomaly
resolution tasks. - Develop / maintain process for identifying
situations when experts need to be contacted and
how to determine which experts to notify. (This
can be part of other operations or contingency
plans.) - Maintain matrix of required areas of engineering
expertise, names, and contact numbers. Include
alternate names in vital areas. - Maintain contact with engineering experts (e.g.,
keep them in the loop on a routine basis through
e-mail, provide trending reports, etc.) so they
know the current status of their responsible
subsystems. - Maintain list of engineering organization
managers to access additional engineering
resources as required (e.g., personnel, tools,
models, etc.) beyond known mission specific
expertise maintained in first sub-bullet above. - Identify and maintain information / processes on
how to reimburse experts companies as they
change positions / organizations.
Appendix B
14Satellite Operations Risk Assessment Team
- Best Practices (continued)
- Operations Management Transition Process
- Agreement developed during development phase to
identify agreed-to roles and responsibilities of
all operational support elements for all
life-cycle phases (e.g., pre-launch, initial
operational phases, during and following possible
mission operations transition / hand-over, etc.)
and preliminary transition plan / approach, if
transition required. Agreement to be updated /
refined during development phase. - Transition occurs after activation, testing, and
check-out are complete including such tasks as
deployments, payload activation and outgassing,
and demonstration of special operations (e.g.,
orbit maneuvers, attitude maneuvers, safe modes,
etc.) to extent agreed to by all parties. - Performance testing completed. Performance
specification for system (spacecraft, instruments
and ground) established pre-launch and verified
for agreed-to period of time during actual
operations. - Documentation (e.g., operations handbook,
simulator, spacecraft handbook from vendor, etc.)
completed and updated, reflecting check-out and
on-orbit experiences. - System demonstrated to be stable and ready for
on-going operations. Operations environment
(command / telemetry database, trending plans,
contingency operations procedures, command
procedures) are updated to reflect on-orbit
experiences, configured, and in place. - Training and certification of operations and
sustaining engineering staff completed.
Continuing training and processes developed and
in-place.
Appendix B
15Satellite Operations Risk Assessment Team
- Best Practices (continued)
- Operations Management Transition Process
(continued) - Configuration management requirements identified
and implementation plan is clearly established. - Ground system and flight software completed,
delivered, and tested including essential
corrections and enhancements. Known problems
documented and prioritized with required
work-arounds defined and in-place. Sustaining
engineering plans in-place with agreed to process
for scheduling future deliveries. - Mechanisms / processes in place to access
appropriate engineering expertise (e.g.,
in-house, development contractor, analysis
support, etc.) to support routine operations and
potential anomalous conditions. Summary list of
known points of contact should be included. - Handover review held documenting status /
configuration of system at handover, known risks,
including remaining open incident reports, and
status and plans to address known issues and / or
open discrepancies. Summary report, including
actions and agreements reached at handover
review, is documented, distributed, and actions
are tracked.
Appendix B