Title: Time, Synchronization, and Data Transfer on the ISS
1Time, Synchronization, and Data Transfer on the
ISS
- Kevin Murphy
- The Boeing Company
2Problem Statement
- How do you maintain data coherency in systems
which use Multi-Ported memory? - Multi-ported Memory is memory which can be read,
and written to by multiple different devices, - E.g. the CPU, a co-processor, Direct Memory
Access (DMA), etc - This presentation provides a general discussion
of how the International Space Station uses a
synchronous architecture to maintain data
coherency. - This presentation includes a lesson learned
from the International Space Stations
experience, of what can happen when processors
get out of sync, and how difficult that situation
can be to troubleshoot.
3Definitions
- MDM Multiplexer-DeMultiplexer a configurable
computer with a 386SX system board which can
accommodate a variety of I/O boards - SPD card - Serial Parallel Data card An I/O
card with a 186 co-processor, two separate 1553
a/b channel pairs, two other serial interfaces
and a parallel interface - Subframe 12.5 millisecond period between the
80Hz interrupts - Numbered from 0-7,
- Processing Frame A 100 millisecond time period,
a collection of 8 consecutive Subframes - Numbered from 0-99
- Minor frame 1 second collection of 10
processing frames - Major frame 10 second collection of processing
frames (100 processing frames) - Boxcar a 32 data word 1533 message
- Functional data a data item which is used as an
input by flight software on the Station.
(exclusive of displays)
4Good and Bad Decommutation
Good
Green data from Frame X-1 Yellow-data from frame
X Red data from frame X1
X
X1
Bad
10 Hz
1 Hz
0.1 Hz
5Overview of the MDM Hardware
Bus Interface Adapter
Bus Controller
1553
Multi Ported Memory
Remote Terminal
6Command and Data Handling SystemSummary
Architecture
Crew Interface Processors
Command Control MDMs (3)
CT Equipment
A
A
(5)
(2)
(2)
(2)
A
Internal MDMs (2)
External MDMs (2)
Pwr Mgmt MDMs (2)
Payload MDMs (2)
GNC MDMs (2)
RS Central Computers (3)
HCZ MDMs (2)
JEM Processors
APM Processors
MSS Processors
(6)
(4)
(6)
(4)
(4)
(4)
(1)
(4)
(6)
(2)
(2)
(2)
(2)
(4)
(4)
(6)
(4)
Orb PCS
CHeCS Equip
SSRMS
RS GNC Equip
System Equip
Pri Power Dist
Sec Power Dist
Sec Power Dist
System Equip
MBS
Orbiter Interface Units (2)
Internal Payloads
RMS
SPDM
MT Equip
Payload Data Sys Equip
System Equip
B
Payloads
Sec Power Dist
Payloads
ECLSS Equip
ECLSS Equip
External Mechs
ITCS Equip
ITCS Equip
External Payloads
PM MDMs (2)
IMPLMs
IMPLMs
Sec Power Dist
US GNC Equip - Rate Gyros - CMGs - SIGIs
Airlock MDM (1)
Pri Power Dist
Airlock Equip
S3/P3 MDMs (4)
SARJ Equip
PM Equip - JD MDMs (2)
US Lab MDMs (3)
US Lab Equip
Node1/Z1/P6 Equip, Sec Power Dist
Thermal Radiator Equip
STR/PTR MDMs (2)
Node1 MDMs (2)
A
Assy Restart Ops Only
FGB MDMs (2)
B
Ext Therm Equip
Node2 MDMs (2)
Node2 Equip
FGB Equip
System Equip
Sec Power Dist
S1/S0/P1 MDMs (6)
Node3 Equip
B
Node3 MDMs (2)
CAM MDMs (2)
CAM Equip
CRV
PV Array Equip
PVCU MDMs (8)
Hab MDMs (TBD) (2)
Hab Equip (TBD)
01/28/2000
7Bus Profile
100 ms Processing Frame
5
4
3
2
1
7
6
0
DDCU
DDCU
C B M
C B M
C B M
SM CC
SM CC
SMCC
SMCC
RT to BC Data Acquisition Polls-10Hz
RT to BC Status from PCS -10Hz, if active
BC to RT Telemetry Transfer PCS1, 10 Hz
BC to RT Telemetry Transfer PCS2, 10 Hz
Telemetry BC to RT (US to RS) (approx every .4
sec)
RT to BC Cmd Poll, commands from PCS 1 Hz
RT to BC Data Acquisition Polls-10Hz
BC to RT Broadcast Time-1 Hz
BC to RT Cmd Transfers to PCS 1 Hz each PCS On
Demand
BC to RT DDCU Cmds -1 Hz
BC to RT USOS Status-10 Hz
BC to RT Data Load to PCS -10 Hz, maximum when
operating
BC to RT Cmd Transfers - 10 Hz
RT to BC SM Cmd Poll -1 Hz
BC to RT Cmd Transfers - 1 Hz
BC to RT Broadcast Ancillary Data-10 Hz
Only N1 MDM that is in the Primary State will
communicate with the SMCC and PCS. The N1 MDM in
Secondary State will only have CBM and DDCU
transactions. CB GNC-1 connected to N1-1 and CB
GNC-2 connected to N1-2.
BC to RT Broadcast Sync With Data-10 Hz
8Processing Frames Synchronization
80 Hz interrupt
Tier 1 (CC MDM) Processing Frames
Tier 2 (INT) Processing Frames
NOT to Scale
Tier 3 (NCS Processor)
At this time, the tier 2 is lagging time in the
tier I by say 20 µs
At this time, the tier 3 is lagging time in the
tier 1, by say 10 µs, but leading the time in
the Tier 2 by 10 µs
- The Processing Frame boundaries are lined up, and
Interface Control Documents specify when the
subaddresses are read, so it is a design function
to avoid updating the subaddresses when the Bus
Controller is scheduled to poll them. - The synchronization tolerance for this system is
/- 350 µs. Note that this value is less than
half the bit-times required for a full boxcar,
approximately 753 µs.
9Cyclic Data Acquisition Packet Structure
Logical Packet Structure
Physical Packet Allocation to Boxcars
10 Hz Data
Boxcar 1
The First Bit is the Loss of Sync Indicator,
set when Delta Time is more that 350 µs, or when
the time broadcast has been absent for 3 seconds
1 Hz Data
The last 7 bits are the Frame Count 0-99, which
is used as a commutation/decommutation Key
Boxcar 2
0.1 Hz Data
10Data Decommutation
From the perspective of the CC MDMs Memory map
INT
1 Hz data
1 Hz data
1 Hz data
1 Hz data
1 Hz data
1 Hz data
1 Hz data
0.1 Hz data
...
1 Hz data
1 Hz data
0.1 Hz data
10 Hz data
1 Hz data
0.1 Hz data
MDM
EXT
1 Hz data
1 Hz data
1 Hz data
1 Hz data
1 Hz data
1 Hz data
1 Hz data
0.1 Hz data
...
1 Hz data
1 Hz data
0.1 Hz data
1 Hz data
0.1 Hz data
10 Hz data
MDM
10 Hz data
1 Hz data
1 Hz data
1 Hz data
1 Hz data
1 Hz data
1 Hz data
1 Hz data
0.1 Hz data
GNC
...
1 Hz data
1 Hz data
0.1 Hz data
1 Hz data
0.1 Hz data
MDM
- 10 Hz data block overwritten every processing
frame - As such, 10 Hz data is immune to corruption due
to commutation/decommutation errors - 1 Hz data written to CVT over 10 processing
frames - Tenth Hz data written to CVT over 100 processing
frames
11Synchronized Data Acquisition and
DecommutationThe Nominal Case
- The RT builds its data packet, with the frame
count of the frame it was built in, located in
word 1, in the output subaddresses. - It does it at a time when the Bus controller is
not polling - The BC comes along, later, after the subaddress
has been completely updated, and commands the RT
to transmit the data, via the 1553 protocols. - The BC uses the value of the frame count in the
packet header, to decommutate the data into its
correct location. - As long as the entire packet was updated prior to
the 1553 chain commanding the RT to transmit the
data, all is good.
Remember, this is multi-ported memory. The reads
and writes must not be coincident.
12Drift Compensation
- Drift compensation compares the value of the time
broadcast message with the value in the MDM
clock, and uses the delta time in a PID
controller, to determine the amount of correction
to apply to the Real Time Clock, via the drift
compensation register. - The function runs at 1 Hz.
- This compensation is applied as a hexadecimal
value ranging from 0 to 43 in the drift
compensation register. - A value of 22 represents the null setting for
the register. - Numbers greater than 22 slow down the oscillator
- Numbers less than 22 speed up the oscillator
- The resolution of this register is 10.43 µs per
second - The oscillator frequency determines the spacing
between the 80 Hz interrupts. (e.g. the size of
the subframes) - The time value is used to determine the location
of the subframe boundaries.
13When the System is Poorly Synchronized
- The term poorly synchronized is used here
instead of Loss of Sync. The Loss of Sync
indicator is set when the RT computes delta times
of gt (/-)350 µs from its bus controller, for at
least 3 seconds, or when the time broadcast is
absent for 3 seconds. - This tolerance is less than half of a single
boxcars bit times. (app 800 µs) - This means the data can be reliably passed
between BC and RT, even when the LOS bit is set. - It is a small percentage of the time the data is
corrupted. If, for example, we are talking about
a 2 boxcar status poll, that is about 1,600 µs
out of the whole 100 Millisecond frame, where
data from the RT is not transferred correctly.
That is an exposure of 1.6. - When the system is out of sync, one computers
clock is drifting with respect to its bus
controllers clock. The drift rate tells you how
long you would have to wait for the data poll to
be clear. - In the above example, if the drift rate were 20
µs per second, it would take you just 80 seconds
to drift out of the data corruption region. - And it would take you 4920 seconds to drift back
into it.
14Node1 MDM Connectivity Diagram
5
5
5
RPCMs
RS Central Computer-1 (SM)
10
MDM LA-1
16
MDM INT-1
E
N1-3B A
RS Central Computer-2 (SM)
N1-3B B
RPCMs
N1-3B C
E
RS Central Computer-3 (SM)
MDM INT-2
BIA
N1-3B Sc
LAFWD-1B A5
MDM N1-1
DDCUs
DDCUs
LAF1-1B A53
or
LAAFT-2B
LAFWD-1B
LAP3-1A4A A36
Crew I/F
Crew I/F
E
N1-4B A
MDM CC-1
RT
2
SM PCR-2
SM PCR-1
1
N1-4B B
FGB PCR-2
FGB PCR-1
N1-4B C
RPCMs
S-Band
Mech
Mech
E
6
6
N1-4B Sc
MDM CC-2
ACBSP-2
MDM LA-2
16
LAAFT-2B A17
Mech
CBM N1FWD-S
CBM N1FWD-P
RFGRP-2
LAF5-2B A57
CBM LAFWD-P
Mech
XPDR-2
ITCS
E
MDM GNC-2
E
MDM GNC-1
E
MDM CC-3
Ku-Band
MDM LA-3
16
CBM LAFWD-S
PPA LAP6
ITCS
SGTRC
ECLSS
PPA LAS6
ECLSS
4
4
4
PCA (USL)
MCA-1 (USL)
LB SYS-LAB-1
LB SYS-LAB-2
CB GNC-1
CB GNC-2
UB EPS-N1-23
UB EPS-N1-14
UB ORB-N1-1
EPS
EPS
UB ORB-N1-2
PCU Z1-4B
PCU Z1-3B
7
7
DDCU Z1-4B
DDCU Z1-3B
ECOMM
ECOMM
10
RPCMs
RPCMs
CTP
CTP
4
MDM FGB-1
Mech
Mech
N1-RS1 A
N1-RS2 A
N1-RS1 B
N1-RS2 B
CBM N1ZEN-S
CBM N1ZEN-P
BIA
4
N1-RS1 C
N1-RS2 C
MDM FGB-2
CBM N1STB-S
CBM N1STB-P
MDM N1-2
N1-RS1 Sc
N1-RS2 Sc
CBM N1NAD-S
CBM N1NAD-P
Orbiter
Z1-4B A
Z1-3B A
CBM N1PRT-S
CBM N1PRT-P
EETCS PFCS
ECU BGA
OIU-1
Orbiter
Orbiter
DCSU SCA
DDCU
Z1-4B B
Z1-3B B
RPCM
BCDU
BCDU
ECU SAW
BC
RPCM
BCDU
PFCS
SSU
Orb PCR
OIU-2
Orb PCR
3
4
4
MDM PVCU-2B
MDM PVCU-4B
These devices are connected to this bus after
Node1-USL internal utilities connections are made
on Flt 5A.
These buses are controlled by INT MDMs after USL
activation on Flt 5A.
2
5
SSU
PFCS
BCDU
RPCM
DDCU
BCDU
BCDU
RPCM
ECU SAW
DCSU SCA
MDMs PVCU-2B and PVCU-4B are disconnected from
these buses when P6 is relocated on Flt 13A.
N1FWD CBMs are permanently disconnected from this
bus after Node1-USL utilities connections are
made on Flt 5A.
ECU BGA
3
6
EETCS PFCS
These devices are connected to this bus prior to
Node1-USL internal utilities connections on Flt
5A.
1
10/13/98
These buses are controlled by CC MDMs after USL
activation on Flt 5A.
ECOMM CTP shares CBM N1STB bus stubs on
non-interference basis.
4
7
Note PV P6 User Buses are as follows PVCU-2B/4B
UB PVB-24-1, UB PVB-24-2
15ISS Lesson Learned
- The Node (NCS) computer is not programmed to shut
itself off. There is no requirement for this, no
code for this, nor has this been observed in any
testing. - One day, it shut itself off
- The system (must have) reacted to corrupted data.
So how do we get corrupted data?
16NCS Background Information
- Power Commanding
- The Node computers are cross strapped as bus
controllers on the two buses. They both need to
control Remote Power Control Module (RPCM)
switches as part of their heater control logic. - The interface between the two computers to
accomplish this, is that the MDM which is the RT
on the bus, (N1-1) will send a heater request
word to N1-2. The request is in the form of a
bit-mapped word. Word Xgt RPCM X, Bit Ygt switch
Y, 0 means open, 1 means close. - Drift Compensation
- If the INT goes away (fails, state transition,
etc) when both NCS MDMs are synchronized to the
INT, the NCS MDMs will use their last computed
value of the drift compensation register - This means they will drift relative to each
other, and relative to CCS - Input/Output
- NCS updates its buffers for the next processing
frame, in the subframe after the BC polls for the
current processing frame
17Loss of Time Source
Rate of divergence is less than or equal to
210.431 µs/Sec
Loss of time source
Delta time (each clock from reference)
N1-1
X
0
Time
N1-2
Delta time (each clock from each other)
- It is virtually impossible for the 2 MDMs to be
oscillating exactly in phase and amplitude - Worst case drift rate can drift a subframe (12.5
millisec) in under 11 minutes. INT RM can take
that long.
18NCS Data Acquisition Paths
INT
CCS
Sys-LAB-1
Sys-LAB-2
GNC-1
GNC-2
Time source
N1-2
N1-1
EPS-23
Read SRAM
RT
BC
EPS-14
BCs version of time
BCs version of time
RTs version of time
Read SRAM
Load SRAM
Different Delta Time
Delta time
RTs version of time
Load SRAM
19So What Happened
- During the INT software update, the NCS MDMs were
out of communication with their BIA Bus
controller, long enough for their clocks to drift
over a subframe (N1-1 relative to N1-2) apart. - This corrupted lots of data, and it had an effect
because it corrupted functional data. One of
these was a status word which had a bit set in
it, which effectively requested N1-2 to power
down the N1-1 MDM. That is why we noticed it. - Future operational workarounds include sending a
command to synchronize the two N1 MDMs to each
other, and monitoring the delta time values,
during any planned outage of the INT MDM. - Other modifications will be made to the software
to increase the robustness of the system, as part
of our normal Software Product Improvement
process, which will apply to unplanned outages
(failures) of the INT MDM. - Out of process changes (Software Patches) to
address unplanned outages, were not justifiable,
as it is caused by scenarios more than two
failures deep.
20Conclusion
- On the ISS, there are uniquenesses driven by
assembly sequence requirements, and other special
connectivity (architectural) issues. - These uniquenesses lead to situations like the
one just described. As we acquire operational
experience with this vehicle, there will
undoubtedly be others - The System, as designed, was safe. The offending
computer shut itself off. Critical power system
loads require 2 step commanding.
- Use of synchronization to assure reliable data
delivery in large embedded realtime systems is an
effective choice.