Title: SEE Validation of SEU Mitigation Methods for FPGAs
1SEE Validation of SEU Mitigation Methods for
FPGAs
Carl Carmichael1 , Sana Rezgui1, Gary Swift2,
Jeff George3, Larry Edmonds2 1Xilinx
Corporation, San Jose CA2Jet Propulsion
Laboratory, Pasadena CA3Aerospace Corporation,
Albuquerque NM
"This work was carried out in part by the Jet
Propulsion Laboratory, California Institute of
Technology, under contract with the National
Aeronautics and Space Administration."
"Reference herein to any specific commercial
product, process, or service by trade name,
trademark, manufacturer, or otherwise, does not
constitute or imply its endorsement by the United
States Government or the Jet Propulsion
Laboratory, California Institute of Technology."
2XTMR SEE Testing
- Experiments were devised to focus TMR mitigation
on major architectural elements of the Virtex-II
FPGA. - Sequential State-Machines were created with
Registers, Multipliers, and Memories - Configurable Logic Block
- Combinatorial Logic, Sequential Logic,
Arithmetics, Multiplexing. - Design implementation is an array of counters.
- Multipliers
- Dedicated 18 x 18 bit multiply function blocks.
- Design implementation is array of Multiply and
Accumulate functions. - Block Memories
- Synchronous Dual Port 18k bit RAM blocks.
- First Design is large memory block rewritten
externally. - Second design Design implemented as an array of
ROMs initialized to incrementing values with
internal EDAC.
3Plot Definitions
- Predicted SEFI cross-section
- Static and Dynamic SEE Characterization of the
Virtex-II FPGA revealed several Single Event
Functional Interrupt Modes POR (2.5E-06), SMAP
(1.72E-06), IOB (4.2E-06) - These combined cross-sections represent the
minimum functional error cross-section for a
single Virtex-II (XQR2V6000) device on orbit. - Worst Case Orbital Upset Rate
- CREME96 calculation of the worst case orbital
upset rate for a XQR2V6000 is 7,740
bit-errors/day (9E-02 bit-errors/sec) in a GEO
orbit at 36,000km during the worst day of an
Anomalously Large Solar Flare accounting for both
Heavy Ion and Proton. In a 40MeV Kr beam the
exact same upset rate is achieved with a Flux of
1.25E-01 p/cm2/s. This denotes that the
equivalent upset rates for all other orbits and
solar conditions would reside to the LEFT of this
line. - Single Event Functional Interrupts
- This is the average cross-section of the observed
SEFI(s) while collecting the data represented in
the plot. This cross-section is not Flux
dependent. Variations from the predicted value
are due to statistical significance of the total
accumulated fluence during each test. - Functional Errors
- Data plot of the observed events when the Device
Under Test returned an incorrect result.
Cross-section is determined by the number of
error events divided by total fluence at the
specified flux. TMR denotes that the DUT design
was fully mitigated with XTMR and scrubbing. The
Unmitigated results were obtained with an
identically functional design without XTMR,
however scrubbing was also used for the
unmitigated test. - Extrapolation
- A derived function describing the relation
between Mitigation failure as a function of upset
rate. Extension of the function predicts
functional error cross-sections at worst case
orbital upset rates to be less than SEFI
cross-sections.
4PLOT 1
3.5E-02
3.5E-01
3.5E00
3.5E01
3.5E02
3.5E03
Configuration Bit Errors per Scrub Cycle
36,000km GEO Orbit Worst Day Solar Flare 8,000
bit-errors/day
All other orbits
40 MeV Kr LET 22.3
MeV/cm2/mg
SEFIs drive error rate for all designs and all
orbits.
Mitigation errors on orbit are always less than
SEFI errors by orders of magnitude
5PLOT 2
3.5E-02
3.5E-01
3.5E00
3.5E01
3.5E02
3.5E03
3.5E03
Configuration Bit Errors per Scrub Cycle
36,000km GEO Orbit Worst Day Solar Flare 8,000
bit-errors/day
All other orbits
40 MeV Kr LET 22.3
MeV/cm2/mg
SEFIs drive error rate for all designs and all
orbits.
Mitigation errors on orbit are always less than
SEFI errors by orders of magnitude
6PLOT 3
3.5E-02
3.5E-01
3.5E00
3.5E01
3.5E02
3.5E03
3.5E03
Configuration Bit Errors per Scrub Cycle
36,000km GEO Orbit Worst Day Solar Flare 8,000
bit-errors/day
All other orbits
SEFIs drive error rate for all designs and all
orbits.
40 MeV Kr LET 22.3
MeV/cm2/mg
Mitigation errors on orbit are always less than
SEFI errors by orders of magnitude
7Improved SEE Test Methodology for Mitigation
- There is an expected physical relationship
between functional error rate of a mitigated
system as a function of upset rate. The expected
relationship is a function that predicts the
increasing probability of upsetting bit
combinations that will cause a mitigated (TMR)
system to fail as a function of bit upset rate - R Mitigation Error Rate
- M Number of groups of relevant bits
- NB Average number of relevant bits per group
- TC Scrub Time
- r Upset Rate of relevant bits.
- Therefore, testing at extremely high fluxes over
several orders of magnitude variation can be
performed to reveal this functional relationship
between mitigation error rate and bit upset rate. - This function can then be extrapolated to make
predictions at the much lower upset rates of
earth orbits.
8Mitigation System Topology
Module 1
Module 2
Module 3
Group 1
Group 2
Group M
NM bits
NM bits
NM bits
. . .
. . .
. . .
Block (M,1)
Block (M,2)
Block (M,3)
9Probability Function Fit for Counter Data
M9224 Ni200 (same number of bits in each block
) Sigma per bit 2.1E-8 cm2 TC0.266 sec
10Conclusions
- Efficiency and accuracy of the validation of
mitigation techniques is greatly improved by
demonstrating the upset rate dependency of the
mitigation method by testing at Flux rates that
overwhelm the mitigation. - The static SEFI cross-section is the dominating
factor for calculating orbital error rates for
any Virtex-II design when mitigated with Full
XTMR Scrubbing. - Additional Work
- Self-Scrubbing BlockRAMs
- Self Scrubbing FPGA Configuration
- Soft-core processors (e.g. Microblaze)