Title: The Challenges and Benefits of Partial Mitigation of FPGAs
1The Challenges and Benefits of Partial Mitigation
of FPGAs
Microelectronics Reliability and Qualification
Workshop Manhattan Beach, CA
- Michael Wirthlin, Brian Pratt and Keith Morgan
- wirthlin_at_ee.byu.edu
- Associate Professor
- Department of Electrical and Computer Engineering
- Brigham Young University
This work was supported by Los Alamos National
Laboratory, U.S. Department of Energy (LA-UR
02-3820) Deployable Adaptive Systems Project
(DAPS) and by the NASA Earth-Sun System
Technology Office as sub-contract through USC-ISI
East.
2Benefits of SRAM FPGA-based Processing
- FPGAs Can Be customized To Application-Specific
Algorithms - Customizable datapath for application specific
computations - Often faster and more efficient than a
programmable processor - Provide high-density logic (over 1 million gates)
- FPGAs Are Reprogrammable
- Configuration after the spacecraft has been
launched - FPGA resources can be used for multiple
instruments, missions, or changing spacecraft
objectives - Errors in an FPGA design can be repaired while in
orbit - Permanent faults in FPGAs can be avoided
- Challenge Exploit benefits of programmable FPGAs
safely in a harsh radiation environment
3SEU Sensitivity
- SEUs from heavy ions and trapped protons
- SEU Heavy-ion cross-section
- CLB flip-flop 6.5E-8 cm2/bit
- Configuration latch 8.0E-8 cm2/bit
- Block RAM bit 1.6E-7 cm2/bit
- Estimated SEU rate (1000 km orbit)
- Quiet Sun .4/hour
- Flare Enhanced 3.2/hour
- Peak Rate (SAA region) 12.6/hour
- Design techniques for SEU mitigation are
essential
Carmichael, Fuller, Blain, Caffrey,SEU
Mitigation Techniques for Virtex FPGAs in
Space Applicaions, 3rd Annual Conference on
Military and Aerospace Programmable Logic Devices
(MAPLD), September 1999.
4SRAM FPGA Architecture
Routing Matrix
Routing Matrix
5SRAM FPGA Configuration Bits
101100001010 010111110100 110100111011 00000110010
1 101110100010 111100110101
01111011 10010001 10111011 01111000 11110010
Look Up Table (LUT)
0 1
1101 0011 1101
User FF
Routing Matrix
101100001010 010111110100 110100111011 00000110010
1 101110100010 111100110101
01111011 10010001 10111011 01111000 11110010
Look Up Table (LUT)
1 1
User FF
1101 0011 1101
Routing Matrix
5,603,456 Configuration Bits
6FPGA Design
7FPGA Design - Routing Upset
8FPGA Design - Logic Upset
9Configuration Sensitivity
- Not all FPGA configuration bits affect design
behavior - Many unused logic/routing resources for a design
- Many dont care conditions within a design
- The Configuration Sensitivity of an FPGA design
is the number of FPGA configuration bits that
affect the design behavior. - Dependent on the design style and density
- Only sensitive configuration bits will cause a
design to fail when upset - The Configuration Sensitivity metric will be used
to evaluate design reliability - Several methods for measuring configuration
sensitivity - Fault Injection (BYU Fault Injection)
- Dynamic Radiation Testing
10Sample Sensitivity Results
FPGA Editor Layout
Sensitivity Map
DSP Kernel
5,746 slices (46)
575,448 bits (9.9)
Synthetic Design
2,538 slices (20)
189,835 bits (3.3)
11Tolerating Configuration Upsets
- Configuration upsets are not permanent an can be
repaired at run-time - Upsets in configuration memory can be identified
through the device readback operation - Configuration faults can repaired through
traditional device configuration techniques - Configuration Scrubbing
- Initialize FPGA with full configuration of device
- Continuously configure FPGA one frame at a time
- Use original configuration bitstream
- Configuration SEUs repaired during scrub cycle
- Scrubbing can operate during circuit execution
Carmichael, Caffrey,Salazar, Correcting
Single-Event Upsets Through Virtex Partial
Configuration, Xilinx Application Note XAPP216
(v1.0), Xilinx Corporation, June 1, 2000.
12Configuration Scrubbing Example
13Configuration Scrubbing Example
x
Configuration Upset Repaired
14Persistent vs. Non-persistent Upset
- Not all sensitive upsets cause permanent failure
- Some upsets repaired through scrubbing
- Non-persistent upsets repairable through
scrubbing - Persistent upsets requires reconfiguration
Non-Persistent Upset
Persistent Upset
Bitstream Repair
Upset
Bitstream Repair
Upset
error magnitude
error magnitude
Correct Output
Incorrect Output
time cycle
time cycle
15Configuration Persistence
FPGA Editor Layout
Sensitivity Map
Persistence Map
DSP Kernel
5,746 slices (46)
575,448 bits (9.9)
13,841 bits (0.23)
Synthetic Design
2,538 slices (20)
189,835 bits (3.3)
77,159 bits (1.3)
16Persistent Circuit Structures
Persistent components and routing in feedback
and input to feedback
Non-persistent components and routing in
feed-forward path
17Partial Mitigation
Reduce cost of mitigation by applying mitigation
partially to persistent components
Logic
FF
FF
Logic
TMR
FF
Logic
FF
FF
Logic
Logic
18Partial TMR Example
Logic
FF
FF
Logic
FF
Logic
FF
FF
Logic
Logic
Initial Unmitigated Circuit
19Partial TMR Example
FF
Logic
Voter
Voter
FF
Logic
FF
Logic
Voter
Voter
FF
Logic
Voter
Voter
FF
Logic
FF
FF
Logic
Logic
Feedback Only Mitigation
20Partial TMR Example
Logic
FF
FF
Logic
Logic
FF
FF
Logic
Voter
Voter
Logic
FF
FF
Logic
Voter
Voter
FF
Logic
FF
FF
Logic
Logic
Feedback Feedback Input Mitigation
21Partial TMR Example
Full Mitigation
22BLTmr Tool Flow
Partially Mitigated Design
Original Design
User Constraints
Analysis (Feedback, Input to FB, etc.)
Create Design Database
Parse EDIF
Cell Triplication
Voter Insertion
- Analyze circuit structure
- Rank priority of sub-structures
- Triplicate circuit cells
- Apply as much triplication as possible
- Voter insertion on feedback paths
23Experimental Results DSP Kernel
FPGA Editor Layout
Sensitivity Map
Persistence Map
Unmitigated
575,448 (9.90)
13,841 (0.24)
5,746 slices (46)
Partial TMR applied to Feedback Input to FB
569,700 (9.81)
152 (0.0026)
8,036 slices (65)
24Experimental Results Synthetic DSP
FPGA Editor Layout
Sensitivity Map
Persistence Map
Unmitigated
2,538 slices (20)
189,835 (3.27)
77,159 (1.33)
Full TMR Applied
20,256 (0.35)
671 (0.012)
11,961 slices (97)
25Experimental Results
100000
10000
1000
MTBF (days)
100
10
DSP Kernel
Synthetic
1
Static X-Section -
Unmitigated -
Unmitigated -
Feedback TMR -
FeedbackInput
Max TMR -
Sensitive
Sensitive
Persistent
Persistent
TMR - Persistent
Persistent
- GPS orbit (22,200 km altitude, 55 inclination)
- AP-8 Solar Minimum, JPL Solar Proton Quiet,
CRÈME 96 Solar Minimum
26Conclusions
- FPGAs are susceptible to soft errors
- Configuration sensitivity
- Configuration persistence
- Mitigation required to provide high availability
- Configuration scrubbing
- Redundancy (TMR)
- Mitigation techniques are expensive
- Partial TMR can provide adequate availability at
lower cost - Mitigate against persistent configuration faults
- Automated design analysis and fault mitigation
- Results have been verified in fault injection and
radiation testing