Title: SEFI Mitigation Techniques for Microprocessors
1SEFI Mitigation Techniques for Microprocessors
Space Micro Inc.
Author David Czajkowski (760)
815-5330 dcz_at_spacemicro.com
2MSFC Space Micro Mtg Agenda
- Background need for SEFI mitigation
- Hardened Core SEFI Mitigation Description
- Hardened Core Test Setup
- Proton Radiation Test Results
- Hardened Core Design
- Hardened Core Roadmap
- Conclusions
Paper P15
3Background Need for SEFI Mitigation
- Hardened Core
- (aka SEFI Watchdog Controller)
4Single Event Functional InterruptMicroprocessors
PowerPC SEFI Data
- SEFI (aka Hangs)
- Processor hangs by SEU
- By protons or heavy ions
- All CPUs susceptible
- CPU Hangs from
- Illegal Branching
- Upsets in Program Counter
- Undefined State Machines
- Approx. rate 1 every 100 days SOI PPC (10d for
CMOS) - SEFI problem is Severe not easily solvable
- Power down is current industry solution
5Single Event Functional InterruptSDRAMs
- No known SDRAM without SEFI
- SEFI problem greater than SEU problem
- SEFI causes gt5,000 errors loss of memory
- SEFI not correctable with Hamming EDAC
- Reed Solomon EDAC bad for random access
- No known solution
No Correction (Elpida 2Gbit) Single Bit EDAC Reed Solomon 1 nibble EDAC Reed Solomon 2 nibble EDAC
4.1 SEUs/year 0.006 SEU/year 0.00000017 SEU/year 1.5E-20 SEU/yr
0.96 SEFI/year 1.08 SEFI/year 0.18 SEFI/year 1.6E-7 SEFI/yr
Note Data provided above from Maxwell
Technologies
6Shuttle Upgrade SEFI Problem
- CAU program
- 36 Intel flash parts
- No replacement
- SEFI driving system reliability over spec limit
- Even with system changes, CAU over spec limit
- Improving flash SEFI problem allows CAU to meet
system reliability requirements - Caused major redesign of 3 subsystems
Flash Parts
7Hardened Core SEFI Mitigation
- Description of the Technique
8New SEE Mitigation Techniques
- SEFI Hardened Core detects and corrects SEFI
faults in microprocessor - Time-Triple Modular Redundancy corrects SEU
faults in microprocessor - Both enable the use of advanced commercial
microprocessors in space computers - Enables space computers gt1,500 MIPS
9Hardened Core System
- More than a Watchdog
- H-Core generates periodic signal
- If OK, CPU responds
- If SEFI, H-Core
- Toggles interrupt
- S/W reboot
- H/W reset
- Power cycle
- Post SEFI status flags
- Recovery software code
10Technical Objectives
- Determine the characteristics of SEFI on a CPU
- Develop software prototype of Hardened Core.
Verify performance in radiation environment - Develop Hardened Core architecture and initial
product design - Determine SEFI rate in combo with TTMR SEU rate
- Determine performance of TTMR computer with SEFI
Watchdog
11Hardened Core Test Setup
- SEFI Mitigation Radiation Test Set using Pentium
III in Versalogic VSBC-8 Computer
12Test Set Challenges
- Finding processor that is not plastic, flip-chip
and has known SEFI became difficult and caused
schedule risk - Selected a plastic, flip-chip Pentium III (850
MHz) - Changed to proton radiation source to penetrate
plastic - Solved de-lidding thinning issues
- Beam availability and high cost lowered available
beam time - Resulting in less information on SEFI signatures
- Found partial hardware watchdog in VSBC-8d
- Provided unexpected additional prototype data
13H-Core SEU Test System
VSBC-8d Computer
RS-232
Communication Link - Ethernet
Interrupt Lines
Reset Line
- Software Hardware Include
- SEFI test loop
- Diagnostic self-test routine
- Hardware watchdog to Reset
- Linux software watchdog
- Local APIC (PIII) routines
- Diagnostic self-test
- Recovery code display to screen
Monitor Computer
- Software Routines
- Mode control
- Data Collection
- SEFI Identification
- Diagnostic self-test
14Pentium III SEFI Test Set
VSBC Video
Monitor PC
VSBC Hardware - PIII
15VSBC Pentium Hardware
External Fan
IDE Drive
Pentium w/ Heat Sink
Network Switch
Multiplex Card
VSBC Computer
16SEFI Test Software
- VSBC Linux OS
- VSBC SEFI Test Loop
- Ethernet serial communicate
- Math test
- Timer test
- Network test
- IDE test
- Monitor
- Communication
- Mode control
- Datalog
- Parallel port control software
17Selected Pentium Control Signals
- BINIT - bus state machine reset
- INIT - resets integer registers
- LINT0 INTR interrupt (no avail. Interrupt
vector) - IRQ5 INTR hardware signal thru PCI bus
- LINT1 non-maskable interrupt, or NMI
- RESET - PIII hardware reset
- SMI - system management interrupt (not tested,
no available interrupt vector on VSBC)
18How to Connect to PIII Signals?
- NOT EASY
- Multiplex with VSBCs signals
- De-populate PIII pins hardwire to MUX circuit
- Have good technicians
19Proton Radiation Test Results
- Tested Hardened Core using Intel Pentium III in
Proton Environment
20Hardened Core Radiation Test
- Tested at UC Davis with 51 MeV Protons
- Test CPU was Pentium III, Intel, 850 MHz
- Summary Results
- 21 SEFIs induced
- 21 recoveries by SEFI Watchdog Functions
- IRQ, NMI and Reset brought back Pentium III
- Patent Pending
- RESULT Hardened Core Proven with Protons
21Detailed Test Results
22H-Core Success Rate by Signal
23Hardened Core Design
24Hardened Core is More Than a Chip!
- Timer code when NO SEFI
- KILL Threads post SEFI
- Read H-Core Status Flags
- Flush cache registers
- Recovery routines
- Rollback software routines
- Rollback data stored in Memory
- Store critical variable periodically
- Store instruction pointer locations
- Software hardware
- Software allows for post SEFI Recovery
25Programmable Hardened Core Block Diagram
- Usable for all CPUs
- Min 8 Interrupt signals
- MOSFET driver OUT for power cycle control
- Variable pulse width
- Variable timer length
- 1 ms, 1 s, 1 min, etc
- Status of CPU saved
- Flags available
- External ON/OFF control
- External H-Core reset
26Predicted SEU/SEFI Rates Proton100k Computer
- SEFI 1E-2 corrected resets/day
- Using Hardened Core
- 2,400 MIPS, 64 bits _at_400 MHz
- gt1,440 MIPS SEU corrected
- SEU lt 1E-5 uncorrected errors/day
- No SEL
- Total Dose gt 100 krad
- 4.9 W CPU, 8W total power
- VxWorks and Linux OS s/w
27Hardened Core Roadmap
- From Inception to Availability
28Hardened Core Roadmap
Chip Design
Preliminary Design
H-Core Inception
Benchtop Model
Radiation Verification
Software Design
Effort is Complete
Future
- Verification of H-Core complete
- Preliminary H-Core Design Complete
- Design manufacture as rad hard chip
- Improve H-Core software routines
29Future Research Options
- Collect additional microprocessor recovery data
- SEFI test additional processors (PowerPC, BSP-15,
TI DSP) - SEFI test more samples (statistical improvement)
- Radiation test simpler microprocessor structures
- State machine logic (in FPGA)
- Instruction pointer to software (in simple
micro-controller) - Memory cells
- Embedded test logic
- Radiation test improved recovery software
routines - Thread kill cleanup routines
- H-Core status flag check, used as pointer to
restart routines - Restart routines
30Hardened Core Planned Availability
- Hardened Core has been added to Space Micros
Proton100k computer product - Circuit for H-Core in Actel FPGAs available now
- Stand-alone H-Core IC product available in 2004
- Application software kernels will be made
available to customers
31Conclusions
- SEFI is growing problem for microprocessors
- New Hardened Core H/W S/W solution
- Hardened Core benchtop model radiation tested
- 850 MHz Intel Pentium III test device
- Proton radiation testing completed
- Results show 100 success rate
- Preliminary design of H-Core complete
- Added to Proton100k satellite computer
- Space Micro has plan to design manufacture rad
hard chip for commercial availability