Radiation Tolerant Intelligent Memory Stack (RTIMS) - PowerPoint PPT Presentation

1 / 27
About This Presentation
Title:

Radiation Tolerant Intelligent Memory Stack (RTIMS)

Description:

Title: RTIMS Second Interim Review 2005 Author: Jeff Herath Last modified by: rk Created Date: 1/1/1601 12:00:00 AM Document presentation format: On-screen Show – PowerPoint PPT presentation

Number of Views:132
Avg rating:3.0/5.0
Slides: 28
Provided by: JeffH56
Learn more at: http://klabs.org
Category:

less

Transcript and Presenter's Notes

Title: Radiation Tolerant Intelligent Memory Stack (RTIMS)


1
Radiation Tolerant Intelligent Memory Stack
(RTIMS)
Tak-kwong Ng, Jeffrey HerathElectronics Systems
BranchSystems Engineering DirectorateNASA
Langley Research Centert.ng_at_nasa.govjeffrey.a.he
rath_at_nasa.gov 757-864-1097 (Tak)757-864-1098
(Jeff)
2
Agenda
  • What is it ?
  • Goals
  • Components selection
  • FPGA SEU mitigation
  • XTMR tools
  • Status
  • Future work
  • Points to ponder

3
What is it ?
  • Radiation tolerant
  • Use commercial-off-the-shelf (COTS) components
  • Reprogrammable FPGA
  • High performance
  • Lower cost
  • Pick parts with applicable mitigation techniques
  • Shielding, over-current protection, triple module
    redundancy, FPGA configuration scrubbing
  • Intelligent
  • Reprogrammable FPGA
  • SDRAM controller
  • Capacity to add custom logic
  • Memory
  • Large capacity
  • SDRAM
  • Stack
  • 3D vs 2D, board space saving

4
Goals
  • Large memory capacity
  • 256 MB EDAC
  • Single 3.3V power supply
  • Simple interface, LVTTL compatible
  • Throughput
  • 32 MWord write
  • 16 MWord read
  • Reprogram via the JTAG interface
  • Spare FPGA gate capacity for user application
  • Radiation characteristics
  • Total ionizing dose of 100 krad (Si) at 25o C
  • SEU best practice
  • SEL of 60 MeV-cm2/mg requirement
  • Operating temperature -40o C / 85o C

5
Components Selection (1/3)
  • FPGA
  • Reprogrammable
  • Xilinx Virtex, Virtex-II
  • XQR2V1000
  • Total ionizing dose of 200 krad (Si) (data
    sheet)
  • SEL of 160 MeV-cm2/mg (data sheet)
  • Current limiters
  • Limited SEFI
  • POR, SelectMAP, JTAG
  • 1.5E-6 upsets/device/day (data sheet)
  • SOFT
  • Mitigation techniques TMR, configuration
    scrubbing
  • XQ2V1000-4BG575
  • Military version for lower cost
  • SEL may not be as good as XQR2V1000
  • SEL of 124 MeV-cm2/mg
  • Capacity of 1 M gates
  • 328 Signal I/Os

6
Components Selection (2/3)
  • EEPROM
  • Xilinx XQR18V04
  • Total ionizing dose of 10 krad (Si) (data sheet)
  • 30 krad (Si) for read only (data sheet)
  • SEL of 120 MeV-mg/cm2 (data sheet)
  • SEU of 120 MeV-mg/cm2 (data sheet)
  • SDRAM
  • Elpida EDS5108ABTA (512Mb)
  • Total ionizing dose of 50 krad (Si)
  • SEL of 80 MeV-mg/cm2 at 85o C, 100o C, 125o C
  • SEU
  • Bit error rate of 6.96E-12 errors/bit-day
  • SEFI error rate of 1.3E-4 errors/device-day
  • Linear Regulator
  • Texas Instrument TPS75715 (1.5V LDO regulator)
  • Total ionizing dose of 10 krad (Si)
  • SEL of 60 MeV-cm2/mg

7
Components Selection (3/3)
  • Current limiters
  • Maxim-IC MAX893L (1.2A) , MAX891L (0.5A)
  • Total ionizing dose SEL of 30 krad (Si)
  • Power-On-Reset circuit
  • Maxim-IC MAX803
  • Total ionizing dose of 20 krad (Si)
  • Stacking technology
  • Provided by 3D Plus

8
Radiation Mitigation
  • Total ionizing dose
  • Local shielding
  • Package shielding, thickness depend on
    requirement
  • SEL
  • Current limiting device
  • SEU
  • Memory contents
  • TMR, EDAC
  • FPGA SEU
  • Configuration scrubbing, TMR
  • SEFI
  • Best effort to minimize the SEFI rate
  • Mitigate at higher level

9
Block Diagram
10
FPGA SEU Mitigation (1/5)
  • Input
  • Xilinx recommendation
  • Use 3 pins per signal, connected on the board
  • Bus signals use one pin per signal, add EDAC,
    save pins
  • The sending side must generate EDAC check bits
  • Pins can be used up quickly
  • Implementation
  • Module Interface
  • Use 3 pins per signal for address/controls
  • Use 1 pin per signal for Din
  • EDAC is optional
  • Single point failure rate increases without EDAC

11
FPGA SEU Mitigation (2/5)
  • Output
  • Xilinx recommendation
  • Use 3 pins per signal, connected on the board
  • Not glitch-free
  • Signal integrity
  • Bus signals use one pin per signal, add EDAC,
    save pins
  • The receiving side must also implement EDAC
  • Pins can be used up quickly
  • Implementation
  • Module interface
  • Use 3 pins per signal for controls
  • Use 1 pin per signal for Dout
  • EDAC is optional
  • Single point failure rate increases without EDAC

12
FPGA SEU Mitigation (3/5)
  • Output
  • Implementation
  • SDRAM interface
  • Clock, Address
  • 3 sets, equivalent signals are not connected
    together on the board,
  • Each set drives two SDRAMs
  • Controls
  • 4 sets, equivalent signals are not connected
    together on the board
  • Two of the sets, each drives two SDRAMs
  • The other two sets, each drives one SDRAM
  • Switch EDAC/TMR configured SDRAM

13
FPGA SEU Mitigation (4/5)
  • Bi-directional
  • Xilinx recommendation
  • Use 1 pin per signal
  • Path from voter to the pin becomes possible
    single point failure
  • Implementation
  • SDRAM Interface
  • TMR configured SDRAMs
  • 3 sets of data bus
  • EDAC configure SDRAMs
  • Use 1 pin per signal

14
FPGA SEU Mitigation (5/5)
  • Implication on data integrity of the SDRAM
    contents
  • EDAC configured SDRAMs
  • 256 MB
  • Output drivers and input receivers are possible
    single point failure
  • TMR configured SDRAMs
  • 128 MB
  • No single point failure
  • Back ground SDRAMs content scrubbing

15
XTMR Tool (1/4)
  • Fairly fast
  • Gates utilized
  • Average utilization cost of TMR is 3.2x
  • RTIMS actual
  • 4.3x
  • Gates multiplier 3 3 (fraction of flops
    fraction of I/Os)
  • It is closer to 3x for design that is mostly
    gates
  • It is closer to 6x for design that is mostly
    flops
  • RTIMS actual 36 flops
  • Additional multiplier for design with SRL16

16
XTMR Tool (2/4)
  • Internal performance degradation
  • Average performance impact of TMR is 10
  • RTIMS actual
  • 20
  • 6 logic levels original
  • Add a voter, 7 levels
  • 15 performance impact
  • Longer routing
  • 3.8x gates
  • 5 performance impact

17
XTMR Tool (3/4)
  • I/O performance degradation
  • Input Pin
  • TMR
  • Voters after the FF
  • Lock the FF in the IOB
  • No TMR on input pin
  • 3 FFs after the input receiver
  • Cant lock the FF in the IOB
  • Performance penalty
  • RTIMS actual increased from 1.8 ns to 3.6 ns

18
XTMR Tool (4/4)
  • Output Pin
  • Triplicate pin, tied together on board
  • Add Voter before the output driver
  • Glitch
  • Cant lock the FF in the IOB
  • Performance penalty
  • Signal integrity
  • Not triplicating pin
  • Add voter before the output driver
  • Glitch
  • Cant lock the FF in the IOB
  • Performance penalty
  • RTIMS actual increased from 4.5 ns to 6.4 ns

19
Storage state
  • Correct SEU on storage state before the next SEU
    that make it uncorrectable
  • Memory content
  • Scrubbing
  • Flop state
  • Basic Xilinx flop FDCPE(PRE, D, CE, C, CLR, Q)
  • Inputs of FLOP are corrected
  • Unless CE is active, the Flop state is not
    corrected.
  • 3 minority voters and 3 OR gates can be added to
    force a CE on error detected
  • Expensive to apply this universally
  • For almost static flop, the following FLOP is
    used

20
A few other things (1/4)
  • Digital Clock Manger
  • Use 3 DCMs for each DCM that is in the original
    design
  • DCM is a unit
  • SEU on a FLOP in the DCM
  • Corrected by configuration scrubbing
  • Reset only
  • 3 counters, each counter is clocked by a DCM
  • When one of the counter value is different from
    the other two, we know which DCM is operating
    differently than the others
  • Each counter is TMR so that a SEU on the counter
    other than the clock path will not produce an
    error

21
A few other things (2/4)
  • Configuration scrubbing
  • Similar to Virtex
  • Virtex II
  • Whole configuration is loaded with 1 type 2
    command
  • The order of configuration loading is
  • GCLK, CLB and IOB, Memory Content, and Memory
    Control
  • Script to split the loading into three type 2
    command
  • GCLK, CLB, IOB
  • Memory control
  • Memory content
  • On power up the whole configuration is loaded
  • On scrubbing, only GCLK, CLB, IOB, and memory
    control are loaded

22
A few other things (3/4)
  • Configuration scrubbing
  • Scrubber logic is TMR and it is part of the FPGA
    code
  • Master SelectMap for configuration with
    configuration clock continue to run after initial
    load
  • Scrubber logic is clocked by the configuration
    clock
  • The generation of the configuration clock becomes
    a possible single point failure
  • Can switch to Slave SelectMap and add an external
    oscillator

23
A few other things (4/4)
  • SelectMap Interface SEFI detection
  • Implement a 16x1 distribute memory as SRL16 with
    initial value of all zeros
  • Instruct XTMR not to convert it to registers
  • Write a signature into this memory prior to
    configuration scrubbing
  • This memory shall be clear because of the
    reloading of the CLB during configuration
    scrubbing
  • Read the memory content after configuration
    scrubbing
  • A non-zero content indicates scrubbing failure

24
Stack
SDRAM
MISC
25
Status
  • 20 Modules
  • Related paper "Radiation Tolerant and
    Intelligent Memory for Space" (P1025)
  • 144-Lead QFP package
  • Dimensions42.5mm x 42.5mm x 13.0 mm
  • Mass 70g with radiation shielding
  • Power 4.0 W peak
  • To Be Verified / Analyzed
  • Total Ionizing Dose gt 100 krad (Si)
  • SEU in GEO less than 1.5E-6 per day
  • Latch-Up Immune to 60 MeV-cm2/mg

26
Future Work
  • VHDL and Place Route
  • Works in progress
  • Minimize SEFI
  • Error detection and recording
  • Error recovery
  • What is the SEFI rate of RTIMS ?
  • Environment testing
  • Life test (accelerated component life testing)
  • 100 krad (Si) TID radiation tests
  • SEL and SEU radiation tests
  • Vacuum and temperature tests
  • Mechanical stress tests
  • Electrostatic discharge tests

27
Points to ponder
  • XTMR
  • Not a turn key process
  • Scrub memory content
  • Almost static flop
  • DCM failure detection and reset
  • Glitch-free output is no longer glitch-free
  • Signal integrity with dotted output
  • IO
  • 3 pins for one signal, EDAC
  • Tie the triplicate IO together vs carry three
    signals on the board with the voter implemented
    on the receiving side
  • One size does not fit all
Write a Comment
User Comments (0)
About PowerShow.com