Title: PiCAP: A Parallel and Incremental Capacitance Extraction Considering Stochastic Process Variation
1In-Place Decomposition for Robustness in FPGA
Ju-Yueh Lee, Zhe Feng, and Lei He Electrical
Engineering Dept., UCLA Presented by Ju-Yueh
Lee Address comments to Lei He
(lhe_at_ee.ucla.edu)
2Outline
- Preliminaries and Motivations
- In-Place Decomposition(IPD) Formulations
- IPD Properties and Algorithms
- Experimental Results
- Conclusions and Future Work
3Single Event Upset (SEU)
- Soft errors caused by single event upsets (SEUs)
due to cosmic rays, supply voltage fluctuations,
electromagnetic coupling - Result in bit flips in the affected memory
elements
- SRAM-based FPGA
- High density
- Reprogrammability
- Susceptible to SEU
- Configuration memory and user FF and register
- Soft errors in configuration memory have
permanent impact until scrubbing
4Related Work
- Explicit redundancy
- Triple Modular Redundancy (TMR)
- Logic masking with little or no area overhead
- Robust FPGA resynthesis based on fault tolerant
boolean matching , Hu-et al, iccad08 - IPR In-place reconfiguration for FPGA fault
tolerance, Feng-et al, iccad09
5Related Work
- Fault-tolerant Resynthesis with Dual-Output
LUT, Lee-et al, aspdac2010
Duplication and Encoding
Original LUT
6Limitation Slow Design Closure
- Logic coding in fanout LUT leads to
- Extra interconnects ? more delay, more routing
congestion, and more interconnect faults - Slow design closure between logic and physical
syntheses
Extra interconnects
7Limitation not Applicable to Large Functions
- Duplication cannot be applied to
- 6-input functions for Virtex-5 with LUT6
- 5- and 6-input functions for Stratix-IV with ALM
8Outline
- Preliminaries and Motivation
- In-Place Decomposition(IPD) Formulations
- IPD Properties and Algorithms
- Experimental Results
- Conclusions and Future Work
9Fault Metrics
- Criticality of a configuration SRAM bit
- Mean-Time-To-Failure (MTTF)
- System level measurement of reliability
- For single fault model, MTTF ? 1/average(Cb)
10Logic Decomposition
- Decomposition F C( F1, F2, , Fn ) (C Is
the logic function of the converging logic, e.g.,
AND)R)
Decomposition
Original LUT
Decomposed LUT
11Circuit Implementation of Decomposition
- Logic decompositon by decomposable LUT
- Converging logic by fanout programmable logic
block (PLB) or carry chains/adders - Carry chains/adders has only 10 to 50
utilization rate, an ideal choice for converging
Altera Stratix-IV
Xilinx Virtex-5
12Circuit Implementation of Decomposition
- Logic decompositon by decomposable LUT
- Converging logic by fanout programmable logic
block (PLB) or carry chains/adders - Carry chains/adders has only 10 to 50
utilization rate, an ideal choice for converging
Altera Stratix-IV
Xilinx Virtex-5
13In-Place Decomposition
- Converging in a same PLB ? In-place
decompositionor OR)
Carry Chain or Adder
Decomposable LUT
Decomposition
Original LUT
14Carry Chain/Adder Configuration
- Configure Carry Chaings/Adders as AND/OR gates
15Example 1 In-Place Duplication
- An example of In-Place Duplication on n24 node
from alu4 benchmark circuit
F1 0xA280 F2 0xA280 Avg. Crit. 0.2116
F 0xA280 Avg. Crit. 0.6876
IPD
In-Place Duplication
Original LUT
16Example 2 In-Place Decomposition
F 0x8000 (duplication not applicable) Avg.
Crit. 0.8
F1 0x7 F2 0x7 Avg. Crit. 0.4
IPD
In-Place Decomposition
Original LUT
17Recap of IPD
Given a placed and routed
circuit Objective decompose LUTs to
minimize criticalities
- IPD Advantages
- No PLB level overhead
- Fast design closure
- Can be applied to large functions
18Outline
- Preliminaries and Motivation
- In-Place Decomposition(IPD) Formulations
- IPD Properties and Algorithms
- Experimental Results
- Conclusions and Future Work
19IPD Property 1
- An optimal LUT decomposition can be obtained by
duplication when duplication is applicable
20IPD Property 1
- An optimal LUT decomposition can be obtained by
duplication when duplication is applicable
21In-place Duplication vs Decomposition
In-PlaceDecomposition
In-PlaceDuplication
22IPD Property 2
- An optimal LUT decomposition can be obtained by
applying AND or OR converging logic
23IPD Property 2
- An optimal LUT decomposition can be obtained by
applying AND or OR converging logic
24IPD Algorithm
- IPD problem is formulated to an integer linear
programming (ILP) problem and solved optimally
for each PLB
Objective
- Minimize LUT criticality
Subject to
- Circuit function is preserved
- LUT architecture
constraints - Logic
masking constraints
25Boolean Matching Constraints
- The circuit function is preserved by boolean
matching constraints considering LUT architecture - Boolean matching for each input pattern
- Each SRAM bit is matched such that the LUT
function is preserved
26Logic Masking Constraints
- The criticality reduction is constrained by the
logic masking that IPD can produce - Criticality update constraint
27Logic Masking Constraints
- The criticality reduction is constrained by the
logic masking that IPD can produce - Criticality update constraint
28Outline
- Preliminaries and Motivation
- In-Place Decomposition(IPD) Formulations
- IPD Properties and Algorithms
- Experimental Results
- Conclusions and Future Work
29Experimental Settings
- Assume single fault
- Apply to architectures similar to Xilinx Virtex-5
and Altera Stratix-IV FPGAs - Using the 10 largest MCNC combinational circuits
mapped to 6-input LUT by ABC technology mapper
30Experiment on Virtex-5 Similar Architecture
- 2.43X MTTF improvement on ex1010 circuit under 0
carry chain utilization rate
- 5-input or smaller functions are in-place
duplicated - 6-input functions are hardly improved
31Experiment on Stratix-IV Similar Architecture
- 9.67X MTTF improvement on apex2 circuit under 0
adder utilization rate
- 4-input or smaller functions are in-place
duplicated - 5 6-input functions are decomposed with four
inputs shared
32IPD Improvement Indicator
- A good IPD improvement indicator by the
criticality difference between on set and off
set of SRAM bits
33Outline
- Preliminaries and Motivation
- In-Place Decomposition(IPD) Formulations
- IPD Properties and Algorithms
- Experimental Results
- Conclusions and Future Work
34Conclusions and Future Work
- Proposed a new robust technique leveraging
decomposable LUTs and built-in carry chain/adder
in modern FPGAs - 1.59X MTTF improvement for architecture similar
to Xilinx Virtex-5 - 4.51X MTTF improvement for architecture similar
to Altera Stratix-4 - Future works
- Develop a more accurate robustness analysis and a
more efficient algorithm for sequential circuits - Study the interaction between IPD and existing
fault tolerant techniques for robustness
optimization.
35Thanks
In-place Decomposition for Robustness in
FPGAJu-Yueh Lee, Zhe Feng, and Lei He
36IPD Experimental Results (1)
- IPD on 10 biggest MCNC combinational circuits
37IPD Experimental Results (1)
- IPD on 10 biggest MCNC combinational circuits
9.67Ximprovement
38IPD Algorithm Flow
39Dual-Output LUT Configurations
- Xilinx Virtex-5 6-input LUT
- Altera Stratix-IV ALM
Func. sizes of shared inputs
5 , 5 5
Func. sizes of shared inputs
4 , 4 0
5 , 3 0
5 , 4 1
5 , 5 2
6 , 6 4
40IPD Experimental Results
- IPD under different carry chain/adder utilization
rate - 9.67X MTTF improvement on apex2 circuit
41Criticality Update for In-Place Duplication (1)
Average Crit. 0.125
Average Crit. 0.25
Duplication
Input Output Crit.
00 0 0.2
01 1 0.2
10 0 0.4
11 1 0.2
Input Output Crit.
00 0 0.2
01 1 0.2
10 0 0.4
11 1 0.2
Input Output Crit.
00 0 0.2
01 1 0.2
10 0 0.4
11 1 0.2
Input Output Crit.
00 not used 0
01 not used 0
10 not used 0
11 not used 0
42Criticality Update for In-Place Duplication (2)
Average Crit. 0.08
Average Crit. 0.25
ANDEncoding
Input Output Crit.
00 0 0.2
01 1 0.2
10 0 0.4
11 1 0.2
Input Output Crit.
00 0 0
01 1 0.2
10 0 0
11 1 0.2
Input Output Crit.
00 0 0.2
01 1 0.2
10 0 0.4
11 1 0.2
Input Output Crit.
00 0 0
01 1 0.2
10 0 0
11 1 0.2
43Criticality Update for In-Place Decomposition
Average Crit. 0.125
Average Crit. 0.075
Duplication
Input Output Crit.
00 0 0.1
01 0 0.05
10 0 0.1
11 1 0.05
Input Output Crit.
000 0 0.05
001 0 0.05
010 0 0.05
011 0 0.05
100 0 0.05
101 0 0.05
110 0 0.05
111 1 0.65
Input Output Crit.
00 0 0.1
01 0 0.1
10 0 0.05
11 1 0.05