Fault-Tolerant Clustering for FPGAs - PowerPoint PPT Presentation

1 / 21
About This Presentation
Title:

Fault-Tolerant Clustering for FPGAs

Description:

Fault-Tolerant Clustering for FPGAs Jason Cong and Brian Tagiku VLSI CAD Laboratory Computer Science Department University of California, Los Angeles – PowerPoint PPT presentation

Number of Views:151
Avg rating:3.0/5.0
Slides: 22
Provided by: ucl50
Category:

less

Transcript and Presenter's Notes

Title: Fault-Tolerant Clustering for FPGAs


1
Fault-Tolerant Clustering for FPGAs
  • Jason Cong and Brian Tagiku
  • VLSI CAD Laboratory
  • Computer Science Department
  • University of California, Los Angeles
  • cong,btagiku_at_cs.ucla.edu
  • http//cadlab.cs.ucla.edu/

2
Outline
  • Background
  • Problem Model and Formulation
  • Fault-Tolerant Clustering
  • Fault Assignment and Reconfiguration
  • Future Work

3
Previous Work
  • Flat (Non-Hierarchical) FPGAs
  • Hatori et al. (Toshiba, 1993) Spare rows of
    CLBs
  • Howard et al. (Univ. of York, 1994) Spare
    blocks of CLBs
  • Hanchek and Dutt (Intel/UIUC, 1996) Node
    Covering, each CLB assigned a node to cover
  • Lach et al. (UCLA, 98) Tiling, FPGA partitioned
    into tiles and alternate configurations for each
    tile precomputed
  • Hierarchical FPGAs
  • Lakamraju and Tessier (Univ. of Mass., 2000)
    Spare elements in each block level

4
Related Work
  • Fault covering in memory arrays
  • Spare row and columns available
  • Must use spares to cover entire row or column in
    which faults occur
  • Difficulty lies in finding a set of covering rows
    and columns
  • Comparison to fault tolerance in FPGAs
  • A set of spares to cover faults is easy to find
  • Difficulty is finding a set that allows a target
    delay to be met

5
Hierarchical FPGAs
  • 2 level, hierarchical circuit logic
  • Level 0 Blocks LUTs
  • Level 1 Blocks Clusters of LUTs
  • Uses locality of interconnections to improve
    circuit performance

6
Redundancy in FPGAs
  • LUTs can fail with some probability
  • Allocate extra components (e.g. LUTs) into the
    system
  • Re-route inputs and outputs to a spare LUT
  • Ideally, want the spare LUT to be close to the
    failure so that delay does not increase

7
Redundancy in FPGAs
  • LUTs can fail with some probability
  • Allocate extra components (e.g. LUTs) into the
    system
  • Re-route inputs and outputs to a spare LUT
  • Ideally, want the spare LUT to be close to the
    failure so that delay does not increase

8
Redundancy in FPGAs
  • LUTs can fail with some probability
  • Allocate extra components (e.g. LUTs)
  • Re-route inputs and outputs to a spare LUT
  • Ideally, want the spare LUT to be close to the
    failure so that delay does not increase

9
Redundancy in FPGAs
  • LUTs can fail with some probability
  • Allocate extra components (e.g. LUTs) into the
    system
  • Recover from defects by using spare LUTs
  • Ideally, want the spare LUT to be close to the
    failure so that delay does not increase

10
Redundancy in FPGAs
  • LUTs can fail with some probability
  • Allocate extra components (e.g. LUTs) into the
    system
  • Re-route inputs and outputs to a spare LUT
  • Ideally, want the spare LUT to be close to the
    failure so that delay does not increase

11
Fault Tolerant Clustering
  • Inputs
  • A DAG G (LUT Netlist)
  • An HFPGA with k clusters of c LUTs
  • Inter/Intracluster edge delays
  • Probability of LUT defects
  • Target delay D
  • Goal
  • Map G into the HFPGA to maximize probability of
    achieving delay D after reconfiguration of
    failures

A
B
C
D
12
Motivational Example
Probability of LUT failure 0.1
Maximum intercluster edges along path Probability
1 0.89
2 0.09
failure 0.02
Maximum intercluster edges along path Probability
1 0.97
2 0.01
failure 0.02
13
Dynamic Programming Heuristic
  • Use a dynamic programming matrix A
  • Each entry Ai,j,k stores a clustering solution
    of node i and its predecessors such that
  • Exactly j clusters are used
  • Arrival time at i is at most k
  • The probability of achieving delay k is maximized
  • Allows node duplication
  • Assumes constant fan-in

14
DP Heuristic Performance
  • 10 failure rate
  • Intracluster edge delay 0
  • Intercluster edge delay 3
  • LUT delay 1
  • 8 clusters each of 3 LUTs
  • Target delay of 7

15
DP Heuristic Performance
Min-delay clustering Achieves delay 7 with
probability 0.28
DP clustering Achieves delay 7 with probability
0.39
16
Difficulties
  • Best known algorithm for calculating probability
    distribution of delays is exponential
  • Doesnt specify how to reconfigure a circuit

17
Failure Assignment
  • Inputs
  • A DAG G
  • An HFPGA with k clusters of c LUTs
  • Inter/Intracluster edge delays
  • Target delay D
  • A mapping of G into the FPGA
  • A set of failed LUTs
  • Goal
  • Reassign failed LUTs to spare LUTs so that the
    delay D is still met.

18
More Difficulties
  • Failure Assignment is NP-Complete
  • Even with fixed cluster sizes c 4
  • Even if spares are guaranteed to be non-defective
  • Even if we are guaranteed at least m spares per
    cluster
  • Even if no more than P percent of the LUTs fail
  • These results seem to imply that Fault Tolerant
    Clustering is also a hard problem

19
Online Failure Assignment
  • Defects and faults announced online (one at a
    time)
  • We must assign a fault to a spare when announced
  • We cannot change our mind at a later time
  • Related Problems
  • Online Routing
  • Faults need to be connected to spares
  • ?(log n)-competitive algorithm known
  • Online Bipartite Matching
  • Faults need to be matched to spares
  • O(log3 n) randomized algorithm for metric cases
    known

20
Future Work
  • Try to modify inputs to Failure Assignment
  • So problem admits a poly-time exact (or
    approximation) algorithm
  • Restrict the given mapping to allocate spares in
    some manner
  • Use results from Failure Assignment to guide
    clustering algorithms
  • Generalize delay model so that LUT placement is
    also considered

21
Future Applications
  • Nanoscale FPGAs
  • Integrate with BIST to make a self-repairing
    system
  • Produce profitable yield despite high defect rates
Write a Comment
User Comments (0)
About PowerShow.com