Unreliable Silicon: Myth or Reality? - PowerPoint PPT Presentation

1 / 8
About This Presentation
Title:

Unreliable Silicon: Myth or Reality?

Description:

Title: Unreliable Silicon: Myth or Reality? Author: Shubu Mukherjee Last modified by: ssmukher Created Date: 10/16/2001 5:01:05 PM Document presentation format – PowerPoint PPT presentation

Number of Views:37
Avg rating:3.0/5.0
Slides: 9
Provided by: ShubuMu5
Category:

less

Transcript and Presenter's Notes

Title: Unreliable Silicon: Myth or Reality?


1
Unreliable Silicon Myth or Reality?
  • Shubu Mukherjee
  • Principal Engineer
  • Director, SPEARS Group
  • (SPEARS Simulation Pathfinding of Efficient
    And Reliable Systems)
  • Intel Corporation
  • Workshop on Computer Architecture Research
    Directions (CARD)
  • Feb. 11th, 2007

2
Whats the Truth?
  • There are three versions of the truth
  • My truth
  • Your truth
  • The truth

3
The Truth Silicon is Becoming Unreliable
4
The End-Users Truth
  • End-users
  • Care deeply about reliable systems
  • May not be able to determine why their system
    failed
  • Expect the industry to produce reliable systems
    for them
  • Goal of silicon vendors
  • Keep silicon errors low enough (e.g., lt 0.1 of
    all errors)
  • Low enough that end-users dont notice or dont
    care
  • Point Risks
  • Individual corruption or crash may be critical
    (e.g., Windows 98 crash during a Gates demo)
  • End-users may demand chip replacement, even if
    the error was not permanent

5
The IT Manager or Vendors Truth
  • The Lightbulb Phenomenon
  • A house with 48 lightbulbs, each with 4 year MTTF
  • Will replace a lightbulb every month
  • Negative Impact to Business ? billions of dollars
    involved
  • Increased total cost of ownership
  • Product returns replacement
  • Loss of data and/or availability

6
The Designers Awakening
  • Shock
  • SER is the crabgrass in the lawn of computer
    design
  • Denial
  • We will do the SER work two months before
    tapeout
  • Anger
  • Our reliability target is too ambitious
  • Acceptance
  • You can deny physics only for so long

Designers have accepted silicon reliability as a
challenge they will have to deal with
7
The Designers Challenge
  • Protection comes from
  • Process improved process technology
  • Materials shielding for alpha particles
  • Circuits rad-hard cells
  • Architecture ECC, parity, hardened gates,
    redundant execution
  • Software can provide detection recovery at
    higher level
  • Companies constantly making trade-offs for
    reliability
  • Cost of protection (performance die size) vs.
    chip reliability
  • Products must meet the end-users reliability
    expectations

Industry will produce reliably operating parts
8
Industry Needs Help with Research
  • Academia has some misconceptions
  • MTBF is only a rough estimate of an individual
    parts life
  • A system hang does not protect from data
    corruption
  • Adding protection without correction does not
    reduce the overall error rate
  • Research needed in different areas of silicon
    reliability
  • How do we predict and/or measure error rate from
    radiation, wearout, variability?
  • How do we detect soft errors, wearout,
    variability on individual parts?
  • Many traditional solutions exist, but how do we
    make them cheaper?
Write a Comment
User Comments (0)
About PowerShow.com