Top 5 Reasons Reliability is the Biggest Fallacy in Computer Architecture Research

About This Presentation
Title:

Top 5 Reasons Reliability is the Biggest Fallacy in Computer Architecture Research

Description:

Top 5 Reasons Reliability is the Biggest Fallacy in Computer Architecture Research Scott Mahlke University of Michigan Thanks to Jason Blome, Shuguang Feng, and ... –

Number of Views:134
Avg rating:3.0/5.0
Slides: 11
Provided by: fank150
Category:

less

Transcript and Presenter's Notes

Title: Top 5 Reasons Reliability is the Biggest Fallacy in Computer Architecture Research


1
Top 5 ReasonsReliability is the Biggest Fallacy
in Computer Architecture Research
  • Scott Mahlke
  • University of Michigan

Thanks to Jason Blome, Shuguang Feng, and
Shantanu Gupta for putting their research on
reliable systems on hold to help with this
presentation.
2
Disclaimer
Still a need for high reliability designs for
mission critical systems
  • Space shuttle, airplanes, etc.
  • Cost is not an issue use high degrees of
    redundancy

I would like to convince you reliability is a
fallacy for mainstream computer systems used in
consumer/business electronics
The speaker may not agree with this position
3
Reason 1 Its the Software, Stupid!
  • Mature OS can have an MTTF measured in months,
    while newer OS may crash every few days. Peter
    Chen Reliability Hierarchies, 1999 HOT OS.
  • Sources 1 www.nstl.com 2 A system-level
    approach for memory robustness, ICMTD05 3
    Lifetime Reliability Towards an architectural
    solution, IEEE Micro 2005 4 www.calce.umd.edu

4
Hmm My ATM Does Not Work
5
Reason 2 Disposable Electronics
  • The average working life of a mobile phone is 7
    years, but the average consumer changes their
    mobile every 11 months.

6
PCs/Laptops Not Far Behind
  • Take-away something.
  • http/ieeexplore.ieee.org/iel5/9100/28876/01299720
    .pdf

7
Reason 3 A Transient Fault is About As Likely As

8
Reason 4 Does Anyone Care?
Which is flawed?
  • Can a human identify errors in video, images, or
    sound?
  • Glitches are accepted by the consumer (dropped
    cell calls)
  • Natural redundancy and resiliency in software

100 reliable operation of hardware is not
important or worth extra cost in many situations
9
Reason 5 This Problem is Better Solved Closer to
the Circuit Level
Electromigration in copper
  • Lower overhead
  • Many designs benefit
  • In-situ solutions naturally handle variation

10
Some Hope?
What if we assume reliability is a looming
problem. Then we need solutions that are
  • Low overhead, high rate of return
    solutions Joint circuit/architectural
    techniques
  • Domain specific solutions know thy customer
  • Reliability features provide other
    benefits Its not just a tax

The bottom line
Write a Comment
User Comments (0)
About PowerShow.com