Title: Top 5 Reasons Reliability is the Biggest Fallacy in Computer Architecture Research
1Top 5 ReasonsReliability is the Biggest Fallacy
in Computer Architecture Research
- Scott Mahlke
- University of Michigan
Thanks to Jason Blome, Shuguang Feng, and
Shantanu Gupta for putting their research on
reliable systems on hold to help with this
presentation.
2Disclaimer
Still a need for high reliability designs for
mission critical systems
- Space shuttle, airplanes, etc.
- Cost is not an issue use high degrees of
redundancy
I would like to convince you reliability is a
fallacy for mainstream computer systems used in
consumer/business electronics
The speaker may not agree with this position
3Reason 1 Its the Software, Stupid!
- Mature OS can have an MTTF measured in months,
while newer OS may crash every few days. Peter
Chen Reliability Hierarchies, 1999 HOT OS. - Sources 1 www.nstl.com 2 A system-level
approach for memory robustness, ICMTD05 3
Lifetime Reliability Towards an architectural
solution, IEEE Micro 2005 4 www.calce.umd.edu
4Hmm My ATM Does Not Work
5Reason 2 Disposable Electronics
- The average working life of a mobile phone is 7
years, but the average consumer changes their
mobile every 11 months.
6PCs/Laptops Not Far Behind
- Take-away something.
- http/ieeexplore.ieee.org/iel5/9100/28876/01299720
.pdf
7Reason 3 A Transient Fault is About As Likely As
8Reason 4 Does Anyone Care?
Which is flawed?
- Can a human identify errors in video, images, or
sound? - Glitches are accepted by the consumer (dropped
cell calls) - Natural redundancy and resiliency in software
100 reliable operation of hardware is not
important or worth extra cost in many situations
9Reason 5 This Problem is Better Solved Closer to
the Circuit Level
Electromigration in copper
- Lower overhead
- Many designs benefit
- In-situ solutions naturally handle variation
10Some Hope?
What if we assume reliability is a looming
problem. Then we need solutions that are
- Low overhead, high rate of return
solutions Joint circuit/architectural
techniques - Domain specific solutions know thy customer
- Reliability features provide other
benefits Its not just a tax
The bottom line