Computers for the PostPC Era

1 / 34

About This Presentation

Title:

Computers for the PostPC Era

Description:

scrubbing: periodic restoration of potentially 'decaying' hardware or software state ... via hardware and software introspection ... – PowerPoint PPT presentation

Number of Views:68

Avg rating:3.0/5.0

Slides: 35

Provided by: davidoppe

Learn more at: http://www.cs.berkeley.edu

more less

Transcript and Presenter's Notes

Title: Computers for the PostPC Era

1
Computers for the Post-PC Era

Aaron Brown, Jim Beck, Rich Martin, David
Oppenheimer, Kathy Yelick, and David Patterson
http//iram.cs.berkeley.edu/istore
2000 Grad Visit Day

2
Berkeley Approach to Systems

Find an important problem crossing HW/SW
Interface, with HW/SW prototype at end, typically
as part of graduate courses
Assemble a band of 3-6 faculty, 12-20 grad
students, 1-3 staff to tackle it over 4 years
Meet twice a year for 3-day retreats with invited
outsiders
Builds team spirit
Get advice on direction, and change course
Offers milestones for project stages
Grad students give 6 to 8 talks ? Great Speakers
Write papers, go to conferences, get PhDs, jobs
End of project party, reshuffle faculty, go to 1

3
For Example, Projects I Have Worked On

RISC I,II
Sequin, Ousterhout (CAD)
SOAR (Smalltalk On A RISC) Ousterhout (CAD)
SPUR (Symbolic Processing Using RISCs)
Fateman, Hilfinger, Hodges, Katz, Ousterhout
RAID I,II (Redundant Array of Inexp. Disks)
Katz, Ousterhout, Stonebraker
NOW I,II (Network of Workstations), (TD)
Culler, Anderson
IRAM I (Intelligent RAM)
Yelick, Kubiatowicz, Wawrzynek
ISTORE I,II (Intelligent Storage)
Yelick, Kubiatowicz

4
Symbolic Processing Using RISCs 85-89

Before Commercial RISC chips
Built Workstation Multiprocessor and Operating
System from scratch(!)
Sprite Operating System
3 chips Processor, Cache Controller, FPU
Coined term snopping cache protocol
3Cs cache miss compulsory, capacity, conflict

5
Group Photo (in souvenir jackets)
Jim Larus, Wisconsin, M/S
George Taylor, Founder, ?
David Wood,Wisconsin
Dave Lee Founder Si. Image
John Ouster- hout Founder, Scriptics
Ben Zorn Colorado, M/S
Mark Hill Wisc.
Mendel Rosen- blum, Stanford, Founder VMware
Susan Eggers Wash-ington
Brent Welch Founder, Scriptics
Shing Kong Transmeta
Garth Gibson CMU, Founder ?

See www.cs.berkeley.edu/Projects/ARC to learn
more about Berkeley Systems

6
SPUR 10 Year Reunion, January 99

Everyone from North America came!
19 PhDs 9 to Academia
8/9 got tenure, 2 full professors (already)
2 Romme fellows (3rd, 4th at Wisconsin)
3 NSF Presidential Young Investigator Winners
2 ACM Dissertation Awards
They in turn produced 30 PhDs (1/99)
10 to Industry
Founders of 5 startups, (1 failed)
2 Department heads (ATT Bell Labs, Microsoft)
Very successful group SPUR Project gave them a
taste of success, lifelong friends,

7
Network of Workstations (NOW) 94 -98

Leveraging commodity workstations and OSes to
harness the power of clustered machines connected
via high-speed switched networks
Construction of HW/SW prototypes NOW-1 with 32
SuperSPARCs, and NOW-2 with 100 UltraSPARC 1s
NOW-2 cluster held the world record for the
fastest Disk-to-Disk Sort for 2 years, 1997-1999
NOW-2 cluster 1st to crack the 40-bit key as part
of a key-cracking challenge offered by RSA, 1997
NOW-2 made list of Top 200 supercomputers 1997
NOW a foundation of Virtual Interface (VI)
Architecture, standard allows protected, direct
user-level access to network, by Compaq, Intel,
M/S
NOW technology led directly to one Internet
startup company (Inktomi), many other Internet
companies use cluster technology

8
Network of Workstations (NOW) 94 -98

12 PhDs. Note that 3/4 of them went into
academia, and that 1/3 are female
Andrea Arpaci-Desseau, Asst. Professor,
Wisconsin, Madison
Remzi Arpaci-Desseau, Asst. Professor, Wisconsin,
Madison
Mike Dahlin, Asst. Professor, University of
Texas, Austin
Jeanna Neefe Matthews, Asst. Professor, Clarkson
Univ.
Douglas Ghormley, Researcher, Los Alamos
National Labs
Kim Keeton, Researcher, Hewlett Packard Labs
Steve Lumetta, Assistant Professor, Illinois
Alan Mainwaring, Researcher, Sun Microsystems
Labs
Rich Martin, Assistant Professor, Rutgers
University
Nisha Talagala, Researcher, Network Storage, Sun
Micro.
Amin Vahdat, Assistant Professor, Duke University
Randy Wang, Assistant Professor, Princeton
University

9
Research in Berkeley Courses

RISC, SPUR, RAID, NOW, IRAM, ISTORE all started
in advanced graduate courses
Make transition from undergraduate student to
researcher in first-year graduate courses
First year architecture, operating systems
courses select topic, do research, write paper,
give talk
Prof meets each team 1-on-1 3 times, TA help
Some papers get submitted and published
Requires class size lt 40 (e.g., Berkeley)
If 1st year course size 100 students gt cannot
do research in grad courses 1st year or so
If school offers combined BS/MS (e.g., MIT) or
professional MS via TV broadcast (e.g.,
Stanford), then effective class size 150-250

10
Outline

Background Berkeley Approach to Systems
PostPC Motivation
PostPC Microprocessor IRAM
PostPC Infrastructure Motivation
PostPC Infrastructure ISTORE
Hardware Architecture
Software Architecture
Conclusions and Feedback

11
Perspective on Post-PC Era

PostPC Era will be driven by 2 technologies
1) GadgetsTiny Embedded or Mobile Devices
ubiquitous in everything
e.g., successor to PDA, cell phone, wearable
computers
2) Infrastructure to Support such Devices
e.g., successor to Big Fat Web Servers, Database
Servers

12
Intelligent RAM IRAM

Microprocessor DRAM on a single chip
10X capacity vs. SRAM
on-chip memory latency 5-10X, bandwidth 50-100X
improve energy efficiency 2X-4X (no off-chip
bus)
serial I/O 5-10X v. buses
smaller board area/volume
IRAM advantages extend to
a single chip system
a building block for larger systems

13
Revive Vector Architecture

Cost 1M each?
Low latency, high BW memory system?
Code density?
Compilers?
Performance?
Power/Energy?
Limited to scientific applications?

Single-chip CMOS MPU/IRAM
IRAM
Much smaller than VLIW
For sale, mature (gt20 years)(We retarget Cray
compilers)
Easy scale speed with technology
Parallel to save energy, keep performance
Multimedia apps vectorizable too N64b, 2N32b,
4N16b

14
VIRAM-1 System on a Chip

Prototype scheduled for end of Summer 2000
0.18 um EDL process
16 MB DRAM, 8 banks
MIPS Scalar core and
caches _at_ 200 MHz
4 64-bit vector unit
pipelines _at_ 200 MHz
4 100 MB parallel I/O lines
17x17 mm, 2 Watts
25.6 GB/s memory (6.4 GB/s per direction
and per Xbar)
1.6 Gflops (64-bit), 6.4 GOPs (16-bit)
140 M transistors (gt Intel?)

Memory (64 Mbits / 8 MBytes)
Xbar
I/O
Memory (64 Mbits / 8 MBytes)
15
Outline

PostPC Infrastructure Motivation and Background
Berkeleys Past
PostPC Motivation
PostPC Device Microprocessor IRAM
PostPC Infrastructure Motivation
ISTORE Goals
Hardware Architecture
Software Architecture
Conclusions and Feedback

16
Background Tertiary Disk (part of NOW)

Tertiary Disk (1997)
cluster of 20 PCs hosting 364 3.5 IBM disks (8.4
GB) in 7 19x 33 x 84 racks, or 3 TB. The
200MHz, 96 MB P6 PCs run FreeBSD and a switched
100Mb/s Ethernet connects the hosts. Also 4 UPS
units.

Hosts worlds largest art database80,000 images
in cooperation with San Francisco Fine Arts
MuseumTry www.thinker.org

17
Tertiary Disk HW Failure Experience

Reliability of hardware components (20 months)
7 IBM SCSI disk failures (out of 364, or 2)
6 IDE (internal) disk failures (out of 20, or
30)
1 SCSI controller failure (out of 44, or 2)
1 SCSI Cable (out of 39, or 3)
1 Ethernet card failure (out of 20, or 5)
1 Ethernet switch (out of 2, or 50)
3 enclosure power supplies (out of 92, or 3)
1 short power outage (covered by UPS)
Did not match expectationsSCSI disks more
reliable than SCSI cables!
Difference between simulation and prototypes

18
SCSI Time Outs Hardware Failures (m11)
SCSI Bus 0
19
Can we predict a disk failure?

Yes, look for Hardware Error messages
These messages lasted for 8 days between
8-17-98 and 8-25-98
On disk 9 there were
1763 Hardware Error Messages, and
297 SCSI Timed Out Messages
On 8-28-98 Disk 9 on SCSI Bus 0 of m11 was
fired, i.e. appeared it was about to fail, so
it was swapped

20
Lessons from Tertiary Disk Project

Maintenance is hard on current systems
Hard to know what is going on, who is to blame
Everything can break
Its not what you expect in advance
Follow rule of no single point of failure
Nothing fails fast
Eventually behaves bad enough that operator
fires poor performer, but it doesnt quit
Most failures may be predicted

21
Outline

Background Berkeley Approach to Systems
PostPC Motivation
PostPC Microprocessor IRAM
PostPC Infrastructure Motivation
PostPC Infrastructure ISTORE
Hardware Architecture
Software Architecture
Conclusions and Feedback

22
The problem space big data

Big demand for enormous amounts of data
today high-end enterprise and Internet
applications
enterprise decision-support, data mining
databases
online applications e-commerce, mail, web,
archives
future infrastructure services, richer data
computational storage back-ends for mobile
devices
more multimedia content
more use of historical data to provide better
services
Todays SMP server designs cant easily scale
Bigger scaling problems than performance!

23
The real scalability problems AME

Availability
systems should continue to meet quality of
service goals despite hardware and software
failures
Maintainability
systems should require only minimal ongoing human
administration, regardless of scale or complexity
Evolutionary Growth
systems should evolve gracefully in terms of
performance, maintainability, and availability as
they are grown/upgraded/expanded
These are problems at todays scales, and will
only get worse as systems grow

24
Principles for achieving AME (1)

No single points of failure
Redundancy everywhere
Performance robustness is more important than
peak performance
performance robustness implies that real-world
performance is comparable to best-case
performance
Performance can be sacrificed for improvements in
AME
resources should be dedicated to AME
compare biological systems spend gt 50 of
resources on maintenance
can make up performance by scaling system

25
Principles for achieving AME (2)

Introspection
reactive techniques to detect and adapt to
failures, workload variations, and system
evolution
proactive (preventative) techniques to anticipate
and avert problems before they happen

26
Hardware techniques (2)

No Central Processor Unit distribute processing
with storage
Serial lines, switches also growing with Moores
Law less need today to centralize vs. bus
oriented systems
Most storage servers limited by speed of CPUs
why does this make sense?
Why not amortize sheet metal, power, cooling
infrastructure for disk to add processor, memory,
and network?
If AME is important, must provide resources to be
used to help AME local processors responsible
for health and maintenance of their storage

27
ISTORE-1 hardware platform

80-node x86-based cluster, 1.4TB storage
cluster nodes are plug-and-play, intelligent,
network-attached storage bricks
a single field-replaceable unit to simplify
maintenance
each node is a full x86 PC w/256MB DRAM, 18GB
disk
more CPU than NAS fewer disks/node than cluster

Intelligent Disk Brick Portable PC CPU Pentium
II/266 DRAM Redundant NICs (4 100 Mb/s
links) Diagnostic Processor

ISTORE Chassis
80 nodes, 8 per tray
2 levels of switches
20 100 Mbit/s
2 1 Gbit/s
Environment Monitoring
UPS, redundant PS,
fans, heat and vibration sensors...

28
A glimpse into the future?

System-on-a-chip enables computer, memory,
redundant network interfaces without
significantly increasing size of disk
ISTORE HW in 5-7 years

building block 2006 MicroDrive integrated with
IRAM
9GB disk, 50 MB/sec from disk
connected via crossbar switch
10,000 nodes fit into one rack!
O(10,000) scale is our ultimate design point

29
Development techniques

Benchmarking
One reason for 1000X processor performance was
ability to measure (vs. debate) which is better
e.g., Which most important to improve clock
rate, clocks per instruction, or instructions
executed?
Need AME benchmarks
what gets measured gets done
benchmarks shape a field
quantification brings rigor

30
Example results multiple-faults
Windows 2000/IIS
Linux/ Apache

Windows reconstructs 3x faster than Linux
Windows reconstruction noticeably affects
application performance, while Linux
reconstruction does not

31
Software techniques (1)

Proactive introspection
Continuous online self-testing of HW and SW
in deployed systems!
goal is to shake out Heisenbugs before theyre
encountered in normal operation
needs data redundancy, node isolation, fault
injection
Techniques
fault injection triggering hardware and software
error handling paths to verify their
integrity/existence
stress testing push HW/SW to their limits
scrubbing periodic restoration of potentially
decaying hardware or software state
self-scrubbing data structures (like MVS)
ECC scrubbing for disks and memory

32
Conclusions (1) ISTORE

Availability, Maintainability, and Evolutionary
growth are key challenges for server systems
more important even than performance
ISTORE is investigating ways to bring AME to
large-scale, storage-intensive servers
via clusters of network-attached,
computationally-enhanced storage nodes running
distributed code
via hardware and software introspection
we are currently performing application studies
to investigate and compare techniques
Availability benchmarks a powerful tool?
revealed undocumented design decisions affecting
SW RAID availability on Linux and Windows 2000

33
Conclusions (2)

IRAM attractive for two Post-PC applications
because of low power, small size, high memory
bandwidth
Gadgets Embedded/Mobile devices
Infrastructure Intelligent Storage and Networks
PostPC infrastructure requires
New Goals Availability, Maintainability,
Evolution
New Principles Introspection, Performance
Robustness
New Techniques Isolation/fault insertion,
Software scrubbing
New Benchmarks measure, compare AME metrics

34
Berkeley Future work

IRAM fab and test chip
ISTORE
implement AME-enhancing techniques in a variety
of Internet, enterprise, and info retrieval
applications
select the best techniques and integrate into a
generic runtime system with AME API
add maintainability benchmarks
can we quantify administrative work needed to
maintain a certain level of availability?
Perhaps look at data security via encryption?
Even consider denial of service?