Title: Computers for the PostPC Era
1Computers for the Post-PC Era
- Aaron Brown, Jim Beck, Rich Martin, David
Oppenheimer, Kathy Yelick, and David Patterson - http//iram.cs.berkeley.edu/istore
- 2000 Grad Visit Day
2Berkeley Approach to Systems
- Find an important problem crossing HW/SW
Interface, with HW/SW prototype at end, typically
as part of graduate courses - Assemble a band of 3-6 faculty, 12-20 grad
students, 1-3 staff to tackle it over 4 years - Meet twice a year for 3-day retreats with invited
outsiders - Builds team spirit
- Get advice on direction, and change course
- Offers milestones for project stages
- Grad students give 6 to 8 talks ? Great Speakers
- Write papers, go to conferences, get PhDs, jobs
- End of project party, reshuffle faculty, go to 1
3For Example, Projects I Have Worked On
- RISC I,II
- Sequin, Ousterhout (CAD)
- SOAR (Smalltalk On A RISC) Ousterhout (CAD)
- SPUR (Symbolic Processing Using RISCs)
- Fateman, Hilfinger, Hodges, Katz, Ousterhout
- RAID I,II (Redundant Array of Inexp. Disks)
- Katz, Ousterhout, Stonebraker
- NOW I,II (Network of Workstations), (TD)
- Culler, Anderson
- IRAM I (Intelligent RAM)
- Yelick, Kubiatowicz, Wawrzynek
- ISTORE I,II (Intelligent Storage)
- Yelick, Kubiatowicz
4Symbolic Processing Using RISCs 85-89
- Before Commercial RISC chips
- Built Workstation Multiprocessor and Operating
System from scratch(!) - Sprite Operating System
- 3 chips Processor, Cache Controller, FPU
- Coined term snopping cache protocol
- 3Cs cache miss compulsory, capacity, conflict
5Group Photo (in souvenir jackets)
Jim Larus, Wisconsin, M/S
George Taylor, Founder, ?
David Wood,Wisconsin
Dave Lee Founder Si. Image
John Ouster- hout Founder, Scriptics
Ben Zorn Colorado, M/S
Mark Hill Wisc.
Mendel Rosen- blum, Stanford, Founder VMware
Susan Eggers Wash-ington
Brent Welch Founder, Scriptics
Shing Kong Transmeta
Garth Gibson CMU, Founder ?
- See www.cs.berkeley.edu/Projects/ARC to learn
more about Berkeley Systems
6SPUR 10 Year Reunion, January 99
- Everyone from North America came!
- 19 PhDs 9 to Academia
- 8/9 got tenure, 2 full professors (already)
- 2 Romme fellows (3rd, 4th at Wisconsin)
- 3 NSF Presidential Young Investigator Winners
- 2 ACM Dissertation Awards
- They in turn produced 30 PhDs (1/99)
- 10 to Industry
- Founders of 5 startups, (1 failed)
- 2 Department heads (ATT Bell Labs, Microsoft)
- Very successful group SPUR Project gave them a
taste of success, lifelong friends,
7Network of Workstations (NOW) 94 -98
- Leveraging commodity workstations and OSes to
harness the power of clustered machines connected
via high-speed switched networks - Construction of HW/SW prototypes NOW-1 with 32
SuperSPARCs, and NOW-2 with 100 UltraSPARC 1s - NOW-2 cluster held the world record for the
fastest Disk-to-Disk Sort for 2 years, 1997-1999 - NOW-2 cluster 1st to crack the 40-bit key as part
of a key-cracking challenge offered by RSA, 1997 - NOW-2 made list of Top 200 supercomputers 1997
- NOW a foundation of Virtual Interface (VI)
Architecture, standard allows protected, direct
user-level access to network, by Compaq, Intel,
M/S - NOW technology led directly to one Internet
startup company (Inktomi), many other Internet
companies use cluster technology
8Network of Workstations (NOW) 94 -98
- 12 PhDs. Note that 3/4 of them went into
academia, and that 1/3 are female - Andrea Arpaci-Desseau, Asst. Professor,
Wisconsin, Madison - Remzi Arpaci-Desseau, Asst. Professor, Wisconsin,
Madison - Mike Dahlin, Asst. Professor, University of
Texas, Austin - Jeanna Neefe Matthews, Asst. Professor, Clarkson
Univ. - Douglas Ghormley, Researcher, Los Alamos
National Labs - Kim Keeton, Researcher, Hewlett Packard Labs
- Steve Lumetta, Assistant Professor, Illinois
- Alan Mainwaring, Researcher, Sun Microsystems
Labs - Rich Martin, Assistant Professor, Rutgers
University - Nisha Talagala, Researcher, Network Storage, Sun
Micro. - Amin Vahdat, Assistant Professor, Duke University
- Randy Wang, Assistant Professor, Princeton
University
9Research in Berkeley Courses
- RISC, SPUR, RAID, NOW, IRAM, ISTORE all started
in advanced graduate courses - Make transition from undergraduate student to
researcher in first-year graduate courses - First year architecture, operating systems
courses select topic, do research, write paper,
give talk - Prof meets each team 1-on-1 3 times, TA help
- Some papers get submitted and published
- Requires class size lt 40 (e.g., Berkeley)
- If 1st year course size 100 students gt cannot
do research in grad courses 1st year or so - If school offers combined BS/MS (e.g., MIT) or
professional MS via TV broadcast (e.g.,
Stanford), then effective class size 150-250
10Outline
- Background Berkeley Approach to Systems
- PostPC Motivation
- PostPC Microprocessor IRAM
- PostPC Infrastructure Motivation
- PostPC Infrastructure ISTORE
- Hardware Architecture
- Software Architecture
- Conclusions and Feedback
11Perspective on Post-PC Era
- PostPC Era will be driven by 2 technologies
- 1) GadgetsTiny Embedded or Mobile Devices
- ubiquitous in everything
- e.g., successor to PDA, cell phone, wearable
computers - 2) Infrastructure to Support such Devices
- e.g., successor to Big Fat Web Servers, Database
Servers
12Intelligent RAM IRAM
- Microprocessor DRAM on a single chip
- 10X capacity vs. SRAM
- on-chip memory latency 5-10X, bandwidth 50-100X
- improve energy efficiency 2X-4X (no off-chip
bus) - serial I/O 5-10X v. buses
- smaller board area/volume
- IRAM advantages extend to
- a single chip system
- a building block for larger systems
13Revive Vector Architecture
- Cost 1M each?
- Low latency, high BW memory system?
- Code density?
- Compilers?
- Performance?
- Power/Energy?
- Limited to scientific applications?
- Single-chip CMOS MPU/IRAM
- IRAM
- Much smaller than VLIW
- For sale, mature (gt20 years)(We retarget Cray
compilers) - Easy scale speed with technology
- Parallel to save energy, keep performance
- Multimedia apps vectorizable too N64b, 2N32b,
4N16b
14VIRAM-1 System on a Chip
- Prototype scheduled for end of Summer 2000
- 0.18 um EDL process
- 16 MB DRAM, 8 banks
- MIPS Scalar core and
caches _at_ 200 MHz - 4 64-bit vector unit
pipelines _at_ 200 MHz - 4 100 MB parallel I/O lines
- 17x17 mm, 2 Watts
- 25.6 GB/s memory (6.4 GB/s per direction
and per Xbar) - 1.6 Gflops (64-bit), 6.4 GOPs (16-bit)
- 140 M transistors (gt Intel?)
Memory (64 Mbits / 8 MBytes)
Xbar
I/O
Memory (64 Mbits / 8 MBytes)
15Outline
- PostPC Infrastructure Motivation and Background
Berkeleys Past - PostPC Motivation
- PostPC Device Microprocessor IRAM
- PostPC Infrastructure Motivation
- ISTORE Goals
- Hardware Architecture
- Software Architecture
- Conclusions and Feedback
16Background Tertiary Disk (part of NOW)
- Tertiary Disk (1997)
- cluster of 20 PCs hosting 364 3.5 IBM disks (8.4
GB) in 7 19x 33 x 84 racks, or 3 TB. The
200MHz, 96 MB P6 PCs run FreeBSD and a switched
100Mb/s Ethernet connects the hosts. Also 4 UPS
units.
- Hosts worlds largest art database80,000 images
in cooperation with San Francisco Fine Arts
MuseumTry www.thinker.org
17Tertiary Disk HW Failure Experience
- Reliability of hardware components (20 months)
- 7 IBM SCSI disk failures (out of 364, or 2)
- 6 IDE (internal) disk failures (out of 20, or
30) - 1 SCSI controller failure (out of 44, or 2)
- 1 SCSI Cable (out of 39, or 3)
- 1 Ethernet card failure (out of 20, or 5)
- 1 Ethernet switch (out of 2, or 50)
- 3 enclosure power supplies (out of 92, or 3)
- 1 short power outage (covered by UPS)
- Did not match expectationsSCSI disks more
reliable than SCSI cables! - Difference between simulation and prototypes
18SCSI Time Outs Hardware Failures (m11)
SCSI Bus 0
19Can we predict a disk failure?
- Yes, look for Hardware Error messages
- These messages lasted for 8 days between
- 8-17-98 and 8-25-98
- On disk 9 there were
- 1763 Hardware Error Messages, and
- 297 SCSI Timed Out Messages
- On 8-28-98 Disk 9 on SCSI Bus 0 of m11 was
fired, i.e. appeared it was about to fail, so
it was swapped
20Lessons from Tertiary Disk Project
- Maintenance is hard on current systems
- Hard to know what is going on, who is to blame
- Everything can break
- Its not what you expect in advance
- Follow rule of no single point of failure
- Nothing fails fast
- Eventually behaves bad enough that operator
fires poor performer, but it doesnt quit - Most failures may be predicted
21Outline
- Background Berkeley Approach to Systems
- PostPC Motivation
- PostPC Microprocessor IRAM
- PostPC Infrastructure Motivation
- PostPC Infrastructure ISTORE
- Hardware Architecture
- Software Architecture
- Conclusions and Feedback
22The problem space big data
- Big demand for enormous amounts of data
- today high-end enterprise and Internet
applications - enterprise decision-support, data mining
databases - online applications e-commerce, mail, web,
archives - future infrastructure services, richer data
- computational storage back-ends for mobile
devices - more multimedia content
- more use of historical data to provide better
services - Todays SMP server designs cant easily scale
- Bigger scaling problems than performance!
23The real scalability problems AME
- Availability
- systems should continue to meet quality of
service goals despite hardware and software
failures - Maintainability
- systems should require only minimal ongoing human
administration, regardless of scale or complexity - Evolutionary Growth
- systems should evolve gracefully in terms of
performance, maintainability, and availability as
they are grown/upgraded/expanded - These are problems at todays scales, and will
only get worse as systems grow
24Principles for achieving AME (1)
- No single points of failure
- Redundancy everywhere
- Performance robustness is more important than
peak performance - performance robustness implies that real-world
performance is comparable to best-case
performance - Performance can be sacrificed for improvements in
AME - resources should be dedicated to AME
- compare biological systems spend gt 50 of
resources on maintenance - can make up performance by scaling system
25Principles for achieving AME (2)
- Introspection
- reactive techniques to detect and adapt to
failures, workload variations, and system
evolution - proactive (preventative) techniques to anticipate
and avert problems before they happen
26Hardware techniques (2)
- No Central Processor Unit distribute processing
with storage - Serial lines, switches also growing with Moores
Law less need today to centralize vs. bus
oriented systems - Most storage servers limited by speed of CPUs
why does this make sense? - Why not amortize sheet metal, power, cooling
infrastructure for disk to add processor, memory,
and network? - If AME is important, must provide resources to be
used to help AME local processors responsible
for health and maintenance of their storage
27ISTORE-1 hardware platform
- 80-node x86-based cluster, 1.4TB storage
- cluster nodes are plug-and-play, intelligent,
network-attached storage bricks - a single field-replaceable unit to simplify
maintenance - each node is a full x86 PC w/256MB DRAM, 18GB
disk - more CPU than NAS fewer disks/node than cluster
Intelligent Disk Brick Portable PC CPU Pentium
II/266 DRAM Redundant NICs (4 100 Mb/s
links) Diagnostic Processor
- ISTORE Chassis
- 80 nodes, 8 per tray
- 2 levels of switches
- 20 100 Mbit/s
- 2 1 Gbit/s
- Environment Monitoring
- UPS, redundant PS,
- fans, heat and vibration sensors...
28A glimpse into the future?
- System-on-a-chip enables computer, memory,
redundant network interfaces without
significantly increasing size of disk - ISTORE HW in 5-7 years
- building block 2006 MicroDrive integrated with
IRAM - 9GB disk, 50 MB/sec from disk
- connected via crossbar switch
- 10,000 nodes fit into one rack!
- O(10,000) scale is our ultimate design point
29Development techniques
- Benchmarking
- One reason for 1000X processor performance was
ability to measure (vs. debate) which is better - e.g., Which most important to improve clock
rate, clocks per instruction, or instructions
executed? - Need AME benchmarks
- what gets measured gets done
- benchmarks shape a field
- quantification brings rigor
30Example results multiple-faults
Windows 2000/IIS
Linux/ Apache
- Windows reconstructs 3x faster than Linux
- Windows reconstruction noticeably affects
application performance, while Linux
reconstruction does not
31Software techniques (1)
- Proactive introspection
- Continuous online self-testing of HW and SW
- in deployed systems!
- goal is to shake out Heisenbugs before theyre
encountered in normal operation - needs data redundancy, node isolation, fault
injection - Techniques
- fault injection triggering hardware and software
error handling paths to verify their
integrity/existence - stress testing push HW/SW to their limits
- scrubbing periodic restoration of potentially
decaying hardware or software state - self-scrubbing data structures (like MVS)
- ECC scrubbing for disks and memory
32Conclusions (1) ISTORE
- Availability, Maintainability, and Evolutionary
growth are key challenges for server systems - more important even than performance
- ISTORE is investigating ways to bring AME to
large-scale, storage-intensive servers - via clusters of network-attached,
computationally-enhanced storage nodes running
distributed code - via hardware and software introspection
- we are currently performing application studies
to investigate and compare techniques - Availability benchmarks a powerful tool?
- revealed undocumented design decisions affecting
SW RAID availability on Linux and Windows 2000
33Conclusions (2)
- IRAM attractive for two Post-PC applications
because of low power, small size, high memory
bandwidth - Gadgets Embedded/Mobile devices
- Infrastructure Intelligent Storage and Networks
- PostPC infrastructure requires
- New Goals Availability, Maintainability,
Evolution - New Principles Introspection, Performance
Robustness - New Techniques Isolation/fault insertion,
Software scrubbing - New Benchmarks measure, compare AME metrics
34Berkeley Future work
- IRAM fab and test chip
- ISTORE
- implement AME-enhancing techniques in a variety
of Internet, enterprise, and info retrieval
applications - select the best techniques and integrate into a
generic runtime system with AME API - add maintainability benchmarks
- can we quantify administrative work needed to
maintain a certain level of availability? - Perhaps look at data security via encryption?
- Even consider denial of service?