Title: Berkeley RAD Lab: Research in Internet-scale Computing Systems
1Berkeley RAD Lab Research in Internet-scale
Computing Systems
- Randy H. Katz
- randy_at_cs.berkeley.edu
- 28 March 2007
2Five Year Mission
- Observation Internet systems complex, fragile,
manually managed, evolving rapidly - To scale Ebay, must build Ebay-sized company
- To scale YouTube, get acquired by a Google-sized
company - Mission Enable a single person to create,
evolve, and operate the next-generation IT
service - The Fortune 1 Million by enabling rapid
innovation - Approach Create core technology spanning
systems, networking, and machine learning - Focus Making datacenter easier to manage to
enable one person to Analyze, Deploy, Operate a
scalable IT service
3Jan 07 Announcements by Microsoft and Google
- Microsoft and Google race to build next-gen DCs
- Microsoft announces a 550Â million DC in TX
- Google confirm plans for a 600 million site in
NC - Google two more DCs in SC may cost another 950
million -- about 150,000 computers each - Internet DCs are the next computing platform
- Power availability drives deployment decisions
4Datacenter is the Computer
- Google program Web search, Gmail,
- Google computer
-
Warehouse-sized facilities and workloads
likely more common Luiz Barrosos talk at RAD
Lab 12/11/06
Sun Project Blackbox10/17/06
- Compose datacenter from 20 ft. containers!
- Power/cooling for 200 KW
- External taps for electricity, network, cold
water - 250 Servers, 7 TB DRAM, or 1.5 PB disk in 2006
- 20 energy savings
- 1/10th? cost of a building
5Datacenter Programming System
- Ruby on Rails open source Web framework
optimized for programmer happiness and
sustainable productivity - Convention over configuration
- Scaffolding automatic, Web-based, UI to stored
data - Program the client write browser-side code in
Ruby, compile to Javascript - Duck Typing/Mix-Ins
- Proven Expressiveness
- Lines of code Java vs. RoR 31
- Lines of configuration Java vs. RoR 101
- More than a fad
- Java on Rails, Python on Rails,
6Datacenter Synthesis OS
- Synthesis change DC via written specification
- DC Spec Language compiled to logical
configuration - OS allocate, monitor, adjust during operation
- Director using machine learning, Drivers send
commands
7System StatisticalMachine Learning
- S2ML Strengths
- Handle SW churn Train vs. write the logic
- Beyond queuing models Learns how to handle/make
policy between steady states - Beyond control theory Coping with complex cost
functions - Discovery Finding trends, needles in data
haystack - Exploit cheap processing advances fast enough to
run online - S2ML as an integral component of DC OS
8Datacenter Monitoring
- S2ML needs data to analyze
- DC components come with sensors already
- CPUs (performance counters)
- Disks (SMART interface)
- Add sensors to software
- Log files
- D-trace for Solaris, Mac OS
- Trace 10K nodes within and between DCs
- Trace App-oriented path recording framework
- X-Trace Cross-layer/-domain including network
layer
9Middleboxes in Todays DC
- Middle boxes inserted on physical path
- Policy via plumbing
- Weakest link 1 point of failure, bottleneck
- Expensive to upgrade and introduce new
functionality - Identity-based Routing Layer policy not plumbing
to route classified packets to appropriate
middlebox services
High Speed Network
intrusion detector
load balancer
firewall
10First Milestone DC Energy Conservation
- DCs limited by power
- For each dollar spent on servers, add 0.48
(2005)/0.71 (2010) for power/cooling - 26B spent to power and cool servers in 2005
grows to 45B in 2010 - Attractive application of S2ML
- Bringing processor resources on/off-line Dynamic
environment, complex cost function, measurement-
driven decisions - Preserve 100 Service Level Agreements
- Dont hurt hardware reliability
- Then conserve energy
- Conserve energy and improve reliability
- MTTF stress of on/off cycle vs. benefits of
off-hours
11DC Networking and Power
- Within DC racks, network equipment often the
hottest components in the hot spot - Network opportunities for power reduction
- Transition to higher speed interconnects (10 Gbs)
at DC scales and densities - High function/high power assists embedded in
network element (e.g., TCAMs)
12Thermal Image of TypicalCluster Rack
M. K. Patterson, A. Pratt, P. Kumar, From UPS
to Silicon an end-to-end evaluation of
datacenter efficiency, Intel Corporation
13DC Networking and Power
- Selectively power down ports/portions of net
elements - Enhanced power-awareness in the network stack
- Power-aware routing and support for system
virtualization - Support for datacenter slice power down and
restart - Application and power-aware media access/control
- Dynamic selection of full/half duplex
- Directional asymmetry to save power, e.g.,
10Gb/s send, 100Mb/s receive - Power-awareness in applications and protocols
- Hard state (proxying), soft state (caching),
protocol/data streamlining for power as well
as b/w reduction - Power implications for topology design
- Tradeoffs in redundancy/high-availability vs.
power consumption - VLANs support for power-aware system
virtualization
14Why University Research?
- Imperative that future technical leaders learn to
deal with scale in modern computing systems - Draw on talented but inexperienced people
- Pick from worldwide talent pool for students
faculty - Dont know what they cant do
- Inexpensive -- allows focus on speculative ideas
- Mostly grad student salaries
- Faculty part time
- Tech Transfer engine
- Success Train students to go forth and
replicate - Promiscuous publication, including source code
- Ideal launching point for startups
15Why a New Funding Model?
- DARPA has exiting long-term research in
experimental computing systems - NSF swamped with proposals, yielding even more
conservative decisions - Community emphasis on theoretical vs.
experimental-oriented systems-building research - Alternative turn to Industry for funding
- Opportunity to shape research agenda
16New Funding Model
- 30 grad students 5 undergrads 6 faculty 4
staff - Foundation Companies 500K/yr for 5 years
- Google, Microsoft, Sun Microsystems
- Prefer founding partner technology in prototypes
- Many from company attend retreats, advise on
directions, head start on research results - Putting IP in Public Domain so partners use but
not sued - Large Affiliates 100K/yr Fujitsu, HP, IBM,
Siemens - Small Affiliates 50K/yr Nortel, Oracle
- State matching programs add 1M/year MICRO,
Discovery
17Summary
- DC is the Computer
- OS MLVM, Net Identity-based Routing, FS Web
Storage - Prog Sys RoR, Libraries Web Services
- Development Environment RAMP (simulator), AWE
(tester), Web 2.0 apps (benchmarks) - Debugging Environment Trace X-Trace
- Milestones
- DC Energy Conservation Reliability Enhancement
- Web 2.0 Apps in RoR
18Conclusions
- Develop-Analyze-Deploy-Operate modern systems at
Internet scale - Ruby-on-Rails for rapid applications development
- Declarative datacenter for correct-by-construction
system configuration and operation - Resource management by System Statistical Machine
Learning - Virtual Machines and Network Storage for flexible
resource allocation - Power reduction and reliability enhancement by
fast power-down/restart for processing nodes - Pervasive monitoring, tracing, simultation,
workload generation for runtime
analysis/operation
19Discussion Points
- Jointly designed datacenter testbed
- Mini-DC consisting of clusters, middleboxes, and
network equipment - Representative network topology
- Power-aware networking
- Evaluation of existing network elements
- Platform for investigating power reduction
schemes in network elements - Mutual information exchange
- Network storage architecture
- System Statistical Machine Learning
20Ruby on Rails DC PL
- Reasons to love Ruby on Rails
- Convention over Configuration
- Rails framework feature enabled by Ruby language
feature (Meta Object Programming) - Scaffolding automatic, Web based, (pedestrian)
User Interface to stored data - Program the client v 1.1 write browser-side code
in Ruby then compile to Javascript - Duck Typing/Mix-Ins
- Looks like string, responds like string, its a
string! - Mix-in improvement over multiple inheritance
21DC Monitoring
- Imagine a world where path information always
passed along so that can always track user
requests throughout system - Across apps, OS, network components and layers,
different computers on LAN, - Unique request ID
- Components touched
- Time of day
- Parent of this request
22Trace The 1 Solution
- Trace Goal Make Path Based Analysis have low
overhead so it can be always on inside datacenter
- Baseline path info collection with 1
overhead - Selectively add more local detail for specific
requests - Trace an end-to-end path recording framework
- Capture timestamp a unique requestID across all
system components - Top level log contains path traces
- Local logs contain additional detail, correlated
to path ID - Built on X-trace
23X-Trace comprehensive tracing through Layers,
Networks, Apps
- Trace connectivity of distributed components
- Capture causal connections between
requests/responses - Cross-layer
- Include network and middleware services such as
IP and LDAP - Cross-domain
- Multiple datacenters, composed services,
overlays, mash-ups - Control to individual administrative domains
- Network path sensor
- Put individual requests/responses, at different
network layers, in the context of an end-to-end
request
24ActuatorPolicy-based Routing Layer
- Assign ID to incoming packets (hash table
lookup) - Route based on IDs, not locations (i.e., not IP
addr) - Sets up logical paths without changing network
topology - Set of common middle boxes get single ID
- No single weakest link robust, scalable
throughput
Load- Balancer (IDLB)
Intrusion- Detection (IDID)
Service (IDS)
Firewall (IDF)
- So simple can be done in FPGA?
- More general
- than MPLS
Identity-based Routing Layer
25Other RAD Lab Projects
- Research Accelerator for MP (RAMP) DC
simulator - Automatic Workload Evaluator (AWE) DC tester
- Web Storage (GFS, Bigtable, Amazon S3) DC File
System - Web Services (MapReduce, Chubby) DC Libraries
261st Milestone DC Energy Conservation
- Good match to Machine Learning
- An optimization, so imperfection not catastrophic
- Lots of data to measure, dynamically changing
workload, complex cost function - Not steady state, so not queuing theory
- PGE trying to change behavior of datacenters
- Properly state problem
- Preserve 100 Service Level Agreements
- Dont hurt hardware reliability
- Then conserve energy
- Radical idea can conserving energy improve
hardware reliability?
271st Milestone Conserve Energy Improve
Reliability
- Improve component reliability?
- Disks Lifetimes measured in Powered On Hours,
but limited to 50,000 start/stop cycles - Idea, if turn off disks 50, then 50 annual
failure rate as long as dont exceed 50,000
start/stop cycles ( once per hour) - Integrated Circuits lifetimes affected by
Thermal Cycling (fast change bad),
Electromigration (turn off helps), Dielectric
Breakdown (turn off helps) - Idea If limited number of times cycled
thermally, could cut IC failure rate due EM, DB
by 30?
See A Case For Adaptive Datacenters To Conserve
Energy and Improve Reliability, Peter Bodik,
Michael Armbrust, Kevin Canini, Armando Fox,
Michael Jordan and David Patterson, 2007.
28RAD Lab 2.0 2nd Milestone Killer Web 2.0 Apps
- Demonstrate RAD Lab vision of 1 person creating
next great service and scale up - Where get example great apps, given grad students
creating the technology? - Use Undergraduate Computing Clubs to create
exciting apps in RoR using RAD Lab equipment,
technology - Armando Fox is RoR club leader
- Recruited Real World RoR programmer to develop
code and advise RoR computing club - 30 students joined club Jan 2007
- Hire best ugrads to build RoR apps in RAD Lab
29Miracle of University Research
- Talented (inexperienced) people
- Pick from worldwide talent pool for students
faculty - Dont know what they cant do
- Inexpensive
- Mostly grad student salaries (50k-75k/yr
overhead) - Faculty part time (75k-100k/yr including
overhead) - Berkeley Stanford Swing for Fences (R, not r or
D) - Even if hit a single, train next generation of
leaders - Technology Transfer engine
- Success Train students to go forth multiply
- Publish everything, including source code
- Ideal launching point for startups
30Chance to Partner with a Great University
- Chance to Work on the Next Great Thing
- US News World Report ranking of CS Systems
universities 1 Berkeley, 2 CMU, 2 MIT, 4
Stanford - Berkeley Stanford some the top suppliers of
systems students to industry (and academia) - National Academy study mentions Berkeley in 7 of
19 1B industries from IT research, Stanford 4
times - Timesharing (SDS 940), Client-Server Computing
(BSD Unix), Graphics, Entertainment, Internet,
LANs, Workstations (SUN), GUI, VLSI Design
(Spice), RISC (ARM, MIPS, SPARC), Relational DB
(Ingres/Postgres), Parallel DB, Data Mining,
Parallel Computing, RAID, Portable Communication
(BWRC), WWW, Speech Recognition, Broadband last
mile (DSL)
31Years to gt 1B IT industry from Research
StartNational Research Council Computer Science
Telecommunications Board, 2003
gt
32Physical RAD Lab Radical Collocation
- Innovation from spontaneous meetings of people
with different areas of expertise - Communication inversely proportional to distance
- Almost never if gt 100 feet or on different floor
- Everyone (including faculty) in open offices
- Great Meeting Rooms, Ubiquitous Whiteboards
- Technology to concentrate Cell phone, Ipod,
laptop - Google Physical RAD Lab to learn more
33 Example of Next Great Thing
- Berkeley Adaptive Distributed systems Laboratory
(RAD Lab) - Founded 12/2005 with Google, Microsoft, Sun as
founding partners - Armando Fox, Randy Katz, Mike Jordan, Anthony
Joseph, Dave Patterson, Scott Shenker, Ion Stoica - Google RAD Lab to learn more
34RAD Lab Goal Enable Next Ebay
- Create technology to enable next great Internet
Service to grow rapidly without growing the
organization rapidly - Machine Learning Systems is secret sauce
- Position The datacenter is the computer
- Leverage point is simplifying datacenter
management - What is the programming language of the
datacenter? - What is CAD for the datacenter?
- What is the OS for the datacenter?