Berkeley RAD Lab: Research in Internet-scale Computing Systems - PowerPoint PPT Presentation

1 / 34
About This Presentation
Title:

Berkeley RAD Lab: Research in Internet-scale Computing Systems

Description:

Five Year Mission Observation: Internet systems complex, fragile, manually managed, evolving rapidly To scale Ebay, must build Ebay-sized company To scale YouTube, ... – PowerPoint PPT presentation

Number of Views:41
Avg rating:3.0/5.0
Slides: 35
Provided by: csBerkel
Category:

less

Transcript and Presenter's Notes

Title: Berkeley RAD Lab: Research in Internet-scale Computing Systems


1
Berkeley RAD Lab Research in Internet-scale
Computing Systems
  • Randy H. Katz
  • randy_at_cs.berkeley.edu
  • 28 March 2007

2
Five Year Mission
  • Observation Internet systems complex, fragile,
    manually managed, evolving rapidly
  • To scale Ebay, must build Ebay-sized company
  • To scale YouTube, get acquired by a Google-sized
    company
  • Mission Enable a single person to create,
    evolve, and operate the next-generation IT
    service
  • The Fortune 1 Million by enabling rapid
    innovation
  • Approach Create core technology spanning
    systems, networking, and machine learning
  • Focus Making datacenter easier to manage to
    enable one person to Analyze, Deploy, Operate a
    scalable IT service

3
Jan 07 Announcements by Microsoft and Google
  • Microsoft and Google race to build next-gen DCs
  • Microsoft announces a 550 million DC in TX
  • Google confirm plans for a 600 million site in
    NC
  • Google two more DCs in SC may cost another 950
    million -- about 150,000 computers each
  • Internet DCs are the next computing platform
  • Power availability drives deployment decisions

4
Datacenter is the Computer
  • Google program Web search, Gmail,
  • Google computer

Warehouse-sized facilities and workloads
likely more common Luiz Barrosos talk at RAD
Lab 12/11/06
Sun Project Blackbox10/17/06
  • Compose datacenter from 20 ft. containers!
  • Power/cooling for 200 KW
  • External taps for electricity, network, cold
    water
  • 250 Servers, 7 TB DRAM, or 1.5 PB disk in 2006
  • 20 energy savings
  • 1/10th? cost of a building

5
Datacenter Programming System
  • Ruby on Rails open source Web framework
    optimized for programmer happiness and
    sustainable productivity
  • Convention over configuration
  • Scaffolding automatic, Web-based, UI to stored
    data
  • Program the client write browser-side code in
    Ruby, compile to Javascript
  • Duck Typing/Mix-Ins
  • Proven Expressiveness
  • Lines of code Java vs. RoR 31
  • Lines of configuration Java vs. RoR 101
  • More than a fad
  • Java on Rails, Python on Rails,

6
Datacenter Synthesis OS
  • Synthesis change DC via written specification
  • DC Spec Language compiled to logical
    configuration
  • OS allocate, monitor, adjust during operation
  • Director using machine learning, Drivers send
    commands

7
System StatisticalMachine Learning
  • S2ML Strengths
  • Handle SW churn Train vs. write the logic
  • Beyond queuing models Learns how to handle/make
    policy between steady states
  • Beyond control theory Coping with complex cost
    functions
  • Discovery Finding trends, needles in data
    haystack
  • Exploit cheap processing advances fast enough to
    run online
  • S2ML as an integral component of DC OS

8
Datacenter Monitoring
  • S2ML needs data to analyze
  • DC components come with sensors already
  • CPUs (performance counters)
  • Disks (SMART interface)
  • Add sensors to software
  • Log files
  • D-trace for Solaris, Mac OS
  • Trace 10K nodes within and between DCs
  • Trace App-oriented path recording framework
  • X-Trace Cross-layer/-domain including network
    layer

9
Middleboxes in Todays DC
  • Middle boxes inserted on physical path
  • Policy via plumbing
  • Weakest link 1 point of failure, bottleneck
  • Expensive to upgrade and introduce new
    functionality
  • Identity-based Routing Layer policy not plumbing
    to route classified packets to appropriate
    middlebox services

High Speed Network
intrusion detector
load balancer
firewall
10
First Milestone DC Energy Conservation
  • DCs limited by power
  • For each dollar spent on servers, add 0.48
    (2005)/0.71 (2010) for power/cooling
  • 26B spent to power and cool servers in 2005
    grows to 45B in 2010
  • Attractive application of S2ML
  • Bringing processor resources on/off-line Dynamic
    environment, complex cost function, measurement-
    driven decisions
  • Preserve 100 Service Level Agreements
  • Dont hurt hardware reliability
  • Then conserve energy
  • Conserve energy and improve reliability
  • MTTF stress of on/off cycle vs. benefits of
    off-hours

11
DC Networking and Power
  • Within DC racks, network equipment often the
    hottest components in the hot spot
  • Network opportunities for power reduction
  • Transition to higher speed interconnects (10 Gbs)
    at DC scales and densities
  • High function/high power assists embedded in
    network element (e.g., TCAMs)

12
Thermal Image of TypicalCluster Rack
M. K. Patterson, A. Pratt, P. Kumar, From UPS
to Silicon an end-to-end evaluation of
datacenter efficiency, Intel Corporation
13
DC Networking and Power
  • Selectively power down ports/portions of net
    elements
  • Enhanced power-awareness in the network stack
  • Power-aware routing and support for system
    virtualization
  • Support for datacenter slice power down and
    restart
  • Application and power-aware media access/control
  • Dynamic selection of full/half duplex
  • Directional asymmetry to save power, e.g.,
    10Gb/s send, 100Mb/s receive
  • Power-awareness in applications and protocols
  • Hard state (proxying), soft state (caching),
    protocol/data streamlining for power as well
    as b/w reduction
  • Power implications for topology design
  • Tradeoffs in redundancy/high-availability vs.
    power consumption
  • VLANs support for power-aware system
    virtualization

14
Why University Research?
  • Imperative that future technical leaders learn to
    deal with scale in modern computing systems
  • Draw on talented but inexperienced people
  • Pick from worldwide talent pool for students
    faculty
  • Dont know what they cant do
  • Inexpensive -- allows focus on speculative ideas
  • Mostly grad student salaries
  • Faculty part time
  • Tech Transfer engine
  • Success Train students to go forth and
    replicate
  • Promiscuous publication, including source code
  • Ideal launching point for startups

15
Why a New Funding Model?
  • DARPA has exiting long-term research in
    experimental computing systems
  • NSF swamped with proposals, yielding even more
    conservative decisions
  • Community emphasis on theoretical vs.
    experimental-oriented systems-building research
  • Alternative turn to Industry for funding
  • Opportunity to shape research agenda

16
New Funding Model
  • 30 grad students 5 undergrads 6 faculty 4
    staff
  • Foundation Companies 500K/yr for 5 years
  • Google, Microsoft, Sun Microsystems
  • Prefer founding partner technology in prototypes
  • Many from company attend retreats, advise on
    directions, head start on research results
  • Putting IP in Public Domain so partners use but
    not sued
  • Large Affiliates 100K/yr Fujitsu, HP, IBM,
    Siemens
  • Small Affiliates 50K/yr Nortel, Oracle
  • State matching programs add 1M/year MICRO,
    Discovery

17
Summary
  • DC is the Computer
  • OS MLVM, Net Identity-based Routing, FS Web
    Storage
  • Prog Sys RoR, Libraries Web Services
  • Development Environment RAMP (simulator), AWE
    (tester), Web 2.0 apps (benchmarks)
  • Debugging Environment Trace X-Trace
  • Milestones
  • DC Energy Conservation Reliability Enhancement
  • Web 2.0 Apps in RoR

18
Conclusions
  • Develop-Analyze-Deploy-Operate modern systems at
    Internet scale
  • Ruby-on-Rails for rapid applications development
  • Declarative datacenter for correct-by-construction
    system configuration and operation
  • Resource management by System Statistical Machine
    Learning
  • Virtual Machines and Network Storage for flexible
    resource allocation
  • Power reduction and reliability enhancement by
    fast power-down/restart for processing nodes
  • Pervasive monitoring, tracing, simultation,
    workload generation for runtime
    analysis/operation

19
Discussion Points
  • Jointly designed datacenter testbed
  • Mini-DC consisting of clusters, middleboxes, and
    network equipment
  • Representative network topology
  • Power-aware networking
  • Evaluation of existing network elements
  • Platform for investigating power reduction
    schemes in network elements
  • Mutual information exchange
  • Network storage architecture
  • System Statistical Machine Learning

20
Ruby on Rails DC PL
  • Reasons to love Ruby on Rails
  • Convention over Configuration
  • Rails framework feature enabled by Ruby language
    feature (Meta Object Programming)
  • Scaffolding automatic, Web based, (pedestrian)
    User Interface to stored data
  • Program the client v 1.1 write browser-side code
    in Ruby then compile to Javascript
  • Duck Typing/Mix-Ins
  • Looks like string, responds like string, its a
    string!
  • Mix-in improvement over multiple inheritance

21
DC Monitoring
  • Imagine a world where path information always
    passed along so that can always track user
    requests throughout system
  • Across apps, OS, network components and layers,
    different computers on LAN,
  • Unique request ID
  • Components touched
  • Time of day
  • Parent of this request

22
Trace The 1 Solution
  • Trace Goal Make Path Based Analysis have low
    overhead so it can be always on inside datacenter
  • Baseline path info collection with 1
    overhead
  • Selectively add more local detail for specific
    requests
  • Trace an end-to-end path recording framework
  • Capture timestamp a unique requestID across all
    system components
  • Top level log contains path traces
  • Local logs contain additional detail, correlated
    to path ID
  • Built on X-trace

23
X-Trace comprehensive tracing through Layers,
Networks, Apps
  • Trace connectivity of distributed components
  • Capture causal connections between
    requests/responses
  • Cross-layer
  • Include network and middleware services such as
    IP and LDAP
  • Cross-domain
  • Multiple datacenters, composed services,
    overlays, mash-ups
  • Control to individual administrative domains
  • Network path sensor
  • Put individual requests/responses, at different
    network layers, in the context of an end-to-end
    request

24
ActuatorPolicy-based Routing Layer
  • Assign ID to incoming packets (hash table
    lookup)
  • Route based on IDs, not locations (i.e., not IP
    addr)
  • Sets up logical paths without changing network
    topology
  • Set of common middle boxes get single ID
  • No single weakest link robust, scalable
    throughput

Load- Balancer (IDLB)
Intrusion- Detection (IDID)
Service (IDS)
Firewall (IDF)
  • So simple can be done in FPGA?
  • More general
  • than MPLS

Identity-based Routing Layer
25
Other RAD Lab Projects
  • Research Accelerator for MP (RAMP) DC
    simulator
  • Automatic Workload Evaluator (AWE) DC tester
  • Web Storage (GFS, Bigtable, Amazon S3) DC File
    System
  • Web Services (MapReduce, Chubby) DC Libraries

26
1st Milestone DC Energy Conservation
  • Good match to Machine Learning
  • An optimization, so imperfection not catastrophic
  • Lots of data to measure, dynamically changing
    workload, complex cost function
  • Not steady state, so not queuing theory
  • PGE trying to change behavior of datacenters
  • Properly state problem
  • Preserve 100 Service Level Agreements
  • Dont hurt hardware reliability
  • Then conserve energy
  • Radical idea can conserving energy improve
    hardware reliability?

27
1st Milestone Conserve Energy Improve
Reliability
  • Improve component reliability?
  • Disks Lifetimes measured in Powered On Hours,
    but limited to 50,000 start/stop cycles
  • Idea, if turn off disks 50, then 50 annual
    failure rate as long as dont exceed 50,000
    start/stop cycles ( once per hour)
  • Integrated Circuits lifetimes affected by
    Thermal Cycling (fast change bad),
    Electromigration (turn off helps), Dielectric
    Breakdown (turn off helps)
  • Idea If limited number of times cycled
    thermally, could cut IC failure rate due EM, DB
    by 30?

See A Case For Adaptive Datacenters To Conserve
Energy and Improve Reliability, Peter Bodik,
Michael Armbrust, Kevin Canini, Armando Fox,
Michael Jordan and David Patterson, 2007.
28
RAD Lab 2.0 2nd Milestone Killer Web 2.0 Apps
  • Demonstrate RAD Lab vision of 1 person creating
    next great service and scale up
  • Where get example great apps, given grad students
    creating the technology?
  • Use Undergraduate Computing Clubs to create
    exciting apps in RoR using RAD Lab equipment,
    technology
  • Armando Fox is RoR club leader
  • Recruited Real World RoR programmer to develop
    code and advise RoR computing club
  • 30 students joined club Jan 2007
  • Hire best ugrads to build RoR apps in RAD Lab

29
Miracle of University Research
  • Talented (inexperienced) people
  • Pick from worldwide talent pool for students
    faculty
  • Dont know what they cant do
  • Inexpensive
  • Mostly grad student salaries (50k-75k/yr
    overhead)
  • Faculty part time (75k-100k/yr including
    overhead)
  • Berkeley Stanford Swing for Fences (R, not r or
    D)
  • Even if hit a single, train next generation of
    leaders
  • Technology Transfer engine
  • Success Train students to go forth multiply
  • Publish everything, including source code
  • Ideal launching point for startups

30
Chance to Partner with a Great University
  • Chance to Work on the Next Great Thing
  • US News World Report ranking of CS Systems
    universities 1 Berkeley, 2 CMU, 2 MIT, 4
    Stanford
  • Berkeley Stanford some the top suppliers of
    systems students to industry (and academia)
  • National Academy study mentions Berkeley in 7 of
    19 1B industries from IT research, Stanford 4
    times
  • Timesharing (SDS 940), Client-Server Computing
    (BSD Unix), Graphics, Entertainment, Internet,
    LANs, Workstations (SUN), GUI, VLSI Design
    (Spice), RISC (ARM, MIPS, SPARC), Relational DB
    (Ingres/Postgres), Parallel DB, Data Mining,
    Parallel Computing, RAID, Portable Communication
    (BWRC), WWW, Speech Recognition, Broadband last
    mile (DSL)

31
Years to gt 1B IT industry from Research
StartNational Research Council Computer Science
Telecommunications Board, 2003
gt
32
Physical RAD Lab Radical Collocation
  • Innovation from spontaneous meetings of people
    with different areas of expertise
  • Communication inversely proportional to distance
  • Almost never if gt 100 feet or on different floor
  • Everyone (including faculty) in open offices
  • Great Meeting Rooms, Ubiquitous Whiteboards
  • Technology to concentrate Cell phone, Ipod,
    laptop
  • Google Physical RAD Lab to learn more

33
Example of Next Great Thing
  • Berkeley Adaptive Distributed systems Laboratory
    (RAD Lab)
  • Founded 12/2005 with Google, Microsoft, Sun as
    founding partners
  • Armando Fox, Randy Katz, Mike Jordan, Anthony
    Joseph, Dave Patterson, Scott Shenker, Ion Stoica
  • Google RAD Lab to learn more

34
RAD Lab Goal Enable Next Ebay
  • Create technology to enable next great Internet
    Service to grow rapidly without growing the
    organization rapidly
  • Machine Learning Systems is secret sauce
  • Position The datacenter is the computer
  • Leverage point is simplifying datacenter
    management
  • What is the programming language of the
    datacenter?
  • What is CAD for the datacenter?
  • What is the OS for the datacenter?
Write a Comment
User Comments (0)
About PowerShow.com