Title: S06: Open-Source Stack for Cloud Computing
1 S06 Open-Source Stack for Cloud Computing
Milind Bhandarkar Yahoo!
Michael Kozuch Intel
Michael Ryan Intel
Richard Gass Intel
2Agenda
- Sessions
- (A) Introduction 8.30-9.00
- (B) Hadoop 9.00-10.00
- Break 10.00-10.30
- Hadoop/Pig 10.30-1200
- Lunch 12.00-1.30
- (C) Pig 1.30-2.00
- (D) Tashi 2.00-3.00
- Break 3.00-3.30
- PRS 3.30-4.45
- Wrapup 4.45-5.00
- Speaker intros
- Motivation
- Open Cirrus
- Open Cirrus software stack
- Getting involved
Zoni
3Session AIntroduction
4Michael Kozuch (Intro)
- Michael Kozuch is a Principal Engineer with Intel
Labs Pittsburgh and manager of the ILP Systems
Research and Engineering group - Manages the Intel Open Cirrus cluster and is the
PI for the Tashi research project - Michael is a 12-year veteran of Intel and
contributed to the development of Intels VT and
TXT technologies - He has published 25 scientific papers and 20
patents
5Milind Bhandarkar (Hadoop)
- Lead Yahoo! Grid Solutions Team since June 2005
- Contributor to Hadoop since January 2006
- Trained 1000 Hadoop users at Yahoo! elsewhere
- 20 years of experience in Parallel Programming
6Michael Ryan (Tashi)
- Michael is a research engineer with Intel Labs
Pittsburgh - Lead developer for Tashi
- Serves as sysadmin for the Intel Open Cirrus site
- Coordinates the Global Monitoring service for
Open Cirrus
7Richard Gass (Zoni)
- Richard is currently a research engineer with
Intel Labs Pittsburgh - Lead developer for Zoni
- Serves as sysadmin for the Intel OpenCirrus site
- Richard has published 9 scientific papers and is
also an (imminent) PhD candidate with University
Pierre and Marie Curie LIP6 in Paris
8Motivation
9Why Open and Cloud makes sense
- Cloud Computing is a new, critical technology
- Efficiency Admin costs aggregated
- Scalability From 1 to 1000 servers in 10 sec.
flat - Empowerment Anyone can buy a cluster
- Open Communities enable rapid innovation
- Exchange of ideas Knowledge grows
- Constructive Darwinism Best tools survive/evolve
- Empowerment Anyone can build a LAMP stack
Rapidly developing and deploying innovative
computing technologies
10Research Interest Big Data
- Interesting applications are data hungry
- The data grows over time
- The data is immobile
- 100 TB _at_ 1Gbps 10 days
- Compute comes to the data
- Big Data clusters are the new libraries
(Data-Rich Computing theme proposal. J.
Campbell, et al., 2007)
The value of a cluster is its data
11Open Cirrus
12Open Cirrus Cloud Computing Testbed
Collaboration between industry and academia,
sharing
- hardware infrastructure
- software infrastructure
- research
- applications and data sets
KIT
ISPRAS
UIUC
ETRI
IDA
MIMOS
13Open Cirrus
- Objectives
- Foster systems research around cloud computing
- Vendor-neutral open-source stacks and APIs for
the cloud - Expose research community to enterprise level
requirements - Provide realistic traces of cloud workloads
- How are we unique
- Support for systems research and applications
research - Federation of heterogeneous datacenters
- Collection of interesting data sets
Independently-managed sites providing a
cooperative research testbed
14User Access to Open Cirrus
- User access is organized around Research Projects
- Led by Principal Investigator (PI)
- Project PIs apply to each site separately
- Identifying additional team members
- Contact information for applications to each site
are available on the Open Cirrus Web site
(http//opencirrus.org) - Each Open Cirrus site decides which users and
projects get access to its site.
15Open Cirrus Research Projects
Datacenter federation Datacenter management Web
services Data-intensive systems
Example research areas of interest
Traditional HPC app development Production
apps looking for free cycles Closed-source
system development
Projects typically not of interest
16Software Stack
17Open Cirrus Software Components
Single Sign-On
Global Monitoring
Global User Directories
Global Services
Data Location
Resource Telemetry
Billing/ Accounting
Site Services
Compute Node Services
18Physical Machine Allocation Zoni
- Zoni dynamically divides compute nodes into
isolated subdomains
Provides each project with a mini-datacenter
Isolation of experiments
19Cluster Storage HDFS
- Storage system aggregating standard devices
- High-performance, parallel access
- High data reliability through replication
- Exposing location information enables intelligent
placement of computation
Storage Service
20Virtual Machine Allocation Tashi
- An open source Apache Software Foundation
incubator project - Infrastructure for cloud computing on Big Data
- http//incubator.apache.org/projects/tashi
- Support for AWS interface
- OS, FS, and VMM agnostic
- Research focus
- Location-aware co-scheduling of compute, storage,
and power - Seamless physical/virtual migration
21Application Service Hadoop
- An open-source Apache Software Foundation project
sponsored by Yahoo! - http//hadoop.apache.org
- Provides a scalable, parallel programming model
(MapReduce) and the associated runtime
22Getting Involved
23Summary
- Open Communities can shape the development of
Cloud Computing - Open Cirrus is a multi-partner test bed for
research in Cloud Computing - The Open Cirrus software stack provides a good
starting point for open-source cloud computing
software development
24Getting Involved
http//opencirrus.org
- Contact Open Cirrus with research proposals
- Contribute to the Open Cirrus software stack
- Zoni, Tashi, Hadoop
- Apache Software Foundation
25The Rest of the Day
26Ground Rules
- Questions?
- Please ask, wed love an interactive day
- But, if the answer is not of general interest, we
may defer until the break - Need to step out?
- Thats OK, but please take your belongings
- Including the lunch
- Please be considerate
- And keep conversations focused on the topic