Title: Cloud Computing - I
1Cloud Computing - I
- Presenters Abhishek Verma, Nicolas Zea
2Cloud Computing
- Map Reduce
- Clean abstraction
- Extremely rigid 2 stage group-by aggregation
- Code reuse and maintenance difficult
- Google ? MapReduce, Sawzall
- Yahoo ? Hadoop, Pig Latin
- Microsoft ? Dryad, DryadLINQ
- Improving MapReduce in heterogeneous environment
3MapReduce A group-by-aggregate
Input records
Output records
map
reduce
Split
Local QSort
reduce
map
Split
shuffle
4Shortcomings
- Extremely rigid data flow
- Other flows hacked in
- Stages Joins Splits
- Common operations must be coded by hand
- Join, filter, projection, aggregates,
sorting,distinct - Semantics hidden inside map-reduce fns
- Difficult to maintain, extend, and optimize
M
R
5Pig Latin A Not-So-Foreign Language for
Data Processing
- Christopher Olston, Benjamin Reed, Utkarsh
Srivastava, Ravi Kumar, Andrew Tomkins
Research
6Pig Philosophy
- Pigs Eat Anything
- Can operate on data w/o metadata relational,
nested, or unstructured. - Pigs Live Anywhere
- Not tied to one particular parallel framework
- Pigs Are Domestic Animals
- Designed to be easily controlled and modified by
its users. - UDFs transformation functions, aggregates,
grouping functions, and conditionals. - Pigs Fly
- Processes data quickly(?)?
7Features
- Dataflow language
- Procedural different from SQL
- Quick Start and Interoperability
- Nested Data Model
- UDFs as First-Class Citizens
- Parallelism Required
- Debugging Environment
8Pig Latin
- Data Model
- Atom 'cs'
- Tuple ('cs', 'ece', 'ee')?
- Bag ('cs', 'ece'), ('cs')
- Map 'courses' ? ('523', '525', '599'
- Expressions
- Fields by position 0
- Fields by name f1,
- Map Lookup
9Example Data Analysis Task
- Find the top 10 most visited pages in each
category
Visits
URL Info
User URL Time
Amy cnn.com 800
Amy bbc.com 1000
Amy flickr.com 1005
Fred cnn.com 1200
10Data Flow
Load Visits
Group by url
Foreach url generate count
Load Url Info
Join on url
Group by category
Foreach category generate top10 urls
11In Pig Latin
- visits load /data/visits as
(user, url, time) - gVisits group visits by url
- visitCounts foreach gVisits generate url,
count(visits) - urlInfo load /data/urlInfo as (url,
category,pRank) - visitCounts join visitCounts by url, urlInfo
by url - gCategories group visitCounts by category
- topUrls foreach gCategories
- generate top(visitCounts,10)
- store topUrls into /data/topUrls
12Quick Start and Interoperability
- visits load /data/visits as
(user, url, time) - gVisits group visits by url
- visitCounts foreach gVisits generate url,
count(visits) - urlInfo load /data/urlInfo as (url,
category,pRank) - visitCounts join visitCounts by url, urlInfo
by url - gCategories group visitCounts by category
- topUrls foreach gCategories
- generate top(visitCounts,10)
- store topUrls into /data/topUrls
Operates directly over files
13Optional Schemas
- visits load /data/visits as
(user, url, time) - gVisits group visits by url
- visitCounts foreach gVisits generate url,
count(visits) - urlInfo load /data/urlInfo as (url,
category,pRank) - visitCounts join visitCounts by url, urlInfo
by url - gCategories group visitCounts by category
- topUrls foreach gCategories
- generate top(visitCounts,10)
- store topUrls into /data/topUrls
Schemas 0ptional can be assigned dynamically
14UDFs as First-class citizens
- visits load /data/visits as
(user, url, time) - gVisits group visits by url
- visitCounts foreach gVisits generate url,
count(visits) - urlInfo load /data/urlInfo as (url,
category,pRank) - visitCounts join visitCounts by url, urlInfo
by url - gCategories group visitCounts by category
- topUrls foreach gCategories
- generate top(visitCounts,10)
- store topUrls into /data/topUrls
UDFs can be used in every construct
15Operators
- LOAD specifying input data
- FOREACH per-tuple processing
- FLATTEN eliminate nesting
- FILTER discarding unwanted data
- COGROUP getting related data together
- GROUP, JOIN
- STORE asking for output
- Other UNION, CROSS, ORDER, DISTINCT
16COGROUP Vs JOIN
17Compilation into MapReduce
Every group or join operation forms a map-reduce
boundary
Map1
Load Visits
Group by url
Reduce1
Map2
Foreach url generate count
Load Url Info
Join on url
Reduce2
Map3
Other operations pipelined into map and reduce
phases
Group by category
Reduce3
Foreach category generate top10 urls
18Debugging Environment
- Write-run-debug cycle
- Sandbox dataset
- Objectives
- Realism
- Conciseness
- Completeness
- Problems
- UDFs
19Future Work
- Optional safe query optimizer
- Performs only high-confidence rewrites
- User interface
- Boxes and arrows UI
- Promote collaboration, sharing code fragments and
UDFs - Tight integration with a scripting language
- Use loops, conditionals of host language
20DryadLINQ A System for General Purpose
Distributed Data-Parallel Computing Using a
High-Level Language
- Yuan Yu, Michael Isard, Dennis Fetterly, Mihai
Budiu, - Ulfar Erlingsson, Pradeep Kumar Gunda, Jon Currey
21Dryad System Architecture
data plane
Files, TCP, FIFO, Network
job schedule
V
V
V
NS
PD
PD
PD
control plane
Job manager
cluster
22LINQ
CollectionltTgt collection bool IsLegal(Key) stri
ng Hash(Key) var results from c in collection
where IsLegal(c.key) select new
Hash(c.key), c.value
23DryadLINQ Constructs
- Partitioning Hash, Range, RoundRobin
- Apply, Fork
- Hints
24Dryad LINQ DryadLINQ
CollectionltTgt collection bool IsLegal(Key
k) string Hash(Key) var results from c in
collection where IsLegal(c.key) select new
Hash(c.key), c.value
Vertexcode
Queryplan (Dryad job)
Data
collection
C
C
C
C
results
25DryadLINQ Execution Overview
Client machine
DryadLINQ
C
Data center
Distributed query plan
Invoke
Query Expr
Query
Input Tables
ToDryadTable
Dryad Execution
JM
Output DryadTable
Results
C Objects
Output Tables
(11)
foreach
26System Implementation
- LINQ expressions converted to execution plan
graph (EPG) - similar to database query plan
- DAG
- annotated with metadata properties
- EPG is skeleton of Dryad DFG
- as long as native operations are used, properties
can propagate helping optimization
27Static Optimizations
- Pipelining
- Multiple operations in a single process
- Removing redundancy
- Eager Aggregation
- Move aggregations in front of partitionings
- I/O Reduction
- Try to use TCP and in-memory FIFO instead of disk
space
28Dynamic Optimizations
- As information from job becomes available, mutate
execution graph - Dataset size based decisions
- Intelligent partitioning of data
29Dynamic Optimizations
- Aggregation can turn into tree to improve I/O
based on locality - Example if part of computation is done locally,
then aggregated before being sent across network
30Evaluation
- 240 computer cluster of 2.6Ghz dual core AMD
Opterons - Sort 10 billion 100-byte records on 10-byte key
- Each computer stores 3.87 GBs
31Evaluation
- DryadLINQ vs Dryad - SkyServer
- Dryad is hand optimized
- No dynamic optimization overhead
- DryadLINQ is 10 native code
32Main Benefits
- High level and data type transparent
- Automatic optimization friendly
- Manual optimizations using Apply operator
- Leverage any system running LINQ framework
- Support for interacting with SQL databases
- Single computer debugging made easy
- Strong typing, narrow interface
- Deterministic replay execution
33Discussion
- Dynamic optimizations appear data intensive
- What kind of overhead?
- EPG analysis overhead -gt high latency
- No real comparison with other systems
- Progress tracking is difficult
- No speculation
- Will Solid State Drives diminish advantages of
MapReduce? - Why not use Parallel Databases?
- MapReduce Vs Dryad
- How different from Sawzall and Pig?
34Comparison
Language Sawzall Pig Latin DryadLINQ
Built by Google Yahoo Microsoft
Programming Imperative Imperative Imperative Declarative Hybrid
Resemblance to SQL Least Moderate Most
Execution Engine Google MapReduce Hadoop Dryad
Performance Very Efficient 5-10 times slower 1.3-2 times slower
Implementation Internal, inside Google Open Source Apache-License Internal, inside Microsoft
Model Operate per record Sequence of MR DAGs
Usage Log Analysis Machine Learning Iterative computations
35Improving MapReduce Performance in Heterogeneous
Environments
- Matei Zaharia, Andy Konwinski, Anthony Joseph,
- Randy Katz, Ion Stoica
- University of California at Berkeley
36Hadoop Speculative Execution Overview
- Speculative tasks executed only if no failed or
waiting avail. - Notion of progress
- 3 phases of execution
- Copy phase
- Sort phase
- Reduce phase
- Each phase weighted by data processed
- Determines whether a job failed or is a straggler
and available for speculation
37Hadoops Assumptions
- Nodes can perform work at exactly the same rate
- Tasks progress at a constant rate throughout time
- There is no cost to launching a speculative task
on an idle node - The three phases of execution take approximately
same time - Tasks with a low progress score are stragglers
- Maps and Reduces require roughly the same amount
of work
38Breaking Down the Assumptions
- Virtualization breaks down homogeneity
- Amazon EC2 - multiple vms on same physical host
- Compete for memory/network bandwidth
- Ex two map tasks can compete for disk bandwidth,
causing one to be a straggler
39Breaking Down the Assumptions
- Progress threshold in Hadoop is fixed and assumes
low progress faulty node - Too Many speculative tasks executed
- Speculative execution can harm running tasks
40Breaking Down the Assumptions
- Tasks phases are not equal
- Copy phase typically the most expensive due to
network communication cost - Causes rapid jump from 1/3 progress to 1 of many
tasks, creating fake stragglers - Real stragglers get usurped
- Unnecessary copying due to fake stragglers
- Progress score means anything with gt80 never
speculatively executed
41LATE Scheduler
- Longest Approximate Time to End
- Primary assumption best task to execute is the
one that finishes furthest into the future - Secondary tasks make progress at approx.
constant rate - Progress Rate ProgressScore/T
- T time task has run for
- Time to completion (1-ProgressScore)/T
42LATE Scheduler
- Launch speculative jobs on fast nodes
- best chance to overcome straggler vs using first
available node - Cap on total number of speculative tasks
- Slowness minimum threshold
- Does not take into account data locality
43Performance Comparison Without Stragglers
- EC2 test cluster
- 1.0-1.2 Ghz Opteron/Xeon w/1.7 GB mem
Sort
44Performance Comparison With Stragglers
- Manually slowed down 8 VMs with background
processes
Sort
45Performance Comparison With Stragglers
WordCount
Grep
46Sensitivity
47Sensitivity
48Takeaways
- Make decisions early
- Use finishing times
- Nodes are not equal
- Resources are precious
49Further questions
- Focusing work on small vms fair?
- Would it be better to pay for large vm and
implement system with more customized control? - Could this be used in other systems?
- Progress tracking is key
- Is this a fundamental contribution? Or just an
optimization? - Good research?