Benchmarking MapReduceStyle Parallel Computing - PowerPoint PPT Presentation

1 / 14

About This Presentation

Title:

Benchmarking MapReduceStyle Parallel Computing

Description:

Can apply MapReduce style computation to many other application domains. Give it a Try! ... High-speed, local network using commodity technology. E.g., gigabit ... – PowerPoint PPT presentation

Number of Views:70

Avg rating:3.0/5.0

Slides: 15

Provided by: randa176

Category:

Tags: mapreducestyle | benchmarking | computing | parallel

Transcript and Presenter's Notes

Title: Benchmarking MapReduceStyle Parallel Computing

1
BenchmarkingMapReduce-StyleParallel Computing
Randal E. Bryant Carnegie Mellon University
http//www.cs.cmu.edu/bryant
2
Programming with MapReduce

Background
Developed at Google for aggregating web data
Dean Ghemawat MapReduce Simplified Data
Processing on Large Clusters, OSDI 2004
Strengths
Easy way to write scalable parallel programs
Powerful programming model
Beyond web search applications
Runtime system automatically handles many of the
challenges of parallel programming
Scheduling, load balancing, fault tolerance

3
Overall Execution Model

General Form
Input
Large set of files
Compute
Aggregate information
Output
Files containing aggregations

Example Word Count Index
Input
1010 cached web pages
Stored on cluster of 1000 machines, each with own
local disk
Compute
Index of words with occurrence counts
Output
File containing count for each word

4
MapReduce Programming

Map
Function generating keyword/value pairs from
input file
E.g., word/count for each word in document
Reduce
Function aggregating values for single keyword
E.g.,Sum word counts

5
MapReduce Implementation

(Somewhat naïve implementation)
Map
Spawn mapping task for each input file
Execute on processor local to file
Generate file for each keyword/value
Shuffle
Redistribute files by hashing keywords K gt Ph(K)
Reduce
Spawn reduce task for each keyword
On processor to which keyword hashes Ph(K)

6
Appealing Features

Ease of Programming
Programmer provides only two functions
Express in terms of computation over data, not
detailed execution on system
Robustness
Tolerant to failures of disks, processors,
network
Source files stored redundantly
Runtime monitor detects and reexecutes failed
tasks
Dynamic scheduling automatically adapts to
resource limitations

7
Tolerating Failures

Dean Ghemawat, OSDI 2004
Sorting 10 million 100-byte records with 1800
processors
Proactively restart delayed computations to
achieve better performance and fault tolerance

8
Our Data-Driven World

Science
Data bases from astronomy, genomics, natural
languages, seismic modeling,
Humanities
Scanned books, historic documents,
Commerce
Corporate sales, stock market transactions,
census, airline traffic,
Entertainment
Internet images, Hollywood movies, MP3 files,
Medicine
MRI CT scans, patient records,

9
Big Data Computing Beyond Web Search

Application Domains
Rely on large, ever-changing data sets
Collecting maintaining data is major effort
Computational Requirements
Extract information from large volumes of raw
data
Hypothesis
Can apply MapReduce style computation to many
other application domains
Give it a Try!
Hadoop Open source implementation of parallel
file system MapReduce

10
Q1 Workload Characteristics

Hardware
1000s of nodes
Each with processor(s), disk(s), network
interface
High-speed, local network using commodity
technology
E.g., gigabit ethernet with switches
Data Organization
Distributed file system providing uniform name
space and redundant storage
Computation
Each task executed as separate process with file
I/O
Rely on file system for data transfer

11
Q2 Hardware/Software Challenges

Performance Issues
Disk bandwidth limitations
? 3.6 hours to read data from 1TB disk
Data transfer across network
Process file I/O overhead
Runtime Issues
Detecting and mitigating effects of failed
components

12
Q3 Benchmarking Challenges

Generalizing Results
Beyond specific data set cluster configuration
Performance depends on many different factors
Can we predict how program will scale?
Identifying Bottlenecks
Many interacting parts to system
Evaluating Robustness
Creating realistic failure modes

13
Q4 University Contributions

Currently Industry ahead of universities
Dealing with massive data sets
Computing at very large scale
Developing new programming/runtime approaches
Google, Yahoo!, Microsoft
University Role
More open and systematic inquiry
Apply to noncommercial problems
Extend and improve programming model and
notations
Expose students to emerging styles of computing

14
Background Information

Data-Intensive Supercomputing The case for
DISC
Tech Report CMU-CS-07-128
Available from http//www.cs.cmu.edu/bryant

Write a Comment

User Comments (0)

About PowerShow.com

Recommended Relevance Latest Highest Rated Most Viewed

Sort by:

Related More from user

CrystalGraphics Presentations

Introducing-PowerShowcom PowerPoint PPT Presentation

Introducing-PowerShowcom - Introducing-PowerShowcom (Without Music)

CrystalGraphics 3D Character Slides for PowerPoint PowerPoint PPT Presentation

CrystalGraphics 3D Character Slides for PowerPoint - CrystalGraphics 3D Character Slides for PowerPoint

Chart and Diagram Slides for PowerPoint PowerPoint PPT Presentation

Chart and Diagram Slides for PowerPoint - Beautifully designed chart and diagram s for PowerPoint with visually stunning graphics and animation effects. Our new CrystalGraphics Chart and Diagram Slides for PowerPoint is a collection of over 1000 impressively designed data-driven chart and editable diagram s guaranteed to impress any audience. They are all artistically enhanced with visually stunning color, shadow and lighting effects. Many of them are also animated. And they’re ready for you to use in your PowerPoint presentations the moment you need them. – PowerPoint PPT presentation

Related Presentations

CS 213: Parallel Processing Architectures PowerPoint PPT Presentation

CS 213: Parallel Processing Architectures - Parallelism moved to instruction level. Microprocessor performance ... Process Level or Thread level parallelism; mainstream for general purpose computing? ... | PowerPoint PPT presentation | free to view

An Introduction to Parallel Computing PowerPoint PPT Presentation

An Introduction to Parallel Computing - Task parallelism. The problem consists of a number of independent tasks ... Data parallelism. The problem consists of dependent tasks ... | PowerPoint PPT presentation | free to view

GridFTP: File Transfer Protocol in Grid Computing Networks PowerPoint PPT Presentation

GridFTP: File Transfer Protocol in Grid Computing Networks - File Transfer Protocol in Grid Computing Networks Caitlin Minteer Agenda Grid Computing Globus Toolkit Grid FTP Advantages of GridFTP Disadvantages of GridFTP Using ... | PowerPoint PPT presentation | free to view

Advanced Hardware Parallel/Distributed Processing High Performance Computing Top 500 list Grid computing PowerPoint PPT Presentation

Advanced Hardware Parallel/Distributed Processing High Performance Computing Top 500 list Grid computing - CMPE 478, Parallel Processing Advanced Hardware Parallel/Distributed Processing High Performance Computing Top 500 list Grid computing picture of | PowerPoint PPT presentation | free to view

Alternative Computing Paradigms: Summary and Future Directions PowerPoint PPT Presentation

Alternative Computing Paradigms: Summary and Future Directions - Quantum Computing: Parallel computation in quantum systems Basic Mechanism: Parallel computation along all possible computational paths, ... | PowerPoint PPT presentation | free to view

Parallel Collaborative Design Overview and examples of method for designers and other stakeholders to quickly generate and refine design ideas. PowerPoint PPT Presentation

Parallel Collaborative Design Overview and examples of method for designers and other stakeholders to quickly generate and refine design ideas. - Parallel Collaborative Design Overview and examples of method for designers and other stakeholders to quickly generate and refine design ideas. | PowerPoint PPT presentation | free to view

Observation on Parallel Computation of Transitive and Max-closure Problems PowerPoint PPT Presentation

Observation on Parallel Computation of Transitive and Max-closure Problems - Observation on Parallel Computation of Transitive and Max-closure Problems Motivation TC problem has numerous applications in many areas of computer science. | PowerPoint PPT presentation | free to view

Introduction to Parallel Computing PowerPoint PPT Presentation

Introduction to Parallel Computing - Introduction to Parallel Computing Yao-Yuan Chuang | PowerPoint PPT presentation | free to view

Introduction to Parallel Programming PowerPoint PPT Presentation

Introduction to Parallel Programming - Title: The IC Wall Collaboration between Computer science + Physics Last modified by: bal Document presentation format: Custom Other titles: Times New Roman Arial ... | PowerPoint PPT presentation | free to view

Introduction to Parallel Computing PowerPoint PPT Presentation

Introduction to Parallel Computing - Load balancing is important to parallel programs for ... Memory Hybrid Distributed-Shared Memory Shared Memory Shared memory parallel computers vary ... | PowerPoint PPT presentation | free to view

Introduction to Cluster Computing PowerPoint PPT Presentation

Introduction to Cluster Computing - ... and HTC Parallel algorithms Software technologies High Performance Computing CPU clock frequency Parallel computers ... load balancing Transparent process ... | PowerPoint PPT presentation | free to view

Lecture 1: Introduction to High Performance Computing PowerPoint PPT Presentation

Lecture 1: Introduction to High Performance Computing - Title: CSE 574 Parallel Processing Author: ICS Faculty User Last modified by: Esin Onbasioglu Created Date: 7/12/2005 12:19:29 PM Document presentation format | PowerPoint PPT presentation | free to view

Parallel Job Deployment and Monitoring in a Hierarchy of Mobile Agents PowerPoint PPT Presentation

Parallel Job Deployment and Monitoring in a Hierarchy of Mobile Agents - Parallel Job Deployment and Monitoring in a Hierarchy of Mobile Agents Munehiro Fukuda Computing & Software Systems, University of Washington, Bothell | PowerPoint PPT presentation | free to view

pArray as an Efficient Static Parallel Container in STAPL (Standard Template Adaptive Parallel Library) PowerPoint PPT Presentation

pArray as an Efficient Static Parallel Container in STAPL (Standard Template Adaptive Parallel Library) - pArray as an Efficient Static Parallel Container in STAPL (Standard Template Adaptive Parallel Library) Presenter: Olga Tkachyshyn Grad Student Advisors: Ping An ... | PowerPoint PPT presentation | free to view

High Performance Molecular Visualization and Analysis with GPU Computing PowerPoint PPT Presentation

High Performance Molecular Visualization and Analysis with GPU Computing - High Performance Molecular Visualization and Analysis with GPU Computing John Stone Theoretical and Computational Biophysics Group Beckman Institute for Advanced ... | PowerPoint PPT presentation | free to view

Objective: End-to-end parallel programming solutions for high-performance interactive computing with provable performances. PowerPoint PPT Presentation

Objective: End-to-end parallel programming solutions for high-performance interactive computing with provable performances. - The MOAIS team-project Objective: End-to-end parallel programming solutions for high-performance interactive computing with provable performances. | PowerPoint PPT presentation | free to view

EE 316 Computer Engineering Junior Lab PowerPoint PPT Presentation

EE 316 Computer Engineering Junior Lab - EE 316 Computer Engineering Junior Lab Lecture on PC Parallel port The IEEE 1284 parallel interface standard Parallel ports are used for connecting a computer (host ... | PowerPoint PPT presentation | free to view

A Parallel Architecture for the Generalized Traveling Salesman Problem PowerPoint PPT Presentation

A Parallel Architecture for the Generalized Traveling Salesman Problem - A Parallel Architecture for the Generalized Traveling Salesman Problem Max Scharrenbroich AMSC 663 Mid-Year Progress Report Advisor: Dr. Bruce L. Golden | PowerPoint PPT presentation | free to view

Parallel and Distributed Models in Evolutionary Computing PowerPoint PPT Presentation

Parallel and Distributed Models in Evolutionary Computing - Parallel and Distributed Models in Evolutionary Computing Motivation Parallelization models Distributed models Neural and Evolutionary Computing - Lecture 10 * | PowerPoint PPT presentation | free to view

Computing Beyond Silicon Valley Summer School at Caltech PowerPoint PPT Presentation

Computing Beyond Silicon Valley Summer School at Caltech - Title: Branching in DNA Computation Author: avery Last modified by: Vera Created Date: 6/21/2004 4:09:34 AM Document presentation format: | PowerPoint PPT presentation | free to view

IaaS Cloud Benchmarking: PowerPoint PPT Presentation

IaaS Cloud Benchmarking: - IaaS Cloud Benchmarking: Approaches, Challenges, and Experience Alexandru Iosup Parallel and Distributed Systems Group Delft University of Technology | PowerPoint PPT presentation | free to view

Parallel Programming Orientation PowerPoint PPT Presentation

Parallel Programming Orientation - ... no disk required Less than 20 seconds Virtual, ... Shared memory OpenMP Sockets PVM Linda MPI Most distributed parallel programs are now ... Presentation Author ... | PowerPoint PPT presentation | free to view

Hardware Acceleration of Parallel Prefix Algorithms PowerPoint PPT Presentation

Hardware Acceleration of Parallel Prefix Algorithms - Peter Scott (Project leader) Avinash Srinivasa Vaibhav Sundriyal What is parallel prefix? Finding parallelism in serial-looking problems. Take an array, like [1, 3, 2 ... | PowerPoint PPT presentation | free to view

Parallel solution of the Helmholtz equation with high frequency PowerPoint PPT Presentation

Parallel solution of the Helmholtz equation with high frequency - Parallel solution of the Helmholtz equation with high frequency Dan Gordon Computer Science University of Haifa Rachel Gordon Aerospace Eng. Technion | PowerPoint PPT presentation | free to view

ECE 697F Reconfigurable Computing Lecture 4 Contrasting Processors: Fixed and Configurable PowerPoint PPT Presentation

ECE 697F Reconfigurable Computing Lecture 4 Contrasting Processors: Fixed and Configurable - ECE 697F Reconfigurable Computing Lecture 4 Contrasting Processors: Fixed and Configurable | PowerPoint PPT presentation | free to view

Global and Europe Parallel Battery Pack Market - Analysis and Outlook to 2022 PowerPoint PPT Presentation

Global and Europe Parallel Battery Pack Market - Analysis and Outlook to 2022 - This report presents a comprehensive overview of the Parallel Battery Pack market in Europe, which shares the history data information from 2012 to 2016, and forecast from 2017 to 2022. This report provides a detailed analysis of the market, including its dynamics, structure, characteristics, main players, growth and demand drivers, etc. As a Detailed Analysis report, it covers all details inside analysis and opinion in Parallel Battery Pack industry. | PowerPoint PPT presentation | free to view

Global High Performance Computing Market Share PowerPoint PPT Presentation

Global High Performance Computing Market Share - Global High Performance Computing Market size is expected to reach $48.3 billion by 2023, rising at a market growth of 7.1% CAGR during the forecast period. Full report: https://kbvresearch.com/high-performance-computing-market/ | PowerPoint PPT presentation | free to view