MapReduce for the Cell B. E. Architecture - PowerPoint PPT Presentation

About This Presentation

Title:

MapReduce for the Cell B. E. Architecture

Description:

Department of Computer Science. MapReduce for the Cell B. E. Architecture. Marc ... Distributed grep. Indexing. Simple, high-level interface. Runtime handles: ... – PowerPoint PPT presentation

Number of Views:37

Avg rating:3.0/5.0

Slides: 28

Provided by: marcde1

Learn more at: https://pages.cs.wisc.edu

Category:

more less

Transcript and Presenter's Notes

Title: MapReduce for the Cell B. E. Architecture

1
MapReduce for the Cell B. E. Architecture
Marc de Kruijf University of Wisconsin-Madison Ad
vised by Professor Sankaralingam
2
MapReduce

A model for parallel programming
Proposed by Google
Large scale distributed systems 1,000 node
clusters
Applications
Distributed sort
Distributed grep
Indexing
Simple, high-level interface
Runtime handles
parallelization, scheduling, synchronization, and
communication

3
Cell B. E. Architecture

A heterogeneous computing platform
1 PPE, 8 SPEs
Programming is hard
Multi-threading is explicit
SPE local memories are software-managed
The Cell is like a cluster-on-a-chip

4
Motivation

MapReduce
Scalable parallel model
Simple interface

Cell B. E.
Complex parallel architecture
Hard to program

MapReduce for the Cell B.E. Architecture
5
Overview

Motivation
MapReduce
Cell B.E. Architecture
MapReduce Example
Design
Evaluation
Workload Characterization
Application Performance
Conclusions and Future Work

6
MapReduce Example

Counting word occurrences in a set of documents

7
Overview

Motivation
MapReduce
Cell B.E. Architecture
MapReduce Example
Design
Evaluation
Workload Characterization
Application Performance
Conclusions and Future Work

8
Design
Flow of Execution Five stages Map, Partition,
Quick-sort, Merge-sort, Reduce
9
Design
Flow of Execution Five stages Map, Partition,
Quick-sort, Merge-sort, Reduce 1. Map streams
key/value pairs
10
Design

Flow of Execution
Five stages Map, Partition, Quick-sort,
Merge-sort, Reduce
1. Map streams key/value pairs
Key grouping implemented as
2. Partition hash and distribute
3. Quick-sort
4. Merge-sort

two-phase external sort
11
Design

Flow of Execution
Five stages Map, Partition, Quick-sort,
Merge-sort, Reduce
1. Map streams key/value pairs
Key grouping implemented as
2. Partition hash and distribute
3. Quick-sort
4. Merge-sort

two-phase external sort
12
Design

Flow of Execution
Five stages Map, Partition, Quick-sort,
Merge-sort, Reduce
1. Map streams key/value pairs
Key grouping implemented as
2. Partition hash and distribute
3. Quick-sort
4. Merge-sort

two-phase external sort
13
Design

Flow of Execution
Five stages Map, Partition, Quick-sort,
Merge-sort, Reduce
1. Map streams key/value pairs
Key grouping implemented as
2. Partition hash and distribute
3. Quick-sort
4. Merge-sort
5. Reduce reduces
key/list-of-values pairs to
key/value pairs.

two-phase external sort
14
Overview

Motivation
MapReduce
Cell B.E. Architecture
MapReduce Example
Design
Evaluation
Workload Characterization
Application Performance
Conclusions and Future Work

15
Evaluation Methodology

MapReduce Model Characterization
Synthetic micro-benchmark with six parameters
Run on a 3.2 GHz Cell Blade
Measured effect of each parameter on execution
time
Application Performance Comparison
Six full applications
MapReduce versions run on 3.2 GHz Cell Blade
Single-threaded versions run on 2.4 GHz Core 2
Duo
Evaluation
Measured speedup comparing execution times
Measured overheads on the Cell monitoring SPE
idle time
Measured ideal speedup assuming no Cell overheads

16
MapReduce Model Characterization

Model Characteristics

Effect on Execution Time
Characteristic Description
Map intensity Execution cycles per input byte to Map
Reduce intensity Execution cycles per input byte to Reduce
Map fan-out Ratio of input size to output size in Map
Reduce fan-in Number of values per key in Reduce
Partitions Number of partitions
Input size Input size in bytes
17
Application Performance

Applications
histogram counts bitmap RGB occurrences
kmeans clustering algorithm
linearReg least-squares linear regression
wordCount word count
NAS_EP EP benchmark from NAS suite
distSort distributed sort

18
Speedup Over Core 2 Duo
19
Runtime Overheads
20
Overview

Motivation
MapReduce
Cell B.E. Architecture
MapReduce Example
Design
Evaluation
Workload Characterization
Application Performance
Conclusions and Future Work

21
Conclusions and Future Work

Conclusions
Programmability benefits
High-performance on computationally intensive
workloads
Not applicable to all application types
Future Work
Additional performance tuning
Extend for clusters of Cell processors
Hierarchical MapReduce

22
Questions?
23
Backup Slides
24
MapReduce API

void MapReduce_exec(MapReduce Specification
specification)
The exec function initializes the MapReduce
runtime and executes MapReduce according to the
user specification.
void MapReduce_emitIntermediate(void key, void
value)
void MapReduce_emit(void value)
These two functions are called by the
user-defined Map and Reduce functions,
respectively. These functions take references to
pointers as arguments, and modify the referenced
pointer to point to pre-allocated storage. It is
then the responsibility of the application to
provision this storage.