Matchmaking: A New MapReduce Scheduling Technique

About This Presentation

Title:

Matchmaking: A New MapReduce Scheduling Technique

Description:

Matchmaking: A New MapReduce Scheduling Technique Chen He Dr. Ying Lu Dr. David Swanson MatchMaking Algorithm Outline Background Delay Algorithm MatchMaking ... – PowerPoint PPT presentation

Number of Views:519

Avg rating:3.0/5.0

Slides: 49

Provided by: Kathleen251

Category:

more less

Transcript and Presenter's Notes

Title: Matchmaking: A New MapReduce Scheduling Technique

1
Matchmaking A New MapReduce Scheduling Technique

Chen He Dr. Ying Lu Dr. David Swanson

2
Problem Statement

MapReduce cluster scheduling algorithm becomes
increasingly important
Efficient MapReduce scheduler must avoid
unnecessary data transmission
We will focus on decreasing data transmission in
a MapReduce cluster

3
Contributions

Build a matchmaking algorithm to improve data
locality of Hadoop MapReduce jobs
MatchMaking algorithm lead to higher data
locality rate and shorter map task response time
We substitute Delay algorithm with MatchMaking
algorithm in Fair-sharing scheduler and also
obtain better performance

4
Outline

Background
Delay Algorithm
MatchMaking algorithm
Evaluation
Conclusion
Questions

5
Background

Hadoop FIFO scheduler
Scheduler searches local tasks in the first job
and assign them
If no local task in the first job, a non-local
task of the first job will be assigned
Strict FIFO job order is followed

6
Background

Hadoop FIFO scheduler

7
Background

Hadoop FIFO scheduler

8
Background

Hadoop FIFO scheduler

9
Background

Hadoop FIFO scheduler

10
Background

Hadoop FIFO scheduler

11
Background

Hadoop FIFO scheduler deficiencies
On the node side, strict FIFO job order reduces
data locality
On the job side, FIFO can not provide a fair
opportunity for each worker node

12
Delay Algorithm

Driven by Facebook events log saved in their
Hadoop data warehouse
Hadoop default FIFO scheduler results in
unnecessarily long job response time and lack of
fairness in resource sharing
Focus on two points fair sharing and data
locality

13
Delay Algorithm

Workload

Bin Maps Jobs at Facebook Maps in Benchmark of jobs in Benchmark
1 1 39 1 38
2 2 16 2 16
3 3-20 14 10 14
4 21-60 9 50 8
5 61-150 6 100 6
6 151-300 6 200 6
7 301-500 4 400 4
8 501-1500 4 800 4
9 gt1501 3 4800 4
Matei Zaharia et al Delay scheduling A simple
technique for achieving locality and fairness in
cluster scheduling
14
Delay Algorithm

Fairness
Task execution percentage between jobs
groups
users
Data locality
For Map stage, a map task is running on a node
that contains its input data
For Reduce stage?

15
Delay Scheduling

Fairness VS. Data locality

16
Delay Algorithm

Fair-sharing principle-hierarchical principle

17
Delay Scheduling-including rack locality
18
Delay Algorithm

Relax the strict job order
Scheduler can search other jobs in the job queue
to find a local task
Maximum Delay Time (MDT) for a job to avoid
starvation
MDT is a user defined maximum time that the
scheduler can delay a job from assigning its
non-local map tasks

19
Delay Algorithm
20
Delay Algorithm
21
Delay Algorithm
22
Delay algorithm
23
Delay algorithm
24
Delay algorithm
25
Delay Algorithm Properties

MDT decides data locality rate
Rl is an increasing function of MDT but with a
ceiling value 1
However, average response time

26
Delay Algorithm Deficiency

To achieve best response time, we need to
vary the MDT value
different types of jobs
different cluster sizes
different job execution orders

27
Outline

Background
Delay Algorithm
MatchMaking algorithm
Evaluation
Conclusion
Questions

28
MatchMaking Algorithm

Relax strict job order
search all jobs in the queue for local tasks
To give every node a fair chance to grab its
local tasks
when a node fails to find a local task for the
first time in a row, no non-local task will be
assigned to it
when a node fails to find a local task for the
second time in a row, a non-local task will be
assigned to it
A node can be assigned at most one non-local
task in every heartbeat interval

29
MatchMaking Algorithm
30
MatchMaking Algorithm
31
MatchMaking Algorithm
32
MatchMaking Algorithm
33
MatchMaking Algorithm
34
MatchMaking Algorithm
35
Outline

Background
Delay Algorithm
MatchMaking algorithm
Evaluation
Conclusion
Questions

36
Evaluation

Environment
Hardware
1 head node with 2 AMD Optron 2.2GHz 64bit, 8GB
Mem, 1Gbps Ethernet
30 worker nodes with same CPUs and network but
4GB Mem
Software
Hadoop 0.21
Redhat Linux CentOS 5.5
Test cases
Loadgen
Wordcount
Metrics
Locality Rate
Average Response Time

37
Evaluation

Hadoop Configuration
HDFS
Block size is128MB
100 Blocks evenly distributed in 30 worker nodes
Replication number is 2
MapReduce
2 map slots and 1 reduce slot for each worker
node
Facebook production workload

Matei Zaharia et al Delay scheduling A simple
technique for achieving locality and fairness in
cluster scheduling
38
Evaluation

FIFO Scheduler
Default locality policy
Delay policy
Matchmaking policy
Fair-sharing Scheduler
Delay policy
Matchmaking policy

39
Evaluation

FIFO scheduler locality rate
loadgen wordcount

40
Evaluation

FIFO scheduler MTART
loadgen
wordcount

41
Evaluation

Fair sharing scheduler locality rate

42
Evaluation

Fair sharing scheduler response time

43
Conclusion

We create MatchMaking algorithm to improve
MapReduce schedulers data locality without
tuning
It obtains good performance in a middle size
cluster with Facebook production workload
It can be easily integrated with other scheduler
like FIFO or Fair-sharing scheduler

44
Disscussion