MROrder: Flexible Job Ordering Optimization for Online MapReduce Workloads

1 / 29
About This Presentation
Title:

MROrder: Flexible Job Ordering Optimization for Online MapReduce Workloads

Description:

MROrder: Flexible Job Ordering Optimization for Online MapReduce Workloads Shanjiang Tang, Bu-Sung Lee, Bingsheng He School of Computer Engineering –

Number of Views:124
Avg rating:3.0/5.0
Slides: 30
Provided by: ppl78
Category:

less

Transcript and Presenter's Notes

Title: MROrder: Flexible Job Ordering Optimization for Online MapReduce Workloads


1
MROrder Flexible Job Ordering Optimization for
Online MapReduce Workloads
Shanjiang Tang, Bu-Sung Lee, Bingsheng He
School of Computer Engineering Nanyang
Technological University
30th Aug 2013
2
OutLine
  • Background Motivations
  • MROrder
  • Evaluation
  • Conclusion

3
MapReduce Computation Model
Map-Phase Computation
Reduce-Phase Computation
Reduce
Intermediate Result
Output Result
Reduce
Intermediate Result
Output Result
Final Result
Input Data
Reduce
Intermediate Result
Output Result
Reduce
Intermediate Result
Output Result
4
Hadoop Execution Model
  • Hadoop is an open-source implementation of
    MapReduce Model.
  • The cluster computation resources are divided
    into map slots and reduce slots, which are
    configured by Hadoop administrator in advance.
  • A MapReduce job generally consists of map tasks
    and reduce tasks.
  • Map tasks have to be allocated with map slots,
    and reduce tasks have to be allocated with reduce
    slots.


5
Hadoop Execution Model


Map slots
Reduce slots
Map tasks can only run on map slots, reduce
tasks can only run on reduce slots
Map tasks start before reduce tasks
6
Job Order VS Performance
time
Map Phase
Reduce Phase
time
Map Phase
Reduce Phase
  • Implication Different Job orders have a
    significant impact on performance results!!!


7
Our Goals
  • Job ordering Optimization is a non-trivial
    approach to improve the performance of MapReduce
    workloads ( i.e., a batch of MapReduce jobs).
  • Our work focuses on job ordering optimization for
    online MapReduce workloads under FIFO scheduler,
    where jobs arriving over time.
  • Different performance metrics are considered,
    e.g., makespan, total completion time.


8
OutLine
  • Background Motivations
  • MROrder
  • Evaluation
  • Conclusion

9
Architecture Overview of MROrder


10
Policy Module
  • Determine when and how to perform job ordering
    optimization for MapReduce jobs.
  • We provide two alternative solutions for
    determine when to perform job ordering
    optimization
  • PNJ-Dominated Solution.
  • performs job ordering when the number of jobs
    in the queue reaches to a threshold , i.e.,
    .
  • TP-Dominated Solution.
  • invokes periodically after a time
    interval.
  • Notes PNJ -- policy for the number of
    job. TP time-based policy.


11
Policy Module
  • TP-Dominated solution
  • TP-Dominated Solution with Fixed Time Interval
    (TP-FTI).
  • perform job ordering periodically within
    fixed time interval
  • TP-Dominated Solution with Adaptive Time Interval
    (TP-ATI).
  • perform job ordering dynamically with
    adaptive time interval, based on the estimated
    running time of workloads.


12
TP-FTI
13
TP-ATI
14
Ordering Engine
  • Responsible for performing job ordering
    optimization.
  • Two types of job ordering approaches
  • Simulation-based Ordering Approach (SIM).
  • we develop a Hadoop simulator Hsim to look
    for optimal results. It is a brute-force method.
  • Algorithm-based Ordering Approach (ALG).
  • we provide efficient heuristic job ordering
    algorithms for different performance metrics,
    e.g., makespan, total completion time.


15
ALG for Makespan
16
ALG for Total Completion Time
17
OutLine
  • Background Motivations
  • MROrder
  • Evaluation
  • Conclusion

18
Experiment Setup
  • Enviroments
  • A Hadoop cluster consisting of 10 nodes, each
    with two Intel X5675 CPUs, 24GB memory and 56GB
    hard disks.
  • Workloads
  • Synthetic Facebook Workload.
  • we generated it based on previously related
    work. Most of jobs are small-size, aiming to use
    it to evaluate the total completion time.
  • Tested Workload.
  • Most of its jobs are large-size, we use it
    to evaluate the makespan.

19
TP-FTI VS TP-ATI
  • TP-ATI is smarter and works better than
    TP-FTI !

?t the suitable threshold of time period for
time-based policy. PITCT performance improvement
of total completion time.
20
ALG VS SIM
  • SIM performs better than ALG, but consumes
    more time especially when the number of jobs are
    large.

21
Performance Improvement by MROrder (Simulation
Result)
Total Completion Time is sensitive to the
small-size dominated jobs !
22
Performance Improvement by MROrder (Real
Experiment Result)
Makespan is sensitive to the large-size
dominated jobs !
23
OutLine
  • Background Motivations
  • MROrder
  • Evaluation
  • Conclusion

24
Conclusion
  • Job ordering optimization is a non-trivial method
    to improve the efficiency of slots resource
    utilization and perform of MapReduce workloads.
  • MROrder is a prototype system for online
    MapReduce workloads, being flexible for various
    performance metrics.
  • Experimental results show that MROrder improves
    the performance of MapReduce workloads
    significantly.
  • The source code of MROrder is available at
  • http//sourceforge.net/projects/mrorder/

25
Ongoing and Future Work
  • Integrating MROrder into Hadoop system.
  • Considering the performance improvement for other
    schedulers, e.g., Hadoop Fair Scheduler, Capacity
    Scheduler.
  • Exploring other alternative approaches to improve
    the cluster utilization and performance of
    MapReduce workloads.

26
Acknowledgement
  • This work is supported by the User and Domain
    driven data analytics as a Service framework
    project under the ASTAR Thematic Strategic
    Research Programme (SERC Grant No. 1021580034).

27
  •  

Thank You !
Question?
28
Accuracy Evaluation of HSim
29
Impact of Inaccuracy in Estimated Map/Reduce
Tasks Time
Write a Comment
User Comments (0)
About PowerShow.com