Job Scheduling in MapReduce - PowerPoint PPT Presentation

1 / 29
About This Presentation
Title:

Job Scheduling in MapReduce

Description:

Master (JobTracker) chooses tasks from jobs in a queue ... Throttling / Admission Control. 24. Job Scheduling in MapReduce. Open Areas. Billing and Accounting ... – PowerPoint PPT presentation

Number of Views:430
Avg rating:3.0/5.0
Slides: 30
Provided by: Jaid9
Category:

less

Transcript and Presenter's Notes

Title: Job Scheduling in MapReduce


1
Job Scheduling in MapReduce
  • Jaideep Dhok

jaideep_at_research.iiit.ac.in
2
Hadoop Terminology
  • JobTracker, TaskTracker
  • HDFS
  • Task, Job
  • Slot
  • Queue

3
MapReduce workflow
  • Workers (TaskTrackers) ask for tasks,
  • Master (JobTracker) chooses tasks from jobs in a
    queue and assigns them to workers

4
Issues in MapReduce scheduling
  • Maps before Reduce
  • Data Locality
  • Intermediate Data

5
Issues in MapReduce scheduling
  • Speculative Execution
  • Shared Environments
  • Example Facebook, 3200 jobs/day, average job has
    50 maps.
  • Preemption
  • Job/Task Recovery

6
Data Locality
  • Try to allocate data local tasks, if available
  • If not, assign a task anyway
  • Problem?
  • Yes, for small jobs.

7
Shared Environments
  • Multiple users share a single cluster
  • FIFO clearly not suitable
  • Isolation is important
  • Multiplexing improves utilization

8
Expectations from a scheduling algorithm
  • Fairness
  • Avoid starvation
  • Maximize throughput
  • Minimize response time
  • Optimal use of resources

9
Native Scheduler
  • FIFO
  • Limited support for job priorities
  • Memory aware limit physical and virtual memory
    allowed for a task
  • Start reduce after all the maps have finished?

10
FAIR Scheduler
  • Pick a job that we have been most unfair to
  • Similar examples
  • Linux CFS process scheduler
  • Linux CFQ disk scheduler

11
FAIR Scheduler - Goals
  • Improve response time of small jobs
  • Guarantee resources for large jobs
  • Multiplex job execution

12
FAIR scheduler - Pools
  • Not exactly WFQ (Weighted Fair Queuing)
  • Create pools of jobs
  • Each pool guaranteed a minimum share of slots
  • Divide slots equally between jobs in a pool
  • Default, one pool per user

13
FAIR Scheduler
  • Locality wait
  • Weighted share allows job weights and pool
    weights
  • Preemption
  • FIFO pools
  • Limitations?

14
Capacity Scheduler
  • Multiple Queues
  • Queues are guaranteed a fraction of the capacity
    of the cluster
  • Useful in very large clusters.
  • Memory management

15
Capacity Scheduler
  • Select a queue if
  • It needs to reclaim capacity
  • Whose ratio of running tasks to guaranteed
    tasks is lowest
  • Within a queue, use FIFO
  • Reclaim capacity by killing tasks
  • User quotas

16
Dynamic Priority Scheduler
  • In FAIR, and Capacity schedulers users have to
    negotiate their share with the administrator
  • Not feasible when you have lots of users
  • Priority depends on lots of factors
  • Ex. Conference deadlines

17
Dynamic Priority Scheduler
  • Users given a fixed budget
  • Users define their spending rate
  • Every allocation deducts money from the users
    account

18
Dynamic Priority Scheduler
  • Supports Preemption
  • Each user has a different queue
  • Amount charged not always equal to spending rate
  • Effective rate
  • Excess allocations
  • Below guaranteed allocation

19
Dynamic Priority Scheduler
  • Economic approach to job scheduling
  • Current limitations
  • Currency is virtual, replaceable
  • Users overestimate spending rate
  • Abuse

20
Dynamic Priority Scheduler
  • Possible extensions
  • Demand and Supply driven pricing
  • Reservation
  • Trading virtual currency

21
Open Areas
  • Speculative Execution

22
Open Areas
  • Context Switching
  • Task Migration

23
Open Areas
  • Predicting job resource requirements
  • Categorizing jobs based on resource requirements

24
Open Areas
  • Throttling / Admission Control

25
Open Areas
  • Billing and Accounting
  • Scalable and accurate resource usage measurement

26
Open Areas
  • Federation
  • Job migration between Clusters

27
Open Areas
  • Enabling the MapReduce Cloud

28
Questions?
29
Thank You
Write a Comment
User Comments (0)
About PowerShow.com