Title: Scheduling Data-Intensive Workflows
1Scheduling Data-Intensive Workflows
- Tim H. Wong, Daniel Zinn, Bertram Ludäscher
- (UC Davis)
2Outline
- Problem motivation
- Assumptions
- Cost model
- Problem formalization
- Different simplifications and their complexity
- Prototypical Java implementation for Kepler
- Summary
3Motivation Distributed Execution of Scientific
Workflows
4Motivation Distributed Execution of Scientific
Workflows
- Process a set of data on a set of
machinesGOALMinimize WF-Execution
time!Allocation Problem Which actors are
computed on which hosts?
5(No Transcript)
6Cost Model
- Communication Time TC
- Function Execution Time TE
- Total Time TT TC TEShipping and Handling
ProblemSchedule all tasks such that the total
time is minimal
7Problem Variants and Complexities
Reduction from Task Scheduling Problem ERLA94
Shipping and Handling Problem (SHP)
Communication Cost Non-uniform Function
Execution Cost Non-uniform Complexity
NP-complete
Task Handling Problem (THP)
Data Shipping Problem (DSP)
Communication Cost Zero Function Execution Cost
Non-uniform Complexity NP-complete
Communication Cost Non-uniform Function
Execution Cost Zero Complexity NP-complete
Reduction from Multiprocessor Scheduling Problem
KA99
Reduction from 1-Multiterminal Cut
8easy-DSP Uniform Transfer Rate, Uniform Data
Size
- Given
- Directed Acyclic Graph,Set of Colors
- Some vertices are already colored
- Edge Weight 1, if two adjacent vertices are of
different colorsEdge Weight 0, otherwise - TASK
- Color the rest of the vertices such that total
weight is minimal!
4
Cost Model Minimize Total Shipped Volume!
91 - Multi-Terminal CUT
- Given
- Undirected Graph G (V,E)
- Set of Terminals S V
- Edge Weights 1
- TASK
- Find a multi-way cut of G with aminimum number
of edges
4
Minimize edges between different terminals!
NP-Complete for more than 3 Terminals!
10Reduction 1-MTC lt DSP
?
Order graph Color terminals
4
4
1-MTC
DSP
11Reduction 1-MTC lt DSP
?
!
1
1
1
1
1
1
1
1
1
4
4
1-MTC
DSP
12(No Transcript)
13NP-Hard, ...But Need to solve
- Greedy Algorithm
- Dynamic Programing Algorithm
- Investigate Approximation Algorithms for
MTC/related !
14Prototypical Implementation ...
abstract only some nodes assigned
concrete all nodes assigned
scheduling
15Prototypical Implementation ... in Kepler!
Abstract Workflow ...
SCHEDULING
16Prototypical Implementation ... in Kepler!
Concrete Workflow ...
17Future Work
- Use Heuristics about looping to guess
multiplicities(then not ACYCLIC any more!) - Investigate approximation algorithms with error
guarantees for 1-MTC gt try to apply for DSP - ALSO Relevant for COMAD Workflowscan be
compiled into a low-level conventional WF
18Summary
- Bad news
- Scheduling is hard
- DSP is hard (for BEST plans)
- Good news
- Finding a quite good plan is easy
- Greedy/Dynamic Algorithms
- Open Problems
- Approximation Quality of simple algorithms?
- When do they perform badly?
- Does this occur often in real-life workflows?
19References
20Thank You. Questions?