MultiObjective Scheduling of Streaming Workflows - PowerPoint PPT Presentation

1 / 31
About This Presentation
Title:

MultiObjective Scheduling of Streaming Workflows

Description:

Naga Vydyanathan 1, Umit V. Catalyurek 2,3, Tahsin Kurc 2, P. Sadayappan 1 and Joel Saltz 1,2 ... H. Casanova, D. Zagorodnov, F. Berman, and A. Legrand. ... – PowerPoint PPT presentation

Number of Views:46
Avg rating:3.0/5.0
Slides: 32
Provided by: Lang57
Learn more at: http://bmi.osu.edu
Category:

less

Transcript and Presenter's Notes

Title: MultiObjective Scheduling of Streaming Workflows


1
Multi-ObjectiveScheduling of Streaming Workflows
2nd Scheduling in Aussois Workshop May 18-21,
2008
Bi-criteriaScheduling of Streaming Workflows
  • Naga Vydyanathan 1, Umit V. Catalyurek 2,3,
  • Tahsin Kurc 2, P. Sadayappan 1 and Joel Saltz 1,2
  • 1 Dept. of Computer Science Engineering
  • 2 Dept. of Biomedical Informatics
  • 3 Dept. of Electrical Computer Engineering
  • The Ohio State University

2
Current and Emerging Applications
Satellite Data Processing
High Energy Physics
Quantum Chemistry
DCE-MRI Analysis
Image Processing
Multimedia
Video Surveillance
Montage
3
Challenges
  • Complex and diverse processing structures

Data Analysis Applications
Task
File
Sequential or Parallel Task
4
Challenges
  • Complex and diverse processing structures
  • Varied parallelism

5
Challenges
  • Complex and diverse processing structures
  • Varied parallelism
  • Bag-of-tasks applications task-parallelism

6
Challenges
  • Complex and diverse processing structures
  • Varied parallelism

7
Challenges
  • Complex and diverse processing structures
  • Varied parallelism
  • Bag-of-tasks task-parallelism
  • Non-streaming workflows task- and
    data-parallelism

8
Challenges
  • Complex and diverse processing structures
  • Varied parallelism

9
Challenges
  • Complex and diverse processing structures
  • Varied parallelism
  • Bag-of-tasks task-parallelism
  • Non-streaming workflows task- and
    data-parallelism
  • Streaming workflows task-, data- and
    pipelined-parallelism

10
Challenges
  • Different performance criteria
  • Bag-of-tasks batch execution time CCGrid05,
    HCW05, JSSPP06, HPDC06
  • Non-streaming workflows makespan ICPP05,
    HCW06, ICPP06, Cluster06
  • Streaming workflows latency, throughput SC02,
    EuroPar07, ICPP08
  • Significant communication/data transfer overheads

11
Scheduling Streaming Workflows
Data Analysis Applications
Bag-of-Tasks Applications
Workflows
Non-streaming
Streaming
12
Scheduling Streaming Workflows
  • Image processing, multimedia, computer vision
    applications often act on a stream of input data
  • Scheduling challenges
  • Multiple performance criteria
  • Latency (time to process one data item)
  • Throughput (aggregate rate of processing)
  • Multiple forms of parallelism
  • Pipelined parallelism
  • Task parallelism
  • Data parallelism

13
An Example Pipelined Schedule
10
T1
8
12
T4(k-2)
T4(2)
T3
T2
T3
T4(1)
T3(2)
T3(2)
T4
15
T4
T3(1)
T3(k-1)
T3(3)
T3(1)
T3(1)
T2(2)
T2(1)
T2(k-1)
T2(3)
T2(1)
T2(1)
Throughput0.1 Latency37
T1(1)
T1(2)
T1(3)
T1(k)
T1(4)
T1(2)
Pipelined Parallelism
Task Parallelism
Data Parallelism
14
Optimizing Latency while meeting Throughput
Constraints
  • Given
  • A workflow DAG with runtime and data volume
    estimates
  • A collection of homogeneous processors
  • A throughput constraint
  • Goal
  • Generate a schedule that meets the throughput
    constraint while minimizing workflow latency

This requires leveraging pipelined, task and data
parallelism in a co-ordinated manner
15
Pipelined Scheduling Heuristic
  • Three-phase approach
  • Phase 1 Satisfying the throughput requirement
  • Assumes unbounded number of processors
  • Employs clustering, replication and duplication
    to meet throughput requirement
  • Phase 2 Limiting the number of processors used
  • Merges task clusters to reduce the number of
    processors used until a feasible schedule is
    obtained
  • Preference given to decisions that minimize
    latency
  • Phase 3 Minimizing the workflow latency
  • Minimizes communication costs along the critical
    path by duplication and clustering

16
Task Clustering
17
Task Replication
  • Throughput 0.1
  • Replication for
  • Improve computation throughput
  • Improve communication throughput

10
T1
T1
18
T3
T2
T3
T2
12
8
T4
15
T4
18
Task Duplication
Sample application DAG (a) Schedule without
duplication (b) Schedule with duplication
19
Duplication based Scheduling of Streaming
Workflows
  • In the context of streaming workflows,
  • duplication can be used to avoid bottleneck data
    transfers without compromising task parallelism
  • Minimize workflow latency

5
Let T0.1 and P4
T1
T1
15
15
5
T3
T2
5
8
8
T4
10
20
Duplication-based Scheduling of Streaming
Workflows
  • However,
  • Duplication can require more processors due to
    redundant computation
  • Depends on weight of duplicated task and
    throughput constraint
  • Extra communication to broadcast input data to
    duplicates
  • May increase latency too!
  • Selectively duplicate ancestors
  • Duplication is done only if
  • There are available processors
  • It proves beneficial in terms of latency
  • It does not involve expensive communications that
    violate throughput requirement

21
Estimating Throughput and Latency
  • Execution Model
  • Realistic k-port communication model
  • Communication computation overlaps
  • Throughput Estimate min (CompRate, CommRate)
  • Computation Rate (CompRate) Estimate
  • Min Procs(Ci) / exec_time(Ci) for all Cis
  • Communication Rate (CommRate) Estimate
  • Greedy priority based scheduling of communication
    to channels ports
  • Min ParallelTransfers (trj)/ min_cycle_time
    (trj) for all trj
  • Latency Estimate
  • Takes into account both communication and
    computation dependencies

22
An Example
8
T1
  • P 4, Throughput constraint T 0.1
  • Satisfying the throughput
  • nr(T1) 0.8, nr(T2)1, nr(T3)0.4, nr(T4)0.5,
    nr(T5)0.4, nr(T6)0.2
  • Expensive communications eT1T2, eT3T4, eT3T5
  • Cluster T1 and T2
  • Duplicate T3
  • Limiting the number of processors
  • Pused 5
  • Two options to reduce Pused
  • Merging T1, T2 and T6
  • Merging T3, T5 and T6
  • Merge T3, T5 and T6 -gt reduces latency
  • Minimizing latency
  • Nothing to be done!

12
6
6
T3
T2
10
4
T3
4
11
12
5
T4
T5
4
4
3
8
T6
2
23
The Pipelined Schedule
8
Throughput 0.1, Latency 28
T1
6
6
T3
T2
10
4
T3
4
T3 (1)
T5 (1)
T6 (1)
T3 (2)
T5 (2)
T6 (2)
T3 (3)
T5 (3)
T6 (3)
5
T4
T5
4
4
T3 (1)
T4 (1)
T3 (2)
T4 (2)
T3 (3)
T4 (3)
3
Processors
P1 P2 P3 P4
T1(2)
T2(2)
T1(4)
T2(4)
T6
T1(1)
T2(1)
T1(3)
T2(3)
2
8
18
28
38
48
10
14
23
33
Time
24
Performance on Synthetic Benchmarks
(a) CCR 0.1
(b) CCR 1
  • As CCR is increased, more instances where FCP and
    EXPERT do not meet throughput requirement
  • Proposed approach always meet throughput
    requirement and produces lower latencies

(c) CCR 10
25
Benefit of Task Duplication
(a) CCR 1
(b) CCR 10
  • As throughput constraint relaxed, greater benefit
    observed (more processors for duplication)
  • For negligible throughput constraint, clustering
    doesnt have much adverse impact on latency

26
Performance on Applications
(a)
(b)
Performance of MPEG Video Compression on 32
processors, (a) Latency Ratio and (b) Utilization
Ratio
  • MPEG frames are processed in order of arrival
    no replication
  • Throughput constraint assumed to be reciprocal of
    weight of largest task
  • Proposed approach yields similar latency as FCP,
    but has lower resource utilization
  • Proposed approach generates lower latency than
    EXPERT

27
Throughput Optimization under Latency Constraint
  • Relation between throughput and latency
  • Monotonically increasing
  • Binary search algorithm on the inverse problem
  • L latency required
  • If L gt L_max, output T_max
  • If L_min lt L lt L_max, do binary search
    (TT_max/2.)
  • However, as we use heuristics, the monotonic
    relation is not guaranteed
  • We use look-ahead techniques to avoid local optima

(L_min, 0)
(L_max, T_max)
28
Throughput Optimization under Latency Constraint
(a) CCR 0.1
(b) CCR 1
  • Proposed approach generates schedules with larger
    throughputs that meet the latency constraints
  • Meets latency constraints even when other schemes
    fail

28
29
Related Work
  • Bag-of-Tasks applications
  • H. Casanova, D. Zagorodnov, F. Berman, and A.
    Legrand. Heuristics for scheduling parameter
    sweep applications in grid environments. HCW00.
  • Arnaud Giersch, Yves Robert, and Frédéric Vivien.
    Scheduling tasks sharing files on heterogeneous
    master-slave platforms. Journal of Systems
    Architecture, 2006.
  • K Kaya and C Aykanat. Iterative-improvement-based
    heuristics for adaptive scheduling of tasks
    sharing files on heterogeneous master-slave
    environments. IEEE TPDS, 2006.
  • Non-streaming workflows
  • S Ramaswamy, S Sapatnekar, and P Banerjee. A
    framework for exploiting task and data
    parallelism on distributed memory multicomputers.
    IEEE TPDS 1997.
  • A. Radulescu and A. van Gemund. A low-cost
    approach towards mixed task and data parallel
    scheduling. ICPP, 2001.
  • A Radulescu, C Nicolescu, A J. C. van Gemund, and
    P Jonker. Cpr Mixed task and data parallel
    scheduling for distributed systems. IPDPS, 2001.
  • K. Aida and H. Casanova. Scheduling
    Mixed-Parallel Applications with Advance
    Reservations. HPDC, 2008.
  • Streaming workflows
  • F. Guirado, A.Ripoll, C. Roig, and E. Luque.
    Optimizing latency under throughput requirements
    for streaming applications on cluster execution.
    Cluster Computing, 2005.
  • Matthew Spencer, Renato Ferreira, Michael Beynon,
    Tahsin Kurc, Umit Catalyurek, Alan Sussman, and
    Joel Saltz. Executing multiple pipelined data
    analysis operations in the grid. SC, 2002
  • Anne Benoit and Yves Robert. Mapping pipeline
    skeletons onto heterogeneous platforms. Technical
    Report LIP RR-2006-40, 2006.
  • Anne Benoit and Yves Robert. Complexity results
    for throughput and latency optimization of
    replicated and data-parallel workflows. Technical
    Report LIP RR-2007-12, 2007.
  • Anne Benoit, Harald Kosch, Veronika Rehn-Sonigo
    and Yves Robert. Optimizing latency and
    reliability of pipeline workflow applications,
    Technical Report LIP RR-2008-12, 2008.

30
Conclusions Future Work
  • Streaming Workflows
  • Co-ordinated use of task-, data- and
    pipelined-parallelism
  • Multiple performance objectives (latency and
    throughput)
  • Consistently meets throughput requirements
  • Lower latency schedules using fewer resources
  • Larger throughput schedules while meeting latency
    requirements
  • Future Work
  • Scheduling for multi-core clusters
  • Deeper memory hierarchies
  • Power-aware approaches
  • Fault-tolerant approaches

31
Thanks
  • Questions?
  • Contact Info Umit Catalyurek
  • umit_at_bmi.osu.edu
  • http//bmi.osu.edu/umit
  • OSU Dept. of Biomedical Informatics
    http//bmi.osu.edu
Write a Comment
User Comments (0)
About PowerShow.com