A Stable Broadcast Algorithm - PowerPoint PPT Presentation

About This Presentation
Title:

A Stable Broadcast Algorithm

Description:

Iteratively construct spanning trees. Create a spanning tree (Tn) by tracing every destination. Set the throughput (Vn) to the bottleneck bandwidth in Tn ... – PowerPoint PPT presentation

Number of Views:48
Avg rating:3.0/5.0
Slides: 38
Provided by: KEI151
Category:

less

Transcript and Presenter's Notes

Title: A Stable Broadcast Algorithm


1
A Stable Broadcast Algorithm
  • Kei Takahashi Hideo Saito
  • Takeshi Shibata Kenjiro Taura
  • (The University of Tokyo, Japan)

CCGrid 2008 - Lyon, France
2
Broadcasting Large Messages
  • To distribute the same, but large data to many
    nodes
  • Ex content delivery
  • Widely used in parallel processing

Data
Data
Data
Data
Data
lt A Stable Broadcast Algorithm gt Kei Takahashi,
Hideo Saito, Takeshi Shibata and Kenjiro Taura
3
Problem of Broadcast
  • Usually, in a broadcast transfer, the source can
    deliver much less data than a single transfer
    from the source

100
S
D
25
25
25
25
S
D
lt A Stable Broadcast Algorithm gt Kei Takahashi,
Hideo Saito, Takeshi Shibata and Kenjiro Taura
4
Problem of Slow Nodes
  • Pipeline-manner transfers improve the performance
  • Even in a pipeline transfer, nodes with small
    bandwidth (slow nodes) may degrade receiving
    bandwidth of all other nodes

100
10
10
100
100
10
10
?
?
?
?
5
Contributions
  • Propose an idea of Stable Broadcast
  • In a stable broadcast
  • Slow nodes never degrade receiving bandwidth to
    other nodes
  • All nodes receive the maximum possible amount of
    data

6
Contributions (cont.)
  • Propose a stable broadcast algorithm for tree
    topologies
  • Proved to be stable in a theoretical model
  • Improve performances in general graph networks
  • In a real-machine experiment, our algorithm
    achieved 2.5 times the aggregate bandwidth than
    the previous algorithm (FPFR)

7
Agenda
  • Introduction
  • Problem Settings
  • Related Work
  • Proposed Algorithm
  • Evaluation
  • Conclusion

lt A Stable Broadcast Algorithm gt Kei Takahashi,
Hideo Saito, Takeshi Shibata and Kenjiro Taura
8
Problem Settings
  1. Target large message broadcast
  2. Only computational nodes handle messages

lt A Stable Broadcast Algorithm gt Kei Takahashi,
Hideo Saito, Takeshi Shibata and Kenjiro Taura
9
Problem Settings (cont.)
  • Only bandwidth matters for large messages
  • (Transfer time) (Latency)
  • Bandwidth is only limited by link capacities
  • Assume that nodes and switches have enough
    processing throughput

1GB
(Message Size)
(Bandwidth)
50msec
1Gbps
99
10
Problem Settings (cont.)
  • Bandwidth-annotated topologies are given in
    advance
  • Bandwidth and topologies can be rapidly inferred
  • - Shirai et al. A Fast Topology Inference - A
    building block for network-aware parallel
    computing. (HPDC 2007)
  • - Naganuma et al. Improving Efficiency of Network
    Bandwidth Estimation Using Topology Information
    (SACSIS 2008, Tsukuba, Japan)

80
10
30
100
40
11
Evaluation of Broadcast
  • Previous algorithms evaluated broadcast by
    completion time
  • However, it cannot evaluate the effect of slowly
    receiving nodes
  • It is desirable that each node receives as much
    data as possible
  • Aggregate Bandwidth is a more reasonable
    evaluation criterion in many cases

12
Definition of Stable Broadcast
  • All nodes receive maximum possible bandwidth
  • Receiving bandwidth for each node does not lessen
    by adding other nodes to the broadcast

Single Transfer
120
D2
Broadcast
10
120
100
100
D0
D1
D2
D3
13
Properties of Stable broadcast
  • Maximize aggregate bandwidth
  • Minimize completion time

14
Agenda
  • Introduction
  • Problem Settings
  • Related Work
  • Proposed Algorithm
  • Evaluation
  • Conclusion

lt A Stable Broadcast Algorithm gt Kei Takahashi,
Hideo Saito, Takeshi Shibata and Kenjiro Taura
15
Single-Tree Algorithms
  • Flat tree
  • The outgoing link from the source becomes a
    bottleneck
  • Random Pipeline
  • Some links used many times become bottlenecks
  • Depth-first Pipeline
  • Each link is only used once, but fast nodes
    suffer from slow nodes
  • Dijkstra
  • Fast nodes do not suffer from slow nodes, but
    some link are used many times

Flat Tree
Random Pipeline
Dijkstra
Depth-First (FPFR)
lt A Stable Broadcast Algorithm gt Kei Takahashi,
Hideo Saito, Takeshi Shibata and Kenjiro Taura
16
FPFR Algorithm
  • FPFR (Fast Parallel File Replication) has
    improved the aggregate bandwidth from algorithms
    that use only one tree
  • Idea
  • (1) Construct multiple spanning trees
  • (2) Use these trees in parallel

Izmailov et al. Fast Parallel File
Replication in Data Grid. (GGF-10, March
2004.)
17
Tree constructions in FPFR
  • Iteratively construct spanning trees
  • Create a spanning tree (Tn) by tracing every
    destination
  • Set the throughput (Vn) to the bottleneck
    bandwidth in Tn
  • Subtract Vn from the remaining bandwidth of each
    link

First Tree (T1)
V2
Bottleneck
V1
18
Data transfer with FPFR
  • Each tree sends different fractions of data in
    parallel
  • The proportion of data sent through each tree may
    be optimized by linear programming (Balanced
    Multicasting)

T1 Sends the former part
T2 sends the latter part
V2
V1
den Burger et al. Balanced Multicasting
High-throughput Communication for Grid
Applications (SC 2005)
19
Problems of FPFR
  • In FPFR, slow nodes degrade receiving bandwidth
    to other nodes
  • For tree topologies, FPFR only outputs one
    depth-first pipeline, which cannot utilize the
    potential network performance

?
Bottleneck
?
?
?
20
Agenda
  • Introduction
  • Problem Settings
  • Related Work
  • Our Algorithm
  • Evaluation
  • Conclusion

lt A Stable Broadcast Algorithm gt Kei Takahashi,
Hideo Saito, Takeshi Shibata and Kenjiro Taura
21
Our Algorithm
  • Modify FPFR algorithm
  • Create both spanning trees and partial trees
  • Stable for tree topologies whose links have the
    same bandwidth in both directions

V
V
lt A Stable Broadcast Algorithm gt Kei Takahashi,
Hideo Saito, Takeshi Shibata and Kenjiro Taura
22
Tree Constructions
  • Iteratively construct trees
  • Create a tree Tn by tracing every destination
  • Set the throughput Vn to the bottleneck in Tn
  • Subtract Vn from the remaining capacities

Throughput of T1
T1 First Tree (Spanning)
V1
S
A
B
C
T3 Third Tree(Partial Tree)
T2 Second Tree (Partial Tree)
V3
V2
S
A
B
C
S
A
B
C
23
Data Transfer
  • Send data proportional to the tree throughput Vn
  • Example
  • Stage1 use T1, T2 and T3
  • Stage2 use T1 and T2 to send data previously
    sent by T3
  • Stage3 use T1 to send data previously sent by T2

T3
(V3)
T2
(V2)
T1
(V1)
A
B
S
C
24
Properties of Our Algorithm
  • Our algorithm is Stable for tree topologies
    (whose links have the same capacities in both
    directions)
  • Every node receives maximum bandwidth
  • For any topology, it achieves greater aggregate
    bandwidth than the baseline algorithm (FPFR)
  • Fully utilize link capacity by using partial
    trees
  • It has small calculation cost to create a
    broadcast plan

25
Agenda
  • Introduction
  • Problem Settings
  • Related Work
  • Proposed Algorithm
  • Evaluation
  • Conclusion

lt A Stable Broadcast Algorithm gt Kei Takahashi,
Hideo Saito, Takeshi Shibata and Kenjiro Taura
26
(1) Simulations
  • Simulated 5 broadcast algorithms using a real
    topology
  • Compared the aggregate bandwidth of each method
  • Many bandwidth distributions
  • Broadcast to 10, 50, and 100 nodes
  • 10 kinds of conditions (src, dest)

lt A Stable Broadcast Algorithm gt Kei Takahashi,
Hideo Saito, Takeshi Shibata and Kenjiro Taura
27
Compared Algorithms
Random
Flat Tree
Depth-First (FPFR)
Dijkstra
and OURS
28
Result of Simulations
  • Mixed two kinds of Links (100 and 1000)
  • Vertical axis speedup from FlatTree
  • 40 times more than random, 3 times more than
    depth-first (FPFR) with 100 nodes

100
1000
100
100
1000
1000
29
Result of Simulations (cont.)
  • Tested 8 bandwidth distributions
  • Uniform distribution (500-1000)
  • Uniform distribution (100-1000)
  • Mixed 100 and 1000 links
  • Uniform distribution (500-100) between switches
  • (for each distribution, tested two conditions
    that bandwidth of both directions are the same
    and different)
  • Our method achieved the largest bandwidth in 7/8
    cases
  • Large improvement especially in large bandwidth
    variance
  • In a uniform distribution (100-1000) and link
    bandwidth in two directions are different,
    Dijkstra achieved 2 more aggregate bandwidth

30
(2) Real Machine Experiment
  • Performed broadcasts in 4 clusters
  • Number of destinations10, 47 and 105 nodes
  • Bandwidths of each link (10M - 1Gbps)
  • Compared the aggregate bandwidth in 4 algorithms
  • Our algorithm
  • Depth-first (FPFR)
  • Dijkstra
  • Random (Best among 100 trials)

31
Theoretical Maximum Aggregate Bandwidth
  • Also, we calculated the theoretical maximum
    aggregate bandwidth
  • The total of the receiving bandwidth in a case of
    separate direct transfer from the source to each
    destination

10
120
100
100
D0
D1
D2
D3
32
Evaluation of Aggregate Bandwidth
  • For 105 nodes broadcast, 2.5 times more bandwidth
    than the baseline algorithm DepthFirst (FPFR)
  • However, our algorithm stayed 50-70 the
    aggregate bandwidth compared to the theoretical
    maximum
  • Computational nodes cannot fully utilize up/down
    network

700
700
33
Evaluation of Stability
  • Compared aggregate bandwidth of 9 nodes
    before/after adding one slow node
  • Unlike DepthFirst(FPFR), existing nodes do not
    suffer from adding a slow node in our algorithm
  • Achieved 1.6 times bandwidth than Dijkstra

Slow
34
Agenda
  • Introduction
  • Problem Settings
  • Related Work
  • Our Algorithm
  • Evaluation
  • Conclusion

lt A Stable Broadcast Algorithm gt Kei Takahashi,
Hideo Saito, Takeshi Shibata and Kenjiro Taura
35
Conclusion
  • Introduced the notion of Stable Broadcast
  • Slow nodes never degrade receiving bandwidth of
    fast nodes
  • Proposed a stable broadcast algorithm for tree
    topologies
  • Theoretically proved
  • 2.5 times the aggregate bandwidth in real
    machine experiments
  • Confirmed speedup in simulations with many
    different conditions

lt A Stable Broadcast Algorithm gt Kei Takahashi,
Hideo Saito, Takeshi Shibata and Kenjiro Taura
36
Future Work
  • Algorithm that maximizes aggregate bandwidth in
    general graph topologies
  • Algorithm that changes relay schedule by
    detecting bandwidth fluctuations

lt A Stable Broadcast Algorithm gt Kei Takahashi,
Hideo Saito, Takeshi Shibata and Kenjiro Taura
37
Future work
  • Algorithm that maximizes aggregate bandwidth in
    general graph topologies
  • Algorithm that changes relay schedule by
    detecting bandwidth fluctuations

38
All the graphs
39
Broadcast with BitTorrent
  • BitTorrent gradually improves the transfer
    schedule by adaptively choosing the parent node
  • Since relaying structure created by BitTorrent
    has many branches, these links may become
    bottlenecks

Bottleneck Link
Transfer tree snapshot
Wei et al. Scheduling Independent Tasks
Sharing Large Data Distributed with BitTorrent.
(In GRID 05)
40
Simulation 1
  • Uniform distribution (100-1000) between switches
  • Vertical axis speedup from FlatTree
  • 36 times more than FlatTree, 1.2 times more than
    DepthFirst (FPFR) for 100-nodes broadcast

1001000
1001000
1000
1000
41
Topology-unaware pipeline
  • Trace all the destinations from the source
  • Some links used by many transfers become
    bottlenecks

Bottleneck
42
Depth-first Pipeline
  • Construct a depth-first pipeline by using
    topology information
  • Avoid link sharing by using each link only once
  • Minimize the completion time in a tree topology
  • Slow nodes degrade the performance of other nodes

Slow Node
Shirai et al. A Fast Topology Inference - A
building block for network-aware parallel
computing. (HPDC 2007)
43
Dijkstra Algorithm
  • Construct a relaying structure in a greedy manner
  • Add a node reachable in the maximum bandwidth one
    by one
  • Effects of slow nodes are small
  • Some links may be used by many transfers, may
    become bottlenecks

Bottleneck Link
Wang et al. A novel data grid coherence
protocol using pipeline-based aggressive copy
method. (GPC, pages 484495, 2007)
Write a Comment
User Comments (0)
About PowerShow.com