A Stable Broadcast Algorithm - PowerPoint PPT Presentation

About This Presentation

Title:

A Stable Broadcast Algorithm

Description:

Iteratively construct spanning trees. Create a spanning tree (Tn) by tracing every destination. Set the throughput (Vn) to the bottleneck bandwidth in Tn ... – PowerPoint PPT presentation

Number of Views:48

Avg rating:3.0/5.0

Slides: 38

Provided by: KEI151

Category:

more less

Transcript and Presenter's Notes

Title: A Stable Broadcast Algorithm

1
A Stable Broadcast Algorithm

Kei Takahashi Hideo Saito
Takeshi Shibata Kenjiro Taura
(The University of Tokyo, Japan)

CCGrid 2008 - Lyon, France
2
Broadcasting Large Messages

To distribute the same, but large data to many
nodes
Ex content delivery
Widely used in parallel processing

Data
Data
Data
Data
Data
lt A Stable Broadcast Algorithm gt Kei Takahashi,
Hideo Saito, Takeshi Shibata and Kenjiro Taura
3
Problem of Broadcast

Usually, in a broadcast transfer, the source can
deliver much less data than a single transfer
from the source

100
S
D
25
25
25
25
S
D
lt A Stable Broadcast Algorithm gt Kei Takahashi,
Hideo Saito, Takeshi Shibata and Kenjiro Taura
4
Problem of Slow Nodes

Pipeline-manner transfers improve the performance
Even in a pipeline transfer, nodes with small
bandwidth (slow nodes) may degrade receiving
bandwidth of all other nodes

100
10
10
100
100
10
10
?
?
?
?
5
Contributions

Propose an idea of Stable Broadcast
In a stable broadcast
Slow nodes never degrade receiving bandwidth to
other nodes
All nodes receive the maximum possible amount of
data

6
Contributions (cont.)

Propose a stable broadcast algorithm for tree
topologies
Proved to be stable in a theoretical model
Improve performances in general graph networks
In a real-machine experiment, our algorithm
achieved 2.5 times the aggregate bandwidth than
the previous algorithm (FPFR)

7
Agenda

Introduction
Problem Settings
Related Work
Proposed Algorithm
Evaluation
Conclusion

lt A Stable Broadcast Algorithm gt Kei Takahashi,
Hideo Saito, Takeshi Shibata and Kenjiro Taura
8
Problem Settings

Target large message broadcast
Only computational nodes handle messages

lt A Stable Broadcast Algorithm gt Kei Takahashi,
Hideo Saito, Takeshi Shibata and Kenjiro Taura
9
Problem Settings (cont.)

Only bandwidth matters for large messages
(Transfer time) (Latency)
Bandwidth is only limited by link capacities
Assume that nodes and switches have enough
processing throughput

1GB
(Message Size)
(Bandwidth)
50msec
1Gbps
99
10
Problem Settings (cont.)

Bandwidth-annotated topologies are given in
advance
Bandwidth and topologies can be rapidly inferred
- Shirai et al. A Fast Topology Inference - A
building block for network-aware parallel
computing. (HPDC 2007)
- Naganuma et al. Improving Efficiency of Network
Bandwidth Estimation Using Topology Information
(SACSIS 2008, Tsukuba, Japan)

80
10
30
100
40
11
Evaluation of Broadcast

Previous algorithms evaluated broadcast by
completion time
However, it cannot evaluate the effect of slowly
receiving nodes
It is desirable that each node receives as much
data as possible
Aggregate Bandwidth is a more reasonable
evaluation criterion in many cases

12
Definition of Stable Broadcast

All nodes receive maximum possible bandwidth
Receiving bandwidth for each node does not lessen
by adding other nodes to the broadcast

Single Transfer
120
D2
Broadcast
10
120
100
100
D0
D1
D2
D3
13
Properties of Stable broadcast

Maximize aggregate bandwidth
Minimize completion time

14
Agenda

Introduction
Problem Settings
Related Work
Proposed Algorithm
Evaluation
Conclusion

lt A Stable Broadcast Algorithm gt Kei Takahashi,
Hideo Saito, Takeshi Shibata and Kenjiro Taura
15
Single-Tree Algorithms

Flat tree
The outgoing link from the source becomes a
bottleneck
Random Pipeline
Some links used many times become bottlenecks
Depth-first Pipeline
Each link is only used once, but fast nodes
suffer from slow nodes
Dijkstra
Fast nodes do not suffer from slow nodes, but
some link are used many times

Flat Tree
Random Pipeline
Dijkstra
Depth-First (FPFR)
lt A Stable Broadcast Algorithm gt Kei Takahashi,
Hideo Saito, Takeshi Shibata and Kenjiro Taura
16
FPFR Algorithm

FPFR (Fast Parallel File Replication) has
improved the aggregate bandwidth from algorithms
that use only one tree
Idea
(1) Construct multiple spanning trees
(2) Use these trees in parallel

Izmailov et al. Fast Parallel File
Replication in Data Grid. (GGF-10, March
2004.)
17
Tree constructions in FPFR

Iteratively construct spanning trees
Create a spanning tree (Tn) by tracing every
destination
Set the throughput (Vn) to the bottleneck
bandwidth in Tn
Subtract Vn from the remaining bandwidth of each
link

First Tree (T1)
V2
Bottleneck
V1
18
Data transfer with FPFR

Each tree sends different fractions of data in
parallel
The proportion of data sent through each tree may
be optimized by linear programming (Balanced
Multicasting)

T1 Sends the former part
T2 sends the latter part
V2
V1
den Burger et al. Balanced Multicasting
High-throughput Communication for Grid
Applications (SC 2005)
19
Problems of FPFR

In FPFR, slow nodes degrade receiving bandwidth
to other nodes
For tree topologies, FPFR only outputs one
depth-first pipeline, which cannot utilize the
potential network performance

?
Bottleneck
?
?
?
20
Agenda

Introduction
Problem Settings
Related Work
Our Algorithm
Evaluation
Conclusion

lt A Stable Broadcast Algorithm gt Kei Takahashi,
Hideo Saito, Takeshi Shibata and Kenjiro Taura
21
Our Algorithm

Modify FPFR algorithm
Create both spanning trees and partial trees
Stable for tree topologies whose links have the
same bandwidth in both directions

V
V
lt A Stable Broadcast Algorithm gt Kei Takahashi,
Hideo Saito, Takeshi Shibata and Kenjiro Taura
22
Tree Constructions

Iteratively construct trees
Create a tree Tn by tracing every destination
Set the throughput Vn to the bottleneck in Tn
Subtract Vn from the remaining capacities

Throughput of T1
T1 First Tree (Spanning)
V1
S
A
B
C
T3 Third Tree(Partial Tree)
T2 Second Tree (Partial Tree)
V3
V2
S
A
B
C
S
A
B
C
23
Data Transfer

Send data proportional to the tree throughput Vn
Example
Stage1 use T1, T2 and T3
Stage2 use T1 and T2 to send data previously
sent by T3
Stage3 use T1 to send data previously sent by T2

T3
(V3)
T2
(V2)
T1
(V1)
A
B
S
C
24
Properties of Our Algorithm

Our algorithm is Stable for tree topologies
(whose links have the same capacities in both
directions)
Every node receives maximum bandwidth
For any topology, it achieves greater aggregate
bandwidth than the baseline algorithm (FPFR)
Fully utilize link capacity by using partial
trees
It has small calculation cost to create a
broadcast plan

25
Agenda

Introduction
Problem Settings
Related Work
Proposed Algorithm
Evaluation
Conclusion

lt A Stable Broadcast Algorithm gt Kei Takahashi,
Hideo Saito, Takeshi Shibata and Kenjiro Taura
26
(1) Simulations

Simulated 5 broadcast algorithms using a real
topology
Compared the aggregate bandwidth of each method
Many bandwidth distributions
Broadcast to 10, 50, and 100 nodes
10 kinds of conditions (src, dest)

lt A Stable Broadcast Algorithm gt Kei Takahashi,
Hideo Saito, Takeshi Shibata and Kenjiro Taura
27
Compared Algorithms
Random
Flat Tree
Depth-First (FPFR)
Dijkstra
and OURS
28
Result of Simulations

Mixed two kinds of Links (100 and 1000)
Vertical axis speedup from FlatTree
40 times more than random, 3 times more than
depth-first (FPFR) with 100 nodes

100
1000
100
100
1000
1000
29
Result of Simulations (cont.)

Tested 8 bandwidth distributions
Uniform distribution (500-1000)
Uniform distribution (100-1000)
Mixed 100 and 1000 links
Uniform distribution (500-100) between switches
(for each distribution, tested two conditions
that bandwidth of both directions are the same
and different)
Our method achieved the largest bandwidth in 7/8
cases
Large improvement especially in large bandwidth
variance
In a uniform distribution (100-1000) and link
bandwidth in two directions are different,
Dijkstra achieved 2 more aggregate bandwidth

30
(2) Real Machine Experiment

Performed broadcasts in 4 clusters
Number of destinations10, 47 and 105 nodes
Bandwidths of each link (10M - 1Gbps)
Compared the aggregate bandwidth in 4 algorithms
Our algorithm
Depth-first (FPFR)
Dijkstra
Random (Best among 100 trials)

31
Theoretical Maximum Aggregate Bandwidth

Also, we calculated the theoretical maximum
aggregate bandwidth
The total of the receiving bandwidth in a case of
separate direct transfer from the source to each
destination

10
120
100
100
D0
D1
D2
D3
32
Evaluation of Aggregate Bandwidth

For 105 nodes broadcast, 2.5 times more bandwidth
than the baseline algorithm DepthFirst (FPFR)
However, our algorithm stayed 50-70 the
aggregate bandwidth compared to the theoretical
maximum
Computational nodes cannot fully utilize up/down
network

700
700
33
Evaluation of Stability

Compared aggregate bandwidth of 9 nodes
before/after adding one slow node
Unlike DepthFirst(FPFR), existing nodes do not
suffer from adding a slow node in our algorithm
Achieved 1.6 times bandwidth than Dijkstra

Slow
34
Agenda

Introduction
Problem Settings
Related Work
Our Algorithm
Evaluation
Conclusion

lt A Stable Broadcast Algorithm gt Kei Takahashi,
Hideo Saito, Takeshi Shibata and Kenjiro Taura
35
Conclusion

Introduced the notion of Stable Broadcast
Slow nodes never degrade receiving bandwidth of
fast nodes
Proposed a stable broadcast algorithm for tree
topologies
Theoretically proved
2.5 times the aggregate bandwidth in real
machine experiments
Confirmed speedup in simulations with many
different conditions

lt A Stable Broadcast Algorithm gt Kei Takahashi,
Hideo Saito, Takeshi Shibata and Kenjiro Taura
36
Future Work

Algorithm that maximizes aggregate bandwidth in
general graph topologies
Algorithm that changes relay schedule by
detecting bandwidth fluctuations

lt A Stable Broadcast Algorithm gt Kei Takahashi,
Hideo Saito, Takeshi Shibata and Kenjiro Taura
37
Future work

Algorithm that maximizes aggregate bandwidth in
general graph topologies
Algorithm that changes relay schedule by
detecting bandwidth fluctuations

38
All the graphs
39
Broadcast with BitTorrent

BitTorrent gradually improves the transfer
schedule by adaptively choosing the parent node
Since relaying structure created by BitTorrent
has many branches, these links may become
bottlenecks

Bottleneck Link
Transfer tree snapshot
Wei et al. Scheduling Independent Tasks
Sharing Large Data Distributed with BitTorrent.
(In GRID 05)
40
Simulation 1

Uniform distribution (100-1000) between switches
Vertical axis speedup from FlatTree
36 times more than FlatTree, 1.2 times more than
DepthFirst (FPFR) for 100-nodes broadcast

1001000
1001000
1000
1000
41
Topology-unaware pipeline