Title: Large Scale File Distribution Sequential Branching Distribution
1Large Scale File DistributionSequential
Branching Distribution
- Final Presentation
- Grad Operating Systems
- Presented by
- Chris Miller Pramita Mitra
- Dec 13, 2006
2Problem Statement
- Research requires distribution of large datasets
on distributed networks - Methods such as multicast are too complicated to
implement reliably - Tools available for file distribution
- Chirp
- Parrot
- Algorithm needed to efficiently schedule the
distribution of files
3Solution
- Using CCL storage pool as model of distributed
network - Using small, measured steps to find what aspects
of distribution work best in implementation - Sequential distribution
-
- Parallel distribution
-
Distributor
Ineffiecient use of network resources. Total
time for distribution O(n).
Stage 1
Stage 2
Stage n
Distributor
Node 1
Node 2
Node n
Total time for distribution O(n).
4Baseline Results
5Sequential Branching Distribution
Nodeset
Distributor
Thirdput
Thirdput
Stage 1
Stage 2
Stage 3
Thirdput
Stage 2
Stage 3
Stage 3
Stage 3
Total time for distribution O(log2 n)
6Best Neighbor Approximation
7Best Neighbor Approximation
8Probabilistic Weighted Average
9Best Neighbor Approximation
Data File Data File 100MB 100MB 250MB 250MB 500MB 500MB 1GB 1GB
 Reduction in  Net  Net  Net  Net
File Size Transfer Time OverHead Reduction OverHead Reduction OverHead Reduction OverHead Reduction
1MB 16.64 39.93 -23.3 15.97 0.7 7.99 8.7 3.99 12.6
2MB 28.44 44.32 -15.9 17.73 10.7 8.86 19.6 4.43 24.0
3MB 29.12 50.16 -21.0 20.07 9.1 10.03 19.1 5.02 24.1
4MB 23.20 55.61 -32.4 22.24 1.0 11.12 12.1 5.56 17.6
5MB 27.39 67.07 -39.7 26.83 0.6 13.41 14.0 6.71 20.7
Latency 16.59 15.55 1.0 6.22 10.4 3.11 13.5 1.56 15.0
10Results
11Conclusions
- A fast and reliable distribution method is
possible with simple file transfer methods - Distribution system is fault tolerant for all
nodes except distributor node - Latency measurement
- moderate indicator of transfer rate
- low overhead
- Small file transfer approximation
- strong indicator of transfer rate
- high overhead
- Performance is near O(log2 n)