Title: Dynamic Data Grid Replication Strategy based on Internet Hierarchy
1Dynamic Data Grid Replication Strategy based on
Internet Hierarchy
- Sang Min Park ?, Jai-Hoon Kim,
- and Young-Bae Ko
- Ajou University
- South Korea
2Contents
- Introduction to Data Grid
- Optimizations in Data Grid
- Novel Replication Strategy based on Internet
Hierarchy - Simulation
- Simulation Results
- Conclusions
3Introduction to Data Grid
- Data Grid Motivations
- Petabyte scale data production
- Distributed data storage to store parts of data
- Distributed computing resources which process the
data - Two Most Important Approaches for Data Grid
- Secure, reliable, and efficient data transport
protocol - (ex. GridFTP)
- Replication (ex. Replica catalog)
- Replication
- Large size files are partially replicated among
sites - Reduce data access time
- Application Scheduling, Dynamic replication
issues are emerging
4Introduction to Data Grid
- Typical Job Execution Scenario
5Optimizations in Data Grid
- Reducing the Overall Job Execution Time
- Scheduling Optimization
- Deciding where to allocate the job
- Considering location of replicas and
computational capabilities of sites - Short-term Optimization
- Deciding from where to fetch replicas
- Considering available network bandwidth between
sites - Long-term Optimization (Dynamic Replication
Strategy) - Shortage of storage in a site
- Deciding which file should be remaining as a
replica - Better to replicate popular files because of its
future usage
6Existing Dynamic Replication Strategies
- Replica Optimization based on Site-level Locality
- Replicate the file that is predicted to be used
in future from the perspective of a site - Try to reduce the number of fetch
- Delete Oldest, Delete LRU Method
- Economic Strategy from European Data Grid
- Developing OptorSim Data Grid Optimization
Simulator - Using Auction Protocol to trigger Long-term
Optimization - Site-level Locality based on File access patterns
7Existing Dynamic Replication Strategies
- The Limitations of the site-level optimization
- A Site certainly have limitations of their
storage size, which means that the rate of data
request locality is also limited - There should be predictable file access patterns,
but we do not know if there will be.
8Replication Strategy based on Bandwidth Hierarchy
(BHR)
- Network-level Locality
- A site is not the only possible source of
locality - Another source of locality Network-level
locality - If the replica is located in a close site, not
long delay would be taken to fetch this replica
Slow Replica Transmission
Fast Replica Transmission
Network Region (e.g., a country)
9Replication Strategy based on Bandwidth Hierarchy
(BHR)
10Replication Strategy based on Bandwidth Hierarchy
(BHR)
- Maximizing Network-level locality
- 1. Avoiding Replica Duplication in a region
- 2. Considering popularity of file request at the
region-level
No space here! We should remove some file
Replica X is duplicated here!
X
Receiving New Replica
a Site
a Site
A Region
11Simulation
- OptorSim
- Data Grid Dynamic Replication Simulation tool
- Developed as part of European Data Grid Project
- Implemented in Java
- Implemented Our own Region-based Optimizer in
OptorSim
12Simulation
13Simulations
General configuration of parameters
Bandwidth and Storage Size
14Simulation Results
Total Job times of three strategies
15Simulation Results
Total job time with varying bandwidth and storage
size
16Conclusions
- The existing dynamic replication strategies are
based only on site-level locality of file request - BHR strategy is based on the network-locality
- BHR shows quite good performance when hierarchy
of bandwidth clearly appears, and size of storage
at a site is small - We extend current site-level replica optimization
study to more scalable way
17References
- William H. Bell, David G. Cameron, Luigi Capozza,
A. Paul Millar, Kurt Stockinger, and Floriano
Zini. Simulation of Dynamic Grid Replication
Strategies in OptorSim. In Proc. of the 3rd
Int'l. IEEE Workshop on Grid Computing
(Grid'2002), Baltimore, USA, November 2002.
Springer Verlag, Lecture Notes in Computer
Science. - William H. Bell, David G. Cameron, Ruben
Carvajal-Schiaffino, A. Paul Millar, Kurt
Stockinger, and Floriano Zini. Evaluation of an
Economy-Based File Replication Strategy for a
Data Grid. In International Workshop on Agent
based Cluster and Grid Computing at CCGrid 2003,
Tokyo, Japan, May 2003. IEEE Computer Society
Press. - Mark Carman, Floriano Zini, Luciano Serafini, and
Kurt Stockinger. Towards an Economy-Based
Optimisation of File Access and Replication on a
Data Grid. In International Workshop on Agent
based Cluster and Grid Computing at International
Symposium on Cluster Computing and the Grid
(CCGrid'2002), Berlin, Germany, May 2002. IEEE
Computer Society Press. - Ann Chervenak, Ian Foster, Carl Kesselman,
Charles Salisbury and Steven Tuecke. The Data
Grid Towards an Architecture for the Distributed
Management and Analysis of Large Scientific
Datasets. Journal of Network and Computer
Applications, 23187-200, 2001. - EU Data Grid Project http//www.eu-datagrid.org
18References
- I. Foster, C. Kesselman and S. Tuecke. The
Anatomy of the Grid Enabling Scalable Virtual
Organizations. International J. Supercomputer
Applications, 15(3), 2001. - Wolfgang Hoschek, Javier Jaen-Martinez, Asad
Samar, Heinz Stockinger and Kurt Stockinger.
Data Management in an International Data Grid
Project. 1st IEEE/ACM International Workshop on
Grid Computing (Grid'2000), Bangalore, India, Dec
2000. - OptorSim A Replica Optimizer Simulation
http//edg-wp2.web.cern.ch/edg-wp2/optimization/op
torsim.html - Sang-Min Park and Jai-Hoon Kim. Chameleon A
Resource Scheduler in a Data Grid Environment.
2003 IEEE/ACM International Symposium on Cluster
Computing and the Grid (CCGRID'2003), Tokyo,
Japan, May 2003. IEEE Computer Society Press. - Kavitha Ranganathan and Ian Foster. Design and
Evaluation of Dynamic Replication Strategies for
a High Performance Data Grid. International
Conference on Computing in High Energy and
Nuclear Physics, Beijing, September 2001. - Kavitha Ranganathan and Ian Foster. Identifying
Dynamic Replication Strategies for a High
Performance Data Grid. International Workshop on
Grid Computing, Denver, November 2001.