Data Grids - PowerPoint PPT Presentation

1 / 80

About This Presentation

Title:

Data Grids

Description:

Data Grids. Slides from UT Drs. Faisal N. Abu-Khzam & Michael A. Langston. What is Grid Computing? ... A network of geographically distributed resources: ... – PowerPoint PPT presentation

Number of Views:30

Avg rating:3.0/5.0

Slides: 81

Provided by: mingleisus

Category:

Tags: data | grids | ut

more less

Transcript and Presenter's Notes

Title: Data Grids

1

Data Grids

Slides from UT Drs. Faisal N. Abu-Khzam
Michael A. Langston

3
What is Grid Computing?

Computational Grids
Homogeneous (e.g., Clusters)
Heterogeneous (e.g., with one-of-a-kind
instruments)

4
Computational Grids

A network of geographically distributed
resources
computers, peripherals, switches, instruments,
and data.
Each user - a single login account to access all
resources.
Resources - owned by diverse organizations.

5
Computational Grids

Grids are typically managed by gridware
Gridware - a special type of middleware that
enables
sharing
management of grid components
based on user requirements and resource attributes

6
Simplistically

Large number of users
Large volume of data
Large computational task involved

7
Cousins of Grid Computing

Distributed Computing
Parallel Computing
Peer-to-Peer Computing
Many others Cluster Computing, Network
Computing, Client/Server Computing, Internet
Computing, etc...

8
Distributed Computing

Question
Is Grid Computing a fancy new name for the
concept of distributed computing?
In general, NO
Distributed Computing - distributing the load of
a program across two or more processes.

9
Parallel computing

Single task multiple machines
Divide task into smaller tasks
Share resources, e.g. memory

10
PEER2PEER Computing

Sharing of computer resources and services by
direct exchange between systems.
Computers can act as clients or servers depending
on what role is most efficient for the network.

11
Grid is more

Term Grid borrowed from electrical grid
Users obtains computing power through Internet by
using Grid just like electrical power from any
wall socket
By connecting to a Grid, can get
needed computing power
storage spaces and data

12
Methods of Grid Computing

Distributed Supercomputing
High-Throughput Computing
On-Demand Computing
Data-Intensive Computing
Collaborative Computing
Logistical Networking

13
Distributed Supercomputing

Combine multiple high-capacity resources on a
computational grid into
a single, virtual distributed supercomputer.
Tackle problems that cannot be solved on a single
system.

14
High-Throughput Computing

Use grid to schedule large numbers of loosely
coupled or independent tasks.
goal of putting unused processor cycles to work.

15
On-Demand Computing

Uses grid capabilities to meet short-term
requirements for resources that are not locally
accessible.
Models real-time computing demands.

16
Data-Intensive Computing

Synthesize new information from data that is
maintained in
geographically distributed repositories, digital
libraries, and databases.
Particularly useful for distributed data mining.

17
Collaborative Computing

Concerned primarily with enabling and enhancing
human-to-human interactions.
Applications are often structured in terms of a
virtual shared space.

18
Who Needs Grid Computing?

A chemist utilized hundreds of processors to
screen thousands of compounds per hour.
Teams of engineers worldwide - pool resources to
analyze terabytes of structural data.
Meteorologists - visualize and analyze petabytes
of climate data with enormous computational
demands.

19
An Illustrative Example

NASA research scientist
collected microbiological samples in the
tidewaters around Wallops Island, Virginia.
Needs
high-performance microscope at National Center
for Microscopy and Imaging Research (NCMIR),
University of California, San Diego.

20
Example (continued)

Samples sent to San Diego and used NPACIs
Telescience Grid and NASAs Information Power
Grid (IPG) to view and control the output of the
microscope from her desk on Wallops Island.
Viewed the samples, and move platform holding
them, making adjustments to the microscope.

21
Example (continued)

The microscope produced a huge dataset of images.
This dataset was stored using a storage resource
broker on NASAs IPG.
Scientist was able to run algorithms on this very
dataset while watching the results in real time.

22
Online replication strategy to Increase
Availability in Data Grids

Ming Lei, PhD student
Department of Computer Science
University of Alabama

23
Outline

1. Introduction
2. Two metrics of system availability
System Bytes Missing Rate
System File Missing Rate
3. Our analytical model and the new dynamic
replica algorithms
4. Replica optimizer to minimize the Data Miss
Rate (MinDmr)
5. Simulation results
6. Conclusions and Future work

24
1. Introduction

Property of a Grid System
Millions of files, thousands of users world-wide
Dynamic behavior of Grid users
Unavailability of a file job hang, delay in job
Storage space is limited
File sizes are different
Data Grid Grid computing system for processing
and managing this large volume of data

25
Introduction

Early work
Decrease access latency
Network bandwidth
How to improve file access time and availability
in a Data Grid
Data Replication

26
Introduction

Related work
Economical model replica decision based on
auction protocol Carman, Zini, et al.
Hotzone places replicas so client-to-replica
latency minimized Szymaniak et al.
Replica strategies dynamic, shortest
turnaround, least relative load Tang et al.
consider only LRU
Multi-tiered Grid Simple Bottom Up and
Aggregate Bottom Up Tang et al.
Replicate fragments of files, block mapping
procedure for direct user access ChangChen

27
Two metrics of system reliability

Instead of access time, what about availability
(if file access failure?)
1. System File Missing Rate SFMR
number of files potentially unavailable
number of all the files requested by all the jobs
2. System Bytes Missing Rate SBMR
number of bytes potentially unavailable
total number of bytes requested by all jobs

28
Data Grid Architecture

Simulated Data Grid Architecture

29
Two metrics of system reliability

Availability of a file
k - is number of copies of file fi
- file availability at a particular SE
Pj - file fjs availability

30
Two metrics of system reliability

Definition of SFMR (System File Missing Rate)
n - total number of jobs
m number of files accesses for each job
Pj - file fjs availability

SFMR

31
Two metrics of system reliability

Definition of SBMR (System Bytes Missing Rate)
SBMR
n - total number of jobs
m number of file access operations for each job
Pj - file fjs availability
Sj - size of file fj in bytes

32
Long term performance

Must make long term performance decisions
Each file access operation ri, at instant T, is
associated with variable Vi
Vi is set to number of times file will be
accessed in the future
Assign future value to file via a prediction
function

33
New dynamic replica algorithms

Prediction via four kinds of prediction
functions
Bio Prediction binomial distribution is used to
predict Vi based on file access history
Zipf Prediction Zipf distribution is used to
predict Vi based on file access history
Queue Prediction The current job queue is used
to predict the Vi of the file
No Prediction No predictions of the file are
made, Vi will always be 1

file set d f1,f2,..,fk
achieve the maximum
or

35
On-line Optimal replication problem

Optimal problem is classic Knapsack problem
Aggregate file replicas storage costs as Weight
of item (fi)
Convert optimization problem to fractional
knapsack problem
Assume storage capacity is sufficiently large and
holds sufficiently large number of files
Amount of space left after storing maximum is
negligible ???

36
New dynamic replica algorithms MinDmr Algorithm

For each file request
If enough space
replicate file
Else
Sort stored files by a weight W
Replace file(s)
if value gained by replicating gt
value lost by replacing a file

37
New dynamic replica algorithms

MinDmr replica optimizer
In our greedy algorithm, we introduce the file
weight as
W (Pj Vj) /(Cj Sj)
Pj - file fjs availability
Cj - the number of copies of fj
Sj - the size of fj

38
MinDmr algorithm

MinDmr Optimizer ()
Requested file fi exists in the site
Do nothing
Requested file fi does not exist in the site
Site has enough free space
retrieve fi from remote site and store
it
.
Requested file fi does not exist in the site
Site does not have enough free space
Sort the files in current SE by the file weight
Wi (equation (9)) in ascending order.
Fetch the files from the sorted file list in
order and add it into the candidates list until
the accumulative file size of the candidate files
are greater than or equal to the requested file.
4. Replicate the file if the value gained by
replicating the file fi is greater than the
accumulative value loss by deleting the
candidate file fj from the SE, where
value gained ?Pi Vi and accumulative
value loss

?Pj Vj (?P is the absolute variance between
the file availability before the file is
replicated or replaced, and the file availability
after it is replicated or replaced)
39
OptorSim

Evaluate the performance of our MinDmr (MD)
replica and replacement strategy
Using OptorSim
OptorSim developed by the EU DataGrid Project to
test dynamic replica schemes

40
Eco model

Compare to Economical Model in OptorSim
Eco
File replicated if maximizes profit if SE (e.g.
what is earned over time based on predicted
number of file requests)
Eco prediction functions
EcoBio
EcoZipf
Queue Prediction
No Prediction

41
Strategies compared

Compare performance of 8 stratgies
LRU
LFU
EcoBio
EcoZipf
BioMD
ZipfMD
MDNoPred
MDQuePred

42
Grid topology
Grid Topology
43
Configuration
44
Access Patterns

Consider 4 access patterns
Random
Random Walk Gaussian
Sequential
Random Walk Zipf

45
5. Simulation results

Results for equal size files
SFMR with varying replica optimizers

46
Simulation results

MinDmr best performance
LFU slightly better than LRU
Eco worst
ZipfMD not as good as other MD

47
Simulation results
48
Simulation results

MinDmr shorter total job times
LRU shorter job time, although larger SFMR

49
Simulation results
50
Simulation results

Random Scheduler
Shortest Queue
Access Cost
Queue Access Cost
Job scheduler does not change SFMR tendency

51
Simulation results
52
Simulation results

When job queue short, SFMR higher for MDQuePred
When job queue too long, SFMR can increase
slightly
Due to valuable files always stay in storage

53
Simulation results
54
Simulation results

Total job time decreases as job queue increases

55
Simulation results
56
Simulation results

Vary size of files all same size
Larger the file size the larger the SFMR

57
Simulation results
58
Simulation results

Different size files
Higher SBMR than SFMR
Replica schemes prefer small-size files
LFU no affected, decides based on access
frequency
MinDmr better than Eco

59
6. Conclusions and Future work

Results indicate performance (data availability)
of MinDmr is better than others with
varying file sizes
prediction functions
System load
Queue length
job schedulers
file access patterns
Prediction functions help improve performance of
MinDmr, but MinDmr not dependent on prediction
function used

60
Replication for Fairness

Fairness ignored when focus on system turnaround
time
Propose new metric of fairness
Remote Data Access Element
Data Backfill scheduling strategy
Sliding window replica protocol

All jobs submitted to resource broker
Jobs dispatched to different sites
Global resource broker cannot make perfect
schedule to guarantee first job arrived will
execute first because
Network bandwidth
Data replication

62
Scheduler Fairness

A measurement of the degree to which the
scheduler will guarantee a later arriving job
will not block a job that arrived earlier
Unfairly blocked time
Tblock time earlier blocked by later job

63
Fairness Performance Index

Loss-fairness compared to most fair
Gain-per compared to slowest

64
Data Backfill scheduling

Remote Data Access Element RDAE
Allows CE to focus on processing
CE sends request to RDAE, swaps out job, CE
processes next arrived job
RDAE can be logic unit
RDAE handles remote data fetching, passes data to
CE

65
(No Transcript)
66
Sliding Window Replica Scheme

Alternative to future prediction which can
overemphasize future accesses times when queue is
long

67
Sliding window replica protocol

Build sliding window set of times used
immediately in the future
Size bound by size of local SE
Includes all files current job will access and
distinct files from next arriving jobs
Sliding window slides forward one more file each
time system finishes processing a file
Sliding window dynamic

68
Sliding window replica protocol

Sum of size of all files lt size of SE
No duplicate files in sliding window

69
(No Transcript)
70
Simulation results

Assume OptorSim topology
10,000 jobs in Grid
Each job accesses 3-10 files
Storage available 100M-10G
File size 0.5-1.5 G
Compare replica strategies
LRU, LFU, EcoBio, EcoZipf, No Prediction, Sliding
Window

Fist, study sliding window without RDAE
Measure running time

72
Figure 7. Running Time with Varying File
Accessing Pattern.
73

Sliding window replica scheme always best
turnaround time
No replication, EcoBio the worst
LFU second best

74
Figure 8. Impact of network bandwidth w/o RDAE
Figure 9. Impact of network bandwidth with RDAE
75

Higher the bandwidth, shorter system performance
Running time reduced by average of 15 with RDAE
Sliding window replica always the best
EcoZipf, EcoBio perform almost same as no
prediction

76
Figure 10. Impact of varying the job switch time
77

Longer the switch time, the longer the total
running time
Sliding window the best
LRU, LFU second best
Improvement provided by sliding window over LRU,
LFU greatest for smaller switch times
Improvement provided by sliding window over
EcoBio, EcoZipf greatest remains high for higher
switch times

78
Figure 11. Running Time with Varying Schedulers
79
Figure 13. Fairness Performance Index
80
6. Conclusions and Future work