Data Grids - PowerPoint PPT Presentation

1 / 80
About This Presentation
Title:

Data Grids

Description:

Data Grids. Slides from UT Drs. Faisal N. Abu-Khzam & Michael A. Langston. What is Grid Computing? ... A network of geographically distributed resources: ... – PowerPoint PPT presentation

Number of Views:30
Avg rating:3.0/5.0
Slides: 81
Provided by: mingleisus
Category:
Tags: data | grids | ut

less

Transcript and Presenter's Notes

Title: Data Grids


1
  • Data Grids

2
  • Slides from UT Drs. Faisal N. Abu-Khzam
    Michael A. Langston

3
What is Grid Computing?
  • Computational Grids
  • Homogeneous (e.g., Clusters)
  • Heterogeneous (e.g., with one-of-a-kind
    instruments)

4
Computational Grids
  • A network of geographically distributed
    resources
  • computers, peripherals, switches, instruments,
    and data.
  • Each user - a single login account to access all
    resources.
  • Resources - owned by diverse organizations.

5
Computational Grids
  • Grids are typically managed by gridware
  • Gridware - a special type of middleware that
    enables
  • sharing
  • management of grid components
  • based on user requirements and resource attributes

6
Simplistically
  • Large number of users
  • Large volume of data
  • Large computational task involved

7
Cousins of Grid Computing
  • Distributed Computing
  • Parallel Computing
  • Peer-to-Peer Computing
  • Many others Cluster Computing, Network
    Computing, Client/Server Computing, Internet
    Computing, etc...

8
Distributed Computing
  • Question
  • Is Grid Computing a fancy new name for the
    concept of distributed computing?
  • In general, NO
  • Distributed Computing - distributing the load of
    a program across two or more processes.

9
Parallel computing
  • Single task multiple machines
  • Divide task into smaller tasks
  • Share resources, e.g. memory

10
PEER2PEER Computing
  • Sharing of computer resources and services by
    direct exchange between systems.
  • Computers can act as clients or servers depending
    on what role is most efficient for the network.

11
Grid is more
  • Term Grid borrowed from electrical grid
  • Users obtains computing power through Internet by
    using Grid just like electrical power from any
    wall socket
  • By connecting to a Grid, can get
  • needed computing power
  • storage spaces and data

12
Methods of Grid Computing
  • Distributed Supercomputing
  • High-Throughput Computing
  • On-Demand Computing
  • Data-Intensive Computing
  • Collaborative Computing
  • Logistical Networking

13
Distributed Supercomputing
  • Combine multiple high-capacity resources on a
    computational grid into
  • a single, virtual distributed supercomputer.
  • Tackle problems that cannot be solved on a single
    system.

14
High-Throughput Computing
  • Use grid to schedule large numbers of loosely
    coupled or independent tasks.
  • goal of putting unused processor cycles to work.

15
On-Demand Computing
  • Uses grid capabilities to meet short-term
    requirements for resources that are not locally
    accessible.
  • Models real-time computing demands.

16
Data-Intensive Computing
  • Synthesize new information from data that is
    maintained in
  • geographically distributed repositories, digital
    libraries, and databases.
  • Particularly useful for distributed data mining.

17
Collaborative Computing
  • Concerned primarily with enabling and enhancing
    human-to-human interactions.
  • Applications are often structured in terms of a
    virtual shared space.

18
Who Needs Grid Computing?
  • A chemist utilized hundreds of processors to
    screen thousands of compounds per hour.
  • Teams of engineers worldwide - pool resources to
    analyze terabytes of structural data.
  • Meteorologists - visualize and analyze petabytes
    of climate data with enormous computational
    demands.

19
An Illustrative Example
  • NASA research scientist
  • collected microbiological samples in the
    tidewaters around Wallops Island, Virginia.
  • Needs
  • high-performance microscope at National Center
    for Microscopy and Imaging Research (NCMIR),
    University of California, San Diego.

20
Example (continued)
  • Samples sent to San Diego and used NPACIs
    Telescience Grid and NASAs Information Power
    Grid (IPG) to view and control the output of the
    microscope from her desk on Wallops Island.
  • Viewed the samples, and move platform holding
    them, making adjustments to the microscope.

21
Example (continued)
  • The microscope produced a huge dataset of images.
  • This dataset was stored using a storage resource
    broker on NASAs IPG.
  • Scientist was able to run algorithms on this very
    dataset while watching the results in real time.

22
Online replication strategy to Increase
Availability in Data Grids
  • Ming Lei, PhD student
  • Department of Computer Science
  • University of Alabama

23
Outline
  • 1. Introduction
  • 2. Two metrics of system availability
  • System Bytes Missing Rate
  • System File Missing Rate
  • 3. Our analytical model and the new dynamic
    replica algorithms
  • 4. Replica optimizer to minimize the Data Miss
    Rate (MinDmr)
  • 5. Simulation results
  • 6. Conclusions and Future work

24
1. Introduction
  • Property of a Grid System
  • Millions of files, thousands of users world-wide
  • Dynamic behavior of Grid users
  • Unavailability of a file job hang, delay in job
  • Storage space is limited
  • File sizes are different
  • Data Grid Grid computing system for processing
    and managing this large volume of data

25
Introduction
  • Early work
  • Decrease access latency
  • Network bandwidth
  • How to improve file access time and availability
    in a Data Grid
  • Data Replication

26
Introduction
  • Related work
  • Economical model replica decision based on
    auction protocol Carman, Zini, et al.
  • Hotzone places replicas so client-to-replica
    latency minimized Szymaniak et al.
  • Replica strategies dynamic, shortest
    turnaround, least relative load Tang et al.
    consider only LRU
  • Multi-tiered Grid Simple Bottom Up and
    Aggregate Bottom Up Tang et al.
  • Replicate fragments of files, block mapping
    procedure for direct user access ChangChen

27
Two metrics of system reliability
  • Instead of access time, what about availability
    (if file access failure?)
  • 1. System File Missing Rate SFMR
  • number of files potentially unavailable
  • number of all the files requested by all the jobs
  • 2. System Bytes Missing Rate SBMR
  • number of bytes potentially unavailable
  • total number of bytes requested by all jobs

28
Data Grid Architecture
  • Simulated Data Grid Architecture

29
Two metrics of system reliability
  • Availability of a file
  • k - is number of copies of file fi
  • - file availability at a particular SE
  • Pj - file fjs availability

30
Two metrics of system reliability
  • Definition of SFMR (System File Missing Rate)
  • n - total number of jobs
  • m number of files accesses for each job
  • Pj - file fjs availability

SFMR

31
Two metrics of system reliability
  • Definition of SBMR (System Bytes Missing Rate)
  • SBMR
  • n - total number of jobs
  • m number of file access operations for each job
  • Pj - file fjs availability
  • Sj - size of file fj in bytes

32
Long term performance
  • Must make long term performance decisions
  • Each file access operation ri, at instant T, is
    associated with variable Vi
  • Vi is set to number of times file will be
    accessed in the future
  • Assign future value to file via a prediction
    function

33
New dynamic replica algorithms
  • Prediction via four kinds of prediction
    functions
  • Bio Prediction binomial distribution is used to
    predict Vi based on file access history
  • Zipf Prediction Zipf distribution is used to
    predict Vi based on file access history
  • Queue Prediction The current job queue is used
    to predict the Vi of the file
  • No Prediction No predictions of the file are
    made, Vi will always be 1

34
  • file set d f1,f2,..,fk
  • achieve the maximum
  • or

35
On-line Optimal replication problem
  • Optimal problem is classic Knapsack problem
  • Aggregate file replicas storage costs as Weight
    of item (fi)
  • Convert optimization problem to fractional
    knapsack problem
  • Assume storage capacity is sufficiently large and
    holds sufficiently large number of files
  • Amount of space left after storing maximum is
    negligible ???

36
New dynamic replica algorithms MinDmr Algorithm
  • For each file request
  • If enough space
  • replicate file
  • Else
  • Sort stored files by a weight W
  • Replace file(s)
  • if value gained by replicating gt
  • value lost by replacing a file

37
New dynamic replica algorithms
  • MinDmr replica optimizer
  • In our greedy algorithm, we introduce the file
    weight as
  • W (Pj Vj) /(Cj Sj)
  • Pj - file fjs availability
  • Cj - the number of copies of fj
  • Sj - the size of fj

38
MinDmr algorithm
  • MinDmr Optimizer ()
  • Requested file fi exists in the site
  • Do nothing
  • Requested file fi does not exist in the site
  • Site has enough free space
  • retrieve fi from remote site and store
    it
  • .
  • Requested file fi does not exist in the site
  • Site does not have enough free space
  • Sort the files in current SE by the file weight
    Wi (equation (9)) in ascending order.
  • Fetch the files from the sorted file list in
    order and add it into the candidates list until
    the accumulative file size of the candidate files
    are greater than or equal to the requested file.
  • 4. Replicate the file if the value gained by
    replicating the file fi is greater than the
    accumulative value loss by deleting the
    candidate file fj from the SE, where
  • value gained ?Pi Vi and accumulative
    value loss

?Pj Vj (?P is the absolute variance between
the file availability before the file is
replicated or replaced, and the file availability
after it is replicated or replaced)
39
OptorSim
  • Evaluate the performance of our MinDmr (MD)
    replica and replacement strategy
  • Using OptorSim
  • OptorSim developed by the EU DataGrid Project to
    test dynamic replica schemes

40
Eco model
  • Compare to Economical Model in OptorSim
  • Eco
  • File replicated if maximizes profit if SE (e.g.
    what is earned over time based on predicted
    number of file requests)
  • Eco prediction functions
  • EcoBio
  • EcoZipf
  • Queue Prediction
  • No Prediction

41
Strategies compared
  • Compare performance of 8 stratgies
  • LRU
  • LFU
  • EcoBio
  • EcoZipf
  • BioMD
  • ZipfMD
  • MDNoPred
  • MDQuePred

42
Grid topology
Grid Topology
43
Configuration
44
Access Patterns
  • Consider 4 access patterns
  • Random
  • Random Walk Gaussian
  • Sequential
  • Random Walk Zipf

45
5. Simulation results
  • Results for equal size files
  • SFMR with varying replica optimizers

46
Simulation results
  • MinDmr best performance
  • LFU slightly better than LRU
  • Eco worst
  • ZipfMD not as good as other MD

47
Simulation results
48
Simulation results
  • MinDmr shorter total job times
  • LRU shorter job time, although larger SFMR

49
Simulation results
50
Simulation results
  • Random Scheduler
  • Shortest Queue
  • Access Cost
  • Queue Access Cost
  • Job scheduler does not change SFMR tendency

51
Simulation results
52
Simulation results
  • When job queue short, SFMR higher for MDQuePred
  • When job queue too long, SFMR can increase
    slightly
  • Due to valuable files always stay in storage

53
Simulation results
54
Simulation results
  • Total job time decreases as job queue increases

55
Simulation results
56
Simulation results
  • Vary size of files all same size
  • Larger the file size the larger the SFMR

57
Simulation results
58
Simulation results
  • Different size files
  • Higher SBMR than SFMR
  • Replica schemes prefer small-size files
  • LFU no affected, decides based on access
    frequency
  • MinDmr better than Eco

59
6. Conclusions and Future work
  • Results indicate performance (data availability)
    of MinDmr is better than others with
  • varying file sizes
  • prediction functions
  • System load
  • Queue length
  • job schedulers
  • file access patterns
  • Prediction functions help improve performance of
    MinDmr, but MinDmr not dependent on prediction
    function used

60
Replication for Fairness
  • Fairness ignored when focus on system turnaround
    time
  • Propose new metric of fairness
  • Remote Data Access Element
  • Data Backfill scheduling strategy
  • Sliding window replica protocol

61
  • All jobs submitted to resource broker
  • Jobs dispatched to different sites
  • Global resource broker cannot make perfect
    schedule to guarantee first job arrived will
    execute first because
  • Network bandwidth
  • Data replication

62
Scheduler Fairness
  • A measurement of the degree to which the
    scheduler will guarantee a later arriving job
    will not block a job that arrived earlier
  • Unfairly blocked time
  • Tblock time earlier blocked by later job

63
Fairness Performance Index
  • Loss-fairness compared to most fair
  • Gain-per compared to slowest

64
Data Backfill scheduling
  • Remote Data Access Element RDAE
  • Allows CE to focus on processing
  • CE sends request to RDAE, swaps out job, CE
    processes next arrived job
  • RDAE can be logic unit
  • RDAE handles remote data fetching, passes data to
    CE

65
(No Transcript)
66
Sliding Window Replica Scheme
  • Alternative to future prediction which can
    overemphasize future accesses times when queue is
    long

67
Sliding window replica protocol
  • Build sliding window set of times used
    immediately in the future
  • Size bound by size of local SE
  • Includes all files current job will access and
    distinct files from next arriving jobs
  • Sliding window slides forward one more file each
    time system finishes processing a file
  • Sliding window dynamic

68
Sliding window replica protocol
  • Sum of size of all files lt size of SE
  • No duplicate files in sliding window

69
(No Transcript)
70
Simulation results
  • Assume OptorSim topology
  • 10,000 jobs in Grid
  • Each job accesses 3-10 files
  • Storage available 100M-10G
  • File size 0.5-1.5 G
  • Compare replica strategies
  • LRU, LFU, EcoBio, EcoZipf, No Prediction, Sliding
    Window

71
  • Fist, study sliding window without RDAE
  • Measure running time

72
Figure 7. Running Time with Varying File
Accessing Pattern.
73
  • Sliding window replica scheme always best
    turnaround time
  • No replication, EcoBio the worst
  • LFU second best

74
Figure 8. Impact of network bandwidth w/o RDAE
Figure 9. Impact of network bandwidth with RDAE
75
  • Higher the bandwidth, shorter system performance
  • Running time reduced by average of 15 with RDAE
  • Sliding window replica always the best
  • EcoZipf, EcoBio perform almost same as no
    prediction

76
Figure 10. Impact of varying the job switch time
77
  • Longer the switch time, the longer the total
    running time
  • Sliding window the best
  • LRU, LFU second best
  • Improvement provided by sliding window over LRU,
    LFU greatest for smaller switch times
  • Improvement provided by sliding window over
    EcoBio, EcoZipf greatest remains high for higher
    switch times

78
Figure 11. Running Time with Varying Schedulers
79
Figure 13. Fairness Performance Index
80
6. Conclusions and Future work
  • File bundle situation
  • Preferential treatment of smaller size files
  • Green Grids
Write a Comment
User Comments (0)
About PowerShow.com