Data Replication Service - PowerPoint PPT Presentation

1 / 20
About This Presentation
Title:

Data Replication Service

Description:

Data Replication Service Sandeep Chandra GEON Systems Group San Diego Supercomputer Center Outline Motivation Data Replication Service (DRS) Components for DRS RLS ... – PowerPoint PPT presentation

Number of Views:35
Avg rating:3.0/5.0
Slides: 21
Provided by: Sandeep95
Category:

less

Transcript and Presenter's Notes

Title: Data Replication Service


1
Data Replication Service
  • Sandeep Chandra
  • GEON Systems Group
  • San Diego Supercomputer Center

2
Outline
  • Motivation
  • Data Replication Service (DRS)
  • Components for DRS
  • RLS, GridFTP, RFT
  • DRS Deployment
  • DRS setup on GEON
  • Next Steps

3
Motivation
  • Science domains spend considerable effort
    collecting and managing large amounts of data
  • Science domains develop customized data
    management services that vary with the type of
    application
  • Common data management requirements
  • Publish and replicate large datasets
  • Register data replicas in catalogs and discover
    them
  • Perform metadata-based discovery of datasets
  • May require ability to validate correctness of
    replicas

4
Motivation (cont.)
  • These systems demand considerable resources to
    design, implement maintain
  • Typically cannot be re-used by other applications
  • Need for a long-term solution
  • Generalize functionality provided by these data
    management systems
  • Provide suite of application-independent services
  • Design and build on lower-level grid services
  • Globus Reliable File Transfer (RFT) service
  • Replica Location Service (RLS)
  • GridFTP

5
A possible solutionData Replication System (DRS)
  • Higher level data management service based on low
    level data management components like RLS and RFT
  • The primary functionality is to
  • Allow users to identify a set of desired files
    existing in their grid environment
  • Make local replicas of those data files by
    transferring files from one or more source
    locations
  • Register the new replicas in a Replica Location
    Service

6
Replica Location Service (RLS)
  • A simple registry that keeps track of where
    replicas exist on physical storage systems.
  • Users or services register files in RLS when the
    files are created.
  • Query RLS servers to find these replicas.
  • RLS can be a distributed registry, consisting of
    multiple servers at different sites.
  • Distributed RLS increases the overall scale and
    store more mappings than would be possible in a
    single, centralized catalog.

7
RLS (cont.)
  • A logical file name is a unique identifier for
    the contents of a file.
  • A physical file name is the location of a copy of
    the file on a storage system.
  • RLS maintains mappings between logical file names
    and one or more physical file names of replicas.
  • Users can provide a logical file name to an RLS
    server and ask for all the registered physical
    file names of replicas.
  • Users can also query an RLS server to find the
    logical file name associated with a particular
    physical file location.

Logical File Name XYZ
XYZ replica 1
XYZ replica 2
XYZ replica 3
Site 3
Site 1
Site 2
8
RLS (cont.)
  • Two servers LRI, LRC
  • LRC stores mappings between logical names for
    data items and the physical locations of
    replicas.
  • Query the LRC to discover replicas associated
    with a logical name.
  • RLI server collects information about the logical
    name mappings stored in one or more LRCs.
  • RLI returns a list of all the LRCs it is aware of
    that contain mappings for the logical name
    contained in a query.
  • The client then queries these LRCs to find the
    physical locations of replicas.

Replica Location Index (RLI) Nodes
RLI
RLI
RLI
LRC
LRC
LRC
LRC
Local Replica Catalogs (LRC)
9
RLS in Context
  • The RLS is one component in a layered data
    management architecture
  • Consistency management provided by higher-level
    services

Replica Consistency Management Services
Metadata Service
Reliable Replication Service
Replica Location Service
Reliable Data Transfer Service
GridFTP
10
GridFTP
  • The GridFTP protocol provides for the secure,
    robust, fast and efficient transfer of
    (especially bulk) data.
  • Globus Toolkit provides the most commonly used
    implementation of the protocol, though others
    exist.
  • The Globus Toolkit provides
  • server implementation called globus-gridftp-server
  • scriptable command line client called
    globus-url-copy
  • a set of development libraries for custom clients

11
Reliable File Transfer (RFT)
  • A WSRF compliant web service that provides job
    scheduler like functionality for data movement.
  • You provide a list of source and destination URLs
    (including directories or files), then the
    service writes your job description into a
    database and moves the files on your behalf.

12
RFT (cont.)
  • Accepts SOAP description of a desired transfer
  • Service methods are provided for querying the
    transfer status
  • WSRF tools to subscribe for notifications of
    state change events
  • Supports all the same options as globus-url-copy
    (buffer size, etc)
  • Increased reliability because state is stored in
    a database
  • Supports concurrency, multiple files transferred
    for better performance

13
Globus Services
  • WSRF Services
  • Data Replication Service
  • Delegation Service
  • Reliable File Transfer Service
  • Pre WSRF Components
  • Replica Location Service (Local Replica Catalog,
    Replica Location Index)
  • GridFTP Server

Local Site
Reliable
Data
File
Delegation
Replication
Transfer
Service
Service
Service
Replicator
RFT
Delegated
Resource
Resource
Credential
Web Service Container
Local
Replica
GridFTP
Replica
Location
Server
Catalog
Index
14
DRS Deployment
  • Local storage system
  • GridFTP server for file transfer
  • Replica Location Service
  • LRCs stores mappings from logical names to
    storage locations
  • RLI collects state summaries from LRCs
  • RFT WSRF service to perform data transfer
  • DRS The master replication service

Create a Transfer request
RFT Service
DRS Service
Replica Location Index
Location Replica Catalog
GridFTP
Server
Database
Site Storage System

15
Local Site
Client
1
3
Reliable
Data
File
Delegation
2
Replication
Transfer
Service
9
Service
Service
Request File
5
4
Replicator
RFT
Delegated
8
Resource
Resource
Credential
12
6
10
Web Service Container
Replica
Local
GridFTP
Location
Replica
13
Server
Index
Catalog
7
Remote Sites 1N
11
Reliable
Data
File
Delegation
Replication
Transfer
Service
Service
Service
Replicator
RFT
Delegated
Resource
Resource
Credential
Web Service Container
Local
Replica
GridFTP
Replica
Location
Server
Catalog
Index
16
DRS Functionality
  • Initiate a DRS Request
  • Create a delegated credential (Delegate
    Authority)
  • Create a Replicator resource (Replication
    Service)
  • Monitor Replicator resource (Status)
  • Discover replicas of files in RLS, select among
    replicas
  • Start data transfer to local site with RFT
    service
  • Check status
  • Register new replicas in RLS catalogs
  • Allow client inspection of DRS results
  • Destroy Replicator resource

17
Geon DRS Test Setup
ASU
SDSC
Globus Container
Globus Container
Create a Transfer request
Create a Transfer request
RFT Service
RFT Service
DRS Service
DRS Service
Replica Location Index
Replica Location Catalog
Replica Location Index
Replica Location Catalog
GridFTP
GridFTP
Server
Server
Database
Site Storage System
Database
Site Storage System


Data Transfer
18
Next Tasks
  • Transfer LIDAR data from ASU to SDSC resource.
    (HPSS, etc)
  • Extend the testbed to include more nodes.
  • Benchmarking data movement.
  • Package DRS and components with GEON software
    stack version 2.0

19
Acknowledgement
  • Ann Chervenak Robert Schuler (ISI)
  • www.globus.org (slides)

20
Questions?
Write a Comment
User Comments (0)
About PowerShow.com