LHCb SC3 Exeperience - PowerPoint PPT Presentation

1 / 2
About This Presentation
Title:

LHCb SC3 Exeperience

Description:

Abstract: LHCb's participation in LCG's Service Challenge 3 involves testing the ... unnecessary SSL authentications CPU intensive ... – PowerPoint PPT presentation

Number of Views:19
Avg rating:3.0/5.0
Slides: 3
Provided by: Andrew735
Category:

less

Transcript and Presenter's Notes

Title: LHCb SC3 Exeperience


1
LHCb Data Replication During SC3
Author Andrew C. Smith Abstract LHCb's
participation in LCG's Service Challenge 3
involves testing the bulk data transfer
infrastructure developed to allow high bandwidth
distribution of data across the grid in
accordance with the computing model. To enable
reliable bulk replication of data, LHCb's DIRAC
system has been integrated with gLite's File
Transfer Service middleware component to make use
of dedicated network links between LHCb computing
centres. DIRAC's Data Management tools previously
allowed the replication, registration and
deletion of files on the grid. For SC3
supplementary functionality has been added to
allow bulk replication of data (using FTS) and
efficient mass registration to the LFC replica
catalog. Provisional performance results have
shown that the system developed can meet the
expected data replication rate required by the
computing model in 2007. This paper details the
experience and results of integration and
utilisation of DIRAC with the SC3 transfer
machinery.
  • LHCb Transfer Aims During SC3
  • The extended Service Phase of SC3 was to allow
    the experiments to test their specific software
    and validate their computing models using the
    platform of machinery provided. LHCbs Data
    Replication goals during SC3 can be summarised
    as
  • Replication 1TB of stripped DST data from CERN
    to all Tier-1s.
  • Replication of 8 TB of digitised data from
    CERN/Tier-0 to LHCb participating Tier1 centers
    in parallel.
  • Removal of 50k replicas (via LFN) from all Tier-1
    centres
  • Moving 4TB of data from Tier1 centres to Tier0
    and to other participating Tier1 centers.
  • Introduction to DIRAC Data Management
    Architecture
  • DIRAC architecture split into three main
    component types
  • Services - independent functionalities deployed
    and administered centrally on machines accessible
    by all other DIRAC components
  • Resources - GRID compute and storage resources at
    remote sites
  • Agents - lightweight software components that
    request jobs from the central Services for a
    specific purpose.
  • The DIRAC Data Management System is made up an
    assortment of these components.
  • Integration of DIRAC with FTS
  • SC3 replication machinery utilised gLites File
    Transfer Service (FTS)
  • lowest-level data movement service defined in the
    gLite architecture
  • offers reliable point-to-point bulk file
    transfers
  • physical files (SURLs) between SRM managed SEs
  • accepts source-destination SURL pairs
  • assigns file transfers to dedicated transfer
    channel
  • take advantage of networking between CERN and
    Tier1s
  • routing of transfers is not provided
  • Higher level service required to resolve SURLs
    and hence decide on routing. DIRAC Data
    Management System employed to do these tasks.
  • Main components of the DIRAC Data Management
    System
  • Storage Element
  • abstraction of GRID storage resources
  • actual access by specific plug-ins
  • srm, gridftp, bbftp, sftp, http supported
  • namespace management, file up/download, deletion
    etc.
  • Replica Manager
  • provides an API for the available data management
    operations
  • point of contact for users of data management
    systems
  • removes direct operation with Storage Element and
    File Catalogs
  • uploading/downloading file to/from GRID SE,
    replication of files, file registration, file
    removal
  • File Catalog
  • standard API exposed for variety of available
    catalogs
  • allows redundancy across several catalogs
  • Integration requirements
  • new methods developed in Replica Manager
  • previous Data Management operations single file
    and blocking
  • bulk operation functionality added to the
    Transfer Agent/Request
  • monitoring of asynchronous FTS jobs required
  • information for monitoring stored within Request
    DB entry

2
  • Operation of DIRAC Bulk Transfer Mechanics
  • DIRAC integration with FTS deployed centrally
  • managed machine at CERN
  • service all data replication jobs for SC3
  • Lifetime of bulk replication job
  • bulk replication requests submitted to the DIRAC
    WMS
  • JDL file with an input sandbox of an XML file
  • XML contains important parameters i.e. LFNs,
    source/target SE
  • DIRAC WMS populates the Request DB of central
    machine with XML
  • Transfer Agent polls Request DB periodically for
    waiting requests
  • Combined 40MB/s from CERN to 6 LHCb Tier1s to
    meet SC3 goals
  • aggregated daily rate was obtained
  • overall SC3 machinery not completely stable
  • target rate not sustained over the required
    period
  • peak rates of 100MB/s were observed over several
    hours
  • Rerun of exercise planned to demonstrate the
    required rates.
  • Bulk File Removal Operations
  • Bulk removal of files performed on completion of
    T0-T1 replication.
  • bulk operation of srm-advisory-delete used
  • takes list of SURls and removes physical file
  • functionality added to Replica Manager and
    Storage Element
  • additions required for SRM Storage Element
    plug-in
  • Replica Manager SURL resolution tools reused
  • Different interpretations of the SRM standard has
    lead to different underlying behavior between SRM
    solutions.
  • Initially bulk removal operations executed by a
    single central agent
  • SC3 goal of 50K replicas in 24 hours shown to be
    unattainable
  • Several parallel agents instantiated
  • each performing physical and catalog removal for
    a specific SE
  • 10K replicas were removed from 5 sites in 28
    hours
  • performance loss observed in replica deletion on
    LCG FC (see below)
  • unnecessary SSL authentications CPU intensive
  • Once Transfer Agent obtains Request XML file
  • replica information for LFNs obtained
  • replicas matched against source SE and target SE
  • SURL pairs resolved using endpoint information
  • SURL pairs are then submitted via the FTS Client
  • FTS GUID and other information on job stored in
    XML file
  • Tier1Tier1 Replication Activity On-Going
  • During T0-T1 replication FTS was found to be most
    efficient when replicating files pre-staged on
    disk.
  • dedicated disk pools setup to T1 sites for seed
    files
  • 1.5TB of seed files transferred to dedicated disk
  • FTS Servers were installed by T1 sites
  • channels setup directly between sites
  • Replication activity is on going with this
    exercise. The current status of this setup is
    shown below.
Write a Comment
User Comments (0)
About PowerShow.com