Light weight Disk Pool Manager: Status Update - PowerPoint PPT Presentation

1 / 18
About This Presentation
Title:

Light weight Disk Pool Manager: Status Update

Description:

Light weight Disk Pool Manager: Status Update Jean-Philippe Baud, IT-GD-CERN Gilbert Grosdidier, LAL-IN2P3-CNRS & IT-GD-CERN October 2005 DPM Goals Provide a solution ... – PowerPoint PPT presentation

Number of Views:67
Avg rating:3.0/5.0
Slides: 19
Provided by: admo153
Category:

less

Transcript and Presenter's Notes

Title: Light weight Disk Pool Manager: Status Update


1
Light weight Disk Pool ManagerStatus Update
  • Jean-Philippe Baud, IT-GD-CERN
  • Gilbert Grosdidier, LAL-IN2P3-CNRS
  • IT-GD-CERN
  • October 2005

2
DPM Goals
  • Provide a solution for the small Tier-2s in LCG-2
  • This implies a few tens of Terabytes in 2005
  • Focus on manageability
  • Easy to install
  • Easy to configure
  • Low effort for ongoing maintenance
  • Easy to add/remove resources
  • Integrated security (authentication/authorization)
  • Authentication a la Globus
  • Meant as a replacement for a Classic SE

3
Architecture
  • The Light Weight Disk Pool Manager consists of
  • The Disk Pool Manager with its configuration and
    request DB
  • The Disk Pool Name server with its NS/ACL/replica
    DB
  • The stateless SRM v1 v2 servers
  • The MySQL server (if any)
  • all of the above run usually on the same node
  • this is not a requirement
  • The GSI- DPM-aware RFIOD and GsiFTP servers
  • They have
  • To be duplicated on each disk server
  • And installed on the master node if there are
    local data areas

4
Current available functionalities
  • Management of disk space
  • Overall physical space management
  • automated garbage collection (removal of expired
    volatile files)
  • disk space balancing (when several filesystems)
  • Management of name space (including ACLs)
  • Control interfaces
  • socket
  • SRM v1.0 (no srmCopy method yet)
  • SRM v2.1 (w/o any user/VO global space
    reservation yet)
  • Data access protocols secure RFIO, GsiFTP
    (Globus 2.4)
  • Support for multiple disk servers
  • support for multiple physical disk partitions for
    each server
  • support for several pools for each server (load
    balancing)
  • support for several space types (volatile and
    permanent pools)
  • support for multiple replicas of a file within
    the disk pools

5
First experience with DPM (Tier2s)
  • 18 sites in early Sept. 05
  • 3 ways to install the DPM manual, Yaim, Quattor
    (LAL)
  • Sites using Yaim had almost no problem to install
  • all servers on the same node
  • or disk servers segregated from DPM servers
  • Sites doing a manual installation (for example
    gLite sites) had quite a few problems in the
    security area
  • the DPM install must come on top of the Globus
    security step
  • The overall feeling is that the DPM is very stable

6
Service Challenge 3
  • NDGF (Denmark/Norway/Sweden) used a distributed
    Disk Pool with a single head node
  • Very stable
  • Constant data rate (150 MB/s)
  • Good resiliency spotted after a network fiber
    accidental cut
  • Required several DPM restarts because of wrong
    free space reported (no reboot -gt short
    disruption)
  • Bad disk nodes not blacklisted for all error
    conditions (needs integration with monitoring),
    but FTS could also disable filesystems in case of
    errors

7
Missing functionalities (SC3 feedback)
  • Sysadmin level requests
  • Support for filesystems larger than 2TB (UK)
  • Support for disk servers spread over multiple
    subnets
  • Support for TCP port range used by RFIO v3
    (firewall issue)
  • Needs VOMS integration
  • User level requests
  • Not compatible with dCache srmcp (Phedex)
  • available space reported was wrong
  • Insufficient documentation about ACLs

8
Current Status
  • DPNS, DPM, SRM v1 v2 (w/o Copy nor global space
    reservation) have been heavily tested
  • The secure versions of RFIO and GsiFTP interfaced
    to the DPM are now available
  • they have been thoroughly tested as well
  • DPM first public release was part of LCG 2.5.0 in
    May
  • mainly for testing since all functionalities were
    not available
  • DPM 1.3.8 RPMs are already available
  • 4 flavors are now built
  • SL3 with MySQL or Oracle backend
  • same with RH73
  • DPM 1.4.x is just next door
  • to be distributed along with LCG 2.7.0 very soon
  • will include VOMS, virtual IDs, srmCopy

9
Documentation/Packaging/Installation
  • Reference man pages are provided for each method
    and server
  • Installation guide is also ready
  • more use cases merged when needs arise
  • see the Twiki page
  • https//uimon.cern.ch/twiki/bin/view/LCG/DpmAdminG
    uide
  • The DPM is available as a tarball or as a set of
    RPMs (eg, for the MySQL backend)
  • DPM-server-mysql -1.3.8-1sec_sl3
  • DPM-name-server-mysql -1.3.8-1sec_sl3
  • DPM-srm-server-mysql -1.3.8-1sec_sl3
  • DPM-rfio-server -1.3.8-2sec_sl3
  • DPM-gsiftp-server -1.3.8-1sec_sl3
  • DPM-client-1.3.8-1sec_sl3
  • lcg-dm-common-1.3.8-1_sl3
  • Installation scripts (Yaim, Quattor) are also
    available
  • A support discussion list available as well
  • hep-service-dpm_at_cern.ch

10
Disk Pool Manager short term plan
  • Bugfix release for incorrect free space reported
    1.3.8
  • Support for large filesystems (gt 2 TB) 1.3.8
  • Procedure to convert a Classic SE into a DPM
    1.3.8
  • Integration with VOMS under work
  • DNs will be mapped to virtual UIDs the virtual
    uid is created on the fly the first time the
    system receives a request for this DN (no pool
    account)
  • VOMS roles will be mapped to virtual GIDs
  • A given user may have one UID and several GIDs
  • Integration with CSEC and CGSI
  • Administrative tools to update the DB mapping
    table
  • Propagation of permissions to Storage Elements
  • We need a Consistency Server or RRS
  • srmCopy under work
  • du command under work
  • RFIO_PORT_RANGE under work

11
DPM medium term plan (end 05)
  • DPM with disk servers on several network domains
    (distributed Tier2)
  • Integration with Information System - BDII (DESY)
  • already partly available in the simplest case
  • MySQL backups
  • DPM DSI plugin for Globus 4 gridFTP server
  • Drain of a pool, a server or a filesystem
  • Integration with fabric monitoring
  • Limit number of streams per disk server (may be
    useful for some applications like bulk
    replication, so pool dependent)
  • Support for ROOTD/XROOTD transfer protocol

12
DPM longer term plan (first half 06)
  • RFIO client library compatible with CASTOR
  • very urgent in fact
  • Quotas (INFN/DESY)
  • (automatic) replication inside of a pool
  • Global space reservation with max lifetime
  • required by experiments
  • Streaming mode (SRM v3)
  • implementation of a migrator/recaller to
  • either recall/migrate files automatically between
    Tier1 and Tier2s
  • or interface to a tape/DVD backend

13
DPM Insiders
  • Many ways to manage a given DPM user file
  • to transfer a file (back and forth) from local to
    SE
  • thru the socket interface
  • involves NS, DPM and RFIO/GsiFTP servers
  • thru the SRM v1/v2 interfaces
  • involves only the relevant SRM server on top of
    the 4 above
  • thru direct RFIO/GsiFTP commands
  • involves only the relevant server on top of NS
    DPM ones
  • implies special syntax for these commands
  • thru lcg-utils commands (e.g. lcg-cr, lcg-gt, )
  • involves SRM v1 and GsiFTP servers only, on top
    of NS DPM
  • to remotely access a DPM SE file
  • thru GFAL C API (POSIX interface, e.g. gfal_open,
    gfal_read, )
  • involves SRM v1 and RFIO servers only, on top of
    NS DPM
  • when running on a WN, I strongly advise this
    direct access type since then the file does not
    require any local space

14
DPM Insiders - more details
  • All these modes are interoperable
  • DPNS/DPM commands allow for direct access to the
    NS catalog
  • Once again, all servers are authenticated
  • RFIO GsiFTP need to be installed only on disk
    servers
  • increased security, increased stability
  • physical files are accessible (RW) only to the
    dpmmgr account
  • same RFIO GsiFTP client commands are still able
    to manage ordinary files on other servers
  • with authentication
  • Castor DPM access still possible from the same
    client node thru use of LCG_RFIO_TYPE env. var.
    from within GFAL only

15
Test Cycle
  • Tests were run from very early in the development
    cycle and were improved as soon as new
    functionalities were merged into the servers
  • Test results were immediately taken into account
    for code development
  • Test suites for SRM v2 DPM socket interface and
    RFIO written by Gilbert Grosdidier (Orsay)
  • Suite for SRM v1 written initially by Jiri Kosina
    (Prague)
  • Each suite includes up to 120 sub-tests
  • A global suite merging all of the above now
    available (GG)
  • includes now lcg-util and Gfal tests as well
  • Stress testing (50-100 such suites running in
    parallel) to assess robustness and performance
  • Everything was tested on RH73 and SLC3 for both
    Oracle and MySQL backends

16
Testing (2)
  • Took about 50 of the global effort
  • together with install phase setup
  • This was required
  • to assess robustness to failure conditions
  • many failure use cases are inserted in the suites
  • to review carefully error message handling
  • but nothing is perfect
  • These suites now routinely used to check new
    installs
  • esp. access permission authentication
    cross-checks
  • specifically, the global suite merging all
    possible access modes is very well suited to this
  • Many ill-coded client requests leading to SRM
    server crashes have been fixed in the last
    release
  • the gSOAP interface is weakly protected from these

17
Plans for gLite production service
  • Provide a plausible solution for small Tier-2s
  • Migration of a Classic SE towards a DPM SE
  • Only metadata operations needed (the data does
    not need to be copied)
  • Satisfies gLite requirement for SRM interface at
    small sites
  • Several test installs already achieved at CERN

18
Current sites using DPM
  • - NDGF (Norway-Denmark-Sweden)
  • - Glasgow (UK)
  • - Edinburgh (UK)
  • - QMUL (UK)
  • - RAL (UK)
  • - INFN-Catania (Italy)
  • - INFN-Legnaro (Italy)
  • - INFN-Bari (Italy)
  • - INFN-Padova (Italy)
  • - CERN (Switzerland)
  • - CSCS (Switzerland)
  • - LAL-DAPNIA-LPNHE (France)
  • - NIKHEF (Netherlands)
  • - IFCA (Spain)
  • - BNL (USA)
  • - HIP (Finland)
  • - TW-NCUHEP (Taiwan)
  • - ASCC (Taiwan)
Write a Comment
User Comments (0)
About PowerShow.com