ARDA status and plans - PowerPoint PPT Presentation

1 / 41
About This Presentation
Title:

ARDA status and plans

Description:

go to the ADA meeting tomorrow (a lot of material/details ... Evolvement of PubDB. Effective access to data. Redesign of RefDB. Metadata catalog. ATLAS/ARDA ... – PowerPoint PPT presentation

Number of Views:87
Avg rating:3.0/5.0
Slides: 42
Provided by: Dietri6
Category:

less

Transcript and Presenter's Notes

Title: ARDA status and plans


1
ARDA status and plans
  • Massimo Lamanna / CERN

2
TOC
  • History
  • Prototype activity
  • Highlights
  • LHCb, ALICE, CMS
  • Some more detail on ATLAS!
  • But go to the ADA meeting tomorrow (a lot of
    material/details/demo from the ARDA team)!
  • Other activities
  • Conclusions

I have more material I will be able to show. I
will try to be very fast in flashing transparency
and use them for questions/discussions.
3
The ARDA project
  • History
  • LCG ARDA RTAG and 1st ARDA workshop
    recommendation
  • LCG PEB project definition (February 2004)
  • ARDA project starts in April 2004
  • ARDA set up as an independent project
  • LCG PEB decision (Experiments decision)
  • Focus on prototype activities
  • Starting point existing distributed system in
    the experiments
  • gLite as back end technology

4
ARDA working group recommendations our starting
point
  • New service decomposition
  • Strong influence of Alien system
  • the Grid system developed by the ALICE
    experiments and used by a wide scientific
    community (not only HEP)
  • Role of experience, existing technology
  • Web service framework
  • Interfacing to existing middleware to enable
    their use in the experiment frameworks
  • Early deployment of (a series of) prototypes to
    ensure functionality and coherence

EGEE Middleware(gLite)
ARDA project
5
End-to-end prototypes why?
  • Provide a fast feedback to the EGEE MW
    development team
  • Avoid uncoordinated evolution of the middleware
  • Coherence between users expectations and final
    product
  • Experiments ready to benefit from the new MW as
    soon as possible
  • Frequent snapshots of the middleware available
  • Expose the experiments (and the community in
    charge of the deployment) to the current
    evolution of the whole system
  • Experiments system are very complex and still
    evolving
  • Move forward towards new-generation real systems
    (analysis!)
  • Prototypes should be exercised with realistic
    workload and conditions
  • No academic exercises or synthetic demonstrations
  • LHC experiments users absolutely required here!!!
    EGEE Pilot Application
  • A lot of work (experience and useful software) is
    involved in current experiments data challenges
  • Concrete starting point
  • Adapt/complete/refactorise the existing we do
    not need another system!
  • Contact with the experiment (including agreeing
    on the the programme of work) mainly via
    experiment interface persons

6
End-to-end prototypes how?
  • The initial prototype will have a reduced scope
  • Components selection for the first prototype
  • Experiments components not in use for the first
    prototype are not ruled out (and used/selected
    ones might be replaced later on)
  • Not all use cases/operation modes will be
    supported
  • Every experiment has a production system (with
    multiple backends, like PBS, LCG, G2003,
    NorduGrid, ). We focus on end-user analysis on a
    EGEE MW based infrastructure
  • Adapt/complete/refactorise the existing
    experiment (sub)system!
  • Collaborative effort (not a parallel development)
  • Attract and involve users
  • Many users are absolutely required
  • Informal Use Cases like
  • A physicist selects a data sample (from current
    Data Challenges)
  • With an example/template as starting point (s)he
    prepares a job to scan the data
  • The job is split in sub-jobs, dispatched to the
    Grid, some error-recovery is automatically
    performed, merged back in a single output
  • The output (histograms, ntuples) is returned
    together with simple information on the job-end
    status

7
ARDA _at_ Regional Centres
  • Deployability is a key factor of MW success
  • On different time scales gLite prototype and Pre
    Production Service
  • Understand Deployability issues
  • Quick feedback loop
  • Extend the test bed for ARDA users
  • Stress and performance tests could be ideally
    located outside CERN
  • This is for experiment-specific components (e.g.
    a Meta Data catalogue)
  • Leverage on Regional Centre local know how
  • Data base technologies
  • Web services
  • Pilot sites might enlarge the resources available
    and give fundamental feedback in terms of
    deployability to complement the EGEE SA1
    activity (EGEE/LCG operations Pre Production
    Service)
  • Running ARDA pilot installations
  • Experiment data available where the experiment
    prototype is deployed

8
People
  • Massimo Lamanna
  • (EGEE NA4 Frank Harris)
  • Birger Koblitz
  • Dietrich Liko
  • Frederik Orellana
  • Derek Feichtinger
  • Andreas Peters
  • Julia Andreeva
  • Juha Herrala
  • Andrew Maier
  • Kuba Moscicki

Russia
  • Andrey Demichev
  • Viktor Pose
  • Alex Kryukov
  • Wei-Long Ueng
  • Tao-Sheng Chen
  • 2 PhD students (just starting)
  • Many students requests

Taiwan
ATLAS
Visitors
ALICE
Experiment interfaces Piergiorgio Cerello
(ALICE) David Adams (ATLAS) Lucia Silvestris
(CMS) Ulrik Egede (LHCb)
CMS
LHCb
9
Prototype overview
10
Related activities
  • Integrating with gLite
  • Enabling job submission through GANGA to gLite
  • Job splitting and merging
  • Result retrieval
  • Enabling real analysis jobs to run on gLite
  • Running DaVinci jobs on gLite (custom code user
    algorithms)
  • Installation of LHCb software using gLite package
    manager
  • Participating in the overall development of Ganga
  • Software process (initially)
  • CVS, Savannah, Release Managment
  • Mayor contribution in new versions
  • CLI, Ganga clients
  • LHCb Metadata catalogue performance tests
  • In collaboration with colleagues from Taiwan
  • New activity started using the ARDA metadata
    prototype (newversion, collaboration with
    gridPP people)

11
Current Status
  • GANGA job submission handler for gLite is
    developed
  • DaVinci job runs on gLite submitted through GANGA

Presented in the LHCb software week
Demo in Rio and Den Haag
12
Ganga clients
13
Interactive Session
  • Demo at Supercomputing 04 and Den Haag

Demo in the ALICE sw week
14
CMS - Using MonAlisafor user job monitoring
A single job Is submiited to gLite JDL contains
job-splitting instructions Master job is
splitted by gLite into sub-jobs
  • Demo at Supercomputing 04

Dynamic monitoring of the total number of the
events of processed by all sub-jobs belonging
to the same Master job
15
CMS getting output from gLite
  • When the jobs are over the output files created
    by all sub-jobs belonging to the same master are
    retrieved by the Workflow Planner to the
    directory defined by the user.
  • On user request output files are merged by the
    Workflow Planner (currently implemented for Root
    trees and histograms).
  • Root session is started by the Workflow Planner.

Presentation Friday (APROM meeting)
16
Related Activities
  • Job submission to gLite by PhySH
  • Physicist Shell
  • Integrates Grid Tools
  • Collaboration with CLARENS
  • ARDA participates also in
  • Evolvement of PubDB
  • Effective access to data
  • Redesign of RefDB
  • Metadata catalog

17
ATLAS/ARDA
Presentations tomorrow (ADA meeting)
  • Main component
  • Contribute to the DIAL evolution
  • gLite analysis server
  • Embedded in the experiment
  • AMI tests and interaction
  • Production and CTB tools
  • Job submission (ATHENA jobs)
  • Integration of the gLite Data Management within
    Don Quijote
  • Benefit from the other experiments prototypes
  • First look on interactivity/resiliency issues
  • Agent-based approach (a la DIRAC)
  • GANGA (Principal component of the LHCb prototype,
    key component of the overall ATLAS strategy)

18
DIAL _at_ gLite
  • Interface to gLite Task Queue
  • Available since May
  • For release 0.92
  • Proof of concept OK
  • Interface to WMS
  • Available since October (still under test)
  • Need full WMS interface!
  • For release 0.94
  • AFS now only used for installation
  • Service available on LXB0712
  • Move to machines with external connectivity when
    they are available

19
Data Management
Don Quijote Locate and move data over grid
boundaries
ARDA has connected gLite
DQ Client
Presentation tomorrow (ADA meeting)
DQ server
DQ server
DQ server
DQ server
RLS
SE
RLS
RLS
SE
RLS
SE
SE
GRID3
Nordugrid
gLite
LCG
20
ATCOM _at_ CTB
  • Combined Testbeam
  • Various extensions were made to accommodate the
    new database schema used for CTB data analysis.
  • New panes to edit transformations, datasets and
    partitions were implemented.
  • Production System
  • A first step is to provide a prototype with
    limited functionality, but support for the new
    production system.

21
Combined Test Beam
Real data processed at gLite Standard Athena for
testbeam Data from CASTOR Processed on gLite
worker node
Example ATLAS TRT data analysis done by PNPI St
Petersburg Number of straw hits per layer
22
Prototype overview
23
Prototype Deployment
  • Currently 34 worker nodes are available at CERN
  • 10 nodes (RH7.3, PBS)
  • 20 nodes (low end, SLC, LSF)
  • 4 nodes (high end, SLC, LSF)
  • 1 node is available in Wisconsin
  • Number of CPUs will increase
  • Number of sites will increase
  • FZK Karlsruhe is preparing to connect another
    site
  • Basic middleware components already installed
  • One person hired (6-month contract) up and
    running
  • One person to arrive in January
  • Further extensions are under discussion right now

Access granted on May 18th ! ?
24
Access Authorization
  • gLite uses Globus Grid-Certificates(X.509) to
    authenticate authorize, session not encrypted
  • VOMS is used for VO Management
  • Getting access to gLite for a new user is often
    painful due to registration problems
  • It takes minimum one day for some it can take
    up to two weeks!

25
Accessing gLite
  • Easy access to gLite considered very important
  • Three shells available
  • Alien shell
  • ARDA shell
  • gLiteIO shell
  • Too many

26
ARDA shell C/C API
  • C access library for gLite has been developed
    by ARDA
  • High performance
  • Protocol quite proprietary...
  • Essential for the ALICE prototype
  • Generic enough for general use
  • Using this API grid commands have been added
    seamlessly to the standard shell

27
ARDA Feedback
  • Lightweight shell is important
  • Ease of installation
  • No root access
  • Behind NAT routers
  • Shell goes together with the GAS
  • Should presents the user a simplified picture of
    the grid
  • Strong aspect of the architecture
  • Not everybody liked it when it was presented
  • But not everybody implies that the rest liked
    the idea
  • Role of GAS should be clarified

28
Work Load Management
  • ARDA has been evaluating two WMSs
  • WMS derived from Alien Task Queue
  • available since April
  • pull model
  • integrated with gLite shell, file catalog and
    package manager
  • WMS derived from EDG
  • available since middle of October
  • currently push model (pull model not yet possible
    but foreseen)
  • not yet integrated with other gLite components
    (file catalogue, package manager, gLite shell)

29
Stability
  • Job queues monitored at CERN every hour by ARDA
  • 80 Success rate (Jobs don't do anything real)
  • Component support should not depend on single key
    persons

30
Job submission
  • Submitting of a user job to gLite
  • Register executable in the user bin directory
  • Create JDL file with requirements
  • Submit JDL
  • Straight forward, did not experience any problems
  • except system stability
  • Advanced features tested by ARDA
  • Job splitting based on the gLite file catalogue
    LFN hierarchy
  • Collection of outputs of split jobs in a master
    job directory
  • This functionality is widely used in the ARDA
    prototypes

31
Data Management
  • ARDA has been evaluating two DMSs
  • gLite File Catalog
  • (deployed in April)
  • Allowed to access experiments data from CERN
    CASTOR and with low efficiency from the
    Wisconsin installation
  • LFN name space is organized as a very intuitive
    hierarchical structure
  • MySQL backend
  • Local File Catalogue (Fireman)
  • (deployed in November)
  • Just delivered to us
  • gliteIO
  • Oracle backend

32
Performance
  • gLite File catalog
  • Good performance due to streaming
  • 80 concurrent queries, 0.35 s/query, 2.6s startup
    time
  • Fireman catalog
  • First attempt to use the catalog quite high
    entrance fee
  • Good performance
  • Not yet stable results due to unexpected crashes
  • We are interacting with the developers

33
Fireman tests
  • Single entries up to 100000
  • Successful, but no stable performance numbers yet
  • Time outs in reading back (ls)
  • Erratic values for bulk insertion
  • Problem in concurrency
  • Bulk registration
  • After some crashes, it seems to work more stable
  • No statistics yet
  • Bulk registration as a transaction
  • In case of error, no file is registered (OK)
  • Interactions with gLite
  • First draft note ready (ARDA site)

34
gliteIO
  • Simple test procedure
  • Create small random file
  • copy to SE and read it back
  • Check if it still ok
  • Repeat that until one observes a problem
  • A number of crashes observed
  • From the client side the problem cannot be
    understood
  • In one case, a data corruption has been observed
  • We are interacting with the developers

35
ARDA Feedback
  • We keep on testing the catalogs
  • We are in contact with the developers
  • Consider a clean C API for the catalogs
  • Hide the SOAP toolkit
  • Probably handcrafted
  • Or is there a better toolkit ????
  • gLiteIO has to be rock stable

36
Package management
  • Multiple approaches exist for handling of the
    experiment software and user private packages on
    the Grid
  • Pre-installation of the experiment software is
    implemented by a site manager with further
    publishing of the installed software. Job can run
    only on a site where required package is
    preinstalled.
  • Installation on demand at the worker node.
    Installation can be removed as soon as job
    execution is over.
  • Current gLite package management implementation
    can handle light-weight installations, close to
    the second approach
  • Clearly more work has to be done to satisfy
    different use cases

37
Metadata
  • gLite has provided a prototype interface and
    implementation mainly for the Biomed community
  • The gLite file catalog has some metadata
    functionality and has been tested by ARDA
  • Information containing file properties (file
    metadata attributes) can be defined in a tag
    attached to a directory in the file catalog.
  • Access to the metadata attributes is via gLite
    shell
  • Knowledge of schema is required
  • No schema evolution
  • Can these limitations be overcome?

38
ARDA Metadata
  • ARDA preparatory work
  • Stress testing of the existing experiment
    metadata catalogues was performed (for ATLAS
    good collaboration with the AMI team)
  • Existing implementations showed to share similar
    problems
  • ARDA technology investigation
  • On the other hand usage of extended file
    attributes in modern systems (NTFS, NFS, EXT2/3
    SCL3,ReiserFS,JFS,XFS) was analyzed
  • a sound POSIX standard exists!
  • Presentation in LCG-GAG and discussion with gLite
  • As a result of metadata studies a prototype for a
    metadata catalogue was developed

39
Performance
  • Tested operations
  • query catalogue by meta attributes
  • attaching meta attributes to the files
  • First client LHCb

40
ARDA workshops and related activities
  • ARDA workshop (January 2004 at CERN open)
  • ARDA workshop (June 21-23 at CERN by invitation)
  • The first 30 days of EGEE middleware
  • NA4 meeting (15 July 2004 in Catania EGEE open
    event)
  • ARDA workshop (October 20-22 at CERN open)
  • LCG ARDA Prototypes
  • NA4 meeting 24 November (EGEE conference in Den
    Haag)
  • ARDA workshop (Early 2005 open)
  • Sharing of the AA meeting (Wed afternoon) to
    start soon (recommendation of the ARDA workshop)
  • gLite documents discussions fostered by ARDA
    (review process, workshop, invitation of the
    experiments to the EGEE PTF)
  • GAG meetings

41
Conclusions
  • ARDA has been set up to
  • enable distributed HEP analysis on gLite
  • Contact have been established
  • With the experiments
  • With the middleware
  • Experiment activities are progressing rapidly
  • Prototypes for LHCb, ALICE, ATLAS CMS are on
    the way
  • Complementary aspects are studied
  • Good interaction with the experiments environment
  • Desperately seeking for users! (more interested
    in physics than in mw we support them!)
  • ARDA is providing early feedback to the
    development team
  • First use of components
  • Try to run real life HEP applications
  • Follow the development on the prototype
  • Some of the experiment-related ARDA activities
    could be of general use
  • Shell access (originally in ALICE/ARDA)
Write a Comment
User Comments (0)
About PowerShow.com