Title: ARDA status and plans
1ARDA status and plans
2TOC
- History
- Prototype activity
- Highlights
- LHCb, ALICE, CMS
- Some more detail on ATLAS!
- But go to the ADA meeting tomorrow (a lot of
material/details/demo from the ARDA team)! - Other activities
- Conclusions
I have more material I will be able to show. I
will try to be very fast in flashing transparency
and use them for questions/discussions.
3The ARDA project
- History
- LCG ARDA RTAG and 1st ARDA workshop
recommendation - LCG PEB project definition (February 2004)
- ARDA project starts in April 2004
- ARDA set up as an independent project
- LCG PEB decision (Experiments decision)
- Focus on prototype activities
- Starting point existing distributed system in
the experiments - gLite as back end technology
4ARDA working group recommendations our starting
point
- New service decomposition
- Strong influence of Alien system
- the Grid system developed by the ALICE
experiments and used by a wide scientific
community (not only HEP) - Role of experience, existing technology
- Web service framework
- Interfacing to existing middleware to enable
their use in the experiment frameworks - Early deployment of (a series of) prototypes to
ensure functionality and coherence
EGEE Middleware(gLite)
ARDA project
5End-to-end prototypes why?
- Provide a fast feedback to the EGEE MW
development team - Avoid uncoordinated evolution of the middleware
- Coherence between users expectations and final
product - Experiments ready to benefit from the new MW as
soon as possible - Frequent snapshots of the middleware available
- Expose the experiments (and the community in
charge of the deployment) to the current
evolution of the whole system - Experiments system are very complex and still
evolving - Move forward towards new-generation real systems
(analysis!) - Prototypes should be exercised with realistic
workload and conditions - No academic exercises or synthetic demonstrations
- LHC experiments users absolutely required here!!!
EGEE Pilot Application - A lot of work (experience and useful software) is
involved in current experiments data challenges - Concrete starting point
- Adapt/complete/refactorise the existing we do
not need another system! - Contact with the experiment (including agreeing
on the the programme of work) mainly via
experiment interface persons
6End-to-end prototypes how?
- The initial prototype will have a reduced scope
- Components selection for the first prototype
- Experiments components not in use for the first
prototype are not ruled out (and used/selected
ones might be replaced later on) - Not all use cases/operation modes will be
supported - Every experiment has a production system (with
multiple backends, like PBS, LCG, G2003,
NorduGrid, ). We focus on end-user analysis on a
EGEE MW based infrastructure - Adapt/complete/refactorise the existing
experiment (sub)system! - Collaborative effort (not a parallel development)
- Attract and involve users
- Many users are absolutely required
- Informal Use Cases like
- A physicist selects a data sample (from current
Data Challenges) - With an example/template as starting point (s)he
prepares a job to scan the data - The job is split in sub-jobs, dispatched to the
Grid, some error-recovery is automatically
performed, merged back in a single output - The output (histograms, ntuples) is returned
together with simple information on the job-end
status
7ARDA _at_ Regional Centres
- Deployability is a key factor of MW success
- On different time scales gLite prototype and Pre
Production Service - Understand Deployability issues
- Quick feedback loop
- Extend the test bed for ARDA users
- Stress and performance tests could be ideally
located outside CERN - This is for experiment-specific components (e.g.
a Meta Data catalogue) - Leverage on Regional Centre local know how
- Data base technologies
- Web services
-
- Pilot sites might enlarge the resources available
and give fundamental feedback in terms of
deployability to complement the EGEE SA1
activity (EGEE/LCG operations Pre Production
Service) - Running ARDA pilot installations
- Experiment data available where the experiment
prototype is deployed
8People
- Massimo Lamanna
- (EGEE NA4 Frank Harris)
- Birger Koblitz
- Dietrich Liko
- Frederik Orellana
- Derek Feichtinger
- Andreas Peters
- Julia Andreeva
- Juha Herrala
- Andrew Maier
- Kuba Moscicki
Russia
- Andrey Demichev
- Viktor Pose
- Alex Kryukov
- Wei-Long Ueng
- Tao-Sheng Chen
- 2 PhD students (just starting)
- Many students requests
-
Taiwan
ATLAS
Visitors
ALICE
Experiment interfaces Piergiorgio Cerello
(ALICE) David Adams (ATLAS) Lucia Silvestris
(CMS) Ulrik Egede (LHCb)
CMS
LHCb
9Prototype overview
10Related activities
- Integrating with gLite
- Enabling job submission through GANGA to gLite
- Job splitting and merging
- Result retrieval
- Enabling real analysis jobs to run on gLite
- Running DaVinci jobs on gLite (custom code user
algorithms) - Installation of LHCb software using gLite package
manager - Participating in the overall development of Ganga
- Software process (initially)
- CVS, Savannah, Release Managment
- Mayor contribution in new versions
- CLI, Ganga clients
- LHCb Metadata catalogue performance tests
- In collaboration with colleagues from Taiwan
- New activity started using the ARDA metadata
prototype (newversion, collaboration with
gridPP people)
11Current Status
- GANGA job submission handler for gLite is
developed - DaVinci job runs on gLite submitted through GANGA
Presented in the LHCb software week
Demo in Rio and Den Haag
12Ganga clients
13Interactive Session
- Demo at Supercomputing 04 and Den Haag
Demo in the ALICE sw week
14 CMS - Using MonAlisafor user job monitoring
A single job Is submiited to gLite JDL contains
job-splitting instructions Master job is
splitted by gLite into sub-jobs
- Demo at Supercomputing 04
Dynamic monitoring of the total number of the
events of processed by all sub-jobs belonging
to the same Master job
15CMS getting output from gLite
- When the jobs are over the output files created
by all sub-jobs belonging to the same master are
retrieved by the Workflow Planner to the
directory defined by the user. - On user request output files are merged by the
Workflow Planner (currently implemented for Root
trees and histograms). - Root session is started by the Workflow Planner.
Presentation Friday (APROM meeting)
16Related Activities
- Job submission to gLite by PhySH
- Physicist Shell
- Integrates Grid Tools
- Collaboration with CLARENS
- ARDA participates also in
- Evolvement of PubDB
- Effective access to data
- Redesign of RefDB
- Metadata catalog
17ATLAS/ARDA
Presentations tomorrow (ADA meeting)
- Main component
- Contribute to the DIAL evolution
- gLite analysis server
- Embedded in the experiment
- AMI tests and interaction
- Production and CTB tools
- Job submission (ATHENA jobs)
- Integration of the gLite Data Management within
Don Quijote - Benefit from the other experiments prototypes
- First look on interactivity/resiliency issues
- Agent-based approach (a la DIRAC)
- GANGA (Principal component of the LHCb prototype,
key component of the overall ATLAS strategy)
18DIAL _at_ gLite
- Interface to gLite Task Queue
- Available since May
- For release 0.92
- Proof of concept OK
- Interface to WMS
- Available since October (still under test)
- Need full WMS interface!
- For release 0.94
- AFS now only used for installation
- Service available on LXB0712
- Move to machines with external connectivity when
they are available
19Data Management
Don Quijote Locate and move data over grid
boundaries
ARDA has connected gLite
DQ Client
Presentation tomorrow (ADA meeting)
DQ server
DQ server
DQ server
DQ server
RLS
SE
RLS
RLS
SE
RLS
SE
SE
GRID3
Nordugrid
gLite
LCG
20ATCOM _at_ CTB
- Combined Testbeam
- Various extensions were made to accommodate the
new database schema used for CTB data analysis. - New panes to edit transformations, datasets and
partitions were implemented. - Production System
- A first step is to provide a prototype with
limited functionality, but support for the new
production system.
21Combined Test Beam
Real data processed at gLite Standard Athena for
testbeam Data from CASTOR Processed on gLite
worker node
Example ATLAS TRT data analysis done by PNPI St
Petersburg Number of straw hits per layer
22Prototype overview
23Prototype Deployment
- Currently 34 worker nodes are available at CERN
- 10 nodes (RH7.3, PBS)
- 20 nodes (low end, SLC, LSF)
- 4 nodes (high end, SLC, LSF)
- 1 node is available in Wisconsin
-
- Number of CPUs will increase
- Number of sites will increase
- FZK Karlsruhe is preparing to connect another
site - Basic middleware components already installed
- One person hired (6-month contract) up and
running - One person to arrive in January
- Further extensions are under discussion right now
Access granted on May 18th ! ?
24Access Authorization
- gLite uses Globus Grid-Certificates(X.509) to
authenticate authorize, session not encrypted - VOMS is used for VO Management
- Getting access to gLite for a new user is often
painful due to registration problems - It takes minimum one day for some it can take
up to two weeks!
25Accessing gLite
- Easy access to gLite considered very important
- Three shells available
- Alien shell
- ARDA shell
- gLiteIO shell
- Too many
26ARDA shell C/C API
- C access library for gLite has been developed
by ARDA - High performance
- Protocol quite proprietary...
- Essential for the ALICE prototype
- Generic enough for general use
- Using this API grid commands have been added
seamlessly to the standard shell
27ARDA Feedback
- Lightweight shell is important
- Ease of installation
- No root access
- Behind NAT routers
- Shell goes together with the GAS
- Should presents the user a simplified picture of
the grid - Strong aspect of the architecture
- Not everybody liked it when it was presented
- But not everybody implies that the rest liked
the idea - Role of GAS should be clarified
28Work Load Management
- ARDA has been evaluating two WMSs
- WMS derived from Alien Task Queue
- available since April
- pull model
- integrated with gLite shell, file catalog and
package manager - WMS derived from EDG
- available since middle of October
- currently push model (pull model not yet possible
but foreseen) - not yet integrated with other gLite components
(file catalogue, package manager, gLite shell)
29Stability
- Job queues monitored at CERN every hour by ARDA
- 80 Success rate (Jobs don't do anything real)
- Component support should not depend on single key
persons
30Job submission
- Submitting of a user job to gLite
- Register executable in the user bin directory
- Create JDL file with requirements
- Submit JDL
- Straight forward, did not experience any problems
- except system stability
- Advanced features tested by ARDA
- Job splitting based on the gLite file catalogue
LFN hierarchy - Collection of outputs of split jobs in a master
job directory - This functionality is widely used in the ARDA
prototypes
31Data Management
- ARDA has been evaluating two DMSs
- gLite File Catalog
- (deployed in April)
- Allowed to access experiments data from CERN
CASTOR and with low efficiency from the
Wisconsin installation - LFN name space is organized as a very intuitive
hierarchical structure - MySQL backend
- Local File Catalogue (Fireman)
- (deployed in November)
- Just delivered to us
- gliteIO
- Oracle backend
32Performance
- gLite File catalog
- Good performance due to streaming
- 80 concurrent queries, 0.35 s/query, 2.6s startup
time - Fireman catalog
- First attempt to use the catalog quite high
entrance fee - Good performance
- Not yet stable results due to unexpected crashes
- We are interacting with the developers
33Fireman tests
- Single entries up to 100000
- Successful, but no stable performance numbers yet
- Time outs in reading back (ls)
- Erratic values for bulk insertion
- Problem in concurrency
- Bulk registration
- After some crashes, it seems to work more stable
- No statistics yet
- Bulk registration as a transaction
- In case of error, no file is registered (OK)
- Interactions with gLite
- First draft note ready (ARDA site)
34gliteIO
- Simple test procedure
- Create small random file
- copy to SE and read it back
- Check if it still ok
- Repeat that until one observes a problem
-
- A number of crashes observed
- From the client side the problem cannot be
understood - In one case, a data corruption has been observed
- We are interacting with the developers
35ARDA Feedback
- We keep on testing the catalogs
- We are in contact with the developers
- Consider a clean C API for the catalogs
- Hide the SOAP toolkit
- Probably handcrafted
- Or is there a better toolkit ????
- gLiteIO has to be rock stable
36Package management
- Multiple approaches exist for handling of the
experiment software and user private packages on
the Grid - Pre-installation of the experiment software is
implemented by a site manager with further
publishing of the installed software. Job can run
only on a site where required package is
preinstalled. - Installation on demand at the worker node.
Installation can be removed as soon as job
execution is over. - Current gLite package management implementation
can handle light-weight installations, close to
the second approach - Clearly more work has to be done to satisfy
different use cases
37Metadata
- gLite has provided a prototype interface and
implementation mainly for the Biomed community - The gLite file catalog has some metadata
functionality and has been tested by ARDA - Information containing file properties (file
metadata attributes) can be defined in a tag
attached to a directory in the file catalog. - Access to the metadata attributes is via gLite
shell - Knowledge of schema is required
- No schema evolution
- Can these limitations be overcome?
38ARDA Metadata
- ARDA preparatory work
- Stress testing of the existing experiment
metadata catalogues was performed (for ATLAS
good collaboration with the AMI team) - Existing implementations showed to share similar
problems - ARDA technology investigation
- On the other hand usage of extended file
attributes in modern systems (NTFS, NFS, EXT2/3
SCL3,ReiserFS,JFS,XFS) was analyzed - a sound POSIX standard exists!
- Presentation in LCG-GAG and discussion with gLite
- As a result of metadata studies a prototype for a
metadata catalogue was developed
39Performance
- Tested operations
- query catalogue by meta attributes
- attaching meta attributes to the files
- First client LHCb
40ARDA workshops and related activities
- ARDA workshop (January 2004 at CERN open)
- ARDA workshop (June 21-23 at CERN by invitation)
- The first 30 days of EGEE middleware
- NA4 meeting (15 July 2004 in Catania EGEE open
event) - ARDA workshop (October 20-22 at CERN open)
- LCG ARDA Prototypes
- NA4 meeting 24 November (EGEE conference in Den
Haag) - ARDA workshop (Early 2005 open)
- Sharing of the AA meeting (Wed afternoon) to
start soon (recommendation of the ARDA workshop) - gLite documents discussions fostered by ARDA
(review process, workshop, invitation of the
experiments to the EGEE PTF) - GAG meetings
41Conclusions
- ARDA has been set up to
- enable distributed HEP analysis on gLite
- Contact have been established
- With the experiments
- With the middleware
- Experiment activities are progressing rapidly
- Prototypes for LHCb, ALICE, ATLAS CMS are on
the way - Complementary aspects are studied
- Good interaction with the experiments environment
- Desperately seeking for users! (more interested
in physics than in mw we support them!) - ARDA is providing early feedback to the
development team - First use of components
- Try to run real life HEP applications
- Follow the development on the prototype
- Some of the experiment-related ARDA activities
could be of general use - Shell access (originally in ALICE/ARDA)