LCG ARDA project Status and plans

About This Presentation

Title:

LCG ARDA project Status and plans

Description:

GANGA Workshop: http://agenda.cern.ch/fullAgenda.php?ida=a052763 ... Using this API grid commands have been added seamlessly to the standard shell ... Physicist Shell ... – PowerPoint PPT presentation

Number of Views:94

Avg rating:3.0/5.0

Slides: 61

Provided by: Dietri6

Category:

more less

Transcript and Presenter's Notes

Title: LCG ARDA project Status and plans

1
LCG ARDA project Status and plans

Massimo Lamanna / CERN

2
Overview

ARDA in a nutshell
ARDA prototypes
4 experiments
ARDA feedback
Middleware components on the development test bed
ARDA workshops
Outlook and conclusions

3
The ARDA project

ARDA is an LCG project
main activity is to enable LHC analysis on the
grid
ARDA is contributing to EGEE
uses the entire CERN NA4-HEP resource (NA4
Applications)
Interface with the new EGEE middleware (gLite)
By construction, ARDA uses the new middleware
Use the grid software as it matures
Verify the components in an analysis environments
Contribution in the experiments framework
(discussion, direct contribution, benchmarking,)
Users needed here. Namely physicists needing
distributed computing to perform their analyses
Provide early and continuous feedback

4
ARDA prototype overview
5
Current Status

GANGA job submission handler for gLite is
developed
DaVinci job runs on gLite submitted through GANGA
Join project ATLAS/LHCb
And core component of the ARDA/LHCb

Presented in the LHCb software week
Demo in Rio and Den Haag
6
ARDA contributions

Integrating of LHCb environment with gLite
Enabling job submission through GANGA to gLite
Job splitting and merging
Result retrieval
Enabling real analysis jobs to run on gLite
Running DaVinci jobs on gLite (custom code user
algorithms)
Installation of LHCb software using gLite package
manager
Participating in the overall development of Ganga
Software process (initially)
CVS, Savannah, Release Management
Mayor contribution in new versions
Python command-line interface, Ganga clients

7
Ganga 4 - Introduction

Ganga 4 is the next major release of Ganga
Now in ALPHA (since 1 month, several releases,
exposed to expert users)
Ganga 3 has reached its limitations
Ganga 4 is based on the Ganga scripting
interface, the GUI will follow later
Ganga 4 will overcome some of the shortcomings of
the current release
By using a modular design
Applying functional decomposition of modules
Having extended functionality
Ganga 4 Software Reengineering
Starting point
recode Ganga 4 from scratch where needed
add GUI later
GPI Ganga Public Interface
95 based on Ganga3 CLIP
Ganga Core
fast and reliable
self-contained (NO GUI)
clear architecture and module dependency

8
Ganga4

Major version
Important contribution from theARDA team
Interesting concepts
Note that GANGA is a joint ATLAS-LHCbproject
Contacts with CMS (exchange of ideas,code
snippets, )

9
GANGA Workshop 13-15 of June
GANGA Workshop http//agenda.cern.ch/fullAgenda.p
hp?idaa052763 at Imperial College London
(organised by U. Egede)
10
LHCb user
S. K. Paterson (LHCb) Glasgow Univ.
11
Related activities

GANGA-DIRAC (LHCb production system)
Convergence with GANGA/components/experience
Submitting jobs to DIRAC using GANGA
GANGA-Condor
Enabling submission of jobs via GANGA
LHCb Metadata catalogue performance tests
In collaboration with colleagues from Taiwan
New activity started using the ARDA metadata
prototype (newversion, collaboration with
gridPP/LHCb people)
New prototypes for the GUI

12
ALICE prototype

ROOT and PROOF
ALICE provides
the UI
the analysis application (AliROOT)
GRID middleware gLite provides all the rest
ARDA/ALICE is evolving the ALICE analysis system

Middleware
UI shell
Application
end to end
13
PROOF SLAVES
Site B
PROOF MASTER SERVER
Site C
Site A
USER SESSION
Demo based on a hybrid system using 2004 prototype
14
Interactive Session

Demo at Supercomputing 04 and Den Haag

Demo in the ALICE sw week
15
ARDA shell C/C API

C access library for gLite has been developed
by ARDA
High performance
Protocol quite proprietary...
Essential for the ALICE prototype
Generic enough for general use
Using this API grid commands have been added
seamlessly to the standard shell

16
Current Status

Developed gLite C API and API Service
providing generic interface to any GRID service
C API is integrated into ROOT
In the ROOT CVS
job submission and job status query for batch
analysis can be done from inside ROOT
Bash interface for gLite commands with catalogue
expansion is developed
More powerful than the original shell
In use in ALICE
Considered a generic mw contribution (essential
for ALICE, interesting in general)
First version of the interactive analysis
prototype ready
Batch analysis model is improved
submission and status query are integrated into
ROOT
job splitting based on XML query files
application (Aliroot) reads file using xrootd
without prestaging

17
ATLAS/ARDA

Main component
Contribute to the DIAL evolution
gLite analysis server
Embedded in the experiment
AMI tests and interaction
Production and CTB tools
Job submission (ATHENA jobs)
Integration of the gLite Data Management within
Don Quijote
Benefit from the other experiments prototypes
First look on interactivity/resiliency issues
Agent-based approach (a la DIRAC)
GANGA (Principal component of the LHCb prototype,
key component of the overall ATLAS strategy)

ADA meeting
Tao-Sheng Chen, ASCC
18
Data Management
Don Quijote Locate and move data over grid
boundaries
ARDA has connected gLite
DQ Client
ADA meeting
DQ server
DQ server
DQ server
DQ server
RLS
SE
RLS
RLS
SE
RLS
SE
SE
GRID3
Nordugrid
gLite
LCG
19
ATCOM _at_ CTB

Combined Testbeam
Various extensions were made to accommodate the
new database schema used for CTB data analysis.
New panes to edit transformations, datasets and
partitions were implemented.
Production System
A first step is to provide a prototype with
limited functionality, but support for the new
production system.

ADA meeting
20
Combined Test Beam
Real data processed at gLite Standard Athena for
testbeam Data from CASTOR Processed on gLite
worker node
Example ATLAS TRT data analysis done by PNPI St
Petersburg Number of straw hits per layer
21
ATLAS first look in interactivity matters
ADA meeting. Using DIANE
22
ARDA/CMS

Pattern ARDA/CMS activity
Prototype (ASAP)
Contributions to CMS-specific components
RefDB/PubDB
Usage of components used by CMS
Notably Monalisa
Contribution to CMS-specific developments
Physh

23
ARDA/CMS

ARDA/CMS prototype
RefDB Re-Design and PubDB
Taking part in the RefDB redesign
Developing schema for PubDB and supervising
development of the first PubDB version
Analysis Prototype Connected to MonAlisa
To track the progress of an analysis task is
troublesome when the task is split into several
(hundreds of) sub-jobs
Analysis prototype associates each sub-job with
built-in identity and capability to report its
progress to the MonAlisa system
MonAlisa service receives and combines progress
reports of single sub-jobs and publishes the
overall progress of the whole task

24
ARDA/CMS

PhySh
Physicist Shell
ASAP is Python-based and it uses XML-RPC calls
for client-server interaction like Clarens and
PhySh
In addition, to enable future integration, the
analysis prototype has similarly structured CVS
repository as the PhySh project

25
ARDA-CMS

CMS prototype (ASAP Arda Support for cms
Analysis Processing)
First version of the CMS analysis prototype
capable of creating-submitting-monitoring of the
CMS analysis jobs on the gLite middleware had
been developed by the end of the year 2004
Demonstrated at the CMS week in December 2004
Prototype was evolved to support both RB
versions deployed at the CERN testbed (prototype
task queue and gLite 1.0 WMS ).
Currently submission to both RBs is available and
completely transparent for the users (same
configuration file, same functionality)
Plan to implement gLite job submission handler
for Crab
Users?
Starting from February 2005 CMS users began
working on the testbed submitting jobs through
ASAP
Positive feedback, suggestions from the users are
implemented asap
Plan to involve more users as soon as
preproduction farm is available
Plan to try and use in the prototype new
functionality provided by WMS (DAGs, interactive
job for testing purposes)

26
ASAP work and information flowFirst scenario
Monitoring system
CMS catalogs
Monalisa
RefDB
PubDB
Job running on the Worker Node
Job monitoring directory
gLite
JDL
ASAP UI
Job generation
Submission
Querying job status
Defines in the configuration file Application,
application version, ExecutableORCA data cards
Data sample, Working directory, Castor
directory to save output, Number of events to be
processed, Number of events per job
Saving output
Output files location
27
ASAP work and information flow Second scenario
Monalisa
RefDB
PubDB
Job running on the Worker Node
Job monitoring directory
gLite
JDL
ASAP UI
Job submission Checking job status Resubmission
in case of failure Fetching results Storing
results to Castor
Delegates user credentials using MyProxy
ASAP Job Monitoring service
Application,applicationversion, Executable, Orca
data cards Data sample, Working directory,
Castor directory to save output, Number of events
to be processed Number of events per job
Publishing Job status On the WEB
Output files location
28
CMS - Using MonAlisafor user job monitoring
A single job Is submiited to gLite JDL contains
job-splitting instructions Master job is split
by gLite into sub-jobs
Dynamic monitoring of the total number of the
events of processed by all sub-jobs belonging
to the same Master job
Demo at Supercomputing 04
29
ASAP Starting point for users

The user is familiar with the experiment
application needed to perform the analysis (ORCA
application for CMS)
The user knows how to create executable able to
run the analysis task (reading selected data
samples, use the data to compute derived
quantities, take decisions, fill histograms,
select events, etc). The executable is based on
the experiment framework
The user debugged the executable on small data
samples, on a local computer or computing
services (e.g. lxplus at CERN)
How to go for larger samples , which can be
located at any regional center CMS-wide?
The users should not be forced
to change anything in the compiled code
to change anything in the configuration file for
ORCA
to know where the data samples are located

30
Submission

The system hides the details of the submission to
the grid infrastructure
Location of the data sample (many file names,
data distributed)
The task is split in a set of identical subjobs
(every subjob processes a different chunk of
data)
The jobs are submitted

Grid/experiments catalogues

Experiment-specific
tools

Use the gLite Workload Mgmt Service

31
Job monitoring

Although the system hides the details of the grid
infrastructure, the user requires ways to
understand what is going on
How to follow the progress of the analysis task
(consisting of multiple jobs)
How many subjobs of the task are still running?
How many had failed?
What is the estimation time of the task to be
accomplished?
Certain subjob or the whole task had failed
Because of the Grid?
Because it encountered unexpected conditions or
corrupted data?
Because of a bug of mine?
The gLite middleware plus other components from
the CMS framework allows to build monitoring tools

ASAP info monitor (Monalisa)
gLite Logging and Bookkeeping

All these activities are not trivial at all on
the Grid
No direct access to the node where the code is
executed
Authentication/authorization mechanisms essential
(gLite myProxy)
Obviously the user cannot do without

Framework info sent to Monalisa
User-defined info sent to Monalisa
32
Retrieval of the results

The jobs are accomplished
The output files had to be
retrieved from the Grid
An (optional) merging stage
is run at the end
The user will get
One log file per job
Histograms summed over all subjob results
A chain of ntuples with all the events selected

Retrieve data from gLite Storage Element

Run experiment-specific tools
33
First CMS users on gLite

Demo of the first working version of the
prototype was done for the CMS community in
December 2004
ASAP is the first ARDA prototype which migrated
to gLite version 1.0
First CMS physicists started to work on the gLite
testbed using ASAP in the beginning of February
2005
Currently we support 5 users from different
physics group (can not allow more before moving
to the preproduction farm)
3 users - Higgs group
1 user - SUSY group
1 user Standard Model
Positive feed back from the users, got many
suggestions for improving interface and
functionality. Fruitful collaboration.
ASAP has a support mailing list and a web page
where we start to create a user guide
http//arda-cms.cern.ch/asap/doc

34
H-gt2t-gt2j analysis bkg. data available (all
signal events processed with Arda)
A. Nikitenko (CMS)
35
Higgs boson mass (Mtt) reconstruction
Higgs boson mass was reconstructed after basic
off-line cuts reco ETt jet gt 60 GeV, ETmiss gt
40 GeV. Mtt evaluation is shown for
the consecutive cuts pt gt 0 GeV/c, pn gt 0
GeV/c, Dfj1j2 lt 1750.
s(MH) s(ETmiss) / sin(fj1j2)
Mtt and s(Mtt) are in a very good agreement with
old results CMS Note 2001/040, Table 3 Mtt 455
GeV/c2, s(Mtt)77 GeV/c2. ORCA4, Spring 2000
production.
A. Nikitenko (CMS)
36
CMS A-gt2t-gt2j event at low luminosity
A. Nikitenko (CMS)
37
ARDA ASAP

First users were able to process their data on
gLite
Work of these pilot users can be regarded as a
first round of validation of the gLite middleware
and analysis prototypes
The number of users should increase as soon as
preproduction system will become available
Interest to have CPUs at the centres where data
sits (LHC Tier-1s)
To enable user analysis on the Grid
we will continue to work in the close
collaboration with the physics community and
gLite developers
ensuring good level of communication between them
providing constant feedback to the gLite
development team
Key factors to progress
Increasing number of users
Larger distributed systems
More middleware components

38
Prototype Deployment

2004
Prototype available (CERN Madison Wisconsin)
A lot of activity (4 experiments prototypes)
Main limitation size
Experiments data available! ?
Just an handful of worker nodes ?
2005
Coherent move to prepare a gLite package to be
deployed on the pre-production service
ARDA contribution
Mentoring and tutorial
Actual tests!
Lot of testing during 05Q1
PreProduction Service is about to start!

Access granted on May 18th ! ?
39
Workload Management System (WMS)

Last day monitor
Hello World! jobs
1 per minute
LoggingBookkeeping info on the web to help
thedevelopers

Last day
Last week
40
WMS monitor
41
Certification activity

Certification activity
Performed by the operation team
Using tests from other sources
Re-using tests (developed by the operations
group) which proved to be effective in order to
pin down problems in LCG2
LCG2 ? gLite
Lot of effort from ARDA (Mainly Hurng-Chun Lee
ASCC)
Several storm tests migrated
Help other people to get full speed in this

42
Data Management

Central component together with the WMS
Early tests started in 2004
Two main components
gLiteIO (protocol server to access the data)
FiReMan (file catalogue)
The two components are not isolated, for example
gLiteIO uses the ACL as recorded in FiReMan,
FiReMan exposes the physical location of files
for the WMS to optimise the job submissions
Both LFC and FiReMan offer large improvements
over RLS
LFC is the most recent LCG2 catalogue
Still some issues remaining
Scalability of FiReMan
Bulk Entry for LFC missing
More work needed to understand performance and
bottlenecks
Need to test some real Use Cases
In general, the validation of DM tools takes
time!

43
FiReMan Performance - Inserts

Inserted 1M entries in bulk with insert time
5ms
Insert Rate for different bulk sizes

350
300
250
Inserts / Second
200
150
100
50
0
1
2
5
10
20
50
Number Of Threads
44
FiReMan Performance - Queries

Query Rate for an LFN

1200
Fireman Single
Fireman Bulk 1
Fireman Bulk 10
1000
Fireman Bulk 100
Fireman Bulk 500
Fireman Bulk 1000
Fireman Bulk 5000
800
Entries Returned / Second
600
400
200
0
5
10
15
20
25
30
35
40
45
50
Number Of Threads
45
FiReMan Performance - Queries

Comparison with LFC

1200
Fireman - Single Entry
Fireman - Bulk 100
LFC
1000
800
Entries Returned / Second
600
400
200
0
1
2
5
10
20
50
100
Number Of Threads
46
More data coming C. Munro (ARDA Brunel Univ.)
at ACAT 05
47
Summary of gLite usage and testing

Info available also under http//lcg.web.cern.ch/l
cg/PEB/arda/LCG_ARDA_Glite.htm
gLite version 1
WMS
Continuous monitor available on the web (active
since 17th of February)
Concurrency tests
Usage with ATLAS and CMS jobs (Using Storage
Index)
Good improvements observed
DMS (FiReMan gLiteIO)
Early usage and feedback (since Nov04) on
functionality, performance and usability
Considerable improvement in performances/stability
observed since
Some of the tests given to the development team
for tuning and to JRA1 to be used in the testing
suite
Most of the tests given to JRA1 to be used in the
testing suite
Performance/stability measurements heavy-duty
testing needed for real validation
Contribution to the common testing effort to
finalise gLite 1 with SA1, JRA1 and NA4-testing)
Migration of certification tests within the
certification test suite (LCG?gLite)
Comparison between LFC (LCG) and FiReMan
Mini tutorial to facilitate the usage of gLite
within the NA4 testing

48
Metadata services on the Grid

gLite has provided a prototype for the EGEE
Biomed community (in 2004)
Requirements in ARDA (HEP) were not all satisfied
by that early version
ARDA preparatory work
Stress testing of the existing experiment
metadata catalogues
Existing implementations showed to share similar
problems
ARDA technology investigation
On the other hand usage of extended file
attributes in modern systems (NTFS, NFS, EXT2/3
SCL3,ReiserFS,JFS,XFS) was analysed
a sound POSIX standard exists!
Prototype activity in ARDA
Discussion in LCG and EGEE and UK GridPP Metadata
group
Synthesis
New interface which will be maintained by EGEE
benefiting from the activity in ARDA (tests and
benchmarking of different data bases and direct
collaboration with LHCb/GridPP)

49
ARDA Prototype for Metadata

Validate proposed interface
Architecture
Metadata organized in a hierarchy
Schemas can contain sub-schemas
Can inherit attributes
Analogy to file system
Schema ? Directory Entry ? File
Stability with large responses
Common problem discovered in our exploratory work
Send large responses in chunks
Otherwise preparing large responses could crash
server
Stateful server
DB ? Server Data streamed using DB cursors
Server ? Client Response sent in chunks

50
ARDA Implementation

Multiple back ends
Currently Oracle, PostgreSQL, SQLite
Dual front ends
TCP Streaming
Chosen for performance
SOAP
Formal requirement of EGEE
Compare SOAP with TCP Streaming
Also implemented as standalone Python library
Data stored on the file system

51
Dual Front End

Text based protocol
Data streamed to client in single connection
Implementations
Server C, multiprocess
Clients C, Java, Python, Perl, Ruby

Most operations are SOAP calls
Based on iterators
Session created
Return initial chunk of data and session token
Subsequent request client calls nextQuery()
using session token

Clean way to study performance implications of
protocols
52
More data coming N. Santos (ARDA Coimbra
Univ.) at ACAT 05

Test protocol performance
No work done on the backend
Switched 100Mbits LAN
Language comparison
TCP-S with similar performance in all languages
SOAP performance varies strongly with toolkit
Protocols comparison
Keepalive improves performance significantly
On Java and Python, SOAP is several times slower
than TCP-S
Measure scalability of protocols
Switched 100Mbits LAN
TCP-S 3x faster than gSoap (with keepalive)
Poor performance without keepalive
Around 1.000 ops/sec (both gSOAP and TCP-S)

1000 pings
53
Current Uses of the ARDA Metadata prototype

Evaluated by LHCb bookkeeping
Migrated bookkeeping metadata to ARDA prototype
20M entries, 15 GB
Feedback valuable in improving interface and
fixing bugs
Interface found to be complete
ARDA prototype showing good scalability
Ganga (LHCb, ATLAS)
User analysis job management system
Stores job status on ARDA prototype
Highly dynamic metadata
Discussed within the community
EGEE
UK GridPP Metadata group

54
Performance Study and summary of the ARDA
metadata prototype

SOAP increasingly used as standard protocol for
GRID computing
Promising web services standard -
Interoperability
Some potential weaknesses
XML encoding increases message size (4x to 10x
typical)
XML processing is compute and memory intensive
How significant are these weaknesses? What is the
cost of using SOAP?
ARDA metadata implementation ideal for comparing
SOAP with a traditional RCP protocol
A common Metadata Interface was developed by ARDA
and gLite
Endorsed by the EGEE standards committee
Interface validated by ARDA prototype
Prototype in use by LHCb (bookkeeping, Ganga) and
ATLAS (Ganga)
SOAP performance studied using ARDA
implementation
Toolkit performance varies widely
Large SOAP overhead (over 100)

55
ARDA workshops and related activities

ARDA workshop (January 2004 at CERN open)
ARDA workshop (June 21-23 at CERN by invitation)
The first 30 days of EGEE middleware
NA4 meeting (15 July 2004 in Catania EGEE open
event)
ARDA workshop (October 20-22 at CERN open)
LCG ARDA Prototypes
NA4 meeting 24 November (EGEE conference in Den
Haag)
ARDA workshop (March 7-8 2005 at CERN open)
ARDA workshop (October 2005 together with LCG
Service Challenges)
Wednesday afternoon meeting started in 2005
Presentations from experts and discussion (not
necessary from ARDA people)
Available from http//arda.cern.ch

56
Conclusions (1/3)

ARDA has been set up to
Enable distributed HEP analysis on gLite
Contact have been established
With the experiments
With the middleware developers
Experiment activities are progressing rapidly
Prototypes for ALICE, ATLAS, CMS LHCb
Complementary aspects are studied
Good interaction with the experiments environment
Always seeking for users!!!
People more interested in physics than in
middleware we support them!
2005 will be the key year (gLite version 1 is
becoming available on the pre-production service)

57
Conclusions (2/3)

ARDA provides special feedback to the development
team
First use of components (e.g. gLite prototype
activity)
Try to run real life HEP applications
Dedicated studies offer complementary information
Some of the experiment-related ARDA activities
produce elements of general use
Shell access (originally developed in ALICE/ARDA)
Metadata catalog (proposed and under test in
LHCb/ARDA)
(Pseudo)-interactivity interesting issues
(something in/from all experiments)

58
Conclusions (3/3)

ARDA is a privileged observatory to follow,
contribute and influence the evolution of the HEP
analysis
Analysis prototypes are a good idea!
Technically, they complement the data challenges
experience
Key point these systems are exposed to users
The approach of 4 parallel lines is not too
inefficient
Contributions in the experiments from day zero
Difficult environment
Commonality can not be imposed

59
Outlook

Commonality is a very tempting concept, indeed
Sometimes a bit fuzzy, maybe
Maybe it is becoming possible (and valuable)
Lot of experience in the whole community!
Baseline services ideas
LHC schedule physics is coming!
Maybe it is emerging (examples are not
exhaustive)
Interactivity is a genuine requirement e.g.
PROOF and DIANE
Portals ? toolkits for the users to build
applications on top of the computing
infrastructure e.g. GANGA
Metadata/workflow systems open to the users
needed!
This area has yet to be diagonalised
Monitor and discovery services open to users
e.g. Monalisa in ASAP
Strong preference for a a posteriori approach
All experiments still need their system
Since it is really needed, then we should do it
No doubt that technically we can

We the HEP community in collaboration with the
middleware experts
60
People

Massimo Lamanna
Frank Harris (EGEE NA4)
Birger Koblitz
Andrey Demichev
Viktor Pose
Victor Galaktionov
Derek Feichtinger
Andreas Peters
Hurng-Chun Lee
Dietrich Liko
Frederik Orellana
Tao-Sheng Chen
Julia Andreeva
Juha Herrala
Alex Berejnoi

2 PhD students
Craig Munro (Brunel Univ.) Distributed analysis
within CMSworking mainly with Julia
Nuno Santos (Coimbra Univ) Metadata and resilient
computingworking mainly with Birger
Catalin Cirstoiu and Slawomir Biegluk (short-term
LCG visitors)

ALICE
ATLAS
Good collaboration with EGEE/LCG Russian
institutes and with ASCC Taipei
CMS
LHCb

Write a Comment

User Comments (0)