Title: Alice DC Status
1Alice DC Status
- P. Cerello
- March 19th, 2004
2Summary
- Status of AliRoot
- Status of AliEn
- Physics Data Challenge
- Conclusions
3AliRoot layout
G3
G4
FLUKA
ISAJET
AliEn
AliRoot
Virtual MC
HIJING
EVGEN
MEVSIM
HBTAN
STEER
PYTHIA6
PDF
EMCAL
ZDC
ITS
PHOS
TRD
TOF
RICH
PMD
HBTP
CRT
FMD
MUON
TPC
START
RALICE
STRUCT
ROOT
4AliRoot Current status
- Major changes in the last year
- New multi-file I/O finally in full production
- New coordinate system (and we survived!)
- New reconstruction and simulations drivers
- First attempt at the ESD and analysis framework
- Improvements in reconstruction and simulation
- Clearly the system works well, however many
changes to come - ESD the philosophy is still evolving
- Introduction of FLUKA and new geometrical
modeller - Development of the analysis framework
- Raw data for all the detectors
- Introduction of the condition database
infrastructure
5Software Development Process
- ALICE opted for a light core CERN offline team
- Concentrate on framework, software distribution
and maintenance - plus some people from the collaboration
- GRID coordination (Torino), World Computing Model
(Nantes), Detector Construction Database
(Warsaw), Web and VMC (La Habana) - Close integration with physics!
- The ALICE Physics Coordinator is also a member of
the offline team - A development cycle adapted to ALICE
- Developers work on the most important feature at
any moment - A stable production version exists
- Collective ownership of the code
- Flexible release cycle and simple packaging and
installation - Micro-cycles happen continuously, macro-cycles
2-3 times per year - Discussed implemented at Off-line meetings and
Code Reviews
6The ALICE Approach (AliEn)
- Standards are now emerging for the basic building
blocks of a GRID - There are millions lines of code in the OS domain
dealing with these issues - Why not using these to build the minimal GRID
that does the job? - Fast development of a prototype, no problem in
exploring new roads, restarting from scratch etc
etc - Hundreds of users and developers
- Immediate adoption of emerging standards
- An example, AliEn by ALICE (5 of code developed,
95 imported)
7 AliEn Timeline
Functionality Simulation
Interoperability Reconstruction
Performance, Scalability, Standards Analysis
8AliEn ROOT (A)
?
provides
Analysis Macro
Input Files
Query for Input Data
new TAliEnAnalysis Object
USER
List of Input Data Locations
produces
Job Splitting
IO Object 1 for Site BI
IO Object 1 for Site C
IO Object 1 for Site A
IO Object 2 for Site A
Job Submission
Job Object 1 for Site B
Job Object 1 for Site A
Job Object 2 for Site A
Job Object 1 for Site C
Execution
Histogram Merging Tree Chaining
Results
9PROOF of AliEn (B)
PROOF uses AliEn Grid File Catalogue and Data
Management to map LFNs to a chain of PFNs and
Workload Management to detect which nodes in a
cluster can be used in a parallel session
Nice! Now I can finally analyze my datasets on
the Grid and produce a histogram. And it is fast
too!
The PROOF system allows parallel analysis of
objects in a set of files parallel execution of
scripts on clusters of heterogeneous machines
10ALICE Physic Data Challenges
Period (milestone) Fraction of the final capacity () Physics Objective
06/01-12/01 1 pp studies, reconstruction of TPC and ITS
06/02-12/02 5 First test of the complete chain from simulation to reconstruction for the PPR Simple analysis tools Digits in ROOT format
01/04-06/04 10 Complete chain used for trigger studies Prototype of the analysis tools Comparison with parameterised MonteCarlo Simulated raw data
01/06-06/06 20 Test of the final system for reconstruction and analysis
11PDC 3 schema
AliEn job control
Data transfer
Production of RAW
Shipment of RAW to CERN
Reconstruction of RAW in all T1s
CERN
Analysis
Tier2
Tier1
Tier2
Tier1
12Merging
Mixed signal
13AliEn, Genius EDG/LCG seen by ALICE
User submits jobs
Server
Alien CE LCG UI
Alien CEs/SEs
LCG RB
LCG CEs/SEs
LCG PFN
Catalog
Catalog
LCG LFN
LCG PFN AliEn LFN
14AliEn EDG Interface
Mar, 11th, 2003 first AliRoot job, driven by
AliEn, run on EDG
Status report
15ALICE PDC-3 LCG
- All the production will be started via AliEn, the
analysis will be done via Root/Proof/AliEn - LCG-2 will be one CE element of AliEn, which will
integrate seamlessly LCG and non LCG resources - If LCG-2 works well, it will suck a large amount
of jobs, and it will be used heavily - If LCG-2 does not work well, AliEn will privilege
other resources, and it will be less used - In all cases we will use LCG-2 as much as
possible - We will not need to take any decision the
performance of the system will decide for us - The figure of merit will be
16AliEn LCG Data Challenge
Alien CE/SE
A User submits jobs
Alien CE/SE
Submission
Alien CE/SE
Server
LCG CE/SE
Alien CE LCG UI
Catalog
LCG CE/SE
LCG RB
Catalog
LCG CE/SE
17AliEn LCG Interface
- Remote AliEn and AliRoot installation OK on all
LCG-2 sites - Job management interface works with no real
problem - No reliable SE available on the LCG production
infrastructure - generated data is always moved to CERN CASTOR as
soon as the job finishes, using AliEn tools
(AIOd). - An interface to LCG storage is anyhow available,
and it will be tested as soon as LCG provides
storage support on the EIS testbed.
18Software Installation on LCG
- Via LCG jobs
- VO_ALICE_SW_DIR/root/v3-10-02/
-
geant3/v0-6/ -
aliroot/v4-01-Rev-00/ -
alien/ -
AliEn/
LCG site
installAlice.sh
installAlice.jdl
LCG site
LCG-UI
LCG site
installAliEn.sh
LCG site
installAliEn.jdl
LCG site
19First Event Round on LCG
Submitted OK Aborted by LCG Zombi Aborted by AliEn Still runinng
Friday batch 480 157 5 201 117 0
Sunday batch 250 149 0 0 1 100
- OK as reported by AliEn. Output transfered to
CERN CASTOR and registered on AliEn Data
Catalogue - Aborted by LCG reported as Aborted by LB.
- Zombi lost contact between AliEn and the job.
All due to server and gateway restarts, many
probably finished correctly on LCG. - Aborted by AliEn failed. Many due to server and
gateway problems since then fixed. - Still running As reported by AliEn on Sunday,
Feb, 29th, 5 p.m.
20Short history
- Jan 03 Requirements for ALICE PDC04 presented to
PEB - End Dec 03 Announcement of LCG-2 by mid February
2004 - Beg Jan 04 Decision to delay PDC04 by one month
waiting for LCG-2 - End Jan 04 LCG announces that there will be no
SE in LCG-2 - Beg Feb 04 The WAN resources allocated by LCG
for data storage are insufficient/inadequate - Mid Feb 04 Development of an ALICE solution,
developed in haste and working against all odds! - End Feb 04 IT has also come up with a solution
responding to a CMS requirement - End Feb 04 Production started, new sites being
added - Confusing that during all this time LCG-2 has
been declared ready for ALICE on a day-by-day
basis! - Beg Mar 04 castor database has to be reinstalled
(running on Linux 6.2!) - Beg Mar 04 castor servers have to be reinstalled
for security - Beg Mar 04 LCG RB works differently on the
different centres. CNAF has to be switched on and
off by hand, otherwise it swallows all the
jobs! - Beg Mar 04 we are getting now close to 10 TB, 30
were promised by LCG on 1/1/04 - Mid Mar 04 Files on the IT-provided pool are
erased before being copied on tape(!) - 18 Mar 04 restart production insert Grid.it
21Shapshot on Mar, 16th
- file///C/Documents20and20Settings/Piergiorgio
20Cerello/My20Documents/Alice/AlienControls.htm
22Data Challenge Statistics
- First round, closed on Mar 16th
23Data Challenge Statistics
- First round, closed on Mar 16th
24Data Challenge Statistics
- First round, closed on Mar 16th
25DC Monitoring http//alien.cern.ch
- Monalisa http//aliens3.cern.ch8080
26Shapshot on Mar, 18th
- file///C/Documents20and20Settings/Piergiorgio
20Cerello/My20Documents/Alice/AlienControls2.htm
27Data Challenge Statistics
- FirstSecond round, started on Mar 18th 1713
jobs
28Data Challenge Statistics
- FirstSecond round, started on Mar 18th 1051,
680
29Data Challenge Statistics
- FirstSecond round, started on Mar 18th 592,
476
30Present Status
- AliEn native sites
- CERN, CNAF, Cyfronet, Catania, FZK, JINR, LBL,
Lyon, OSC, Prague, Torino - LCG-2 sites
- CERN, CNAF, RAL ok (up to 400 concurrent jobs)
- FZK problems with installation, solved as of
mar, 18th - NIKHEF old version of aliroot in PATH solved
as of mar,18th - TAIWAN intermittent problems (network?)
- Fermilab not an Alice site
- Grid.it sites
- Installation (aliroot AliEn) ok everywhere but
Bo - In production as of mar, 18th
- Ba, Ct, Fe, LNL, Pd, To ok
- Bo-INGV, Pi, not seen by RB
- Bo, Rm minor installation problems
- Mar, 19th, 0030 Ba 1, Ct 7, Fe 7, LNL 97, Pd
70, To 17 199 running jobs
31Double access _at_ CNAF
WN
A User submits jobs
Alien/CNAF CE/SE
WN
Submission
Server
WN
Alien CE LCG UI
LCG/CNAF CE/SE
WN
LCG RB
WN
32Remarks
- First GRID production with fully transparent
common access to different middlewares (AliEn
LCG) - Relevant improvement in the LCG stability (450/12
hours wrt. 450/2 months) - AliEn LCG load is about 50-50
- Optimal situation wrt any other choice (AliEn
only or LCG only) the availability of resources
is doubled - There is room for improvement (on both sides)
- but
- The Data Challenge started well, altough it is
just at the beginning - We hope in the continued support from LCG
- And centres should provide us with the promised
resources - AliEn already provides functionality for
distributed analysis - LCG/ARDA will improve it
33Conclusions
- ALICE has solutions that are evolving into a
solid computing infrastructure - Major decisions have been taken and users have
adopted them - Collaboration between physicists and computer
scientists is excellent - The tight integration with ROOT allows a fast
prototyping and development cycle - AliEn goes a long way toward providing a GRID
solution adapted to HEP needs - It allowed us to do large productions with very
few people in charge - Many ALICE-developed solutions have a high
potential to be adopted by other experiments and
indeed are becoming common solutions
34(No Transcript)
35AliEn
1
1
1. lookup
1..n
3. register
2. authenticate
1..n
1
API
4. bind
1
0..n
0..n
1
0..n
1..n
1
1
0..n
1
0..n
1
1
1
1
0..n
36ARDA in a nutshell
Long they laboured in the regions of Eä, which
are vast beyond the thought of Elves and Men,
until in the time appointed was made Arda... -
J.R.R Tolkien, Valaquenta
- ARDA RTAG
- Found AliEn the most complete system among all
considered in Sep 03 - Suggested a fast prototype in 6 months
- Six months went to calm the turmoil spurred by
this report! - ARDA is now started as suggested by the report
- At least so we hope!
- ARDA, if successful, will form the basis for the
EGEE MW
37AliEn (ARDA)
38ROOT, ALICE LCG
- LCG has brought support for ROOT and FLUKA
- We will continue to develop our system
- Providing basic technology,e.g. VMC and
geometrical modeller - and we will try to collaborate with LCG
wherever possible - Possible convergence in the simulation area,
collaboration on simple benchmarks - We have proposed to base LCG on ROOT and AliEn
- LCG established a client-provider relationship
with ROOT, which is rapidly evolving - Is now adopting AliEn via ARDA/EGEE
- LCG decided to develop alternatives for some ROOT
elements or hide them with interfaces - We expressed our worries
- No time to develop and deploy a new system
- Duplication and dispersion of efforts
- Divergence with the rest of HEP
- We will keep looking for opportunities to
collaborate