Title: ATLAS Collaboration
1ATLAS Collaboration
Data Challengesin ATLAS Computing
- Invited talk at ACAT2002, Moscow, Russia
- June 25, 2002
- Alexandre Vaniachine (ANL)
- vaniachine_at_anl.gov
2Outline Acknowledgements
- World Wide computing model
- Data persistency
- Application framework
- Data Challenges Physics Grid
- Grid integration in Data Challenges
- Data QA and Grid validation
- Thanks to all ATLAS collaborators whose
contributions I used in my talk
3Core Domains in ATLAS Computing
- ATLAS Computing is right in the middle of first
period of Data Challenges - Data Challenge (DC) for software is analogous to
Test Beam for detector many components have to
be brought together to work
- Separation of the data and the algorithms in
ATLAS software architecture determines our core
domains - Persistency solutions for event data storage
- Software framework for data processing algorithms
- Grid computing for the data processing flow
4World Wide Computing Model
The focus of my presentation is on the
integration of these three core software domains
in ATLAS Data Challenges towards a highly
functional software suite, plus a World Wide
computing model which gives all ATLAS equal and
equal quality of access to ATLAS data
5ATLAS Computing Challenge
- The emerging World Wide computing model is an
answer to the LHC computing challenge - For ATLAS the raw data itself constitute 1.3
PB/year adding reconstructed events and Monte
Carlo data results in a10 PB/year (3 PB on
disk) - The required CPU estimates including analysis are
1.6M SpecInt95 - CERN alone can handle only a fraction of these
resources - Computing infrastructure, which was centralized
in the past, now will be distributed (in contrast
to the reverse trend for the experiments that
were more distributed in the past) - Validation of the new Grid computing paradigm in
the period before the LHC requires Data
Challenges of increasing scope and complexity - These Data Challenges will use as much as
possible the Grid middleware being developed in
Grid projects around the world
6 Technology Independence
- Ensuring that the application software is
independent of underlying persistency technology
is one of the defining characteristics of the
ATLAS software architecture (transient/persistent
split) - Integrated operation of framework database
domains demonstrated the capability of - switching between persistency technologies
- reading the same data from different frameworks
- Implementation data description (persistent
dictionary) is stored together with the data,
application framework uses transient data
dictionary for transient/persistent conversion - Grid integration problem is very similar to the
transient/persistent issue, since all objects
become just the bytestream either on disk or on
the net
7ATLAS Database Architecture
Independent of underlying persistency technology
Data description stored together with the data
Ready for Grid integration
8 Change of Persistency Baseline
- For some time ATLAS has had both a baseline
technology (Objectivity) and a baseline
evaluation strategy - We implemented persistency in Objectivity for DC0
- A ROOT-based conversion service (AthenaROOT)
provides the persistence technology for Data
Challenge 1 - Technology strategy is to adopt LHC-wide LHC
Computing Grid (LCG) common persistence
infrastructure (hybrid relational and ROOT-based
streaming layer) as soon as this is feasible - ATLAS is committed to common solutions and look
forward to LCG being the vehicle for providing
these in an effective way - Changing the persistency mechanism (e.g.
Objectivity -gt Root I/O) requires a change of
converter, but of nothing else - The ease of the baseline change demonstrates
benefits of decoupling transient/persistent
represenations - Our architecture, in principle, is capable to
provide language independence (in the long-term)
9Athena Software Framework
- ATLAS Computing is steadily progressing towards a
highly - functional software suite and implementing World
Wide model - (Note that a legacy software suite was produced
and still exists and is used so it can be done
for ATLAS detector!) - Athena Software Framework is used in Data
Challenges for - generator events production
- fast simulation
- data conversion
- production QA
- reconstruction (off-line and High Level Trigger)
- Work in progress integrating detector
simulations - Future Directions Grid integration
10Athena Architecture Features
- Separation of data and algorithms
- Memory management
- Transient/Persistent separation
Athena has a common code base with GAUDI
framework (LHCb)
11ATLAS Detector Simulations
- Scale of the problem
- 25,5 millions distinct volume copies
- 23 thousands different volume objects
- 4,673 different volume types
- managing up to few hundred pile-up events
- one million hits per event on average
12Universal Simulation Box
DetDescription
MC event (HepMC)
Detector simulation program
MC event (HepMC)
Hits
MC event (HepMC)
Digitisation
MCTruth
With all interfaces clearly defined, simulations
become Geant-neutral You can in principle run
G3, G4, Fluka, parameterized simulation with no
effect on the end users G4 robustness test
completed in DC0
13Data Challenges
- Data Challenges prompted increasing integration
of grid components in ATLAS software - DC0 used to test the software readiness and the
production pipeline continuity/robustness - Scale was limited to lt 1 M events
- Physics oriented output for leptonic channels
analysis and legacy Physics TDR data - Despite the centralized production in DC0 we
started deployment of our DC infrastructure
(organized in 13 work packages) covering in
particular areas related to Grid like - production tools
- Grid tools for metadata bookkeeping and replica
management - We started distributed production on the Grid in
DC1
14DC0 Data Flow
- Multiple production pipelines
- Independent data transformation steps
- Quality Assurance procedures
15Data Challenge 1
- Reconstruction analysis on a large scale
exercise data model, study ROOT I/O performance,
identify bottlenecks, exercise distributed
analysis, - Produce data for High Level Trigger (HLT) TDR
Physics groups - Study performance of Athena and algorithms for
use in High Level Trigger - Test of data-flow through HLT byte-stream -gt
HLT-gt algorithms -gt recorded data - High statistics needed (background rejection
study) - Scale 10M simulated events in 10-20 days,
O(1000) PCs - Exercising LHC Computing model involvement of
CERN outside-CERN sites - Deployment of ATLAS Grid infrastructure outside
sites essential for this event scale - Phase 1 (started in June)
- 10oM Generator particles events (all data
produced at CERN) - 10M simulated detector response events (June
July) - 10M reconstructed objects events
- Phase 2 (September December)
- Introduction and use of new Event Data Model and
Detector Description - More Countries/Sites/Processors
- Distributed Reconstruction
- Additional samples including pile-up
- Distributed analyses
- Further tests of GEANT4
16DC1 Phase 1 Resources
- Organization infrastructure is in place lead by
CERN ATLAS group - 2000 processors, 1.5.1011 SI95sec
- adequate for 4107 simulated events
- 2/3 of data produced outside of CERN
- production on a global scale Asia, Australia,
Europe and North America - 17 countries, 26 production sites
Australia Melbourne Canada Alberta Triumf Czech
Republic Prague Denmark Copenhagen France CCI
N2P3 Lyon
Switzerland CERN Taiwan Academia
Sinica UK RAL Lancaster Liverpool
(MAP) USA BNL . . .
- Germany
- Karlsruhe
- Italy INFN
- CNAF
- Milan
- Roma1
- Naples
- Japan
- Tokyo
- Norway
- Oslo
Portugal FCUL Lisboa Russia RIVK BAK JINR
Dubna ITEP Moscow SINP MSU Moscow IHEP
Protvino Spain IFIC Valencia Sweden Stockholm
17Data Challenge 2
- Schedule Spring-Autumn 2003
- Major physics goals
- Physics samples have hidden new physics
- Geant4 will play a major role
- Testing calibration and alignment procedures
- Scope increased to what has been achieved in DC0
DC1 - Scale at a sample of 108 events
- System at a complexity 50 of 2006-2007 system
- Distributed production, simulation,
reconstruction and analysis - Use of GRID testbeds which will be built in the
context of the Phase 1 of the LHC Computing Grid
Project, - Automatic splitting, gathering of long jobs,
best available sites for each job - Monitoring on a gridified logging and
bookkeeping system, interface to a full replica
catalog system, transparent access to the data
for different MSS system - Grid certificates
18Grid Integration in Data Challenges
- Grid and Data Challenge Communities -
- overlapping objectives
- Grid middleware
- testbed deployment, packaging, basic sequential
services, user portals - Data management
- replicas, reliable file transfers, catalogs
- Resource management
- job submission, scheduling, fault tolerance
- Quality Assurance
- data reproducibility, application and data
signatures, Grid QA
19Grid Middleware ?
20Grid Middleware !
21ATLAS Grid Testbeds
US-ATLAS Grid Testbed
NorduGrid
EU DataGrid
For more information see presentations by Roger
Jones and Aleksandr Konstantinov
22Interfacing Athena to the GRID
- Making the Athena framework working in the GRID
environment requires - Architectural design components making use of
the Grid services
GANGA/Grappa
GUI
GRID Services
Histograms Monitoring Results
Virtual Data Algorithms
Athena/GAUDI Application
- Areas of work
- Data access (persistency), Event Selection,
- GANGA (job configuration monitoring, resource
estimation booking, job scheduling, etc.), - Grappa - Grid User Interface for Athena
23Data Management Architecture
AMI ATLAS Metatdata Interface
MAGDA MAnager for Grid-based DAta
VDC Virtual Data Catalog
24AMI Architecture
Data warehousing principle (star architecture)
25MAGDA Architecture
Component-based architecture emphasizing
fault-tolerance
26VDC Architecture
27Introducing Virtual Data
- Recipes for producing the data (jobOptions,
kumacs) has to be fully tested, the produced data
has to be validated through a QA step - Preparation production recipes takes time and
efforts, encapsulating considerable knowledge
inside. In DC0 more time has been spent to
assemble the proper recipes than to run the
production jobs - When you got the proper recipes, producing the
data is straightforward - After the data have been produced, what do we
have to do with the developed recipes? Do we
really need to save them? - Data are primary, recipes are secondary
28Virtual Data Perspective
- GriPhyN project (www.griphyn.org) provides a
different perspective - recipes are as valuable as the data
- production recipes are the Virtual Data
- If you have the recipes you do not need the data
(you can reproduce them) - recipes are primary, data are secondary
- Do not throw away the recipes,
- save them (in VDC)
- From the OO perspective
- Methods (recipes) are encapsulated together with
the data in Virtual Data Objects
29VDC-based Production System
- High-throughput features
- scatter-gather data processing architecture
- Fault tolerance features
- independent agents
- pull-model for agent tasks assignment (vs push)
- local caching of output and input data (except
Objy input) - ATLAS DC0 and Dc1 parameter settings for
simulations are recorded in the Virtual Data
Catalog database using normalized components
parameter collections structured orthogonally - Data reproducibility
- Application complexity
- Grid location
- Automatic garbage collection by the job
scheduler - Agents pull the next derivation from VDC
- After the data has been materialized agents
register success in VDC - When previous invocation has not been completed
within the specified timeout period, it is
invoked again
30Tree-like Data Flow
Exercising rich possibilities for data processing
comprised of multiple independent data
transformation steps
Atlfast.root
Athena Atlfast
recon.root
Atlfast recon
filtering.ntuple
HepMC.root
Athena conversion
digis.root
Athena recon
recon.root
atlsim
digis.zebra
Athena Generators
Athena conversion
Athena QA
geometry.zebra
Athena QA
geometry.root
QA.ntuple
QA.ntuple
31Data Reproducibility
- The goal is to validate DC samples productions by
insuring the reproducibility of simulations run
at different sites - We need the tool capable to establish the
similarity or the identity of two samples
produced in different conditions, e.g at
different sites - A very important (and sometimes overlooked)
component for the Grid computing deployment - It is complementary to the software and/or data
digital signatures approaches that are still in
the RD phase
32Grid Production Validation
- Simulations are run in different conditions
- for instance, same generation input but different
production sites - For each sample, Reconstruction, i.e Atrecon is
run to produce standard CBNT ntuples - The validation application launches specialized
independent analyses for ATLAS subsystems - For each sample standard histograms are produced
33Comparison Procedure
Superimposed Samples
Test sample
Reference sample
Contributions to ?2
34Summary of Comparison
Comparison procedure endswith a ?2 -bar chart
summary Give a pretty nice overview of how
samples compare
35Example of Finding
Comparing energy in calorimeters for Z ? 2l
samples DC0, DC1
Difference caused by the ? cut at generation
It works!
36Summary
- ATLAS computing is in the middle of first period
of Data Challenges of increasing scope and
complexity and is steadily progressing towards a
highly functional software suite, plus a World
Wide computing model which gives all ATLAS equal
and equal quality of access to ATLAS data - These Data Challenges are executed at the
prototype tier centers and use as much as
possible the Grid middleware being developed in
Grid projects around the world - In close collaboration between the Grid and Data
Challenge communities ATLAS is testing
large-scale testbed prototypes, deploying
prototype components to integrate and test Grid
software in a production environment, and running
Data Challenge 1 production in 26 prototype tier
centers in 17 countries on four continents - Quite promising start for ATLAS Data Challenges!