ATLAS Collaboration

About This Presentation

Title:

ATLAS Collaboration

Description:

Liverpool (MAP) USA. BNL. Germany. Karlsruhe. Italy: INFN. CNAF. Milan. Roma1 ... 1 production in 26 prototype tier centers in 17 countries on four continents ... – PowerPoint PPT presentation

Number of Views:33

Avg rating:3.0/5.0

Slides: 37

Provided by: alexa100

Category:

more less

Transcript and Presenter's Notes

Title: ATLAS Collaboration

1
ATLAS Collaboration
Data Challengesin ATLAS Computing

Invited talk at ACAT2002, Moscow, Russia
June 25, 2002
Alexandre Vaniachine (ANL)
vaniachine_at_anl.gov

2
Outline Acknowledgements

World Wide computing model
Data persistency
Application framework
Data Challenges Physics Grid
Grid integration in Data Challenges
Data QA and Grid validation
Thanks to all ATLAS collaborators whose
contributions I used in my talk

3
Core Domains in ATLAS Computing

ATLAS Computing is right in the middle of first
period of Data Challenges
Data Challenge (DC) for software is analogous to
Test Beam for detector many components have to
be brought together to work

Separation of the data and the algorithms in
ATLAS software architecture determines our core
domains
Persistency solutions for event data storage
Software framework for data processing algorithms
Grid computing for the data processing flow

4
World Wide Computing Model
The focus of my presentation is on the
integration of these three core software domains
in ATLAS Data Challenges towards a highly
functional software suite, plus a World Wide
computing model which gives all ATLAS equal and
equal quality of access to ATLAS data
5
ATLAS Computing Challenge

The emerging World Wide computing model is an
answer to the LHC computing challenge
For ATLAS the raw data itself constitute 1.3
PB/year adding reconstructed events and Monte
Carlo data results in a10 PB/year (3 PB on
disk)
The required CPU estimates including analysis are
1.6M SpecInt95
CERN alone can handle only a fraction of these
resources
Computing infrastructure, which was centralized
in the past, now will be distributed (in contrast
to the reverse trend for the experiments that
were more distributed in the past)
Validation of the new Grid computing paradigm in
the period before the LHC requires Data
Challenges of increasing scope and complexity
These Data Challenges will use as much as
possible the Grid middleware being developed in
Grid projects around the world

6

Technology Independence

Ensuring that the application software is
independent of underlying persistency technology
is one of the defining characteristics of the
ATLAS software architecture (transient/persistent
split)
Integrated operation of framework database
domains demonstrated the capability of
switching between persistency technologies
reading the same data from different frameworks
Implementation data description (persistent
dictionary) is stored together with the data,
application framework uses transient data
dictionary for transient/persistent conversion
Grid integration problem is very similar to the
transient/persistent issue, since all objects
become just the bytestream either on disk or on
the net

7
ATLAS Database Architecture
Independent of underlying persistency technology
Data description stored together with the data
Ready for Grid integration
8

Change of Persistency Baseline

For some time ATLAS has had both a baseline
technology (Objectivity) and a baseline
evaluation strategy
We implemented persistency in Objectivity for DC0
A ROOT-based conversion service (AthenaROOT)
provides the persistence technology for Data
Challenge 1
Technology strategy is to adopt LHC-wide LHC
Computing Grid (LCG) common persistence
infrastructure (hybrid relational and ROOT-based
streaming layer) as soon as this is feasible
ATLAS is committed to common solutions and look
forward to LCG being the vehicle for providing
these in an effective way
Changing the persistency mechanism (e.g.
Objectivity -gt Root I/O) requires a change of
converter, but of nothing else
The ease of the baseline change demonstrates
benefits of decoupling transient/persistent
represenations
Our architecture, in principle, is capable to
provide language independence (in the long-term)

9
Athena Software Framework

ATLAS Computing is steadily progressing towards a
highly
functional software suite and implementing World
Wide model
(Note that a legacy software suite was produced
and still exists and is used so it can be done
for ATLAS detector!)
Athena Software Framework is used in Data
Challenges for
generator events production
fast simulation
data conversion
production QA
reconstruction (off-line and High Level Trigger)
Work in progress integrating detector
simulations
Future Directions Grid integration

10
Athena Architecture Features

Separation of data and algorithms
Memory management
Transient/Persistent separation

Athena has a common code base with GAUDI
framework (LHCb)
11
ATLAS Detector Simulations

Scale of the problem
25,5 millions distinct volume copies
23 thousands different volume objects
4,673 different volume types
managing up to few hundred pile-up events
one million hits per event on average

12
Universal Simulation Box
DetDescription
MC event (HepMC)
Detector simulation program
MC event (HepMC)
Hits
MC event (HepMC)
Digitisation
MCTruth
With all interfaces clearly defined, simulations
become Geant-neutral You can in principle run
G3, G4, Fluka, parameterized simulation with no
effect on the end users G4 robustness test
completed in DC0
13
Data Challenges

Data Challenges prompted increasing integration
of grid components in ATLAS software
DC0 used to test the software readiness and the
production pipeline continuity/robustness
Scale was limited to lt 1 M events
Physics oriented output for leptonic channels
analysis and legacy Physics TDR data
Despite the centralized production in DC0 we
started deployment of our DC infrastructure
(organized in 13 work packages) covering in
particular areas related to Grid like
production tools
Grid tools for metadata bookkeeping and replica
management
We started distributed production on the Grid in
DC1

14
DC0 Data Flow

Multiple production pipelines
Independent data transformation steps
Quality Assurance procedures

15
Data Challenge 1

Reconstruction analysis on a large scale
exercise data model, study ROOT I/O performance,
identify bottlenecks, exercise distributed
analysis,
Produce data for High Level Trigger (HLT) TDR
Physics groups
Study performance of Athena and algorithms for
use in High Level Trigger
Test of data-flow through HLT byte-stream -gt
HLT-gt algorithms -gt recorded data
High statistics needed (background rejection
study)
Scale 10M simulated events in 10-20 days,
O(1000) PCs
Exercising LHC Computing model involvement of
CERN outside-CERN sites
Deployment of ATLAS Grid infrastructure outside
sites essential for this event scale
Phase 1 (started in June)
10oM Generator particles events (all data
produced at CERN)
10M simulated detector response events (June
July)
10M reconstructed objects events
Phase 2 (September December)
Introduction and use of new Event Data Model and
Detector Description
More Countries/Sites/Processors
Distributed Reconstruction
Additional samples including pile-up
Distributed analyses
Further tests of GEANT4

16
DC1 Phase 1 Resources

Organization infrastructure is in place lead by
CERN ATLAS group
2000 processors, 1.5.1011 SI95sec
adequate for 4107 simulated events
2/3 of data produced outside of CERN
production on a global scale Asia, Australia,
Europe and North America
17 countries, 26 production sites

Australia Melbourne Canada Alberta Triumf Czech
Republic Prague Denmark Copenhagen France CCI
N2P3 Lyon
Switzerland CERN Taiwan Academia
Sinica UK RAL Lancaster Liverpool
(MAP) USA BNL . . .

Germany
Karlsruhe
Italy INFN
CNAF
Milan
Roma1
Naples
Japan
Tokyo
Norway
Oslo

Portugal FCUL Lisboa Russia RIVK BAK JINR
Dubna ITEP Moscow SINP MSU Moscow IHEP
Protvino Spain IFIC Valencia Sweden Stockholm
17
Data Challenge 2

Schedule Spring-Autumn 2003
Major physics goals
Physics samples have hidden new physics
Geant4 will play a major role
Testing calibration and alignment procedures
Scope increased to what has been achieved in DC0
DC1
Scale at a sample of 108 events
System at a complexity 50 of 2006-2007 system
Distributed production, simulation,
reconstruction and analysis
Use of GRID testbeds which will be built in the
context of the Phase 1 of the LHC Computing Grid
Project,
Automatic splitting, gathering of long jobs,
best available sites for each job
Monitoring on a gridified logging and
bookkeeping system, interface to a full replica
catalog system, transparent access to the data
for different MSS system
Grid certificates

18
Grid Integration in Data Challenges

Grid and Data Challenge Communities -
overlapping objectives
Grid middleware
testbed deployment, packaging, basic sequential
services, user portals
Data management
replicas, reliable file transfers, catalogs
Resource management
job submission, scheduling, fault tolerance
Quality Assurance
data reproducibility, application and data
signatures, Grid QA

19
Grid Middleware ?
20
Grid Middleware !
21
ATLAS Grid Testbeds
US-ATLAS Grid Testbed
NorduGrid
EU DataGrid
For more information see presentations by Roger
Jones and Aleksandr Konstantinov
22
Interfacing Athena to the GRID

Making the Athena framework working in the GRID
environment requires
Architectural design components making use of
the Grid services

GANGA/Grappa
GUI
GRID Services
Histograms Monitoring Results
Virtual Data Algorithms
Athena/GAUDI Application

Areas of work
Data access (persistency), Event Selection,
GANGA (job configuration monitoring, resource
estimation booking, job scheduling, etc.),
Grappa - Grid User Interface for Athena

23
Data Management Architecture
AMI ATLAS Metatdata Interface
MAGDA MAnager for Grid-based DAta
VDC Virtual Data Catalog
24
AMI Architecture
Data warehousing principle (star architecture)
25
MAGDA Architecture
Component-based architecture emphasizing
fault-tolerance
26
VDC Architecture

Two-layer architecture

27
Introducing Virtual Data

Recipes for producing the data (jobOptions,
kumacs) has to be fully tested, the produced data
has to be validated through a QA step
Preparation production recipes takes time and
efforts, encapsulating considerable knowledge
inside. In DC0 more time has been spent to
assemble the proper recipes than to run the
production jobs
When you got the proper recipes, producing the
data is straightforward
After the data have been produced, what do we
have to do with the developed recipes? Do we
really need to save them?
Data are primary, recipes are secondary

28
Virtual Data Perspective

GriPhyN project (www.griphyn.org) provides a
different perspective
recipes are as valuable as the data
production recipes are the Virtual Data
If you have the recipes you do not need the data
(you can reproduce them)
recipes are primary, data are secondary
Do not throw away the recipes,
save them (in VDC)
From the OO perspective
Methods (recipes) are encapsulated together with
the data in Virtual Data Objects

29
VDC-based Production System

High-throughput features
scatter-gather data processing architecture
Fault tolerance features
independent agents
pull-model for agent tasks assignment (vs push)
local caching of output and input data (except
Objy input)
ATLAS DC0 and Dc1 parameter settings for
simulations are recorded in the Virtual Data
Catalog database using normalized components
parameter collections structured orthogonally
Data reproducibility
Application complexity
Grid location
Automatic garbage collection by the job
scheduler
Agents pull the next derivation from VDC
After the data has been materialized agents
register success in VDC
When previous invocation has not been completed
within the specified timeout period, it is
invoked again

30
Tree-like Data Flow
Exercising rich possibilities for data processing
comprised of multiple independent data
transformation steps
Atlfast.root
Athena Atlfast
recon.root
Atlfast recon
filtering.ntuple
HepMC.root
Athena conversion
digis.root
Athena recon
recon.root
atlsim
digis.zebra
Athena Generators
Athena conversion
Athena QA
geometry.zebra
Athena QA
geometry.root
QA.ntuple
QA.ntuple
31
Data Reproducibility

The goal is to validate DC samples productions by
insuring the reproducibility of simulations run
at different sites
We need the tool capable to establish the
similarity or the identity of two samples
produced in different conditions, e.g at
different sites
A very important (and sometimes overlooked)
component for the Grid computing deployment
It is complementary to the software and/or data
digital signatures approaches that are still in
the RD phase

32
Grid Production Validation

Simulations are run in different conditions
for instance, same generation input but different
production sites
For each sample, Reconstruction, i.e Atrecon is
run to produce standard CBNT ntuples
The validation application launches specialized
independent analyses for ATLAS subsystems
For each sample standard histograms are produced

33
Comparison Procedure
Superimposed Samples
Test sample
Reference sample
Contributions to ?2
34
Summary of Comparison
Comparison procedure endswith a ?2 -bar chart
summary Give a pretty nice overview of how
samples compare
35
Example of Finding
Comparing energy in calorimeters for Z ? 2l
samples DC0, DC1
Difference caused by the ? cut at generation
It works!
36
Summary

ATLAS computing is in the middle of first period
of Data Challenges of increasing scope and
complexity and is steadily progressing towards a
highly functional software suite, plus a World
Wide computing model which gives all ATLAS equal
and equal quality of access to ATLAS data
These Data Challenges are executed at the
prototype tier centers and use as much as
possible the Grid middleware being developed in
Grid projects around the world
In close collaboration between the Grid and Data
Challenge communities ATLAS is testing
large-scale testbed prototypes, deploying
prototype components to integrate and test Grid
software in a production environment, and running
Data Challenge 1 production in 26 prototype tier
centers in 17 countries on four continents
Quite promising start for ATLAS Data Challenges!