Title: ATLAS Data Challenges
1ATLAS Data Challenges
- CHEP03
- La Jolla, California
- 24th March 2003
- Gilbert Poulard
- ATLAS Data Challenges Co-ordinator
- CERN EP-ATC
- For the ATLAS DC Grid team
2Outline
- Introduction
- ATLAS Data Challenges activities
- Grid activities in ATLAS DC1
- Beyond DC1
- Summary
- Conclusion
3Introduction
- Several talks on the LHC project and on GRID
projects in this conference - I will not repeat what is said in several other
presentations on LHC, LCG and GRID - This talk concentrates on the on-going Data
Challenges activities of ATLAS, one of the four
LHC experiments
4ATLAS experiment
ATLAS 2000 Collaborators 150 Institutes 34
Countries
Diameter 25 m Barrel toroid length 26
m Endcap end-wall chamber span 46 m Overall
weight 7000 Tons
5ATLAS and building 40
6Aims of the Experiment
- Measure the Standard Model Higgs Boson
- Detect Supersymmetric states
- Study Standard Model QCD, EW, HQ Physics
- New Physics?
7The ATLAS Trigger/DAQ System
- LVL1
- Hardware based (FPGA and ASIC)
- Coarse calorimeter granularity
- Trigger muon detectors RPCs and TGCs
- LVL2
- Region-of-Interest (RoI)
- Specialized algorithms
- Fast selection with early rejection
- EF
- Full event available
- Offline derived algorithms
- Seeding by LVL2
- Best calibration / alignment
- Latency less demanding
8Data
- Every event will consist of 1-1.5 MB (all
detectors together) - After on-line selection, events will be written
to permanent storage at a rate of 100-200 Hz - Total amount of raw data ? 1 PB/year
- To reconstruct and analyze this data Complex
Worldwide Computing Model and Event Data
Model - Raw Data _at_ CERN
- Reconstructed data distributed
- All members of the collaboration must have access
to ALL public copies of the data (at a
reasonable access speed)
9ATLAS Computing Challenge
- The emerging World Wide computing model is an
answer to the LHC computing challenge - In this model the Grid must take care of
- data replicas and catalogues
- condition data base replicas, updates and
synchronization - access authorizations for individual users,
working groups, production managers - access priorities and job queues
- Validation of the new Grid computing paradigm in
the period before the LHC requires Data
Challenges of increasing scope and complexity
10Systems Tests Data Challenges
- Data Challenges are the way to test the prototype
infrastructure before the start of the real
experiment (2007) - ATLAS plans to run one Data Challenge per year,
with increasing complexity and amount of data - Each Data Challenge consists of the following
steps - Physics event generation (Pythia, Herwig, ...)
- Event simulation (Geant3, Geant4)
- Background simulation, pile-up and detector
response simulation (all these depend on
luminosity) - Event reconstruction
- Event analysis
- Data can be (re-)distributed to different
production sites between any of these steps
this is the real challenge!
11DC0 readiness continuity tests(December 2001
June 2002)
- 3 lines for full simulation
- 1) Full chain with new geometry (as of January
2002) - Generator-gt(Objy)-gtGeant3-gt(Zebra-gtObjy)-gtAthena
recon.-gt(Objy)-gtAnalysis - 2) Reconstruction of Physics TDR data within
Athena - (Zebra-gtObjy)-gtAthena rec.-gt (Objy) -gt Simple
analysis - 3) Geant4 robustness test
- Generator-gt (Objy)-gtGeant4-gt(Objy)
- 1 line for fast simulation
- Generator-gt (Objy) -gt Atlfast -gt (Objy)
- Continuity test Everything from the same release
for the full chain (3.0.2) - we learnt a lot
- (we underestimated the implications of that
statement) - completed in June 2002
12ATLAS DC1 Phase I (July-August 2002)
- Primary concern was delivery of events to High
Level Trigger (HLT) community - Goal 107 events (several samples!)
- Put in place the MC event generation detector
simulation chain - Switch to AthenaRoot I/O (for Event generation)
- Updated geometry
- Filtering
- Validate the chain
- Athena/Event Generator -gt (Root
I/O)-gtAtlsim/Dice/Geant3-gt(Zebra) - Put in place the distributed MonteCarlo
production - ATLAS kit (rpm)
- Scripts and tools (monitoring, bookkeeping)
- AMI database Magda replica catalogue VDC
- Quality Control and Validation of the full chain
13Tools used in DC1
physics metadata perm production log trans
production log replica catalog
physics metadata recipe catalog Perm. production
log Trans. production log
physics metadata
replica catalog
recipe catalog
AMI
Magda
Magda
VDC
AMI
AtCom
GRAT
automatic production framework
interactive production framework
14ATLAS Geometry
- Scale of the problem
- 25,5 millions distinct volume copies
- 23 thousands different volume objects
- 4,673 different volume types
- managing up to few hundred pile-up events
- one million hits per event on average
15DC1/Phase I Task Flow
- As an example, for 1 sample of di-jet events
- Event generation 1.5 x 107 events in 150
partitions - Detector simulation 3000 jobs
Zebra
Athena-Root I/O
Di-jet
Hits/ Digits MCTruth
Atlsim/Geant3 Filter
HepMC
105 events
(5000 evts)
(450 evts)
Pythia 6
Hits/ Digits MCTruth
Atlsim/Geant3 Filter
HepMC
Hits/ Digits MCtruth
Atlsim/Geant3 Filter
HepMC
Event generation
Detector Simulation
16DC1 validation quality control
- We defined two types of validation
- Validation of the sites
- We processed the same data in the various centres
and made the comparison - To insure that the same software was running in
all production centres - We also checked the random number sequences
- Validation of the simulation
- We used both old generated data new data
- Validation datasets di-jets, single ?,e,
?,H?4e/2?/2e2?/4? - About 107 evts reconstructed in June, July and
August - Comparison made also with previous simulations
- QC is a key issue for the success
17Comparison Procedure
Superimposed Samples
Test sample
Reference sample
Contributions to ?2
18Summary of Comparison
Comparison procedure endswith a ?2 -bar chart
summary Give a pretty nice overview of how
samples compare
19Data Samples I
- Validation samples (740k events)
-
- single particles (e, g, m, p), jet scans, Higgs
events - Single-particle production (30 million events)
- single p (low pT pT1000 GeV with 2.8lthlt3.2)
- single m (pT3, , 100 GeV)
- single e and g
- different energies (E5, 10, , 200, 1000 GeV)
- fixed h points h scans (hlt2.5) h crack
scans (1.3lthlt1.8) - standard beam spread (sz5.6 cm)
- fixed vertex z-components (z0, 4, 10 cm)
- Minimum-bias production (1.5 million events)
- different h regions (hlt3, 5, 5.5, 7)
20Data Samples II
- QCD di-jet production (5.2 million events)
-
- different cuts on ET(hard scattering) during
generation - large production of ETgt11, 17, 25, 55 GeV
samples, applying particle-level filters - large production of ETgt17, 35 GeV samples,
without filtering, full simulation within hlt5 - smaller production of ETgt70, 140, 280, 560 GeV
samples - Physics events requested by various HLT groups
(e/g, Level-1, jet/ETmiss, B-physics, b-jet,
m 4.4 million events) - large samples for the b-jet trigger simulated
with - default (3 pixel layers) and staged (2 pixel
layers) layouts - B-physics (PL) events taken from old TDR tapes
21ATLAS DC1 Phase 1 July-August 2002
3200 CPUs 110 kSI95 71000 CPU days
39 Institutes in 18 Countries
- Australia
- Austria
- Canada
- CERN
- Czech Republic
- France
- Germany
- Israel
- Italy
- Japan
- Nordic
- Russia
- Spain
- Taiwan
- UK
- USA
grid tools used at 11 sites
5107 events generated 1107 events
simulated 3107 single particles 30 Tbytes 35
000 files
22ATLAS DC1 Phase II (November 02/March 03)
- Provide data with and without pile-up for HLT
studies - Pile-up production
- new data samples (huge amount of requests)
- Byte stream format to be produced
- Introduction testing of new Event Data Model
(EDM) - This includes new Detector Description
- Production of data for Physics and Computing
Model studies - Both ESD and AOD produced from Athena
Reconstruction - Testing of computing model of distributed
analysis using AOD - Use more widely GRID middleware
23Luminosity Effect Simulation
- Aim Study Interesting Processing at different
Luminosity L (cm-2s-1) - Separate Simulation of Physics Events Minimum
Bias Events - and cavern background for muon studies
- Merging of
- Primary Stream (Physics)
- Background Stream(s) (Pileup ( cavern
background))
Primary Stream
Background Stream
(KINE,HITS)
(KINE,HITS
N( L )
1
DIGITIZATION
Bunch Crossing (DIGI)
24Pile-up features
- Different detectors have different memory time
requiring very different number of minimum bias
events to be read in - Silicons, Tile calorimeter tlt25 ns
- Straw tracker tlt40-50 ns
- Lar Calorimeters 100-400 ns
- Muon Drift Tubes 600 ns
- Still we want the pile-up events to be the same
in different detectors ! - For Muon studies in addition Cavern Background
25Pile-up task flow
Physics 2 MB (340 sec)
Minimum bias 0.5 MB (460 sec)
Pile-up 7.5 MB (_at_HL) 400 sec (Mixing80 Digitizati
on 220)
ATLSIM
0.03 sec
Cavern Background 20 KB (0.4 sec)
- High Luminosity 1034
- 23 events/bunch crossing
- 61 bunch crossings
Background 0.5 MB
Low luminosity 2 x 1033
26Higgsinto twophotonsnopile-up
27Higgsinto twophotonsL1034pile-up
28ATLAS DC1/Phase II November 2002-March
2003Goals Produce the data needed for the HLT
TDR Get as many ATLAS institutes
involved as possibleWorldwide collaborative
activityParticipation 56 Institutes
- Australia
- Austria
- Canada
- CERN
- China
- Czech Republic
- Denmark
- France
- Germany
- Greece
- Israel
- Italy
- Japan
- Norway
- Poland
- Russia
- Spain
- Sweden
- Taiwan
- UK
- USA
- New countries or institutes
- using Grid
29Preparation for Reconstruction
- On-going activities (in several areas)
- Put in place the infrastructure for the
production - Get the reconstruction software ready and
validated - Both Physics HLT communities involved
- Include the dedicated code for HLT studies
- Lvl1, Lvl2 Event Filter
- Today we are in the validation phase
- End of March we expect to reconstruct and analyse
- a full high statistics sample without pile-up
- 10 of a high statistics sample with pile-up
- Data being concentrated in 8 sites
- Production both on standard batch or GRID
systems
30Primary data (in 8 sites)
Pile-up Low luminosity 4 x 106 events ( 4 x
103 NCU days) High luminosity 3 x 106 events (
12 x 103 NCU days)
Data (TB) Simulation 23.7 (40) Pile-up
35.4 (60) Lumi02 (14.5) Lumi10 (20.9)
Data replication using Grid tools (Magda)
31 Grid in ATLAS DC1
US-ATLAS EDG Testbed Prod
NorduGrid part of Phase 1
reproduce part of full phase 1
2 production phase 1 data
production Full Phase 2
several tests production
See other ATLAS talks for more details
32DC1 production on the Grid
- Grid test-beds in Phase 1
- 11 out of 39 sites (5 of the total production)
- NorduGrid (Bergen, Grendel, Ingvar, OSV,
NBI,Oslo,Lund,LSCF) - all production done on the Grid
- US-Grid (LBL, UTA, OU)
- 10 of US DC1 production (900 CPU.days)
- Phase 2
- NorduGrid (full pile-up production)
- US Grid
- Pile-up in progress
- 8TB of pile-up data, 5000 CPU.days, 6000 Jobs
- Will be used for reconstruction
33Summary on DC1
- Phase 1 (summer 2002) was a real success
- The pile-up production ran quite smoothly
- Expects to have it completed by end of March
- The concentration of the data is on its way
- Replication mostly performed with Magda
- Progress are being made in the organization
- Integration of tools (production, bookkeeping,
replication) - Validation of the offline reconstruction
software is progressing well - HLT dedicated software will then have to be added
- Massive production for reconstruction expected
by beginning of April -
34DC2-3-4-
- DC2
- Probably Q4/2003 Q2/2004
- Goals
- Full deployment of EDM Detector Description
- Geant4 replacing Geant3
- Test the calibration and alignment procedures
- Use LCG common software (POOL, )
- Use widely GRID middleware
- Perform large scale physics analysis
- Further tests of the computing model (Analysis)
- Run on LCG-1
- Scale
- As for DC1 107 fully simulated events
- DC3 Q3/2004 Q2/2005
- Goals to be defined Scale 5 x DC2
- DC4 Q3/2005 Q2/2006
- Goals to be defined Scale 2 X DC3
35Summary (1)
- ATLAS computing is in the middle of first period
of Data Challenges of increasing scope and
complexity and is steadily progressing towards a
highly functional software suite, plus a World
Wide computing model, which gives all ATLAS equal
and equal quality of access to ATLAS data
36Summary (2)
- These Data Challenges are executed at the
prototype tier centers and use as much as
possible the Grid middleware being developed in
Grid projects around the world
37Conclusion
- Quite promising start for ATLAS Data Challenges!
38Thanks to all DC-team members(working in 14 work
packages)
A-WP1 Event generation
A-WP3 Geant4 Simulation
A-WP4 Pile-up
A-WP2 Geant3 simulation
A-WP5 Detector response
A-WP7 Event filtering
A-WP8 Reconstruction
A-WP6 Data Conversion
A-WP11 Tools
A-WP9 Analysis
A-WP10 Data Management
A-WP12 Teams Production Validation .
A-WP14 Fast Simulation
A-WP13 Tier Centres