ATLAS Collaboration - PowerPoint PPT Presentation

1 / 36
About This Presentation
Title:

ATLAS Collaboration

Description:

Liverpool (MAP) USA. BNL. Germany. Karlsruhe. Italy: INFN. CNAF. Milan. Roma1 ... 1 production in 26 prototype tier centers in 17 countries on four continents ... – PowerPoint PPT presentation

Number of Views:33
Avg rating:3.0/5.0
Slides: 37
Provided by: alexa100
Category:

less

Transcript and Presenter's Notes

Title: ATLAS Collaboration


1
ATLAS Collaboration
Data Challengesin ATLAS Computing
  • Invited talk at ACAT2002, Moscow, Russia
  • June 25, 2002
  • Alexandre Vaniachine (ANL)
  • vaniachine_at_anl.gov

2
Outline Acknowledgements
  • World Wide computing model
  • Data persistency
  • Application framework
  • Data Challenges Physics Grid
  • Grid integration in Data Challenges
  • Data QA and Grid validation
  • Thanks to all ATLAS collaborators whose
    contributions I used in my talk

3
Core Domains in ATLAS Computing
  • ATLAS Computing is right in the middle of first
    period of Data Challenges
  • Data Challenge (DC) for software is analogous to
    Test Beam for detector many components have to
    be brought together to work
  • Separation of the data and the algorithms in
    ATLAS software architecture determines our core
    domains
  • Persistency solutions for event data storage
  • Software framework for data processing algorithms
  • Grid computing for the data processing flow

4
World Wide Computing Model
The focus of my presentation is on the
integration of these three core software domains
in ATLAS Data Challenges towards a highly
functional software suite, plus a World Wide
computing model which gives all ATLAS equal and
equal quality of access to ATLAS data
5
ATLAS Computing Challenge
  • The emerging World Wide computing model is an
    answer to the LHC computing challenge
  • For ATLAS the raw data itself constitute 1.3
    PB/year adding reconstructed events and Monte
    Carlo data results in a10 PB/year (3 PB on
    disk)
  • The required CPU estimates including analysis are
    1.6M SpecInt95
  • CERN alone can handle only a fraction of these
    resources
  • Computing infrastructure, which was centralized
    in the past, now will be distributed (in contrast
    to the reverse trend for the experiments that
    were more distributed in the past)
  • Validation of the new Grid computing paradigm in
    the period before the LHC requires Data
    Challenges of increasing scope and complexity
  • These Data Challenges will use as much as
    possible the Grid middleware being developed in
    Grid projects around the world

6

Technology Independence
  • Ensuring that the application software is
    independent of underlying persistency technology
    is one of the defining characteristics of the
    ATLAS software architecture (transient/persistent
    split)
  • Integrated operation of framework database
    domains demonstrated the capability of
  • switching between persistency technologies
  • reading the same data from different frameworks
  • Implementation data description (persistent
    dictionary) is stored together with the data,
    application framework uses transient data
    dictionary for transient/persistent conversion
  • Grid integration problem is very similar to the
    transient/persistent issue, since all objects
    become just the bytestream either on disk or on
    the net

7
ATLAS Database Architecture
Independent of underlying persistency technology
Data description stored together with the data
Ready for Grid integration
8

Change of Persistency Baseline
  • For some time ATLAS has had both a baseline
    technology (Objectivity) and a baseline
    evaluation strategy
  • We implemented persistency in Objectivity for DC0
  • A ROOT-based conversion service (AthenaROOT)
    provides the persistence technology for Data
    Challenge 1
  • Technology strategy is to adopt LHC-wide LHC
    Computing Grid (LCG) common persistence
    infrastructure (hybrid relational and ROOT-based
    streaming layer) as soon as this is feasible
  • ATLAS is committed to common solutions and look
    forward to LCG being the vehicle for providing
    these in an effective way
  • Changing the persistency mechanism (e.g.
    Objectivity -gt Root I/O) requires a change of
    converter, but of nothing else
  • The ease of the baseline change demonstrates
    benefits of decoupling transient/persistent
    represenations
  • Our architecture, in principle, is capable to
    provide language independence (in the long-term)

9
Athena Software Framework
  • ATLAS Computing is steadily progressing towards a
    highly
  • functional software suite and implementing World
    Wide model
  • (Note that a legacy software suite was produced
    and still exists and is used so it can be done
    for ATLAS detector!)
  • Athena Software Framework is used in Data
    Challenges for
  • generator events production
  • fast simulation
  • data conversion
  • production QA
  • reconstruction (off-line and High Level Trigger)
  • Work in progress integrating detector
    simulations
  • Future Directions Grid integration

10
Athena Architecture Features
  • Separation of data and algorithms
  • Memory management
  • Transient/Persistent separation

Athena has a common code base with GAUDI
framework (LHCb)
11
ATLAS Detector Simulations
  • Scale of the problem
  • 25,5 millions distinct volume copies
  • 23 thousands different volume objects
  • 4,673 different volume types
  • managing up to few hundred pile-up events
  • one million hits per event on average

12
Universal Simulation Box
DetDescription
MC event (HepMC)
Detector simulation program
MC event (HepMC)
Hits
MC event (HepMC)
Digitisation
MCTruth
With all interfaces clearly defined, simulations
become Geant-neutral You can in principle run
G3, G4, Fluka, parameterized simulation with no
effect on the end users G4 robustness test
completed in DC0
13
Data Challenges
  • Data Challenges prompted increasing integration
    of grid components in ATLAS software
  • DC0 used to test the software readiness and the
    production pipeline continuity/robustness
  • Scale was limited to lt 1 M events
  • Physics oriented output for leptonic channels
    analysis and legacy Physics TDR data
  • Despite the centralized production in DC0 we
    started deployment of our DC infrastructure
    (organized in 13 work packages) covering in
    particular areas related to Grid like
  • production tools
  • Grid tools for metadata bookkeeping and replica
    management
  • We started distributed production on the Grid in
    DC1

14
DC0 Data Flow
  • Multiple production pipelines
  • Independent data transformation steps
  • Quality Assurance procedures

15
Data Challenge 1
  • Reconstruction analysis on a large scale
    exercise data model, study ROOT I/O performance,
    identify bottlenecks, exercise distributed
    analysis,
  • Produce data for High Level Trigger (HLT) TDR
    Physics groups
  • Study performance of Athena and algorithms for
    use in High Level Trigger
  • Test of data-flow through HLT byte-stream -gt
    HLT-gt algorithms -gt recorded data
  • High statistics needed (background rejection
    study)
  • Scale 10M simulated events in 10-20 days,
    O(1000) PCs
  • Exercising LHC Computing model involvement of
    CERN outside-CERN sites
  • Deployment of ATLAS Grid infrastructure outside
    sites essential for this event scale
  • Phase 1 (started in June)
  • 10oM Generator particles events (all data
    produced at CERN)
  • 10M simulated detector response events (June
    July)
  • 10M reconstructed objects events
  • Phase 2 (September December)
  • Introduction and use of new Event Data Model and
    Detector Description
  • More Countries/Sites/Processors
  • Distributed Reconstruction
  • Additional samples including pile-up
  • Distributed analyses
  • Further tests of GEANT4

16
DC1 Phase 1 Resources
  • Organization infrastructure is in place lead by
    CERN ATLAS group
  • 2000 processors, 1.5.1011 SI95sec
  • adequate for 4107 simulated events
  • 2/3 of data produced outside of CERN
  • production on a global scale Asia, Australia,
    Europe and North America
  • 17 countries, 26 production sites

Australia Melbourne Canada Alberta Triumf Czech
Republic Prague Denmark Copenhagen France CCI
N2P3 Lyon
Switzerland CERN Taiwan Academia
Sinica UK RAL Lancaster Liverpool
(MAP) USA BNL . . .
  • Germany
  • Karlsruhe
  • Italy INFN
  • CNAF
  • Milan
  • Roma1
  • Naples
  • Japan
  • Tokyo
  • Norway
  • Oslo

Portugal FCUL Lisboa Russia RIVK BAK JINR
Dubna ITEP Moscow SINP MSU Moscow IHEP
Protvino Spain IFIC Valencia Sweden Stockholm
17
Data Challenge 2
  • Schedule Spring-Autumn 2003
  • Major physics goals
  • Physics samples have hidden new physics
  • Geant4 will play a major role
  • Testing calibration and alignment procedures
  • Scope increased to what has been achieved in DC0
    DC1
  • Scale at a sample of 108 events
  • System at a complexity 50 of 2006-2007 system
  • Distributed production, simulation,
    reconstruction and analysis
  • Use of GRID testbeds which will be built in the
    context of the Phase 1 of the LHC Computing Grid
    Project,
  • Automatic splitting, gathering of long jobs,
    best available sites for each job
  • Monitoring on a gridified logging and
    bookkeeping system, interface to a full replica
    catalog system, transparent access to the data
    for different MSS system
  • Grid certificates

18
Grid Integration in Data Challenges
  • Grid and Data Challenge Communities -
  • overlapping objectives
  • Grid middleware
  • testbed deployment, packaging, basic sequential
    services, user portals
  • Data management
  • replicas, reliable file transfers, catalogs
  • Resource management
  • job submission, scheduling, fault tolerance
  • Quality Assurance
  • data reproducibility, application and data
    signatures, Grid QA

19
Grid Middleware ?
20
Grid Middleware !
21
ATLAS Grid Testbeds
US-ATLAS Grid Testbed
NorduGrid
EU DataGrid
For more information see presentations by Roger
Jones and Aleksandr Konstantinov
22
Interfacing Athena to the GRID
  • Making the Athena framework working in the GRID
    environment requires
  • Architectural design components making use of
    the Grid services

GANGA/Grappa
GUI
GRID Services
Histograms Monitoring Results
Virtual Data Algorithms
Athena/GAUDI Application
  • Areas of work
  • Data access (persistency), Event Selection,
  • GANGA (job configuration monitoring, resource
    estimation booking, job scheduling, etc.),
  • Grappa - Grid User Interface for Athena

23
Data Management Architecture
AMI ATLAS Metatdata Interface
MAGDA MAnager for Grid-based DAta
VDC Virtual Data Catalog
24
AMI Architecture
Data warehousing principle (star architecture)
25
MAGDA Architecture
Component-based architecture emphasizing
fault-tolerance
26
VDC Architecture
  • Two-layer architecture

27
Introducing Virtual Data
  • Recipes for producing the data (jobOptions,
    kumacs) has to be fully tested, the produced data
    has to be validated through a QA step
  • Preparation production recipes takes time and
    efforts, encapsulating considerable knowledge
    inside. In DC0 more time has been spent to
    assemble the proper recipes than to run the
    production jobs
  • When you got the proper recipes, producing the
    data is straightforward
  • After the data have been produced, what do we
    have to do with the developed recipes? Do we
    really need to save them?
  • Data are primary, recipes are secondary

28
Virtual Data Perspective
  • GriPhyN project (www.griphyn.org) provides a
    different perspective
  • recipes are as valuable as the data
  • production recipes are the Virtual Data
  • If you have the recipes you do not need the data
    (you can reproduce them)
  • recipes are primary, data are secondary
  • Do not throw away the recipes,
  • save them (in VDC)
  • From the OO perspective
  • Methods (recipes) are encapsulated together with
    the data in Virtual Data Objects

29
VDC-based Production System
  • High-throughput features
  • scatter-gather data processing architecture
  • Fault tolerance features
  • independent agents
  • pull-model for agent tasks assignment (vs push)
  • local caching of output and input data (except
    Objy input)
  • ATLAS DC0 and Dc1 parameter settings for
    simulations are recorded in the Virtual Data
    Catalog database using normalized components
    parameter collections structured orthogonally
  • Data reproducibility
  • Application complexity
  • Grid location
  • Automatic garbage collection by the job
    scheduler
  • Agents pull the next derivation from VDC
  • After the data has been materialized agents
    register success in VDC
  • When previous invocation has not been completed
    within the specified timeout period, it is
    invoked again

30
Tree-like Data Flow
Exercising rich possibilities for data processing
comprised of multiple independent data
transformation steps
Atlfast.root
Athena Atlfast
recon.root
Atlfast recon
filtering.ntuple
HepMC.root
Athena conversion
digis.root
Athena recon
recon.root
atlsim
digis.zebra
Athena Generators
Athena conversion
Athena QA
geometry.zebra
Athena QA
geometry.root
QA.ntuple
QA.ntuple
31
Data Reproducibility
  • The goal is to validate DC samples productions by
    insuring the reproducibility of simulations run
    at different sites
  • We need the tool capable to establish the
    similarity or the identity of two samples
    produced in different conditions, e.g at
    different sites
  • A very important (and sometimes overlooked)
    component for the Grid computing deployment
  • It is complementary to the software and/or data
    digital signatures approaches that are still in
    the RD phase

32
Grid Production Validation
  • Simulations are run in different conditions
  • for instance, same generation input but different
    production sites
  • For each sample, Reconstruction, i.e Atrecon is
    run to produce standard CBNT ntuples
  • The validation application launches specialized
    independent analyses for ATLAS subsystems
  • For each sample standard histograms are produced

33
Comparison Procedure
Superimposed Samples
Test sample
Reference sample
Contributions to ?2
34
Summary of Comparison
Comparison procedure endswith a ?2 -bar chart
summary Give a pretty nice overview of how
samples compare
35
Example of Finding
Comparing energy in calorimeters for Z ? 2l
samples DC0, DC1
Difference caused by the ? cut at generation
It works!
36
Summary
  • ATLAS computing is in the middle of first period
    of Data Challenges of increasing scope and
    complexity and is steadily progressing towards a
    highly functional software suite, plus a World
    Wide computing model which gives all ATLAS equal
    and equal quality of access to ATLAS data
  • These Data Challenges are executed at the
    prototype tier centers and use as much as
    possible the Grid middleware being developed in
    Grid projects around the world
  • In close collaboration between the Grid and Data
    Challenge communities ATLAS is testing
    large-scale testbed prototypes, deploying
    prototype components to integrate and test Grid
    software in a production environment, and running
    Data Challenge 1 production in 26 prototype tier
    centers in 17 countries on four continents
  • Quite promising start for ATLAS Data Challenges!
Write a Comment
User Comments (0)
About PowerShow.com