ATLAS Data Challenge on NorduGrid - PowerPoint PPT Presentation

1 / 23
About This Presentation
Title:

ATLAS Data Challenge on NorduGrid

Description:

ATLAS Data Challenge on NorduGrid CHEP2003 UCSD Anders W n nen waananen_at_nbi.dk – PowerPoint PPT presentation

Number of Views:111
Avg rating:3.0/5.0
Slides: 24
Provided by: AndersW5
Category:

less

Transcript and Presenter's Notes

Title: ATLAS Data Challenge on NorduGrid


1
ATLAS Data Challenge on NorduGrid
  • CHEP2003 UCSD
  • Anders Wäänänen waananen_at_nbi.dk

2
NorduGrid project
  • Launched in spring of 2001, with the aim of
    creating a Grid infrastructure in the Nordic
    countries.
  • Idea to have a Monarch architecture with a common
    tier 1 center
  • Partners from Denmark, Norway, Sweden, and
    Finland
  • Initially meant to be the Nordic branch of the EU
    DataGrid (EDG) project
  • 3 full-time researchers with few externally funded

3
Motivations
  • NorduGrid was initially meant to be a pure
    deployment project
  • One goal was to have the ATLAS data challenge run
    by May 2002
  • Should be based on the the Globus Toolkit
  • Available Grid middleware
  • The Globus Toolkit
  • A toolbox not a complete solution
  • European DataGrid software
  • Not mature for production in the beginning of
    2002
  • Architecture problems

4
A Job Submission Example
Replica Catalogue
Information Service
Resource Broker
Author. Authen.
Storage Element
Job Submission Service
Logging Book-keeping
Compute Element
5
Architecture requirements
  • No single point of failure
  • Should be scalable
  • Resource owners should have full control over
    their resources
  • As few site requirements as possible
  • Local cluster installation details should not be
    dictated
  • Method, OS version, configuration, etc
  • Compute nodes should not be required to be on the
    public network
  • Clusters need not be dedicated to the Grid

6
User interface
  • The NorduGrid user interface provides a set of
    commands for interacting with the grid
  • ngsub for submitting jobs
  • ngstat for states of jobs and clusters
  • ngcat to see stdout/stderr of running jobs
  • ngget to retrieve the results from finished
    jobs
  • ngkill to kill running jobs
  • ngclean to delete finished jobs from the system
  • ngcopy to copy files to, from and between file
    servers and replica catalogs
  • ngremove to delete files from file servers and
    RCs

7
ATLAS Data Challenges
  • A series of computing challenges within Atlas of
    increasing size and complexity.
  • Preparing for data-taking and analysis at the
    LHC.
  • Thorough validation of the complete Atlas
    software suite.
  • Introduction and use of Grid middleware as fast
    and as much as possible.

8
Data Challenge 1
  • Main goals
  • Need to produce data for High Level Trigger
    Physics groups
  • Study performance of Athena framework and
    algorithms for use in HLT
  • High statistics needed
  • Few samples of up to 107 events in 10-20 days,
    O(1000) CPUs
  • Simulation pile-up
  • Reconstruction analysis on a large scale
  • learn about data model I/O performances
    identify bottlenecks etc
  • Data management
  • Use/evaluate persistency technology (AthenaRoot
    I/O)
  • Learn about distributed analysis
  • Involvement of sites outside CERN
  • use of Grid as and when possible and appropriate

9
DC1, phase 1 Task Flow
  • Example one sample of di-jet events
  • PYTHIA event generation 1.5 x 107 events split
    into partitions (read ROOT files)
  • Detector simulation 20 jobs per partition, ZEBRA
    output

Athena-Root I/O
Zebra
Hits/ Digits MCTruth
Atlsim/Geant3 Filter
Di-jet
HepMC
(450 evts)
(5000 evts)
105 events
Pythia6
Atlsim/Geant3 Filter
Hits/ Digits MCTruth
HepMC
Atlsim/Geant3 Filter
Hits/ Digits MCtruth
HepMC
Event generation
Detector Simulation
10
DC1, phase 1 Summary
  • July-August 2002
  • 39 institutes in 18 countries
  • 3200 CPUs , approx.110 kSI95 71000 CPU-days
  • 5 107 events generated
  • 1 107 events simulated
  • 30 Tbytes produced
  • 35 000 files of output

11
DC1, phase1 for NorduGrid
  • Simulation
  • Dataset 2000 2003 (different event generation)
    assigned to NorduGrid
  • Total number of fully simulated events
  • 287296 (1.15 107 of input events)
  • Total output size 762 GB.
  • All files uploaded to a Storage Element
    (University of Oslo) and registered in the
    Replica Catalog.

12
Job xRSL script
  • (executableds2000.sh)
  • (arguments1244)
  • (stdoutdc1.002000.simul.01244.hlt.pythia_jet_17.
    log)
  • (joinyes)
  • (inputfiles(ds2000.sh http//www.nordugrid.org
    /applications/dc1/2000/dc1.002000.simul.NG.sh))
  • (outputfiles
  • (atlas.01244.zebra rc//dc1.uio.no/2000/log/dc1
    .002000.simul.01244.hlt.pythia_jet_17.zebra)
  • (atlas.01244.his rc//dc1.uio.no/2000/log/dc1
    .002000.simul.01244.hlt.pythia_jet_17.his)
  • (dc1.002000.simul.01244.hlt.pythia_jet_17.log
    rc//dc1.uio.no/2000/log/dc1.002000.simul.01244.h
    lt.pythia_jet_17.log)
  • (dc1.002000.simul.01244.hlt.pythia_jet_17.AMI
    rc//dc1.uio.no/2000/log/dc1.002000.simul.01244.h
    lt.pythia_jet_17.AMI)
  • (dc1.002000.simul.01244.hlt.pythia_jet_17.MAG
    rc//dc1.uio.no/2000/log/dc1.002000.simul.01244.h
    lt.pythia_jet_17.MAG))
  • (jobnamedc1.002000.simul.01244.hlt.pythia_jet_17
    )
  • (runtimeEnvironmentDC1-ATLAS)
  • (replicacollectionldap//grid.uio.no389/lcATLA
    S,rcNorduGrid,dcnordugrid,dcorg)
  • (maxCPUTime2000)(maxDisk1200)
  • (notifye waananen_at_nbi.dk)

13
NorduGrid job submission
  • The user submits a xRSL-file specifying the
    job-options.
  • The xRSL-file is processed by the User-Interface.
  • The User-Interface queries the NG Information
    System for resources and the NorduGrid
    Replica-Catalog for location of input-files and
    submits the job to the selected resource.
  • Here the job is processed by the Grid Manager,
    which downloads or links files to the local
    session directory.
  • The Grid Manager submits the job to the local
    resource management system.
  • After simulation finishes, the Grid-Manager moves
    requested output to Storage Elements and
    registers these into the NorduGrid
    Replica-Catalog.

14
NorduGrid job submission
Gatekeeper GridFTP
Grid Manager
15
NorduGrid Production sites
16
(No Transcript)
17
NorduGrid Pileup
  • DC1, pile-up
  • Low luminosity pile-up for the phase 1 events
  • Number of jobs 1300
  • dataset 2000 300
  • dataset 2003 1000
  • Total output-size 1083 GB
  • dataset 2000 463 GB
  • dataset 2003 620 GB

18
Pileup procedure
  • Each job downloaded one zebra-file from
    dc1.uio.no of approximate
  • 900MB for dataset 2000
  • 400MB for dataset 2003
  • Use locally present minimum-bias zebra-files to
    "pileup" events on top of the original simulated
    ones present in the downloaded file. The output
    size of each file was about 50 bigger than the
    original downloaded file i.e.
  • 1.5 GB for dataset 2000
  • 600 GB for dataset 2003
  • Upload output-files to dc1.uio.no and dc2.uio.no
    SEs
  • Register into the RC.

19
Other details
  • At peak production, up to 200 jobs were managed
    by the NorduGrid at the same time.
  • Has most of Scandinavian production clusters
    under its belt (2 of them are in Top 500)
  • However not all of them allow for installation of
    ATLAS Software
  • Atlas job manager Atlas Commander support the
    NorduGrid toolkit
  • Issues
  • Replica Catalog scalability problems
  • MDS / OpenLDAP hangs solved
  • Software threading problems partly solved
  • Problems partly in Globus libraries

20
NorduGrid DC1 timeline
  • April 5th 2002
  • First ATLAS job submitted (Athena Hello World)
  • May 10th 2002
  • First pre-DC1-validation-job submitted
  • (ATLSIM test using Atlas-release 3.0.1)
  • End of May 2002
  • Now clear that NorduGrid mature enough to handle
    real production
  • Spring 2003 (now)
  • Keep running Data challenges and improve the
    toolkit

21
Quick client installation/job run
  • As a normal user (non system privileges
    required)
  • Retrieve nordugrid-standalone-0.3.17.rh72.i386.tgz
  • tar xfz nordugrid-standalone-0.3.17.rh72.i386.tgz
  • cd nordugrid-standalone-0.3.17
  • source ./setup.sh
  • Get a personal certificate
  • grid-cert-request
  • Install certificate per instructions
  • Get authorized on a cluster
  • Run a job
  • grid-proxy-init
  • ngsub '(executable/bin/echo)(arguments"Hello
    World")

22
Resources
  • Documentation and source code are available for
    download
  • Main Web site
  • http//www.nordugrid.org/
  • ATLAS DC1 with NorduGrid
  • http//www.nordugrid.org/applications/dc1/
  • Software repository
  • ftp//ftp.nordugrid.org/pub/nordugrid/

23
The NorduGrid core group
  • ????????? ????????????
  • Balázs Kónya
  • Mattias Ellert
  • ?????? ????????
  • Jakob Langgaard Nielsen
  • Trond Myklebust
  • Anders Wäänänen
Write a Comment
User Comments (0)
About PowerShow.com