PandaRoot on Grid - PowerPoint PPT Presentation

1 / 14
About This Presentation
Title:

PandaRoot on Grid

Description:

We want to submit a job running the analysis chain presented by Soeren Lange in ... we change to the logical directory /panda/user/p/pbartest/macros/ and add ... – PowerPoint PPT presentation

Number of Views:32
Avg rating:3.0/5.0
Slides: 15
Provided by: nuclea
Category:

less

Transcript and Presenter's Notes

Title: PandaRoot on Grid


1
PandaRoot on Grid
  • Dan Protopopescu
  • Glasgow, UK

GSI, December 2007
2
The challenge
  • Goal
  • We want to submit a job running the analysis
    chain presented by Soeren Lange in his talk
    18/09/2007 at GSI. This chain consists of five
    example macros, using PandaRoot, run in sequence.
  • Preconditions
  • - The PandaRoot software package is already
    installed on PANDA Grid (else see wiki
    http//nuclear.gla.ac.uk/twiki/bin/view.pl/Main/Pa
    ckMan)
  • - We have the AliEn client installed and an
    alien user certificate (else see wiki
    http//nuclear.gla.ac.uk/twiki/bin/view.pl/Main/Al
    iEn2ClientInstall)

3
The analysis chain
  • in a bash script looks like this
  • -bash-3.00 cat sim_ana_chain.sh
  • !/bin/bash
  • echo "This is the simulation, digitization,
    reconstruction and analysis chain"
  • echo "presented by Soeren Lange in his talk on
    18/09/2007 at GSI"
  • echo "------------------------------------------
    -----------------------------"
  • root -b -q sim_emc2.C\(1\) exit 11
  • root -b -q hit_emc.C exit 12
  • root -b -q digi_emc.C exit 13
  • root -b -q reco_emc.C exit 14
  • root -b -q reco_analys2.C exit 15
  • echo "------------------------------------------
    -----------------------------"
  • echo "Chain finished successfully"
  • What this wrapper script does is obvious runs
    the five Root macros, and exits with a non-zero
    error code if something goes wrong. We have this
    script on the local disk in /home/protopop/. So,
    sim_ana_chain.sh is our job to run.

We pass an argument here that will be used as
seed for the random generator
Error handling is important for job status
identification
4
AliEn Login
  • We initiate the authentication by creating a
    certificate proxy
  • -bash-3.00 alien proxy-init
  • Your identity /CGB/OPANDA-GRID/OUPANDA/CNDa
    n Protopopescu
  • Creating proxy ................................
    .......... Done
  • Your proxy is valid until Fri Nov 16 052831
    2007
  • Then login
  • -bash-3.00 alien login
  • .
  • .
  • .
  • pgdb1.physics.gla.ac.uk3307
    /panda/user/p/pbartest/ gt
  • And we are now in alien prompt.

pbartest is the generic panda grid test uid
5
Adding files to the catalogue
  • To add the macros to the macro directory on the
    logical file system (catalogue),
  • we change to the logical directory
    /panda/user/p/pbartest/macros/ and add the files
  • pgdb1.physics.gla.ac.uk3307
    /panda/user/p/pbartest/ gt cd macros
  • pgdb1.physics.gla.ac.uk3307
    /panda/user/p/pbartest/macros/ gt add sim_emc2.C
    /tmp/sim_emc2.C
  • pgdb1.physics.gla.ac.uk3307
    /panda/user/p/pbartest/macros/ gt add hit_emc.C
    /tmp/hit_emc.C
  • and so on. Notice that in this example we
    had all these macros on the local physical file
    system (PFS), in the /tmp directory. Now they are
    all in the catalogue
  • pgdb1.physics.gla.ac.uk3307
    /panda/user/p/pbartest/macros/ gt ls
  • digi_emc.C hit_emc.C reco_analys2.C
    reco_emc.C sim_emc2.C
  • We also add the wrapper script to the
    catalogue
  • pgdb1.physics.gla.ac.uk3307
    /panda/user/p/pbartest/macros/ gt cd ../bin/
  • pgdb1.physics.gla.ac.uk3307
    /panda/user/p/pbartest/bin/ gt add
    sim_ana_chain.sh /home/protopop/sim_ana_chain.sh

Alien prompt
6
The JDL
  • We write now a JDL for this job. We prepare a
    file on the local disk
  • -bash-3.00 cat /tmp/sim_ana_chain.txt
  • Executable "sim_ana_chain.sh" ???????
  • Packagespbarprod_at_pandaroot0.2"
  • Arguments"alien_counter"
  • InputFile
  • "LF/panda/user/p/pbartest/macros/sim_
    emc2.C",
  • "LF/panda/user/p/pbartest/macros/hit_
    emc.C",
  • "LF/panda/user/p/pbartest/macros/digi
    _emc.C",
  • "LF/panda/user/p/pbartest/macros/reco
    _emc.C",
  • "LF/panda/user/p/pbartest/macros/reco
    _analys2.C"
  • OutputArchive
  • "out.archivestdout,stderr,resources",
  • "root.archivesimparams.root,cluster_em
    c.root,digi_emc.root,hit_emc.root_at_PANDAGlasgow
    raid0"
  • Split"production1-100"
  • Outputdir"/panda/user/p/pbartest/ana/output/run1
    /alien_counter"

This is passed as argument to the executable, and
used in the first macro as seed for the random
generator
7
Job submission
  • We add this JDL to the alien catalogue
  • pgdb1.physics.gla.ac.uk3307 /panda/user/p/pbart
    est/macros/ gt cd ../jdl
  • pgdb1.physics.gla.ac.uk3307 /panda/user/p/pbart
    est/jdl/ gt add sim_ana_chain.jdl
    /tmp/sim_ana_chain.txt
  • And we submit the job
  • pgdb1.physics.gla.ac.uk3307 /panda/user/p/pbart
    est/ gt submit jdl/sim_ana_chain.jdl 11
  • Nov 15 182754 info Submitting job
    '/panda/user/p/pbartest/bin/sim_ana_chain.sh
    alien_counter'...
  • Nov 15 182754 info There is no time to live
    (TTL) defined in the jdl... putting the default
    '6 hours'
  • Nov 15 182754 info There is no price defined
    for this job in the jdl. Putting the default
    '1.0'
  • Nov 15 182754 info Calling directly
    getListPackages (list -silent -all)
  • Nov 15 182754 info Job is going to be split
    for production, running from 1 to 100
  • Nov 15 182754 info Input Box digi_emc.C
    hit_emc.C reco_analys2.C reco_emc.C sim_emc2.C
  • Nov 15 182754 info Command submitted (job
    68417)!!
  • Job ID is 68417 - 0
  • Note that the job was automatically split into
    100 identical subjobs. This is useful when
  • One wants to run production with N identical jobs
    without going through N separate

Just some arbitrary job id
8
Status
  • Let's check the status
  • pgdb1.physics.gla.ac.uk3307 /panda/user/p/pbart
    est/ gt masterJob 68417 -printsite
  • Nov 15 183040 info Checking the masterjob of
    68417
  • Nov 15 183040 info The job 68417 is in
    status SPLIT
  • It has the following subjobs
  • Subjobs in WAITING 87
  • Subjobs in RUNNING
    (grid8.gsi.de) 11
  • Subjobs in RUNNING
    (kvip81.kvi.nl) 1
  • Subjobs in RUNNING
    (smigrid01.smi.oeaw.ac.at) 1
  • In total, there are 100 subjobs
  • or
  • pgdb1.physics.gla.ac.uk3307 /panda/user/p/pbart
    est/ gt top
  • JobId Status Command name
    Submithost
  • 68418 RUNNING /panda/user/p/pbartest/bin/
    sim_ana_chain.sh pbartest_at_npcfs.physics.gla.ac
    .uk
  • 68419 RUNNING /panda/user/p/pbartest/bin/
    sim_ana_chain.sh pbartest_at_npcfs.physics.gla.ac
    .uk

9
Output
  • After some jobs finished, we list one of the
    output directories
  • pgdb1.physics.gla.ac.uk3307 /panda/user/p/pbart
    est/ gt cd ana/output/run11/10/
  • pgdb1.physics.gla.ac.uk3307 /panda/user/p/pbart
    est/ana/output/run11/10/ gt ls
  • cluster_emc.root
  • digi_emc.root
  • hit_emc.root
  • out.archive
  • resources
  • root.archive
  • simparams.root
  • sim_emc.root
  • stderr
  • stdout

10
The stdout
  • We can check the stdout file
  • pgdb1.physics.gla.ac.uk3307 /panda/user/p/pbart
    est/ana/output/run11/10/ gt cat stdout
  • . . .
  • EXT PARAMETER APPROXIMATE
    STEP FIRST
  • NO. NAME VALUE ERROR
    SIZE DERIVATIVE
  • 1 p0 1.80169e03 2.52610e04
    8.40019e-01 -4.57108e-10
  • 2 p1 9.18940e-02 3.06292e-02
    1.25847e-06 3.19386e-04
  • 3 p2 9.98018e-03 9.57160e-03
    3.25014e-07 -1.23676e-03
  • --------------------------------------------------
    ---------------------
  • Chain finished successfully

  • PandaRoot Validation Script v2.0
  • Time Thu Nov 15 193505 CET 2007
  • Dir /home/kvigrid/alien-job-68419
  • Workdir /home/kvigrid/alien-job-68419
  • --------------------------------------------------
    --
  • total 50636
  • . . .

Lots of output here
This message is from the validation script
11
Resubmissions
  • Lets say we check the status of this job a bit
    later
  • pgdb1.physics.gla.ac.uk3307 /panda/user/p/pbart
    est gt masterJob 68417 -printsite
  • Nov 16 104910 info Checking the masterjob of
    68417
  • Nov 16 104910 info The job 68417 is in
    status DONE
  • It has the following subjobs
  • Subjobs in DONE (grid8.gsi.de) 57
  • Subjobs in DONE (ikp663.ikp.kfa-juel
    ich.de) 5
  • Subjobs in DONE (kvip81.kvi.nl) 7
  • Subjobs in DONE (npcfs.physics.gla.a
    c.uk) 20
  • Subjobs in DONE (pandafarm01.to.infn
    .it) 5
  • Subjobs in DONE (smigrid01.smi.oeaw.
    ac.at) 4
  • Subjobs in ERROR_E
    (smigrid01.smi.oeaw.ac.at) 2
  • In total, there are 100 subjobs
  • We can resubmit the ones that finished with
    errors
  • pgdb1.physics.gla.ac.uk3307 /panda/user/p/pbart
    est gt masterJob 68417 -status ERROR_E resubmit
  • Nov 16 105131 info Checking the masterjob of
    68417
  • Nov 16 105135 info resubmiting subjob 68428

Oops!
12
Your own macro
  • If you would want to run your own macro you would
    have to
  • Write a wrapper script for it. The simplest would
    be
  • -bash-3.00 cat myscript.sh
  • root -b mymacro.C
  • Add the script to the catalogue in your bin/
    directory
  • Add the macro to the catalogue
  • Write a JDL that executes the script. It will
    contain the lines
  • Executable myscript.sh
  • InputFile "LF/panda/user/p/pbartest/macros/myma
    cro.C
  • Submit your JDL

13
The appropriate tool for the task
One would use a hammer for 1 nail
and a nail gun for 1000
14
Final remarks
  • For an update on PANDA Grid status please attend
    my plenary talk on
  • Wednesday, December 12, at 10 AM
  • See a more detailed wiki tutorial at
  • http//nuclear.gla.ac.uk/twiki/bin/view.pl/Main/Su
    bmitExample
  • Learn more by participating at our Grid
    workshops
  • http//nuclear.gla.ac.uk/grid-workshop/
  • Next PANDA Grid workshop (aka Gridathlon)
  • INFN Frascati, February 4-8, 2007
Write a Comment
User Comments (0)
About PowerShow.com