Scheduler tutorial - PowerPoint PPT Presentation

About This Presentation
Title:

Scheduler tutorial

Description:

Title: Slide 1 Author: Gabriele Carcassi Last modified by: Gabriele Carcassi Created Date: 2/11/2003 7:04:09 PM Document presentation format: On-screen Show – PowerPoint PPT presentation

Number of Views:82
Avg rating:3.0/5.0
Slides: 24
Provided by: Gabri118
Learn more at: https://www.star.bnl.gov
Category:

less

Transcript and Presenter's Notes

Title: Scheduler tutorial


1
Scheduler tutorial
  • Gabriele Carcassi
  • STAR Collaboration

2
Why use the scheduler?
  • Allows the distributed disk model
  • MuDST are now resident on the local disk of each
    node
  • I/O performance
  • more space available for data files
  • Allows us to change implementation
  • migrate to GRID tools while keeping the user
    interface unchanged

3
Why use the scheduler?
  • Easy to use
  • No more scripts to write
  • Dont have to keep track where the data files are
    located
  • You just write a small XML file to dispatch
    hundreds of jobs

4
An example
  • We want to execute a root macro on all the MuDST
    with minbias trigger and collision deuteron-Gold
    at 200 GeV
  • The output of our macro will be a root file
    containing a histogram

5
The macro
void SchedulerExample(const char fileList, const
char outFile) load()
Among the input parameters of the macro there is
the filelist (that is the name of a file
containing a list of input files on which the
macros will operate) and the output file in
which the histogram will be saved
Then we prepare the chain
We give the output filename to our analysis maker
6
The macro
// now create StMuDstMaker // agruments are
// 0 read mode // 0 name mode (has no
effect on read mode) // "" input directory,
ignored when filename or fileList is specified
// fileList list of files to read // ""
filter // 1e9 maximum number of files to
read // MuDstMaker name of the maker
StMuDebugsetLevel(0) StMuDstMaker
muDstMaker new StMuDstMaker(0,0,"",fileList,"",1
0,"MuDstMaker")
We create the MuDstMaker, giving the fileList as
a parameter
7
The macro
Finally we loop over the events and we clean the
chain
int iret 0 int iev 0 // now loop over
events, makers are call in order of creation
while ( !iret ) cout ltlt "SchedulerExample.C
-- Working on eventNumber " ltlt iev ltlt endl
chain-gtClear() iret chain-gtMake(iev) //
This should call the Make() method in ALL makers
// Event Loop chain-gtFinish() // This
should call the Finish() method in ALL makers
8
The Job description
  • We write an XML file with the description of our
    request

lt?xml version"1.0" encoding"utf-8" ?gt ltjob
maxFilesPerProcess"500"gt ltcommandgtroot4star
-q -b rootMacros/numberOfEventsList.C\(\"FILELIST
\", \"SCRATCH/dAu200_MC_JOBID.root\"\)lt/commandgt
ltstdout URL"file/star/u/carcassi/scheduler/
out/JOBID.out" /gt ltinput URL"catalogstar.bn
l.gov? collisiondAu200,trgsetupnameminbias,filet
ypeMC_reco_MuDst" preferStorage"local"
nFiles"all"/gt ltoutput fromScratch".root"
toURL"file/star/u/carcassi/scheduler/out/"
/gt lt/jobgt
9
The Job description
  • Lets look at it carefully

10
The Job description
11
The Job description
12
Submitting your job
  • Having your job description, you now just need to
    type

13
What has the scheduler done?
  • In the directory where you run star-submit you
    will see lots of .csh and .list files
  • If you execute bjobs, you will see many jobs
    submitted for you
  • From the single job description the scheduler
    has
  • created many processes
  • assigned an input file list
  • dispatched them to LSF
  • How is this done?

14
Dividing the input files
Job description test.xml
lt?xml version"1.0" encoding"utf-8" ?gt ltjob
maxFilesPerProcess"500"gt ltcommandgtroot4star
-q -b rootMacros/numberOfEventsList.C\(\"FILELIST
\"\)lt/commandgt ltstdout URL"file/star/u/carca
ssi/scheduler/out/JOBID.out" /gt ltinput
URL"catalogstar.bnl.gov?productionP02gd,filetyp
edaq_reco_mudst" preferStorage"local"
nFiles"all"/gt ltoutput fromScratch".root"
toURL"file/star/u/carcassi/scheduler/out/"
/gt lt/jobgt
15
Dividing the input files
  • Every process will receive a different input file
    list
  • FILELIST will be different for each process
  • The list is divided according to how the files
    are distributed on the nodes of the farm, and on
    the maxFilesPerProcess limit set
  • FILELIST is the filename for a text file that
    contains a list of files (one for each line)

16
Processes and their outputs
Job description test.xml
lt?xml version"1.0" encoding"utf-8" ?gt ltjob
maxFilesPerProcess"500"gt ltcommandgtroot4star
-q -b rootMacros/numberOfEventsList.C\(\"FILELIST
\"\)lt/commandgt ltstdout URL"file/star/u/carca
ssi/scheduler/out/JOBID.out" /gt ltinput
URL"catalogstar.bnl.gov?productionP02gd,filetyp
edaq_reco_mudst" preferStorage"local"
nFiles"all"/gt ltoutput fromScratch".root"
toURL"file/star/u/carcassi/scheduler/out/"
/gt lt/jobgt
17
Processes and their outputs
  • All the jobs are automatically dispatched to LSF
  • The output of each process must be different
  • If two processes would write on the same file,
    one would overwrite the other
  • One quick way is to use the JOBID (which is
    different for every process) to generate unique
    names

18
Environment variables
  • The scheduler uses some environment variables to
    communicate to your job
  • FILELIST is the name of a file containing the
    input file list for the process
  • INPUTFILECOUNT tells you how many files where
    assigned to the process
  • INPUTFILExx allows you to iterate over the file
    names in a script
  • JOBID gives you a unique identifier composed of
    two parts the request id and the process number
    (es. 1043250413862_0).

19
Environment variables
  • More variables
  • SCRATCH is a temporary directory for a single
    process located on the node the process will be
    executing. You should write your output here, and
    let the scheduler retrieve it for you
  • You can pass the variables to your macro
  • In the example we passed the FILELIST and we
    built the output filename with SCRATCH and JOBID

20
Input from a catalog query
  • The best way to specify the input is through a
    file catalog query
  • you dont have to worry where the files are
  • it will work both at BNL and at PDSF
  • The file catalog has a lot of attributes to
    select your files
  • collision, trgname, library, production, runtype,
    magvalue, configuration, ...
  • You can get familiar with the file catalog by
    using the get_file_list command. The cond
    paramater is the one passed to the scheduler.
  • ltinput URL"catalogstar.bnl.gov?productionP02gd,
    filetypedaq_reco_mudst" preferStorage"local"
    nFiles"all"/gt

21
What changes are requiredto my analysis code?
  • The macro must take the filelist as an argument
  • The macro must write on different output files
    for different execution
  • use JOBID, the filelist or the input files to
    generate unique names

22
Where can you use it?
  • The scheduler is installed both at BNL and at
    PDSF
  • At present, the file catalog at PDSF is not ready
  • For any help and information you can consult the
    scheduler website and the scheduler mailing on
    hypernews

23
References
  • Scheduler hypernews
  • Scheduler manual
  • http//www.star.bnl.gov/STAR/comp/Grid/scheduler/
  • File Catalog manual
  • http//www.star.bnl.gov/comp/sofi/FileCatalog.html
Write a Comment
User Comments (0)
About PowerShow.com