Title: Grid Computing Status
1Grid Computing Status
- Michael P. McCumber
- Task Force Meeting
- April 3, 2006
2Grid Operational Status
RCF
SBC
Submit from
CCJ, UNM, VAN
RCF
SBC
Process at
CCJ, UNM, VAN
dCache
SBC
Return to
RCF
(SRM dcache_ferry)
(dotted returns - globus-url-copy needed
cluster-wide or additional SRM-like
implementation on grid-host)
(green submit) working pre-upgrade
3Current Grid Job/SRM Job Logistics
Local Grid Host
Pacman Repository
Remote Grid Host
Remote Node
scratch-dir
semaphore-dir
output-dir
To remove execution cluster dependence, these
directories have been standardized.
dCache
4Grid Job XML Layout
ltxml tag defines XML version /gt ltjob tag sets
input file number gt ltcommand taggt
call pacman source environment
link output to HOME/dcache_ferry/transfer
set seeds/initialize code execute
code create semaphore lt/commandgt
ltinput tag sets input file /gt ltstdout tag
sets output log file/gt ltstderr tag sets
error log file /gt lt/jobgt
PJS will generate this automatically for
simulation users.
Scratch directory available in SCRATCH
5dCache_ferry (SRM job)
As before a crontab job checks for semaphore
files, and runs an SRM transfer. The cycle is
currently throttled to 5 mins to prevent local
disk quota overruns. To the same end, Ive
designed the transfers to delete local copies
once a dcache transfer has been achieved. Also
since running on the remote cluster without user
login is required, Ive re-designed SRM to open a
grid-proxy when file transfers are needed. (Put
proxy password in /srm/bin/open_proxy.csh) Could
be run by a ferry captain with world writable
directories
Required directory structure
HOME
dcache_ferry
semaphore
todo
doing
done
transfer
6Raising a Semaphore
- Once output has been sent to dcache_ferry/transfer
directory, a semaphore is made in semaphore/todo
such that - 1) The file has a unique name (usually based
on JobID, run number, etc) - 2) The file contains a line specifying the
full path of the file to be transferred and the
destination directory in dCache, seperated by a
space. - Ex
- /phenix/u/mccumber/dcache_ferry/transfer/exampleOu
tput.root /pnfs/rcf.bnl.gov/user/mccumber/someDire
ctory
7Current Available Policies
- Usage phenix-grid-submit p ltPolicy Namegt
ltjobXMLgt - Policies
- Grid_Policy (contains all queues)
- Sim_Policy (general queues simulation queues)
- BNL_Only_Policy
- SBC_Only_Policy
- RCAS_Only_Policy
- RCRS_Only_Policy
8Current Development Issue
Old design makes pointless and cumbersome grid
transfers
Old Structure
Exodus
PISA
Reco
RCF
RCF
dCache
New Structure
Exodus
PISA
Reco
dCache
Still solving internal conflicts
9Other Current/Future Spearheads
- Diagnose and solve upgrade issues
- Updating changes to PJS in CVS and Grid
documentation - Incorporate into grid, clusters at Vanderbilt
(positive response), UNM CCJ (died in Limbo,
waiting for requested access) - Solve remote-transfer grid-proxy management
problem (Done) - Add option of non-dCache return of output
(Currently not feasible) - Need automated grid-monitoring software? Could be
done via dynamic production of globalConfig.xml
based on open path ways? Ferry Captain? - Simplify, simplify, simplify