Title: Putting Existing Farms on the Testbed
1Putting Existing Farms on the Testbed
- Manchester DZero/Atlas and BaBar farms are
available via the Testbed. - Done with a handful of modifications to the
Testbed site and to the existing farms. - This talks describes what we did and how you can
do it too...
Andrew McNab - Manchester HEP - 17 September 2002
2Farms at Manchester HEP
BaBar 80 0.8GHz
GridFarm 16 1.0GHz
DZero / Atlas 60 1.5GHz
Andrew McNab - Manchester HEP - 17 September 2002
3The problem
- We want to make existing farms available on the
Testbed. - But we dont want to massively reconfigure/reinsta
ll farms - theyre in production so need to be kept stable
- they are already configured the way their owners
need - We might want to keep reinstalling as EDG
software is updated. - this is labour intensive unless we install from
scratch with LCFG install - dont want to have to make many manual changes to
CE etc every time we install/upgrade - Solution that has been mentioned several times is
to have a standard EDG Testbed Site as a front
end to the Existing Farm - So want to find the minimal set of changes to
Farm and Testbed Site that will put the Farm on
the Testbed.
Andrew McNab - Manchester HEP - 17 September 2002
4Standard Testbed Site
/home
- All elements installed from LCFG server
- Computing Element shares /home directories by
NFS - Storage Element shares /flatfiles with data by
NFS - PBS Server on CE talks to PBS on Worker Nodes.
CE
WN
PBS Node
PBS Server
PBS
LCFG
WN
PBS Node
SE
WN
PBS Node
/flatfiles
Andrew McNab - Manchester HEP - 17 September 2002
5What we want
Grid Farm / Testbed Site
BaBar or DZero/Atlas Farm
/home
qsub
CE
WN
PBS Node
PBS Server
PBS Server
PBS
LCFG
WN
PBS Node
PBS Node
SE
WN
PBS Node
PBS Node
/flatfiles
Andrew McNab - Manchester HEP - 17 September 2002
6Reconfigure Existing Farm
- PBS Server must allow access from CE, but only
for the right users. - Add CE to list of valid job submission clients
(eg in hosts.equiv) - Create special queue (bfq or dfq) for Testbed
jobs. - Limit queues so desired pool of accounts (eg
atlas001 etc) can submit jobs to the bfq/dfq but
other queues/pools forbidden. - PBS Nodes need access to pool accounts, home
directories on CE, and /flatfiles area on SE. - If already using NFS automount, then easy to add
/home on CE and /flatfiles on SE (eg as
/nfs/gf-home and /nfs/gf-flatfiles) - Add pool accounts to /etc/passwd (or NIS)
- Make symbolic links in /home to automount CE
/home directories.
Andrew McNab - Manchester HEP - 17 September 2002
7Software on PBS Nodes
- For current EDG job submissions to work, need to
install globus-url-copy RPMs on PBS Nodes. - PBS Nodes currently need to make an outgoing
gridftp - connections to Resource Broker.
- GridFTP possible with NAT, but difficult.
- Other middleware RPMs will be needed if also
intending to manipulate SE and RC during jobs. - For use with EDG Testbed, should also install
relevant application RPMs
Andrew McNab - Manchester HEP - 17 September 2002
8Changes to Testbed Site
- Have attempted to minimise changes
- easier to document and support
- easier to maintain as EDG software changes
- Basic philosophy modify EDG scripts to make
remote qsub and qstat calls to PBS Server
machines on the farms. - Only need to edit 3 scripts on the CE
- /opt/globus/libexec/globus-script-pbs-queue
- /opt/edg/info/mds/sbin/skel/ce-globus.skel
- /opt/edg/info/mds/bin/ce-pbs
- Create grid-mapfile and ce-static.ldif for each
queue. - Include farm queue and PBS nodes in LCFG
site-cfg.h
Andrew McNab - Manchester HEP - 17 September 2002
9New behaviour
- Modified ce-pbs queries PBS Server using remote
qstat - Publishes edited grid-mapfile listing only the
right users. - Jobs can be submitted using Resource Broker,
based on published information. - When received by CE, globus-script-pbs-queue
submits job to remote PBS Server - EDG Globus jobmanager on CE monitors job status
via remote qstat and transmits to Logging as
normal. - Job runs on PBS Node with access to pool account
/home - Job completes and returns files to RB via gridftp
Andrew McNab - Manchester HEP - 17 September 2002
10Example logs
- Three jobmanagers visible to GridPP MDS and RB
- gf18.hep.man.ac.uk2119/jobmanager-pbs-gfq (Grid
Farm/Testbed) - gf18.hep.man.ac.uk2119/jobmanager-pbs-dfq (DZero/
Atlas farm) - gf18.hep.man.ac.uk2119/jobmanager-pbs-bfq (BaBar
farm) - Different operating system, grid-mapfile lists of
users etc for each queue. - Can submit job to RB and have it matchmake the
requirements - including dynamic properties like free nodes
- Example log shows submitting a job from UI at RAL
via RB at IC, which decides which farm at
Manchester matches and sends the job there.
Andrew McNab - Manchester HEP - 17 September 2002
11Applying this to other sites
- This recipe being written up for
http//www.gridpp.ac.uk/tb-support/ - With current EDG release, the PBS Nodes need
outgoing direct internet access (not NAT.) - You need to be able to make minor changes to PBS
Server permissions, NFS mounts etc as described. - You should have some (3?) dedicated Testbed
machines, or add it to an existing GridPP/EDG
Testbed setup. - We use Microdirect.co.uk boxes at
1.5GHz/256MB/40GB box for 250 . - If you dont use an EDG-supported batch system
(PBS etc), you need to modify ce-pbs and
globus-script-pbs- scripts to use your job
submission commands.
Andrew McNab - Manchester HEP - 17 September 2002
12Summary
- Its not at all difficult to access existing PBS
farms via an EDG Testbed site. - include CE SE in NFS and PBS configuration of
farm - include pool accounts in farms passwd file
- enforce security by account pools
- Only need to modify a handful of files on the
Testbed CE. - Should be relatively straightforward to apply
this to other batch queue systems even if you
dont use PBS. - Weve demonstrated putting our 150 1 GHz nodes
on the current Testbed and submitting jobs via
GridPP RB - You can too.
Andrew McNab - Manchester HEP - 17 September 2002