Title: Oxford PP Computing Site Report
1Oxford PP Computing Site Report
- HEPSYSMAN
- 28th April 2003
- Pete Gronbech
2General Strategy
- Approx 200 Windows 2000 Desktop PCs with Exceed
used to access central Linux systems - Digital Unix and VMS phased out for general use.
- Red Hat Linux 7.3 is becoming the standard
3Network Access
Super Janet 4
2.4Gb/s with Super Janet 4
Physics Backbone Router
100Mb/s
Physics Firewall
OUCS Firewall
100Mb/s
1Gb/s
Backbone Edge Router
1Gb/s
100Mb/s
Campus Backbone Router
100Mb/s
1Gb/s
depts
Backbone Edge Router
depts
100Mb/s
depts
100Mb/s
depts
4Physics Backbone Upgrade to Gigabit Autumn 2002
Linux Server
1Gb/s
Physics Firewall
Server Gb/s switch
1Gb/s
Win 2k Server
1Gb/s
100Mb/s
Particle Physics
1Gb/s
100Mb/s
Physics Backbone Router
100Mb/s
1Gb/s
desktop
Clarendon Lab
100Mb/s
1Gb/s
desktop
1Gb/s
1Gb/s
100Mb/s
Astro
Atmos
Theory
5Autumn 2002
CDF
General Purpose Systems
Fermi7.3.1
RH7.3
RH7.3
RH6.2
RH7.1
1Gb/s
pplx2
pplx1
morpheus
pplxfs1
pplxgen
minos DAQ
RH7.3
RH7.3
RH6.2
RH7.1
ppminos1
ppminos2
pplx3 (SNO)
ppnt117 (HARP)
cresst DAQ
RH7.3
RH7.1
Grid Development
ppcresst1
ppcresst2
RH7.3
RH6.2
RH6.2
RH6.2
RH6.2
RH6.2
RH6.2
Atlas DAQ
RH7.1
RH7.1
grid
pplxbatch
pptb01
pptb02
tblcfg
tbse01
tbce01
sam testing
edg ui
ppatlas1
atlassbc
6General Purpose Systems
RH7.3
RH7.3
RH6.2
1Gb/s
pplx2
pplxfs1
pplxgen
7Zero - D X- 3i SCSI -IDE RAID 12 160GB Maxtor
Drives
This proved to be a disaster and was rejected in
favour of bare scsi disks which we internally
mounted in our rack mounted file server
Supplied by Compusys
8The Linux File Server pplxfs1 8146GB SCSI disks
9General Purpose Linux Server pplxgen
pplxgen is a Dual 2.2GHz Pentium 4 Xeon based
system with 2GB ram. It is running Red Hat 7.3 It
was brought on line at the end of August 2002 to
share the load with pplx2 as users migrated off
al1 (the Digital Unix Server)
10PP batch farm running Red Hat 7.3 with Open PBS
can be seen below pplxgen This service became
fully operational in Feb 2003.
11FEBRUARY 2003
CDF
Fermi7.3.1
RH7.1
pplx1 (new)
morpheus
1Gb/s
LHCB MC
Fermi7.3.1
Fermi7.3.1
RH6.2
RH6.2
Fermi7.3.1
node9
Fermi7.3.1
Fermi7.3.1
grid
pplxbatch
Fermi7.3.1
RH6.1
Fermi7.3.1
Fermi7.3.1
Fermi7.3.1
tbgen01
Fermi7.3.1
Grid Development
node1
Fermi7.3.1
Fermi7.3.1
RH6.2
RH6.2
RH6.2
RH6.2
RH6.2
RH6.2
RH7.3
matrix
cdfsam
pptb01
tblcfg
tbse01
tbce01
tbwn01
tbwn02
pptb02
edg ui
sam testing
12Grid development systems. Including EDG software
testbed setup.
13New Linux Systems
Morpheus is an IBM x370 8 way SMP 700MHz
Xeon with 4GB RAM and 1TB Fibre Channel
disks Installed August 2001 Purchased as part of
a JIF grant for the cdf group Runs Red Hat
7.1 Will use cdf software developed at Fermilab
and here to process data from the cdf experiment.
14Tape Backup is provided by a Qualstar
TLS4480 tape robot with 80 slots and Dual Sony
AIT3 drives. Each tape can hold 100GB of data.
Installed January 2002. Netvault Software from
BakBoneis used, running on morpheus, for backup
of both cdf and particle physics systems.
15Second round of cdf JIF tender Dell Cluster -
MATRIX
10 Dual 2.4GHz P4 Xeon servers running Fermi
linux 7.3.1 and SCALI cluster software. Installed
December 2002
16Approx 7.5 TB for SCSI RAID 5 disks are attached
to the master node. Each shelf holds 14 146GB
disks. These are shared via NFS with the worker
nodes. OpenPBS batch queuing software is used.
17Plenty of space in the second rack for expansion
of the cluster.
18Lhcb Monte Carlo Setup
Compute Node
Grid Gateway
8 way 700MHz Xeon Server RH6.2OpenAFSOpenPBS
gridRH6.2Globus1.1.3OpenAFSOpenPBS
The 8 way SMP has now been reloaded as a MS
Windows Terminal Server and lhcb MC jobs will be
run on the new pp farm.
19Problems
- IDE Raid proved to be unreliable, caused lots of
down time. - Problems with NAT (using iptables caused NFS
problems and hangs) Solved by dropping NAT and
using real IP addresses for PP farm - Trouble with ext3 journal errors.
- Hackers
20Problems
- Lack of Manpower!
- Number of Operating systems slowly reducing,
Digital unix and vms very nearly gone. NT4 also
practically eliminated. - Getting closer to standardising on RH 7.3
especially as the EDG software is now heading
that way. - Still finding it very hard to support laptops but
now have a standard clone and recommend IBM
laptops. - Would be good to have more time to concentrate on
security. (See later talk)