Oxford PP Computing Site Report - PowerPoint PPT Presentation

1 / 20
About This Presentation
Title:

Oxford PP Computing Site Report

Description:

Second round of cdf JIF tender: Dell Cluster - MATRIX ... Plenty of space in the second rack for expansion of the cluster. Lhcb Monte Carlo Setup ... – PowerPoint PPT presentation

Number of Views:39
Avg rating:3.0/5.0
Slides: 21
Provided by: gron
Category:

less

Transcript and Presenter's Notes

Title: Oxford PP Computing Site Report


1
Oxford PP Computing Site Report
  • HEPSYSMAN
  • 28th April 2003
  • Pete Gronbech

2
General Strategy
  • Approx 200 Windows 2000 Desktop PCs with Exceed
    used to access central Linux systems
  • Digital Unix and VMS phased out for general use.
  • Red Hat Linux 7.3 is becoming the standard

3
Network Access
Super Janet 4
2.4Gb/s with Super Janet 4
Physics Backbone Router
100Mb/s
Physics Firewall
OUCS Firewall
100Mb/s
1Gb/s
Backbone Edge Router
1Gb/s
100Mb/s
Campus Backbone Router
100Mb/s
1Gb/s
depts
Backbone Edge Router
depts
100Mb/s
depts
100Mb/s
depts
4
Physics Backbone Upgrade to Gigabit Autumn 2002
Linux Server
1Gb/s
Physics Firewall
Server Gb/s switch
1Gb/s
Win 2k Server
1Gb/s
100Mb/s
Particle Physics
1Gb/s
100Mb/s
Physics Backbone Router
100Mb/s
1Gb/s
desktop
Clarendon Lab
100Mb/s
1Gb/s
desktop
1Gb/s
1Gb/s
100Mb/s
Astro
Atmos
Theory
5
Autumn 2002
CDF
General Purpose Systems
Fermi7.3.1
RH7.3
RH7.3
RH6.2
RH7.1
1Gb/s
pplx2
pplx1
morpheus
pplxfs1
pplxgen
minos DAQ
RH7.3
RH7.3
RH6.2
RH7.1
ppminos1
ppminos2
pplx3 (SNO)
ppnt117 (HARP)
cresst DAQ
RH7.3
RH7.1
Grid Development
ppcresst1
ppcresst2
RH7.3
RH6.2
RH6.2
RH6.2
RH6.2
RH6.2
RH6.2
Atlas DAQ
RH7.1
RH7.1
grid
pplxbatch
pptb01
pptb02
tblcfg
tbse01
tbce01
sam testing
edg ui
ppatlas1
atlassbc
6
General Purpose Systems
RH7.3
RH7.3
RH6.2
1Gb/s
pplx2
pplxfs1
pplxgen
7
Zero - D X- 3i SCSI -IDE RAID 12 160GB Maxtor
Drives
This proved to be a disaster and was rejected in
favour of bare scsi disks which we internally
mounted in our rack mounted file server
Supplied by Compusys
8
The Linux File Server pplxfs1 8146GB SCSI disks
9
General Purpose Linux Server pplxgen
pplxgen is a Dual 2.2GHz Pentium 4 Xeon based
system with 2GB ram. It is running Red Hat 7.3 It
was brought on line at the end of August 2002 to
share the load with pplx2 as users migrated off
al1 (the Digital Unix Server)
10
PP batch farm running Red Hat 7.3 with Open PBS
can be seen below pplxgen This service became
fully operational in Feb 2003.
11
FEBRUARY 2003
CDF
Fermi7.3.1
RH7.1
pplx1 (new)
morpheus
1Gb/s
LHCB MC
Fermi7.3.1
Fermi7.3.1
RH6.2
RH6.2
Fermi7.3.1
node9
Fermi7.3.1
Fermi7.3.1
grid
pplxbatch
Fermi7.3.1
RH6.1
Fermi7.3.1
Fermi7.3.1
Fermi7.3.1
tbgen01
Fermi7.3.1
Grid Development
node1
Fermi7.3.1
Fermi7.3.1
RH6.2
RH6.2
RH6.2
RH6.2
RH6.2
RH6.2
RH7.3
matrix
cdfsam
pptb01
tblcfg
tbse01
tbce01
tbwn01
tbwn02
pptb02
edg ui
sam testing
12
Grid development systems. Including EDG software
testbed setup.
13
New Linux Systems
Morpheus is an IBM x370 8 way SMP 700MHz
Xeon with 4GB RAM and 1TB Fibre Channel
disks Installed August 2001 Purchased as part of
a JIF grant for the cdf group Runs Red Hat
7.1 Will use cdf software developed at Fermilab
and here to process data from the cdf experiment.
14
Tape Backup is provided by a Qualstar
TLS4480 tape robot with 80 slots and Dual Sony
AIT3 drives. Each tape can hold 100GB of data.
Installed January 2002. Netvault Software from
BakBoneis used, running on morpheus, for backup
of both cdf and particle physics systems.
15
Second round of cdf JIF tender Dell Cluster -
MATRIX
10 Dual 2.4GHz P4 Xeon servers running Fermi
linux 7.3.1 and SCALI cluster software. Installed
December 2002
16
Approx 7.5 TB for SCSI RAID 5 disks are attached
to the master node. Each shelf holds 14 146GB
disks. These are shared via NFS with the worker
nodes. OpenPBS batch queuing software is used.
17
Plenty of space in the second rack for expansion
of the cluster.
18
Lhcb Monte Carlo Setup
Compute Node
Grid Gateway
8 way 700MHz Xeon Server RH6.2OpenAFSOpenPBS
gridRH6.2Globus1.1.3OpenAFSOpenPBS
The 8 way SMP has now been reloaded as a MS
Windows Terminal Server and lhcb MC jobs will be
run on the new pp farm.
19
Problems
  • IDE Raid proved to be unreliable, caused lots of
    down time.
  • Problems with NAT (using iptables caused NFS
    problems and hangs) Solved by dropping NAT and
    using real IP addresses for PP farm
  • Trouble with ext3 journal errors.
  • Hackers

20
Problems
  • Lack of Manpower!
  • Number of Operating systems slowly reducing,
    Digital unix and vms very nearly gone. NT4 also
    practically eliminated.
  • Getting closer to standardising on RH 7.3
    especially as the EDG software is now heading
    that way.
  • Still finding it very hard to support laptops but
    now have a standard clone and recommend IBM
    laptops.
  • Would be good to have more time to concentrate on
    security. (See later talk)
Write a Comment
User Comments (0)
About PowerShow.com