Title: A Multidisciplinary Computer Centre is it possible
1A Multidisciplinary Computer Centre is it
possible?
- John Gordon
- CCLRC eSCCHEP March 2003
2The Problem?
- A UK Colleague, quoted a few years ago when linux
for physics was just becoming common - We have four Linux systems one for users to
login, one for CERN Linux, one for DESY Linux,
one for Fermilab linux. And I think we will need
one for BaBar Linux soon - Things have changed but by how much?
- Many of the talks in this session describe
implementing a solution for one experiment but
the staff requirements of this solution scale
with number of experiments supported and the
fragmentation of resources is inefficient. - Can we run a single centre for everyone?
3LHC Hierarchical Model
lcg.web.cern.ch/lcg
4Sitting in the Centre
Running US Experiments
Future LHC Experiments
A site like ours sits between many experiments
and grids
5The multi-experiment centre
- So what does a big centre look like these days?
- A big linux cluster and lots of disk?
- Many types of hardware
- All flavours of unix (still VMS!!)
- All uses from desktop to supercomputer
- Different disks (SCSI, IDE, RAID, SAN)
- Different tapes
- Different user communities
6The multi-experiment centre
- Unlikely to be able to run a centre for all
disciplines if we cannot even run one for all HEP
experiments - This talk focuses on the problems of supporting
many different HEP experiments
7Not a problem
- Lots of hardware problems, but the same ones as
big and small centres - Lots of anecdotes about hardware problems but
sharing between experiments hasnt been an issue
recently. - Apart from Suns for Babar
- and we backed away from AMD once because an
experiment wouldnt accept them.
8The problems
- Software levels
- experts
- Local rules
- Security
- Firewalls
- The accelerator centres
9Software Levels
- Experiment A must upgrade the OS (or compiler,
etc), Experiment B cannot. - Linux brings more hardware dependencies
- ExperimentA needs one kernel, fiberchannel driver
only available in another - Now we have middleware too!!
- Experiments can disagree over middleware and OS.
- And the middleware might not match the OS
10experts
- A 200GB disk costs 100 in Best Buy
- Therefore 100TB should cost 50K
- If you pay more, you are profligate and are
wasting HEP funds!!! - and you should probably be able to negotiate a
further discount for bulk purchase!
11Local Rules
- A responsible site probably has a policy for who
can use its resources, with forms, acceptable use
conditions and other safeguards. - Most countries have legal obligations to trace
users in case of law-breaking. - Do we really want them to throw these away for
the grid? - Even if we want to, only a purely HEP lab can
overrule the rules themselves - Even they usually have masters (DoE)
12Security -Why Do We Care?
- Illegal use of resources (stolen software, child
pornography ..) - Base for high bandwidth attack on other targets
(commercial, government ..) - Unauthorised access to local data (data
protection, financial info ) - Health and safety eg beam-line control
- Destruction of local data, disruption of local
service - Gain passwords, keys to attack peer sites
13Security
- Most security issues are common to all sites
- Issues especially relevant here are
- Accelerator Centres (see earlier)
- Distributed computing crosses security boundaries
- Authentication models, trust
- Remote users less attached to your integrity
- Shared usernames how can you trace?
- Software often under active development
- Smaller user community and many less developers
than (eg) Apache
14Why Do We Need a Firewall?
- You do not need a firewall if
- Either you have perfect (bug free) operating
systems and you have infallible system
administrators AND users - Or you dont care if you have security incidents
(unauthorised access to resources)
15How Do Hackers Break in
- Coding errors in server software
- Buffer overflows give more than expected (poor
bounds checking) - Provide unexpected control info (eg append
unexpected commands) - Trojans and viruses backdoors
- Inadequate access control. Eg
- NFS export root filesystem R/W to world)
- https server allows googlebot access to control
menus file delete really delete !!! - Scanning rate hundreds per minute
16Common Firewall Policies
- Dont bother! Very unlikelydisasters!
- Simple exclusion of some protocols. Eg prevent
SNMP off site. - Only allow some protocols
- eg only allow kerberised or encrypted protocols.
- Protected host ranges
- eg keep some hosts/networks safe
- Protect large ranges of ports
- eg privileged port range.
- Access control by host/port
- Different sites probably different policy!
17The accelerator centres
- You will run
- Our Linux
- Our software
- Our middleware
- Our applications
- Our security model
- Dont bother us with your local restrictions or
firewalls - Oh, and by the way, youll give us root access to
your machines to install it and sort out any
problems
18The Answers
- so far
- I hope I can learn more this week
19Software levels
- Will never get hardware vendors to remove
dependence on OS - Lobby middleware developers to be OS independent
- and to keep up reasonably quickly with latest
releases - Experiment developers should code to support
multiple versions of everything - Dont run to use new features
20experts
- Ignore
- Politely tell them to go away
- Explain the realities of 24x365 use
- Ask them to demonstrate their solutions
- And be prepared to accept if they are correct
- Evaluate the most likely of their suggestions
21Local Rules (BaBar/RAL example)
- RAL is a TierA centre for BaBar
- BaBar users have already signed up to conditions
for SLAC, BaBar, Objectivity - They get an X.509 certificate
- Sign EDG accceptable use conditions
- Users are made aware of RAL-specific issues
- network traffic might be monitored
- RAL is happy that they know who the users are and
can trace them. - They are allowed to run as grid users
22Local Rules
- Use other sites as examples
- Common acceptable use policies
- The more sites involved in writing them, the more
likely they are to be acceptable - Get ACs to act as legal entity for a VO
- Need to trust the integrity of the VO
- Local admins feel better if they can sue someone
- Dont tell them they have no chance of suing CERN
23Security
- Educate users through their sysadmins. Make them
aware of the risks and responsibilities - PKI and Grid offers roles and groups so
someone can act as production simulation manager
but still be identifiable.
24Firewalls
- One can often persuade local network admin to
make an exception once. - But not many times
- Establish trust of your network admin
- Convince them that you take security seriously.
- Less likely to achieve this if your machines are
regularly broken into. - Experiment and middleware developers need to
address firewall issues in their design - Security Group of LCG might help here.
25The accelerator centres
- They are not used to being questioned.
- Put them face to face to resolve clashes
- HEPiX is a good forum for this. Successes so
far.. - AFS, profiles
- Large Cluster Workshop
- Surveys on firewalls support.
- But the grid has been a step back
- Different centres, different grids.
26The accelerator centres
- This problem works against experiments
interests. - Experiments should take more control over their
software environments, take their own compilers
and libraries with them. - Lobby for standard distributions
- and use them
27Summary
- It is possible to take the first steps towards a
truly multidisciplinary computer centre - Starting with HEP
- Labs and experiments need to talk and adopt
new/common practices - Need a culture of collaboration in many
dimensions - Lab-lab, experiment-experiment, and
experiment-labs - Dont forget that your experiment/ software/
middleware is not the only one and some poor
is having to cope with them all.