Tier1 Status - PowerPoint PPT Presentation

1 / 31

About This Presentation

Title:

Tier1 Status

Description:

Down to 1 transformer (from 2) for extended periods (weeks). Increased risk of disaster. Single transformer running at max operating load ... – PowerPoint PPT presentation

Number of Views:36

Avg rating:3.0/5.0

Slides: 32

Provided by: grid4

Category:

more less

Transcript and Presenter's Notes

Title: Tier1 Status

1
Tier-1 Status

Andrew Sansum
GRIDPP20
12 March 2008

2
Tier-1 Capacity delivered to WLCG (2007)
RAL
RAL
3
Tier-1 CPU Share by 2007 MoU
4
Wall Time
5
CPU Use by VO (2007)
ATLAS
ALICE
CMS
LHCB
6
Experiment Shares (2008)
7
Grid Only

Non-Grid access to Tier-1 has now ended. Only
special cases (contact us if you believe you are)
now have access to
UIs
Job Submission
Until end of May 2008
IDs will be maintained (disabled)
Home directories will be maintained online
Mail forwarding will be maintained.
After end of May 2008
Ids will be deleted
Home filesystem will be backed up
Mail spool will be backed up
Mail forwarding will stop
AFS service continues for Babar (and just in case)

8
Reliability

Feb mainly due to power failure 8 hours network
Jan/December mainly CASTOR problems over Xmass
period (despite multiple callouts)
Out of hours on-call will help but some problems
take time to diagnose/fix

9
Power Failure Thursday 7th February 1300

Work on power supply since December
Down to 1 transformer (from 2) for extended
periods (weeks). Increased risk of disaster.
Single transformer running at max operating load
No problems until work finished and casing closed
control line crushed and power supply tripped.
Total loss of power to whole building
First power interruption for over 3 years
Restart (Effort gt 200 FTE hours)
Most Global/National/Tier-1 core systems up by
Thursday evening
Most of CASTOR and part of batch up by Friday
Remaining batch on Saturday
Still problems to iron out in CASTOR on
Monday/Tuesday
Lessons
Communication was prompt and sufficient but
ad-hoc
Broadcast unavailable as RAL run GOCDB (now fixed
by caching)
Careful restart of disk servers slow and labour
intensive (but worked) will not scale
See http//www.gridpp.rl.ac.uk/blog/2008/02/18/re
view-of-the-recent-power-failure/

10
Hardware Disk

Production capacity 138 Servers, 2800 drives,
850TB (usable)
1.6PB capacity delivered in January by Viglen
91 Supermicro 3U servers with dual AMD 2220E
(2.8GHz) dual-core CPUs, 8GB RAM, IPMI
1 x 3ware 4 port 9650 PCIe RAID controller with 2
x 250GB WD HDD
1 x 3ware 16 port 9650 PCIe RAID controller with
14 x 750GB WD HDD
91 Supermicro 3U servers with dual Intel E5310
(1.6GHz) quad-core CPUs, 8GB RAM, IPMI
1 x 3ware 4 port 9650 PCIe RAID controller with 2
x 400GB Seagate HDD
1 x 3ware 16 port 9650 PCIe RAID controller with
14 x 750GB Seagate HDD
Acceptance test running scheduled to be
available end of March.
5400 spinning drives after planned phase out in
April (expect drive failure every 3 days)

11
Hardware CPU

Production about 1500KSI2K on 600 systems.
Recently upgraded about 50 of capacity to
2GB/core
Recent procurement (approximately 3000KSI2K - but
YMMV) delivered and under test
Streamline
57 x 1U servers (114 systems, 3 racks), each
system
dual Intel E5410 (2.33GHz) quad-core CPUs
2GB/core, 1 x 500GB HDD
Clustervision
56 x 1U servers (112 systems, 4 racks), each
system
dual Intel E5440 (2.83GHz) quad-core CPUs
2GB/core, 1 x 500GB HDD

12
Hardware Tape

Tape Drives
8 9940B drives
Used on legacy ADS/dCache service phase out
soon
18 T10K tape drives and associated servers
delivered, 15 in production, remainder soon
Planned bandwidth 50MB/s per drive
Actual bandwidth (8-80MB/s) - a work in progress
Media
Approximately 2PB on site

13
Hardware Network
RAL Site
ADS Caches
CPUs Disks
CPUs Disks
RAL Tier 2
N x 1Gb/s
5510 5530
3 x 5510 5530
2 x 5510 5530
10Gb/s
Router A
Firewall
Stack 4 x Nortel 5530
Force10 C300 8 slot Router (6410Gb)
10Gb/s
bypass
OPN Router
10Gb/s
Site Access Router
6 x 5510 5530
5 x 5510 5530
Oracle systems
1Gb/sLancaster (test network)
10Gb/s to SJ5
CPUs Disks
CPUs Disks
Tier 1
10Gb/s to CERN
14
RAL links
implemented
Implement soon
never
15
Backplane Failures (Supermicro)

3 servers burnt out backplane
2 of which set off VESDA
1 called out fire-brigade
Safety risk assessment Urgent rectification
needed

Good response from supplier/manufacturer
PCB fault in bad batch
Replacement nearly complete

16
Machine Rooms

Existing Machine room
Approximately 100 racks of equipment
Getting close to power/cooling capacity
New Machine Room
Work still proceeding near to schedule
800M2 can accommodate 300 racks 5 robots
2.3MW Power/Cooling capacity (some UPS)
Scheduled to be available for September 2008

17
CASTOR Memory Lane
Happy days!
1Q07
2Q07
3Q07
4Q07
1Q08
4Q06
3Q06
2Q06
1Q06
4Q05
CASTOR1 tests OK
2.1.3 good but missing functionality
2.1.2 bad
CASTOR2 Core Running Hard to install
dependencies
ATLAS on CASTOR ?
CSA07 encouraging
Problems with functionality and performance it
doesnt work!
OC Committees note improvement but concerned
Service stopped for extended upgrade
CSA08 reasonably successful
2.1.4 upgrade Goes well. Disk 1 support!
CMS on CASTOR for CSA06. Encouraging. Declare
production service.
LHCB on CASTOR ?
18
Growth in Use of CASTOR
19
Test Architecture
Oracle NS vmgr
Oracle NS vmgr
Shared Services
Shared Services
Tape Server
Oracle stager
Oracle DLF
Oracle DLF
Oracle repack
Oracle DLF
Oracle repack
Oracle stager
Oracle stager
stager
stager
DLF
stager
DLF
DLF LSF
repack
repack
LSF
LSF
Development
Preproduction
Certification Testbed
1 Diskserver - variable
1 Diskserver - variable
1 Diskserver - variable
20
CASTOR Production Architecture
Oracle NS vmgr
Name Server 2
Shared Services
Tape Server
Tape Server
Oracle stager
Oracle DLF
Oracle stager
Oracle DLF
Oracle DLF
Oracle repack
Oracle stager
stager
DLF
stager
stager
stager
DLF
DLF
DLF
repack
LSF
LSF
LSF
LSF
CMS Stager Instance
Atlas Stager Instance
LHCb Stager Instance
Repack and Small User Stager Instance
1 Diskserver
Diskservers
Diskservers
Diskservers
21
Atlas Data Flow Model
AOD2
RAW
RAW
T0Raw
simRaw
AODm2/ TAG
RAW
TAG/ AODm2
RAW
ESD2/ AODm2/ TAG
T0
T2
T1s
ESD/ AODm/ TAG/
AODm1/ TAG
AODm2/ TAG
ESD1/ AODm1/ TAG
ESD1
StripInput
simStrip
ESD
Partner T1
22
CMS Dataflow
All pools are disk0tape1
FarmRead 50 LSF Slots Per server
Batch Farm
Recall
Disk2Disk Copy
WanIn 8 LSF Slots Per server
T0, T1 T2
Disk2Disk Copy
WanOut 16 LSF Slots Per server
T1 T2
23
CMS Disk Server Tuning CSA06/CSA07

Problem network performance too low
Increase default/maximum tcp window size
Increase tcp ring buffers and tx queue
Ext3 journal changed to datawriteback
Problem Performance still too low
Reduce number of gridftp slots/server
Reduce number of streams per file
Problem Phedex transfers now timeout
Reduce FTS slots to match disk pools
Problem servers sticky or crash with OOM
Limit total tcp buffer space
Protect low memory
Aggressive cache flushing
See http//www.gridpp.ac.uk/wiki/RAL_Tier1_Disk_S
erver_Tuning

24
3Ware Write Throughput
25
CCRC08 Disk server Tuning

Migration rate to tape very bad (5-10MB/s) when
concurrent with writing data to disk
Was OK in CSA06 (50MB/server) Areca servers
3Ware 9550 performance terrible under concurrent
read/write (2MB/s read, 120MB/s write)
3Ware appears to prioritise writes
Tried many tweaks, most with little success
except
Either changing elevator to anticipatory
Downside write throughput reduced
Good under benchmarking - testing in production
this week
Or increasing block device read ahead
Read throughput high but erratic under test
But seems OK in production (30MB/server)
See http//www.gridpp.rl.ac.uk/blog/2008/02/29/3w
are-raid-controllers-and-tape-migration-rates/

26
CCRC (CMS WANIN)
300MB/s
Network In
Network Out
Phedex
Migration queue
CPU
Tier-0 Rate
27
CCRC (WANOUT)
300MB/s
Network In
Network Out
Phedex
Before Replication
CPU
After Replication
28
CASTOR Plans for May CCRC08

Still problems
Optimising end to end transfer performance
remains a balancing act.
Hard to manage complex configuration
Working on
Alice/xrootd deployment
Preparation for 2.1.6 upgrade
Installation of Oracle RACS (resilient Oracle
services for CASTOR)
Provisioning and configuration management

29
dCache Closure

Agreed with UB that we would give 6 months notice
before terminating dCache service
dCache closure announced to UB to be May 2008
ATLAS and LHCB working to migrate their data
Migration slower than hoped
Service much reduced in size now (10-12 servers
remain) and operational overhead much lower
Remaining non-LHC experiments migration delayed
by low priority for non-CCRC work.
Work on Gen instance of CASTOR will recommence
shortly.
Pragmatically closure may be delayed by several
months until Minos and tiny VOs migrated

30
Termination of GRIDPP use of ADS Service

GRIDPP funding and use of old legacy Atlas
Datastore service scheduled to end at end of
March 2008.
No gridpp access by tape command after this
Also no access via C callable VTP interface
RAL will continue to operate ADS service and
experiments are free to purchase capacity
directly from Datastore Team.
Pragmatically closure cannot happen until
dCache ends (uses ADS back end)
CASTOR is available for small VOs
Probably 6 months away

31
Conclusions

Hardware for 2008 MoU in the machine room and
moving satisfactorily through acceptance
Volume not yet a problem but warning signs
starting to appear.
CASTOR situation continues to improve
Reliable during CCRC08
Hardware performance improving. Tape migration
problem reasonably understood and partly solved.
Scope for further improvement
Progressing various upgrades
Remaining Tier-1 infrastructure essentially
problem free.
Availability fair, but stagnating. need to
progress
Incident response staff
On-Call
Disaster Planning and National/Global/Cluster
Resilience
Concerned that we still not seen all experiment
use cases.