NIKHEF Data Processing Fclty - PowerPoint PPT Presentation

About This Presentation
Title:

NIKHEF Data Processing Fclty

Description:

9. NDPF Usage. Analyzed production batch logs since May 2002 ... Added 'Halloween' LHC Data Challenges. Added NCF GFRC. experimental use and tests not shown ... – PowerPoint PPT presentation

Number of Views:42
Avg rating:3.0/5.0
Slides: 16
Provided by: david2676
Category:

less

Transcript and Presenter's Notes

Title: NIKHEF Data Processing Fclty


1
NIKHEF Data Processing Fclty
  • Status Overview per 2004.10.27
  • David Groep, NIKHEF

2
A historical view
  • Started in 2000 with a dedicated farm for DØ
  • 50 Dual P3-800 MHz
  • tower model Dell Precision 220
  • 800 GByte 3ware disk array

jobs
3
Many different farms
  • 2001 EU DataGrid WP6 Application test bed
  • 2002 addition of the development test bed
  • 2003 LCG-1 production facility
  • April 2004 amalgamation of all nodes into LCG-2
  • September 2004 addition of
  • EGEE PPS
  • VL-E P4 CTB
  • EGEE JRA1 LTB

4
Growth of resources
  • Intel Pentium III 800 MHz 100 CPUs 2000
  • Intel Pentium III 933 MHz 40 CPUs 2001
  • AMD Athlon MP2000 2 GHz 132 CPUs 2002
  • Intel XEON 2.8 GHz 54 CPUs 2003
  • Intel XEON 2.8 GHz 20 CPUs 2003
  • Total WN resources (raw) 353 THz
    hr/mo 200 kSI2k
  • Total on-line disk cache 7 TByte

5
Node types
2U pizza boxesPIII 933 MHz, 1GByte RAM, 43
Gbyte disk
1U GFRC (NCF)AMD MP2000, 1GByte RAM, 60 Gbyte
diskthermodynamic challenges
1U HalloweenXEON 2.8 GHz2GByte RAM, 80 Gbyte
diskfirst GigE nodes
6
Connecting things together
  • Collapsed backbone strategy
  • Foundry Networks BigIron 15000
  • 14 GigE SX, 2x GigE LX
  • 16 1000BaseTX
  • 48 100BaseTX
  • Service nodes directly GigE connected
  • Farms connected via local switches
  • WN oversubscription typical 15 17
  • Dynamic re-assignment of nodes to facilities
  • DHCP Relay
  • built-in NAT support (for worker nodes)

7
NIKHEF Farm Network
8
Network Uplinks
  • NIKHEF links
  • 1 Gb/s IPv4 1 Gb/s IPv6 SURFnet
  • 2 Gb/s WTCW (to SARA)
  • SURFnet links

9
NDPF Usage
  • Analyzed production batch logs since May 2002
  • total of 1.94 PHzHours provided in 306 000 jobs

Added Halloween
LHC Data Challenges
Added NCF GFRC
experimental use and tests not shown
10
Usage per Virtual Organisation
Real-time web info www.nikhef.nl/grid/ www.dutchg
rid.nl/Org/Nikhef/farmstats.html
  • Dzero acts as background fill
  • Usage doesnt (yet) reflect shares

11
Usage monitoring
  • Live viewgraphs
  • farm occupancy
  • per-VO distribution
  • network loads
  • Tools
  • Cricket (network)
  • home-grown scripts rrdtool

12
Central services
  • VO-LDAP services LHC VOs
  • DutchGrid CA
  • edg-testbed-stuff
  • Torque Maui distribution
  • installation support components

13
Some of the issues
  • Data access patterns in Grids
  • jobs tend to clutter CWD
  • high load when shared over NFS
  • shared homes required for traditional batch MPI
  • Garbage collection for foreign jobs
  • OpenPBS Torque transient TMPDIR patch
  • Policy management
  • maui fair-share policies
  • CPU capping
  • max-queued-jobs capping

14
Developments work in progress
  • Parallel Virtual File Systems
  • From LCFGng to Quattor (Jeff)
  • Monitoring and disaster recovery (Davide)

15
Team
Write a Comment
User Comments (0)
About PowerShow.com