Title: Grid Computing for High Energy Physics in Japan
1Grid Computing for High Energy Physics in Japan
- Hiroyuki Matsunaga
- International Center for Elementary Particle
Physics (ICEPP), - The University of Tokyo
- International Workshop on e-Science for Physics
2008
2Major High Energy Physics Program in Japan
- KEK-B (Tsukuba)
- Belle
- J-PARC (Tokai)
- Japan Proton Accelerator Research Complex
- Operation will start within this year
- T2K (Tokai to Kamioka)
- long baseline neutrino experiment
- Kamioka
- SuperKamiokande
- KamLAND
- International collaboration
- CERN LHC (ATLAS, ALICE)
- Fermilab Tevatron (CDF)
- BNL RHIC (PHENIX)
3Grid Related Activities
- ICEPP, University of Tokyo
- WLCG Tier2 site for ATLAS
- Regional Center for ATLAS-Japan group
- Hiroshima University
- WLCG Tier2 site for ALICE
- KEK
- Two EGEE production sites
- BELLE experiment, J-PARC, ILC
- University support
- NAREGI
- Grid deployment at universities
- Nagoya U. (Belle), Tsukuba U. (CDF)
- Network
4Grid Deployment at University of Tokyo
- ICEPP, University of Tokyo
- Involved in international HEP experiments since
1974 - Operated pilot system since 2002
- Current computer system started working last year
- TOKYO-LCG2. gLite3 installed
- CC-IN2P3 (Lyon, France) is the associated Tier 1
site within ATLAS computing model - Detector data from CERN go through CC-IN2P3
- Exceptionally far distance for T1-T2
- RTT 280msec, 10 hops
- Challenge for efficient data transfer
- Data catalog for the files in Tokyo located at
Lyon - ASGC (Taiwan) could be additional associated
Tier1 - Geographically nearest Tier 1 (RTT 32msec)
- Operations have been supported by ASGC
- Neighboring timezone
5Hardware resources
- Tier-2 site plus (non-grid) regional center
facility - Support local user analysis by ALTAS Japan group
- Blade servers
- 650 nodes (2600 cores)
- Disk arrays
- 140 Boxes (6TB/box)
- 4Gb Fibre-Channel
- File servers
- Attach 5 disk arrays
- 10 GbE NIC
- Tape robot (LTO3)
- 8000 tapes, 32 drives
Tape robot
Blade servers
Disk arrays
Disk arrays
6SINET3
- SINET3 (Japanese NREN)
- Third generation of SINET, since Apr. 2007
- Provided by NII (National Institute of
Informatics) - Backbone up tp 40Gbps
- Major universities connect
- with 1-10 Gbps
- 10 Gbps to Tokyo RC
- International links
- 2 x 10 Gbps to US
- 2 x 622 Mbps to Asia
7International Link
- 10Gbps between Tokyo and CC-IN2P3
- SINET3 GEANT RENATER (French NREN)
- public network (shared with other traffic)
- 1Gbps link to ASGC (to be upgraded to 2.4 Gbps)
GEANT (10Gbps)
SINET3 (10Gbps)
RENATER (10Gbps)
Lyon
New York
Tokyo
Taipei
8Network test with Iperf
- Memory-to-memory test performed with Iperf
program - Use Linux boxes dedicated for iperf test at both
ends - 1Gbps limited by NIC
- Linux kernel 2.6.9 (BIC TCP)
- Window size 8Mbytes, 8 parallel streams
- For Lyon-Tokyo long recovery time due to long RTT
Taipei lt-gt Tokyo (RTT 32ms)
Lyon lt-gt Tokyo (RTT 280ms)
9Data Transfer from Lyon Tier1 center
- Data transferred from Lyon to Tokyo
- Used Storage Elements in production
- ATLAS MC simulation data
- Storage Elements
- Lyon dCache (gt30 gridFTP servers, Solaris, ZFS)
- Tokyo DPM (6 gridFTP servers, Linux, XFS)
- FTS (File Transfer System)
- Main tool for bulk data transfer
- Execute multiple file transfers (by using
gridFTP) concurrently - Set number of streams for gridFTP
- Used in ATLAS Distributed Data Management system
10Performance of data transfer
Throughput per file transfer
- gt500 Mbytes/s observed in May, 2008
- Filesize 3.5Gbytes
- 20 files in parallel, 10 streams each
- 40Mbytes/s for each file transfer
- Low activity at CC-IN2P3 during the period (other
than ours)
100
10
1
40
0
20
Mbytes/s
500 Mbytes/s
11Data transfer between ASGC and Tokyo
- Transferred 1000 files at a test (1Gbytes
filesize) - Tried various numbers of concurrent files /
streams - From 4/1 to 25/15
- Saturate 1Gbps WAN bandwidth
25/10
25/15
25/10
20/10
8/2
16/1
8/1
4/2
4/4
Tokyo -gt ASGC
4/1
20/10
16/1
4/2
25/10
4/4
ASGC -gt Tokyo
12CPU Usage in the last year (Sep 2007 Aug 2008)
- 3,253,321 CPU time (kSI2khours) in last year
- Most jobs are ATLAS MC simulation
- Job submission is coordinated by CC-IN2P3 (the
associated Tier1) - Outputs are uploaded to the data storage at
CC-IN2P3 - Large contribution to the ATLAS MC production
TOKYO-LCG2 CPU time per month
CPU time at Large Tier2 sites
13ALICE Tier2 center at Hiroshima University
- WLCG/EGEE site
- JP-HIROSHIMA-WLCG
- Possible Tier 2 site for ALICE
14Status at Hiroshima
- Just became EGEE production site
- Aug. 2008
- Associated Tier1 site will likely be CC-IN2P3
- No ALICE Tier1 in Asia-Pacific region
- Resources
- 568 CPU cores
- Dual-Core Xeon(3GHz) X 2cpus X 38boxes
- Quad-Core Xeon(2.6GHz) X 2cpus X 32boxes
- Quad-Core Xeon(3GHz) X 2cpus X 20blades
- Storage 200 TB next year
- Network 1Gbps
- On SINET3
15KEK
- Belle experiment has been running
- Need to have access to existing peta-bytes of data
- Site operations
- KEK does not support any LHC experiment
- Try to gain experience by operating sites in
order to prepare for future Tier1 level Grid
center - University support
- NAREGI
KEK Tsukuba campus
Mt. Tsukuba
Belle exp.
KEKB
Linac
16Grid Deployment at KEK
- Two EGEE sites
- JP-KEK-CRC-1
- Rather experimental use and RD
- JP-KEK-CRC-2
- More stable services
- NAREGI
- Used beta version for testing and evaluation
- Supported VOs
- belle (main target at present), ilc, calice,
- Not support LCG VOs
- VOMS operation
- belle (registered in CIC)
- ppj (accelerator science in Japan), naokek
- g4med, apdg, atlasj, ail
17Belle VO
- Federation established
- 5 countries, 7 institutes, 10 sites
- Nagoya Univ., Univ. of Melbourne, ASGC, NCU,
CYFRONET, Korea Univ., KEK - VOMS is provided by KEK
- Activities
- Submit MC production jobs
- Functional and performance tests
- Interface to existing peta-bytes of data
18Takashi Sasaki (KEK)
19ppj VO
- Federated among major universities and KEK
- Tohoku U. (ILC, KamLAND)
- U. Tsukuba (CDF)
- Nagoya U. (Belle, ATLAS)
- Kobe U. (ILC, ATLAS)
- Hiroshima IT (ATLAS, Computing Science)
- Common VO for accelerator science in Japan
- NOT depend on specific projects, but resources
shared - KEK acts as GOC
- Remote installation
- Monitoring
- Based on Nagios and Wiki
- Software update
20KEK Grid CA
- Started since Jan. 2006
- Accredited as an IGTF (International Grid Trust
Federation) compliant CA
Numbers of Issued certificates
21NAREGI
- NAREGI NAtional REsearch Grid Initiative
- Host institute National Institute of Infrmatics
(NII) - RD of the Grid middleware for research and
industrial applications - Main targets are nanotechnology and biotechnology
- More focused on computing grid
- Data grid part integrated later
- Ver. 1.0 of middleware released in May, 2008
- Software maintenance and user support services
will be continued
22NAREGI at KEK
- NAREGI-b version installed on the testbed
- 1.0.1 Jun. 2006 Nov. 2006
- Manual installation for all the steps
- 1.0.2 Feb 2007
- 2.0.0 Oct. 2007
- apt-rpm installation
- 2.0.1 Dec. 2007
- Site federation test
- KEK-NAREGI/NII Oct. 2007
- KEK-National Astronomy Observatory (NAO) Mar.
2008 - Evaluation of application environment of NAREGI
- job submission/retrieval, remote data stage-in/out
23Takashi Sasaki (KEK)
24Data Storage Gfarm
- Gfarm distributed file system
- DataGrid part in NAREGI
- Data are stored in multiple disk servers
- Tests performed
- Stage-in and stage-out to the Gfarm storage
- GridFTP interface
- Between gLite site and NAREGI site
- File access from application
- Have access with FUSE (Filesystem in userspace)
- Without the need of changing application program
- IO speed is several times slower than local disk
25Future Plan on NAREGI at KEK
- Migration to the production version
- Test of interoperability with gLite
- Improve the middleware in the application domain
- Development of the new API to the application
- Virtualization of the middleware
- for script languages (to be used at web portal as
well) - Monitoring
- Jobs, sites,
26Summary
- WLCG
- ATLAS Tier2 at Tokyo
- Stable operation
- ALICE Tier2 at Hiroshima
- Just started operation in production
- Coordinated effort lead by KEK
- Site operations with gLite and NAREGI middlewares
- Belle VO SRB
- Will be replaced with iRODs
- ppj VO deployment at universities
- Supported and monitored by KEK
- NAREGI
- RD, interoperability