Grid Computing for High Energy Physics in Japan - PowerPoint PPT Presentation

1 / 26
About This Presentation
Title:

Grid Computing for High Energy Physics in Japan

Description:

Japan Proton Accelerator Research Complex. Operation will start within this year ... Common VO for accelerator science in Japan ... – PowerPoint PPT presentation

Number of Views:65
Avg rating:3.0/5.0
Slides: 27
Provided by: hepKi
Category:

less

Transcript and Presenter's Notes

Title: Grid Computing for High Energy Physics in Japan


1
Grid Computing for High Energy Physics in Japan
  • Hiroyuki Matsunaga
  • International Center for Elementary Particle
    Physics (ICEPP),
  • The University of Tokyo
  • International Workshop on e-Science for Physics
    2008

2
Major High Energy Physics Program in Japan
  • KEK-B (Tsukuba)
  • Belle
  • J-PARC (Tokai)
  • Japan Proton Accelerator Research Complex
  • Operation will start within this year
  • T2K (Tokai to Kamioka)
  • long baseline neutrino experiment
  • Kamioka
  • SuperKamiokande
  • KamLAND
  • International collaboration
  • CERN LHC (ATLAS, ALICE)
  • Fermilab Tevatron (CDF)
  • BNL RHIC (PHENIX)

3
Grid Related Activities
  • ICEPP, University of Tokyo
  • WLCG Tier2 site for ATLAS
  • Regional Center for ATLAS-Japan group
  • Hiroshima University
  • WLCG Tier2 site for ALICE
  • KEK
  • Two EGEE production sites
  • BELLE experiment, J-PARC, ILC
  • University support
  • NAREGI
  • Grid deployment at universities
  • Nagoya U. (Belle), Tsukuba U. (CDF)
  • Network

4
Grid Deployment at University of Tokyo
  • ICEPP, University of Tokyo
  • Involved in international HEP experiments since
    1974
  • Operated pilot system since 2002
  • Current computer system started working last year
  • TOKYO-LCG2. gLite3 installed
  • CC-IN2P3 (Lyon, France) is the associated Tier 1
    site within ATLAS computing model
  • Detector data from CERN go through CC-IN2P3
  • Exceptionally far distance for T1-T2
  • RTT 280msec, 10 hops
  • Challenge for efficient data transfer
  • Data catalog for the files in Tokyo located at
    Lyon
  • ASGC (Taiwan) could be additional associated
    Tier1
  • Geographically nearest Tier 1 (RTT 32msec)
  • Operations have been supported by ASGC
  • Neighboring timezone

5
Hardware resources
  • Tier-2 site plus (non-grid) regional center
    facility
  • Support local user analysis by ALTAS Japan group
  • Blade servers
  • 650 nodes (2600 cores)
  • Disk arrays
  • 140 Boxes (6TB/box)
  • 4Gb Fibre-Channel
  • File servers
  • Attach 5 disk arrays
  • 10 GbE NIC
  • Tape robot (LTO3)
  • 8000 tapes, 32 drives

Tape robot
Blade servers
Disk arrays
Disk arrays
6
SINET3
  • SINET3 (Japanese NREN)
  • Third generation of SINET, since Apr. 2007
  • Provided by NII (National Institute of
    Informatics)
  • Backbone up tp 40Gbps
  • Major universities connect
  • with 1-10 Gbps
  • 10 Gbps to Tokyo RC
  • International links
  • 2 x 10 Gbps to US
  • 2 x 622 Mbps to Asia

7
International Link
  • 10Gbps between Tokyo and CC-IN2P3
  • SINET3 GEANT RENATER (French NREN)
  • public network (shared with other traffic)
  • 1Gbps link to ASGC (to be upgraded to 2.4 Gbps)

GEANT (10Gbps)
SINET3 (10Gbps)
RENATER (10Gbps)
Lyon
New York
Tokyo
Taipei
8
Network test with Iperf
  • Memory-to-memory test performed with Iperf
    program
  • Use Linux boxes dedicated for iperf test at both
    ends
  • 1Gbps limited by NIC
  • Linux kernel 2.6.9 (BIC TCP)
  • Window size 8Mbytes, 8 parallel streams
  • For Lyon-Tokyo long recovery time due to long RTT

Taipei lt-gt Tokyo (RTT 32ms)
Lyon lt-gt Tokyo (RTT 280ms)
9
Data Transfer from Lyon Tier1 center
  • Data transferred from Lyon to Tokyo
  • Used Storage Elements in production
  • ATLAS MC simulation data
  • Storage Elements
  • Lyon dCache (gt30 gridFTP servers, Solaris, ZFS)
  • Tokyo DPM (6 gridFTP servers, Linux, XFS)
  • FTS (File Transfer System)
  • Main tool for bulk data transfer
  • Execute multiple file transfers (by using
    gridFTP) concurrently
  • Set number of streams for gridFTP
  • Used in ATLAS Distributed Data Management system

10
Performance of data transfer
Throughput per file transfer
  • gt500 Mbytes/s observed in May, 2008
  • Filesize 3.5Gbytes
  • 20 files in parallel, 10 streams each
  • 40Mbytes/s for each file transfer
  • Low activity at CC-IN2P3 during the period (other
    than ours)

100
10
1
40
0
20
Mbytes/s
500 Mbytes/s
11
Data transfer between ASGC and Tokyo
  • Transferred 1000 files at a test (1Gbytes
    filesize)
  • Tried various numbers of concurrent files /
    streams
  • From 4/1 to 25/15
  • Saturate 1Gbps WAN bandwidth

25/10
25/15
25/10
20/10
8/2
16/1
8/1
4/2
4/4
Tokyo -gt ASGC
4/1
20/10
16/1
4/2
25/10
4/4
ASGC -gt Tokyo
12
CPU Usage in the last year (Sep 2007 Aug 2008)
  • 3,253,321 CPU time (kSI2khours) in last year
  • Most jobs are ATLAS MC simulation
  • Job submission is coordinated by CC-IN2P3 (the
    associated Tier1)
  • Outputs are uploaded to the data storage at
    CC-IN2P3
  • Large contribution to the ATLAS MC production

TOKYO-LCG2 CPU time per month
CPU time at Large Tier2 sites
13
ALICE Tier2 center at Hiroshima University
  • WLCG/EGEE site
  • JP-HIROSHIMA-WLCG
  • Possible Tier 2 site for ALICE

14
Status at Hiroshima
  • Just became EGEE production site
  • Aug. 2008
  • Associated Tier1 site will likely be CC-IN2P3
  • No ALICE Tier1 in Asia-Pacific region
  • Resources
  • 568 CPU cores
  • Dual-Core Xeon(3GHz) X 2cpus X 38boxes
  • Quad-Core Xeon(2.6GHz) X 2cpus X 32boxes
  • Quad-Core Xeon(3GHz) X 2cpus X 20blades
  • Storage 200 TB next year
  • Network 1Gbps
  • On SINET3

15
KEK
  • Belle experiment has been running
  • Need to have access to existing peta-bytes of data
  • Site operations
  • KEK does not support any LHC experiment
  • Try to gain experience by operating sites in
    order to prepare for future Tier1 level Grid
    center
  • University support
  • NAREGI

KEK Tsukuba campus
Mt. Tsukuba
Belle exp.
KEKB
Linac
16
Grid Deployment at KEK
  • Two EGEE sites
  • JP-KEK-CRC-1
  • Rather experimental use and RD
  • JP-KEK-CRC-2
  • More stable services
  • NAREGI
  • Used beta version for testing and evaluation
  • Supported VOs
  • belle (main target at present), ilc, calice,
  • Not support LCG VOs
  • VOMS operation
  • belle (registered in CIC)
  • ppj (accelerator science in Japan), naokek
  • g4med, apdg, atlasj, ail

17
Belle VO
  • Federation established
  • 5 countries, 7 institutes, 10 sites
  • Nagoya Univ., Univ. of Melbourne, ASGC, NCU,
    CYFRONET, Korea Univ., KEK
  • VOMS is provided by KEK
  • Activities
  • Submit MC production jobs
  • Functional and performance tests
  • Interface to existing peta-bytes of data

18
Takashi Sasaki (KEK)
19
ppj VO
  • Federated among major universities and KEK
  • Tohoku U. (ILC, KamLAND)
  • U. Tsukuba (CDF)
  • Nagoya U. (Belle, ATLAS)
  • Kobe U. (ILC, ATLAS)
  • Hiroshima IT (ATLAS, Computing Science)
  • Common VO for accelerator science in Japan
  • NOT depend on specific projects, but resources
    shared
  • KEK acts as GOC
  • Remote installation
  • Monitoring
  • Based on Nagios and Wiki
  • Software update

20
KEK Grid CA
  • Started since Jan. 2006
  • Accredited as an IGTF (International Grid Trust
    Federation) compliant CA

Numbers of Issued certificates
21
NAREGI
  • NAREGI NAtional REsearch Grid Initiative
  • Host institute National Institute of Infrmatics
    (NII)
  • RD of the Grid middleware for research and
    industrial applications
  • Main targets are nanotechnology and biotechnology
  • More focused on computing grid
  • Data grid part integrated later
  • Ver. 1.0 of middleware released in May, 2008
  • Software maintenance and user support services
    will be continued

22
NAREGI at KEK
  • NAREGI-b version installed on the testbed
  • 1.0.1 Jun. 2006 Nov. 2006
  • Manual installation for all the steps
  • 1.0.2 Feb 2007
  • 2.0.0 Oct. 2007
  • apt-rpm installation
  • 2.0.1 Dec. 2007
  • Site federation test
  • KEK-NAREGI/NII Oct. 2007
  • KEK-National Astronomy Observatory (NAO) Mar.
    2008
  • Evaluation of application environment of NAREGI
  • job submission/retrieval, remote data stage-in/out

23
Takashi Sasaki (KEK)
24
Data Storage Gfarm
  • Gfarm distributed file system
  • DataGrid part in NAREGI
  • Data are stored in multiple disk servers
  • Tests performed
  • Stage-in and stage-out to the Gfarm storage
  • GridFTP interface
  • Between gLite site and NAREGI site
  • File access from application
  • Have access with FUSE (Filesystem in userspace)
  • Without the need of changing application program
  • IO speed is several times slower than local disk

25
Future Plan on NAREGI at KEK
  • Migration to the production version
  • Test of interoperability with gLite
  • Improve the middleware in the application domain
  • Development of the new API to the application
  • Virtualization of the middleware
  • for script languages (to be used at web portal as
    well)
  • Monitoring
  • Jobs, sites,

26
Summary
  • WLCG
  • ATLAS Tier2 at Tokyo
  • Stable operation
  • ALICE Tier2 at Hiroshima
  • Just started operation in production
  • Coordinated effort lead by KEK
  • Site operations with gLite and NAREGI middlewares
  • Belle VO SRB
  • Will be replaced with iRODs
  • ppj VO deployment at universities
  • Supported and monitored by KEK
  • NAREGI
  • RD, interoperability
Write a Comment
User Comments (0)
About PowerShow.com