The Computing System for the Belle Experiment - PowerPoint PPT Presentation

1 / 17
About This Presentation
Title:

The Computing System for the Belle Experiment

Description:

Processing speed(in case of MC) with 1GHz one CPU. Reconstruction: 3.4sec ... MC samples with 3 time larger than beam data has been produced so far. ... – PowerPoint PPT presentation

Number of Views:18
Avg rating:3.0/5.0
Slides: 18
Provided by: chep0
Learn more at: https://chep03.ucsd.edu
Category:

less

Transcript and Presenter's Notes

Title: The Computing System for the Belle Experiment


1
The Computing System for the Belle Experiment
  • Ichiro Adachi
  • KEK
  • representing the Belle DST/MC production group
  • CHEP03, La Jolla, California, USA
  • March 24, 2003
  • Introduction Belle
  • Belle software tools
  • Belle computing system PC farm
  • DST/MC production
  • Summary

2
Introduction
  • Belle experiment
  • B-factory experiment at KEK
  • study CP violation in B meson system. start from
    1999
  • recorded 120M B meson pairs so far
  • KEKB accelerator is still improving its
    performance

120fb-1
The largest B meson data sample at ?(4s) region
in the world
3
Belle detector
example of event reconstruction
fully reconstructed event
4
Belle software tools
Event flow
  • Home-made kits
  • B.A.S.F. for framework
  • Belle AnalySis Framework
  • unique framework for any step of event processing
  • event-by-event parallel processing on SMP
  • Panther for I/O package
  • unique data format from DAQ to user analysis
  • bank system with zlib compression
  • reconstruction simulation library
  • written in C
  • Other utilities
  • CERNLIB/CLHEP
  • Postgres for database

Input with panther
shared object

unpacking calibration


module
tracking vertexing
loaded dynamically
B.A.S.F.
clustering
particle ID
diagnosis
Output with panther
5
Belle computing system
6
Computing requirements
Reprocess entire beam data in 3 months
Once reconstruction codes are updated or
constants are improved, fast turn-around is
essential to perform physics analyses in a timely
manner
MC size is 3 times larger than real data at least
Analyses are getting matured and understanding
systematic effect in detail needs large MC sample
enough to do this
Added more PC farms and disks
7
PC farm upgrade
Total CPU CPU processor speed(GHz) ? of CPUs
? of nodes
Total CPU(GHz)
1500GHz
Total CPU has become 3 times bigger in recent two
years 60TB(total) disks have been also purchased
for storage
8
Belle PC farm CPUs
  • heterogeneous system from various vendors
  • CPU processors(Intel Xeon/PenIII/Pen4/Athlon)

Dell 36PCs (Pentinum-III 0.5GHz)
NEC 84PCs (Pentium4 2.8GHz)
will come soon
168GHz
Compaq 60PCs (Intel Xeon 0.7GHz)
470GHz
Appro 113PCs (Athlon 2000)
320GHz
380GHz
setting up done
Fujitsu 127PCs (Pentium-III 1.26GHz)
9
DST production skimming scheme
1. Production(reproduction)
raw data
data transfer
Sun
DST data
PC farm
disk
histograms log files
2. Skimming
disk or HSM
user analysis
skims such as hadronic data sample
Sun
histograms log files
DST data
disk
10
Output skims
  • Physics skims from reprocessing
  • Mini-DST(4-vectors) format
  • Create hadronic sample as well as typical physics
    channels(up to 20 skims)
  • many users do not have to go through whole
    hadronic sample.
  • Write data onto disk at Nagoya(350Km away from
    KEK) directly using NFS(thanks to super-sinet
    link of 1Gbps)

mini-DST
reprocessing output
1Gbps
Nagoya 350Km from KEK
KEK site
11
Processing power failure rate
  • Processing power
  • Processing 1fb-1 per day with 180GHz
  • Allocate 40 PC hosts(0.7GHzx4CPU) for daily
    production to catch up with DAQ
  • 2.5fb-1 per day possible
  • Processing speed(in case of MC) with 1GHz one CPU
  • Reconstruction 3.4sec
  • Geant simulation 2.3sec
  • Failure rate

for one B meson pair
module crash lt 0.01
tape I/O error 1
process communication error 3
network trouble/system error negligible
12
Reprocessing 2001 2002
  • Reprocessing
  • major library constants update in April
  • sometimes we have to wait for constants
  • Final bit of beam data taken before summer
    shutdown always reprocessed in time

3months
For 2002 summer 78fb-1
2.5months
For 2001 summer 30fb-1
13
MC production
  • Produce 2.5fb-1 per day with 400GHz PenIII
  • Resources at remote sites also used
  • Size 1520GB for 1 M events.
  • 4-vector only
  • Run dependent

min. set of generic MC
Run xxx
B0 MC data
Run xxx
beam data file
mini-DST
BB- MC data
run-dependent background IP
profile
charm MC data
light quark MC
14
MC production 2002
  • Keep producing MC generic samples
  • PC farm shared with DST
  • Switch from DST to MC production can be made
    easily
  • Reached 1100M events in March 2003. 3 times
    larger samples of 78fb-1 completed

minor change
major update
15
MC production at remote sites
GHz
CPU resource available
  • Total CPU resources at remote sites is similar to
    KEK
  • 44 of MC samples has been produced at remote
    sites
  • All data transferred to KEK via network
  • 68TB in 6 months

300GHz
MC events produced
44 at remote sites
16
Future prospects
  • Short term
  • Softwarestandardize utilities
  • Purchase more CPUs and/or disks if budget
    permits
  • Efficient use of resources at remote sites
  • Centralized at KEK ? distributed over Belle-wide
  • Grid computing technology just started survey
    application
  • Date file management
  • CPU usage
  • SuperKEKB project
  • Aim 1035(or more) cm-2s-1 luminosity from 2006
  • Phys.rate 100Hz for B-meson pair
  • 1PB/year expected
  • New computing system like LHC experiment can be a
    candidate

17
Summary
  • The Belle computing system has been working fine.
    More than 250fb-1 of real beam data has been
    successfully (re)processed.
  • MC samples with 3 time larger than beam data has
    been produced so far.
  • Will add more CPU in near future for quick
    turn-around as we accumulate more data.
  • Grid computing technology would be a good friend
    of ours. Start considering its application in our
    system.
  • For SuperKEKB, we need much more resources. May
    have rather big impact in our system.

18
backup
19
dbasf data flow
  • Sun as a tape servers for I/O
  • Input/output deamon
  • stdout/histo deamon
  • I/O speed of 25MB/s
  • Linux cluster
  • RedHad 6/7
  • 15 PCs of 4CPU of 0.7GHz Intel Xeon
  • communication by network shared memory(NSM)
  • 200pb-1 per day for 1 cluster
  • Processing limited by CPU
  • Possible to add 30PCs
  • Need optimization

Tape server

outputd
inputd
Linux cluster
Master PC
basf
basf
basf
NSM network
20
CPU disk servers
  • Sun CPU
  • 9 servers(0.5GHz4CPU)
  • 38 computing servers(ibid.)
  • operated under LSF batch system
  • tape drives(2 each for 20hosts)
  • Linux CPU
  • 60 computing servers(Intel Xeon, 0.7GHz4CPU)
  • central CPU engines for DST/MC productions
  • Disk servers storage
  • Tape library
  • DTF2 tape(200GB), 24MB/s IO
  • 500TB total
  • 40 tape drives
  • 8TB NFS file servers
  • 120TB HSM servers
  • 4TB staging disk

21
Data size
Raw data 35KB/event
DST data 58KB/event
mini-DST data 12KB/event
for hadronic event
Total raw data 120TB for 120fb-1
1000 DTF2(200GB) tapes
Write a Comment
User Comments (0)
About PowerShow.com