The Computing System for the Belle Experiment - PowerPoint PPT Presentation

1 / 17

About This Presentation

Title:

The Computing System for the Belle Experiment

Description:

Processing speed(in case of MC) with 1GHz one CPU. Reconstruction: 3.4sec ... MC samples with 3 time larger than beam data has been produced so far. ... – PowerPoint PPT presentation

Number of Views:18

Avg rating:3.0/5.0

Slides: 18

Provided by: chep0

Learn more at: https://chep03.ucsd.edu

Category:

more less

Transcript and Presenter's Notes

Title: The Computing System for the Belle Experiment

1
The Computing System for the Belle Experiment

Ichiro Adachi
KEK
representing the Belle DST/MC production group
CHEP03, La Jolla, California, USA
March 24, 2003

Introduction Belle
Belle software tools
Belle computing system PC farm
DST/MC production
Summary

2
Introduction

Belle experiment
B-factory experiment at KEK
study CP violation in B meson system. start from
1999
recorded 120M B meson pairs so far
KEKB accelerator is still improving its
performance

120fb-1
The largest B meson data sample at ?(4s) region
in the world
3
Belle detector
example of event reconstruction
fully reconstructed event
4
Belle software tools
Event flow

Home-made kits
B.A.S.F. for framework
Belle AnalySis Framework
unique framework for any step of event processing
event-by-event parallel processing on SMP
Panther for I/O package
unique data format from DAQ to user analysis
bank system with zlib compression
reconstruction simulation library
written in C
Other utilities
CERNLIB/CLHEP
Postgres for database

Input with panther
shared object

unpacking calibration

module
tracking vertexing
loaded dynamically
B.A.S.F.
clustering
particle ID
diagnosis
Output with panther
5
Belle computing system
6
Computing requirements
Reprocess entire beam data in 3 months
Once reconstruction codes are updated or
constants are improved, fast turn-around is
essential to perform physics analyses in a timely
manner
MC size is 3 times larger than real data at least
Analyses are getting matured and understanding
systematic effect in detail needs large MC sample
enough to do this
Added more PC farms and disks
7
PC farm upgrade
Total CPU CPU processor speed(GHz) ? of CPUs
? of nodes
Total CPU(GHz)
1500GHz
Total CPU has become 3 times bigger in recent two
years 60TB(total) disks have been also purchased
for storage
8
Belle PC farm CPUs

heterogeneous system from various vendors
CPU processors(Intel Xeon/PenIII/Pen4/Athlon)

Dell 36PCs (Pentinum-III 0.5GHz)
NEC 84PCs (Pentium4 2.8GHz)
will come soon
168GHz
Compaq 60PCs (Intel Xeon 0.7GHz)
470GHz
Appro 113PCs (Athlon 2000)
320GHz
380GHz
setting up done
Fujitsu 127PCs (Pentium-III 1.26GHz)
9
DST production skimming scheme
1. Production(reproduction)
raw data
data transfer
Sun
DST data
PC farm
disk
histograms log files
2. Skimming
disk or HSM
user analysis
skims such as hadronic data sample
Sun
histograms log files
DST data
disk
10
Output skims

Physics skims from reprocessing
Mini-DST(4-vectors) format
Create hadronic sample as well as typical physics
channels(up to 20 skims)
many users do not have to go through whole
hadronic sample.
Write data onto disk at Nagoya(350Km away from
KEK) directly using NFS(thanks to super-sinet
link of 1Gbps)

mini-DST
reprocessing output
1Gbps
Nagoya 350Km from KEK
KEK site
11
Processing power failure rate

Processing power
Processing 1fb-1 per day with 180GHz
Allocate 40 PC hosts(0.7GHzx4CPU) for daily
production to catch up with DAQ
2.5fb-1 per day possible
Processing speed(in case of MC) with 1GHz one CPU
Reconstruction 3.4sec
Geant simulation 2.3sec
Failure rate

for one B meson pair
module crash lt 0.01
tape I/O error 1
process communication error 3
network trouble/system error negligible
12
Reprocessing 2001 2002

Reprocessing
major library constants update in April
sometimes we have to wait for constants
Final bit of beam data taken before summer
shutdown always reprocessed in time

3months
For 2002 summer 78fb-1
2.5months
For 2001 summer 30fb-1
13
MC production

Produce 2.5fb-1 per day with 400GHz PenIII
Resources at remote sites also used
Size 1520GB for 1 M events.
4-vector only
Run dependent

min. set of generic MC
Run xxx
B0 MC data
Run xxx
beam data file
mini-DST
BB- MC data
run-dependent background IP
profile
charm MC data
light quark MC
14
MC production 2002

Keep producing MC generic samples
PC farm shared with DST
Switch from DST to MC production can be made
easily
Reached 1100M events in March 2003. 3 times
larger samples of 78fb-1 completed

minor change
major update
15
MC production at remote sites
GHz
CPU resource available

Total CPU resources at remote sites is similar to
KEK
44 of MC samples has been produced at remote
sites
All data transferred to KEK via network
68TB in 6 months

300GHz
MC events produced
44 at remote sites
16
Future prospects

Short term
Softwarestandardize utilities
Purchase more CPUs and/or disks if budget
permits
Efficient use of resources at remote sites
Centralized at KEK ? distributed over Belle-wide
Grid computing technology just started survey
application
Date file management
CPU usage
SuperKEKB project
Aim 1035(or more) cm-2s-1 luminosity from 2006
Phys.rate 100Hz for B-meson pair
1PB/year expected
New computing system like LHC experiment can be a
candidate

17
Summary

The Belle computing system has been working fine.
More than 250fb-1 of real beam data has been
successfully (re)processed.
MC samples with 3 time larger than beam data has
been produced so far.
Will add more CPU in near future for quick
turn-around as we accumulate more data.
Grid computing technology would be a good friend
of ours. Start considering its application in our
system.
For SuperKEKB, we need much more resources. May
have rather big impact in our system.

18
backup
19
dbasf data flow

Sun as a tape servers for I/O
Input/output deamon
stdout/histo deamon
I/O speed of 25MB/s
Linux cluster
RedHad 6/7
15 PCs of 4CPU of 0.7GHz Intel Xeon
communication by network shared memory(NSM)
200pb-1 per day for 1 cluster
Processing limited by CPU
Possible to add 30PCs
Need optimization

Tape server

outputd
inputd
Linux cluster
Master PC
basf
basf
basf
NSM network
20
CPU disk servers

Sun CPU
9 servers(0.5GHz4CPU)
38 computing servers(ibid.)
operated under LSF batch system
tape drives(2 each for 20hosts)
Linux CPU
60 computing servers(Intel Xeon, 0.7GHz4CPU)
central CPU engines for DST/MC productions
Disk servers storage
Tape library
DTF2 tape(200GB), 24MB/s IO
500TB total
40 tape drives
8TB NFS file servers
120TB HSM servers
4TB staging disk

21
Data size
Raw data 35KB/event
DST data 58KB/event
mini-DST data 12KB/event
for hadronic event
Total raw data 120TB for 120fb-1
1000 DTF2(200GB) tapes

Write a Comment

User Comments (0)