BaBar Clusters - PowerPoint PPT Presentation

1 / 22
About This Presentation
Title:

BaBar Clusters

Description:

Initially AIX, HP and OSF. Added Solaris & Linux. Dropped HP ... Distributed users. Cautious awareness and participation. Be involved but not be seduced. ... – PowerPoint PPT presentation

Number of Views:20
Avg rating:3.0/5.0
Slides: 23
Provided by: You6179
Category:

less

Transcript and Presenter's Notes

Title: BaBar Clusters


1
BaBar Clusters
  • Charles Young, SLAC

2
Supported Platforms
  • Initially AIX, HP and OSF.
  • Added Solaris Linux. Dropped HP AIX.
  • Now
  • Linux rapid growth.
  • OSF a little bit.
  • Solaris only one for online.
  • Future?
  • Good to have more than one vendor.
  • Not too many.

3
BaBar Clusters
  • Online.
  • Offline.
  • Multiple reconstruction farms.
  • Prompt and Non-Prompt (or reprocessing).
  • Physics skims.
  • Monte Carlo production.
  • At SLAC and other sites, e.g. ½ in U.K.
  • Analysis.
  • At SLAC and other sites, e.g. CC-IN2P3.

4
Reconstruction Cluster
  • Prompt Reconstruction in almost real time.
  • Feedback to data taking.
  • Calibration constants.
  • Output for downstream physics analyses.
  • Good uptime and reliability.
  • Controlled environment, organized activity.
  • Equipment manageability and density issues.
  • Initially (and still) all Solaris.

5
Reconstruction Cluster Configuration
  • Up to 200 farm CPUs.
  • Solaris only, due to online related code.
  • Sun T1 1 RU, 1 CPU, 440 MHz, 256 MB.
  • Several (Objectivity) file servers.
  • All Solaris.
  • Typically Sun 420R 4 CPUs.
  • Typically 2 x T3 500-GB disk arrays.
  • Gigabit Ethernet.

6
Reconstruction Cluster Experience
  • Hardware generally reliable.
  • Customized job control instead of batch.
  • More focused on specific needs.
  • Increased maintenance.
  • Scaling issues.
  • Tightly coupled system.
  • Serialization or choke points.
  • Reliability concerns when scaling up.
  • (Weak) connection to acceleration schedule.

7
Reconstruction Cluster Plans
  • Aggressively pursuing Linux farm nodes.
  • VA Linux 1220 1 RU, 2 CPUs, 866 MHz, 1 GB.
  • Higher density, faster CPUs and lower cost.
  • Reduced networking per unit CPU power.
  • Fewer CPUs. One link for 2 CPUs.
  • Increase file server capacity.
  • Linux compatibility issues.
  • Some online related code.

8
Monte Carlo Cluster
  • MC production.
  • Physics simulation.
  • Mix in real beam background.
  • Reconstruct with standard code.
  • Capacity related to volume of beam data.
  • Controlled environment, organized activity.
  • Very similar to Reconstruction.

9
Monte Carlo Cluster Configuration
  • 100 farm CPUs. Mix of Solaris and Linux.
  • Sun T1 1 RU, 1 CPU, 440 MHz, 256 MB.
  • VA Linux 1220 1 RU, 2 CPUs, 866 MHz, 1 GB.
  • Several (Objectivity) file servers.
  • All Solaris.
  • Typically Sun E4500 4 CPUs.
  • Typically 1 x A3500 500-GB disk array.
  • Gigabit Ethernet.

10
Monte Carlo Cluster Experience
  • Hardware generally reliable.
  • Batch system (LSF) generally adequate.
  • Scaling issues.
  • Little schedule interaction with accelerator.
  • Want to be up 24 x 365 ?.
  • No major platform issues.
  • Code runs on all BaBar platforms.
  • External code, e.g. Objectivity, Geant4.

11
Monte Carlo Cluster Plans
  • Capacity to scale with integrated luminosity.
  • Many production sites.
  • Increase compute power.
  • More and faster (Linux) nodes.
  • Increase file server capacity.
  • More flexibility in solving scaling issues.
  • Very few choke points.
  • Embarrassingly parallel.

12
Analysis Cluster
  • Data stored (primarily) in Objectivity format.
  • Disk cache with HPSS back end.
  • Different levels of detail.
  • Tag, Micro, Mini, Reco, Raw, etc.
  • Varied access pattern.
  • High rate and many parallel jobs to Micro.
  • Lower rate and fewer jobs to Raw.
  • Uncontrolled environment and activities.
  • Activities coupled to conferences.

13
Analysis Cluster Configuration
  • 200 farm CPUs.
  • Sun T1 1 RU, 1 CPU, 440 MHz, 256 MB.
  • VA Linux 1220 1 RU, 2 CPUs, 866 MHz, 1 GB.
  • Several (Objectivity) file servers.
  • All Solaris.
  • Typically Sun E4500 4 CPUs.
  • Typically 2 x T3 500-GB disk arrays.
  • Gigabit Ethernet.

14
Analysis Cluster Experience
  • Hardware generally reliable.
  • No major platform issues.
  • Code runs on all BaBar platforms.
  • External code, e.g. Objectivity, Geant4.
  • Batch system (LSF) generally adequate.
  • Minor schedule interaction with accelerator.
  • Major schedule interaction with conferences.

15
Analysis Cluster Plans
  • Increase computing power.
  • More and faster (Linux) nodes.
  • Increase file server capacity.
  • Scales with integrated luminosity.
  • Flexibility in solving scaling issues.
  • Reduce choke points.
  • Embarrassingly parallel.

16
Offline Clusters Equipment
  • Similar servers.
  • 40 4-CPU Sun machines 2 x 500 GB disks.
  • Gigabit Ethernet.
  • Two kinds of farm nodes.
  • 900 Sun T1 1 RU, 1 CPU, 440 MHz, 256 MB.
  • 60 VA Linux 1220 1 RU, 2 CPUs, 866 MHz, 1 GB.
  • Straight forward to reassign servers clients.
  • Especially batch nodes.

17
Batch System (LSF) Issues
  • Licensing costs.
  • Significant compared with H/W.
  • Scaling concerns.
  • Many jobs.
  • Complex priority algorithm.
  • Failed to dispatch fast enough.
  • Idle nodes ?.

18
Offline Clusters Schedule
  • Few outage opportunities.
  • Prompt Reconstruction tied to accelerator.
  • Peak analysis load before conferences.
  • Vendor schedule.
  • H/W and S/W availability.
  • BaBar code readiness.
  • Extensive testing required.

19
Offline Cluster Plans
  • Cost concerns.
  • Choice among multiple platforms/vendors.
  • Schedule concerns.
  • Must support multiple OS levels.
  • Essential to have capacity headroom.
  • Scalability concerns.
  • Must reduce choke points.
  • Should be embarrassingly parallel.

20
BaBar Computing Centers
  • Divided into Tier A, B and C.
  • Union of all Tier As gt all data.
  • Two Tier-A centers currently.
  • SLAC.
  • CC-IN2P3 fully operational by end of 01.
  • Potentially more in the future.
  • (Almost) all usual activities at all Tier-A
    sites.
  • Simulation, reprocessing, analysis.
  • Prompt reconstruction only at SLAC.

21
The G Word
  • Nirvana.
  • Automatic discovery of resources.
  • Automatic job configuration and execution.
  • Results at desktop instantly.
  • Foreseeable future.
  • Assisted discovery of resources.
  • Assisted job configuration and submission.
  • Results at desktop before retirement.
  • Rapid evolution and changes.

22
GRID and Future of BaBar Clusters
  • Offline clusters more likely GRID adaptable.
  • Distributed resources.
  • Distributed management.
  • Distributed users.
  • Cautious awareness and participation.
  • Be involved but not be seduced.
Write a Comment
User Comments (0)
About PowerShow.com