Title: US-CMS Core Application Software Demonstration
1US-CMS Core Application Software Demonstration
- DOE NSF Review
- November 27, 2001
2Introduction
- Progress through all levels of the CMS production
chain - Simulated Event Production
- Job Specification
- Generation and Simulation
- Reconstruction
- Digitization and Combination with minimum bias
- Distributed Computing
- Grid Scheduling
- Use of Distributed Computing Resources
- Automatic Return of results using grid
applications - Data Analysis
- User interactions with the database
- Plotting/Fitting
- Event Visualization
- Some of the pieces will be shown in abbreviated
time
3Specification and Job Creation
- US-CMS has taken the lead on job specification,
creation, and submission. - Two Products were created
- IMPALA to specify job parameters for all
elements of CMS Production Chain - MC_RUNJOB Specifies scripts, provides interface
job tracking, parameter storage, allows chaining
of steps.
- Impala has made production smoother and more
reproducible. - Used by almost all production sites
- MC_Runjob is being release for production this
week - A joint development effort between CMS and D0
- Builds on IMPALA specification
- Will make production more automated
4CMS Production with IMPALA
The IMPALA scripts (1) Discover input sources
and parameters (2) Fix parameters, create
production scripts, and track production
jobs (3) Define input parameters, executable
production scripts, and tracking DB interface.
5(No Transcript)
6Production Chain
- CMKIN is a Pythia Generator
- CMSIM is the last piece of CMS fortran code for
GEANT3 simulation - FZ Zebra Files are Created
- ORCA is the reconstruction code, reconstructed
events are stored in Objectivity - Signal and minimum bias are combined
7Farm Setup
- Almost any computer can run the CMKIN and CMSIM
steps.
8Farm Setup
- The first step of the reconstruction is Hit
Formatting, where simulated data is taken from
the fz files, formatted and entered into the
Objectivity data base. - Process is sufficiently fast and involves enough
data that more than 10-20 jobs will bog down the
data base server.
9Farm Setup
- The most advanced production step is digitization
with pile-up - The response of the detector is digitized the
physics objects are reconstructed and stored
persistently and at full luminosity 200 minimum
bias events are combined with the signal events
Due to the large amount of minimum bias events
multiple Objectivity AMS data servers are needed.
Several configurations have been tried.
10CMS Fall 2000 Production (FNAL)
- FNAL Hardware
- 40 dual node 750 MHz Intel based worker nodes
- 3 quad node 650 MHz Intel based server nodes
- 1 250 GB RAID5 partitiions (Dell Powervault) per
server - (1 soon to be canned dual CPU server with 1.5 TB
RAID RAID will be salvaged ) - 100 Mb/s Ethernet (soon to be upgraded to Gb
ethernet) - 1 8 CPU 400 MHz Sun Server with 1 TB RAID for
User federation - FNAL Experience (or why we use multiple
federations) - Limited by AMS server to 15 concurrent formatting
jobs, but overcame this by going to multiple
federations. - gt3 Hit formatting jobs will starve digitization
per federation - File descriptor limit for AMS server was raised
to 4096. - Pileup, pileup, and more pileup
- FNAL farm
- 60 CPU processing digitization jobs - requires
about 833 Mb/s of pileup data on average. - Use 9 pileup servers on 100 Mb/s network for full
pileup. - But we didnt reach the network limit
- FBSNG batch manager used to configure the farm
- pileup intensive jobs required to NOT run on
pileup serving worker nodes.
11Example Objy Server Deployment at FNAL
4 Production Federations at FNAL. (Uses
catalog only to locate database files.) 3 FNAL
servers plus several worker nodes used in this
configuration. 3 federation hosts with attached
RAID partitions 2 lock servers 4 journal
servers 9 pileup servers
12Distributed Computing
- The Production required to complete the TDRs and
Data Challenges rapidly overwhelms any single
production facility. - To complete the required production CMS must
enlist the help of many centers.
Simulation Digitization Digitization GDMP Production tools
Simulation No PU PU GDMP Production tools
CERN Fully operational Fully operational Fully operational ? ?
FNAL Fully operational Fully operational Fully operational ? ?
Moscow Fully operational Fully operational Fully operational ?
INFN Fully operational Fully operational Fully operational In progress ?
UCSD Fully operational Fully operational Fully operational ? ?
Caltech Fully operational Fully operational Fully operational ? ?
Wisconsin Operational Operat. starting ?
IN2P3 Operational Operat. Not Op. In progress ?
Bristol/RAL Operational Operat. Starting ? ?
Helsinki Operational Not Op. Not Op.
UFL starting Not Op. Not Op.
13CMS-PPDG SuperComputing 2001 Demo
14User Analysis
- Once data is in simulated and in the database,
the analysis can begin. - Separate summary format can be created
- Root File example created at Fermilab
- Ntuple Files JetMet PRS group analysis ntuples
generated and stored at FNAL. - Both of these have the disadvantage of breaking
the connection to the database - TAG summary format is created
- Ntuple-like data is stored in the database
- Connection is maintained, user can access higher
levels of the database
15Creation of TAGS
- Users can create tags or use tags generated at
production time. - To create tags a shallow copy of the database is
created
16Processing Tags
- Tags can be used to select events and perform
basic end-game analysis steps - Making Plots
- Applying Cuts
- Performing Fits
- The tags are fairly small
- Shallow copy is stored
- locally
- Allow user to access higher
- levels of the database for
- selected events
17Event Visualization
- Once a small set of events have been selected,
they can be visualized using IGUANA.