Title: PRS steering committee meeting February 12th, 2003
1PRS steering committee meetingFebruary 12th, 2003
Towards defining the key parameters for the
CMS Physics Model
- N. Neumeister, S. Wynhoff
2Introduction
- Computing Model
- define a possible computing model for LHC
start-up. Not optimized but realistic. - LHC start-up scenario
- Proton-proton heavy ions physics
- Physics Model Analysis Model - Data Model -
Computing Model - Some key parameters are needed in order to start
simulation of the computing model - Comparison with Run II Computing Review, June 2002
3Computing Model
4TriDAS
- Level 3 rate to tape 100 Hz
- Days of running/year 200 days
- LHC live time 50
- Number of express lines 5
- Low, high luminosity and heavy ions
- Divide into event output streams (data streams,
datasets) - defined by HLT trigger table?
- Structure of express lines
- duplication of raw data should be small
- full reconstruction online or at T0?
- Number of express lines number of analysis
groups
5TriDAS
FU Filter Unit SU Server Unit PU Processing Unit
HLT
Monitoring, calibration
Reconstruction, reprocessing, analysis
Express lines
The result of the reconstruction will be saved
long with the raw data in a large Object database
6Raw Data
- Raw data event size 1 MByte
- event type, luminosity, proton-proton vs. heavy
ions - Reconstructed event size 0.5 MByte
- can be measured by PRS groups
- Raw data format chunks or objects?
- save information from HLT reconstruction?
- data compression?
- raw data size dominated by tracker
- CDF raw 250 kByte, reco 65 kByte
- DØ raw 250 kByte, reco 150 kByte
- Different ratios raw/reco why?
7Simulation
Event sizes and stored data for tapes and
central analysis disk cache Number of events on
tape an disk for each tier relative to raw data
- Number of simulated events
- Fast vs. Full simulation
- Event size
- Save Hits or Digis?
- Simulation time GEANT 4 vs GEANT 3 (CMSIM)
- production team
- DØ data samples comparable to half the collider
data rate. All MC generation is offsite. (25
full GEANT and 75 parameterized) - The MC TMB is twice as large as the
comparable collider data tier because more
information is stored. - CDF Data samples comparable to half the collider
data rate. Most of the MC generation is
performed at FNAL.
Data type Size MB Tape factor Disk factor
raw event 0.25 1 0.001
data DST 0.15 1.2 0.1
data TMB 0.01 2 1
data root/derived 0.01 8 0
MC DØGstar 0.7 0.1 0
MC DØSim 0.3 0 0
MC DST 0.15 1 0.2
MC TMB 0.02 3 0.5
Fast MC 0.02 2 0.5
DØ
8Calibration/Alignment
- Quasi online?
- laser runs, pedestal runs
- slow control, e.g. temperature
- minbias events
- special physics events (W,Z,)
- number of calibration streams
- number of events needed -gt PRS
- correlation between calibrations
- feedback to online from
- Lvl3 (minutes)
- fast offline (hours-days)
- detailed offline (days-months)
- Trigger efficiencies
hope to get first numbers soon from PRS groups
9Reconstruction
- Reconstruction is structured in several
hierarchical steps - Detector-specific processing detector data
unpacking and decoding, - apply detector calibration constants,
reconstruct clusters or hit objects. - Tracking hits in the silicon and fiber detectors
are used to reconstruct global tracks. - this is one of the most CPU-intensive
activities. - -Vertexing primary, secondary vertex candidates.
- Particle identification produces the objects
most associated with physics analyses. - Using a wide variety of sophisticated
algorithms, standard physics object candidates - are created (electrons, photons, muons, missing
ET and jets heavy-quarks, tau decay)
- Re-reconstruction of passes (CDF twice)
- requires calibration/alignment
- pre-requisites (tracker, calo)
- reconstruction time (tracking)
- luminosity dependency (instantaneous and bunch
spacing) - output of reconstruction DST?
- measure ORCA timing for different event types and
different luminosities
10Reconstruction
- DØ
- Today 15 sec/event (_at_ 500 MHz CPU)
- 6.5 10 GHz sec/evt
- detector 2 sec,
- tracking 8 sec,
- vertexing 0.2 sec,
- particle identification 3 sec.
- These times will grow significantly as the
instantaneous luminosity of the accelerator (and
thus the number of interactions per event)
increases (an increase of a factor of 4 is
observed in tracking times when going from 2 to 5
interactions per event). Processing time
increases dramatically for physics enriched
samples assume a reconstruction processing time
of 50 sec/event for Run 2b. - CDF
- 2.5 GHz sec/evt dominated by tracking
- increase of a factor of 2 at high Lumi!
DØ
Instantaneous Luminosity cm-2s-1 Reconstruction time (_at_ 500 MHz CPU) sec/evt
9 1031 25
20 1031 35
50 1031 6 intera.bx 80
50 1031 2 intera./bx 32
11Event Data Model
- At present CMS has no Event Data Model (CCS
PRS) - Should start with prototype soon
- of standard levels (size?)
- linked raw lt-gt DST lt-gt ESD (tags?) lt-gt analysis
specific ntuples - typical analysis, track refit, event display
(interactive analysis) - Meta data (lumi per event/dataset, trigger
configuration, etc.) - Physics groups standard Ntuples?
- DST content compressed raw data objects?
analysis objects (tracks, electrons, muons, jets,
etc.) - DST stripping?
- Is it possible to define standard/common parts of
ESD / Ntuples?
12DØ
- DST (Data Summary Tier)
- All high level physics objects, some
detector-level information to allow calibration
and limited re-reconstruction. - Slightly more DST than raw data will be
stored on tape to allow for some reprocessing.
- 100 on tape, partial set on disk. (150
kByte/evt) - TMB (Thumbnail)
- All high level physics objects, good for
most physics analyses. - Physics summary format, and is presumed to
be the starting point for the most user analysis.
- 100 on tape, 100 on disk at central and
regional centers. (10 kByte/evt) - Derived Datasets
- Physics/ID groups or their subgroups may
create their derived datasets from either DST or
TMB in their chosen format and are responsible
for maintaining these datasets. Root-tuples (0.2
sec/event to create). The TMB can be used
directly to perform many useful analyses. In
addition, it allows the rapid development of
event selection criteria that will be
subsequently applied to the DST sample.
13CDF
- Primary Data sets (PAD)
- raw data physics objects 100 kByte/evt on
disk - is a collection of 100 Data Sets, each 107
events - Secondary Data Sets
- produced in official organized may by physics
groups - fast access format
- 50-100 kByte/event
- Tertiary Data Sets
- 2 types of standard ntuple formats (ROOT based),
Optimized striped PADs,
14Organization/Analysis
- Number of physics groups
- ID groups??
- of analyses (correlation with publications)
- of active physicists
- Prioritizing of analyses (in case of discoveries)
- Physics coordination
- DØ
- 6 analysis groups (Higgs, B physics, New
Phenomena, QCD, Top, WZ) - 8 ID groups (EM ID, Muon ID, Jet/MET ID, Tau ID,
B ID, Luminosity, Jet Energy Scale, Fwd proton) - CDF
- 5 analysis groups (B, Electroweak, Top, Exotics,
QCD), with subgroubs - 6 ID groups (electron, muon, tau, photon, jets,
b-tagging)