Title: Persistent User Data using Objectivity
1Persistent User Datausing Objectivity
The missing Milestone
- Vincenzo Innocente
- CERN/EP/CMC
2Introduction
- Last RD45 milestone was about private persistent
data and classes - Although a model was developed and prescriptions
provided there was no evidence that it would have
worked in a HEP-experiment production environment - In CMS, following and extending the RD45 model,
we have developed procedures which allows any
physicist - to develop and test private persistent classes
- to manage its own private persistent objects
3HEP Data
- Environmental data
- Detector and Accelerator status
- Calibrations, Alignments
- Event-Collection Meta-Data
- (luminosity, selection criteria, )
-
- Event Data, User Data
Navigation is essential for an effective physics
analysis Complexity requires coherent access
mechanisms
4Requirements
- Software Development
- Physics reconstruction developers should be able
to develop, test and integrate persistent classes
without interferer with other developments (same
as for transient classes) - End Users should be able to develop and use
private persistent classes - Data
- Physicists (End Users) should be able to access
any kind of data without interfering with its
production - Physicists should be able to populate private
databases, using and referencing common
objects, without interfering with production
activities - Environment
- Development and running environment should be the
same for system (experiment-wide) and user data - Access mechanisms should be the same for system
and user data
5Technical Solutions
- FD-Shallow-copy
- A federation shallow-copy is a local copy of
.boot and .FDDB ooinstalled -nocatalog with all
original database files made read-only - Development
- Named schema (few 5 or so) are used to avoid
interferences and ease integration - Development and tests are performed against
fd-shallow-copy - Schema is exchanged using ooschemadump/upgrade
- Standard scripts (today making use of SCRAM,
tomorrow integrated into SCRAM) are provided to
parse ddl - A rich middle-ware of C classes, often
template, is provided to reduce (to zero?) the
Objectivity-specific code to be known by
physicists - In particular a user development environment is
provided to develop concrete-Tags of simple
structure
6Technical Solutions
- Object shallow-copy
- Local copy with (one-way-)references to
constituents - Object deep-copy
- Local copy with local copy of constituents
- Data
- Users always start with a local
federation-shallow-copy - Events are never modified in place
reconstruction always generate a new event
collection and a new event-data structure with a
shallow copy of the parent event - Users can produce deep copy of (part of) the
event for a selected sample and generate a user
collection - Concrete Tags (user private persistent objects)
can be added to a user collection -
7Navigation
- Top Level
- User sees and navigates a Unix-like tree
structure through a C or Python API (Shell) - Implementation is by Objy naming (root is a
database system name) or any other
object-containment mechanism mapped to a
Unix-like tree by the Shell - Collections
- We use a fully hirarchical composite collection
system with metadata associated to each component - It allows sequential and random access with full
support for fast user selection on MetaData - It can be used to organize any kind of objects
that need indexing but slow update - Event
- Navigation in the event structure and from the
event to the configuration is implemented using
one-way references (pure ooRefs)
8Dataset Collection
MetaData User Tag
Run Collection
Rec Event
9User Collection By Reference
MetaData User Tag
DB Name (physical location)
Context Name
Collection Name
Run Collection
User Collections are populated by User
Filters Multiple User Filters (each populating a
different User Collection) are allowed in a
single ORCA job
Original RecEvent
10RecApplication I/O
Federation
Datset Collection or User Collection
Create/extend User Collections
Histograms Tags
Append new Run to a Dataset
Store
RecReader
Request
Output Run is a new event collection containing
new data (digis RecObjs) and reference to or
replica of input data
Output User Collections are unmodified sub-samples
of the input collection
11Top Level Event Structure (COBRA5)
Run
Crossing
Trigger
Pile-up
SimEvent
12Raw Event
RawData are identified by the corresponding
ReadOut. RawData belonging to different detector
s are clustered into different containers. The
granularity will be adjusted to optimize I/O
performances. An index at RawEvent level is
used to avoid the access to all containers in
search for a given RawData. A range index at
RawData level could be used for fast
random access in complex detectors.
RawEvent
ReadOut
ReadOut
...
RawData
RawData
Index implemented as an ordered vector of pairs
13CMS Reconstructed Objects
Reconstructed Objects produced by a given
algorithm are managed by a Reconstructor.
RecEvent
A Reconstructed Object (Track) is split into
several independent persistent objects to allow
their clustering according to their access
patterns (physics analysis, reconstruction,
detailed detector studies, etc.). The top level
object acts as a proxy. Intermediate
reconstructed objects (RHits) are cached by value
into the final objects .
S-Track Reconstructor
esd
Track SecInfo
rec
S Track
..
Track Constituents
aod
Vector of RHits
S Track
14Re-Reconstruction Clones
Run
Run
Id-1
Local Replica
Crossing
Trigger
Pile-up
15Collection By Value
MetaData User Tag
New Owner Name
DataSet Name
Run Collection
New RecEvent with new or cloned Digis RecObjs
16Physical clustering