Software Architecture and Data Model Part I Software Architecture

About This Presentation

Title:

Software Architecture and Data Model Part I Software Architecture

Description:

Other services (detector objects, environmental data, parameters, etc) ... users access persistent objects through C pointers. Vincenzo Innocente, CERN/EP ... – PowerPoint PPT presentation

Number of Views:131

Avg rating:3.0/5.0

Slides: 38

Provided by: ygap

Category:

more less

Transcript and Presenter's Notes

Title: Software Architecture and Data Model Part I Software Architecture

1
Software Architecture and Data ModelPart
ISoftware Architecture
Software framework, services and persistency in
high level trigger, reconstruction and analysis

Vincenzo Innocente
CERN/EP/CMC

2
CMS (offline) Software
Quasi-online Reconstruction
Environmental data
Slow Control
Online Monitoring
store
Request part of event
Store rec-Obj
Request part of event
Event Filter Objectivity Formatter
Request part of event
store
Persistent Object Store Manager Object Database
Management System
Store rec-Obj and calibrations
store
Request part of event
Data Quality Calibrations Group Analysis
Simulation G3 and or G4
User Analysis on demand
3
Requirements (from the CTP)

Multiple Environments
Various software modules must be able to run in a
variety of environments from level 3 triggering,
to individual analysis
Migration between environments
Physics modules should move easily from one
environment to another (from individual analysis
to level 3 triggering)
Migration to new technologies
Should not affect physics software module

4
Requirements (from the CTP)

Dispersed code development
The software will be developed by
organizationally and geographically dispersed
groups of part-time non-professional programmers
Flexibility
Not all software requirements will be fully known
in advance
Not only performance
Also modularity, flexibility, maintainability,
quality assurance and documentation.

5
CMS Software Architecture RD

95-96 RD41 --- OO Detector Reconstruction
Detector model, Local hit cache, Pattern
recognition
95-97 RD45 --- OO Event Model (persistent)
Event structure, Raw data, Reconstructed objects
95-97 RD45 --- Calibration Database
Time dependent data, Versioning, Experience with
Objy
96-98 Implicit Invocation
Event dispatching, Reconstruction on demand
97-98 Test-Beam (H2, X5)
OO Daq, Online filtering, ODB population
99-00 ORCA production
MetaData, concurrent jobs, multi-threading, RT
dynamic loading

6
Use Cases(current functionality in ORCA)

Simulated Hits Formatting
Digitization of Piled-up Events
Test-Beam DAQ Analysis
L1 Trigger Simulation
Track Reconstruction
Calorimeter Reconstruction
Global Reconstruction
Physics Analysis

7
Reconstruction Scenario

Reproduce Detector Status at the moment of the
interaction
front-end electronics signals (digis)
calibrations
alignments
Perform local reconstruction as a continuation of
the front-end data reduction until objects
detachable from the detectors are obtained
Use these objects to perform global
reconstruction and physics analysis of the Event
Store Retrieve results of computing intensive
processes

8
Reconstruction Sources
9
Components

Reconstruction Algorithms
Event Objects
Physics Analysis modules
Other services (detector objects, environmental
data, parameters, etc)
Legacy not-OO data (GEANT3)
The instances of these components require to be
properly orchestrated to produce the results as
specified by the user

10
CARFCMS Analysis Reconstruction Framework
Application Framework
Physics modules
Reconstruction Algorithms
Event Filter
Data Monitoring
Physics Analysis
Calibration Objects
Event Objects
MetaData Objects
Utility Toolkit
11
Architecture structure

An application framework CARF (CMS Analysis
Reconstruction Framework),
customisable for each of the computing
environments
Physics software modules
with clearly defined interfaces that can be
plugged into the framework
A service and utility Toolkit
that can be used by any of the physics modules
Nothing terribly new, but...
Traditional architecture can not cope with
LHC-collaboration complexity

12
Problems with traditional architectures

Traditional Framework schedules a-priori the
sequence of operations required to bring a given
task to completion
Major management problems are produced by changes
in the dependencies among the various operations
Example 1
Tracks of type T1 are reconstructed only using
tracker hits
Tracks of type T2 requires calorimetric clusters
as seeds
Fast simulation reconstruct tracks type T3
directly from generator information
switching from T1 to T2 the framework should
determine that calorimeter reconstruction should
run first
If T3 are used the most of the tracker software
is not required
Example2
The global initialization sequence should be
changed because, for one detector, conditions
change more often than foreseen

13
Framework Basic Dynamics

Avoid monolithic structure
Collection of loosely coupled mechanisms which
implements
in abstract the tasks of a HEP reconstruction and
analysis software.
Implicit Invocation Architecture
No central ordering of actions, no explicit
control of data flow only implicit dependencies
External dependencies managed through an Event
Driven Notification to subscribers
Internal dependencies through an Action on Demand
mechanism

14
Event Driven Notification
Observers are instantiated by static factories
residing in shared libraries. These are loaded
on demand during application configuration
Dispatcher
Detector elements observe physics
events Factories observe user requests
Obs1
Obs2
Obs3
Obs4
Observers clients or providers
15
Action on Demand
Compare the results of two different track
reconstruction algorithms
Rec Hits
Detector Element
Rec Hits
Rec Hits
Hits
Event
Rec T1
T1
CaloCl
Rec T2
Analysis
Rec CaloCl
T2
16
Persistency Services

Persistent Object Management is fully integrated
in
CARF using an ODBMS
CARF manages
multi-threaded transactions
creation of databases and containers
meta data and event collections
physical clustering of event objects
persistent event structure and its relations with
transient objects
Use of Database is transparent to detector
developers
users access persistent objects through C
pointers

17
Framework Ancillary Services

User Interface
not just GUI
Error Report and Exception management
Logging facilities
Timing facility (statistics gathering)
Utility library
Notably Objy utilities, wrappers and generic
persistent capable classes
Not architecture specific
Candidates for common effort with other
interested parties

18
Software Architecture and Data ModelPart
IIData Model

Vincenzo Innocente
CERN/EP/CMC

19
Results of RD and ORCA experience

Traditional software architectures
(mainsubroutines, pipesfilters) have been found
not to be adequate to CMS (multiple environments,
evolving requirements, a long time-scale)
An implicit invocation architecture is a
flexible software solution which can scale with
the complexity of the CMS project.
ODBMS, integrated into the framework,
provides a coherent management of persistent
objects
coupled with run-time dynamic-loading, allows to
automatically configure an application
The framework can effectively shield physics
modules from the underlying technology without
penalizing performances

20
HEP Data

Environmental data
Detector and Accelerator status
Calibrations, Alignments
Event-Collection Meta-Data
(luminosity, selection criteria, )
Event Data, User Data

Navigation is essential for an effective physics
analysis Complexity requires coherent access
mechanisms
21
Do I need a DBMS? (a self-assessment)

Do I encode meta-data (run number, version id) in
file names?
How many files and logbooks I should consult to
determine the luminosity corresponding to a
histogram?
How easily I can determine if two events have
been reconstructed with the same version of a
program and using the same calibrations?
How many lines of code I should write and which
fraction of data I should read to select all
events with two ?s with p?gt 11.5 GeV and
?lt2.7?
The same at generator level?
If the answers scare you, you need a DBMS!

22
Can CMS do without a DBMS?

An experiment lasting 20 years can not rely just
on ASCII files and file systems for its
production bookkeeping, condition database,
etc.
Even today at LEP, the management of all real and
simulated data-sets (from raw-data to n-tuples)
is a major enterprise
Multiple models used (DST, N-tuple, HEPDB,
FATMAN, ASCII)
A DBMS is the modern answer to such a problem
and, given the choice of OO technology for the
CMS software, an ODBMS (or a DBMS with an OO
interface) is the natural solution for a coherent
and scalable approach.

23
A BLOB Model
Event
Event
DataBase Objects
RecEvent
RawEvent
Blob a sequence of bytes. Decoding it is a
user responsibility.
Why should Blobs not be stored in the DBMS?
24
Raw Event
RawData are identified by the corresponding
ReadOut. RawData belonging to different detector
s are clustered into different containers. The
granularity will be adjusted to optimize I/O
performances. An index at RawEvent level is
used to avoid the access to all containers in
search for a given RawData. A range index at
RawData level could be used for fast
random access in complex detectors.
RawEvent
ReadOut
ReadOut
...
RawData
RawData
Index implemented as an ordered vector of pairs
25
CMS Reconstructed Objects
Reconstructed Objects produced by a given
algorithm are managed by a Reconstructor.
RecEvent
A Reconstructed Object (Track) is split into
several independent persistent objects to allow
their clustering according to their access
patterns (physics analysis, reconstruction,
detailed detector studies, etc.). The top level
object acts as a proxy. Intermediate
reconstructed objects (RHits) are cached by value
into the final objects .
S-Track Reconstructor
esd
Track SecInfo
rec
S Track
..
Track Constituents
aod
Vector of RHits
S Track
26
Physical clustering
27
Is an ODBMS an overkill for Histograms?

Maybe, if histograms are your sole I/O.
(I use my sun ultra-5 to read mails through pine
even if a line-mode terminal would be more than
adequate)
N-tuples are user event-data and, for any
serious use, require a level of management and
book-keeping similar to the experiment-wide
event data.
What counts is the efficiency and reliability of
the analysis
The most sophisticated histogramming package is
useless if you are unable to determine the
luminosity corresponding to a given histogram!

28
Objectivity Features CMS (really) uses

Persistent objects are real C (and Java)
objects
coherent access to any kind of object
I/O cache (memory) management
no explicit read and write
no need to delete previous event
Smart-pointers (automatic id to pointer
conversion)
Efficient containers by value (VArray)
Full direct navigation in the complete federation
from MetaData to Event-Data
from Event-Data back to Meta-Data
Flexible object physical-clustering
Object Naming
as top level entry point (at collection level)
as rapid prototyping tool

29
More ODBMS (Objy) Advantages

Novel access methods
A collection of electrons with no reference to
events
Direct reference from event-objects to condition
database
Direct reference to event-data from user-data
Flexible run-time clustering of
heterogeneous-type objects
cluster together all tracks or all objects
belonging to the same event
Real DB management of reconstructed objects
add or modify in place and on demand parts of an
event

30
CMS Experience

Designing and implementing persistent classes not
harder than doing it for native C classes.
Easy and transparent distinction between logical
associations and physical clustering.
Fully transparent I/O with performances
essentially limited by the disk speed (random
access).
File size overhead (5 for realistic CMS object
sizes) not larger than for other products such
as ZEBRA, BOS etc.
Objectivity/DB (compared to other products we are
used to) is robust, stable and well documented.
It provides also many additional useful
features.
All our tests show that Objectivity/DB can
satisfy CMS requirements in terms of performance,
scalability and flexibility

31
CMS Experience

There are additional configuration elements to
care about ddl files, schema-definition
databases, database catalogs
organized software development rapid prototyping
is not impossible, its integration in a product
should be done with care
Performance degradations often wait you around
the corner
monitoring of running applications is essential,
off-the-shell solutions often exist (BaBar,
Compass)
Objectivity/DB is a bare product
integration into a framework is our
responsibility
Objectivity is slow to apply OUR changes to their
product
Is this a real problem? Do we really want a
product whose kernel is changed at each user
request?

32
CMS Experience (missing features 99)

Scalability 64K files are not enough (Scheduled
for Dec 2000)
containers are the natural Objectivity units,
still things for which the OS (and files) is
preferred
bulk data transfer (to mass-storage, among
sites)
access control, space allocation to users, etc.
Efficient and secure AMS (ok in 5.2!!!)
with MSS and WAN support
Support for private user classes and user data
(w.r.t. experiment-wide ones)
many custom solution based on multi-federation
Active schema
User Application Layer
like a rapid prototyping environment

33
Objy-HEP Building a Partnership

Objectivity recognize that HEP requirements
anticipate future requirements of other clients
the next versions will include solutions to
almost all our improvement requests
The New AMS has been essentially developed at
SLAC
CERN has built version 5.2.1 for Linux RH6.1
CERN will help in building a full port to Solaris
CC 5
CERN will prototype a new lockserver monitor
It is essential to continue to develop this
partnership and
increase the trust of both partners in each
other.

34
Alternatives ODBMS

Versant is a viable commercial alternative to
Objectivity
do we have time to build an effective partnership
(eg. MSS interface)?
Espresso (by IT/DB) should be able to produce a
fully fledged ODBMS in a couple of years once the
proof-of-concept prototype is ready
CMS will test Espresso in the context of CARF
this summer
Migrate CARF from Objectivity to another ODBMS
We expect that it would take about one year
Such a transition will not affect the basic
principles of CMS software architecture and Data
Model and
Will involve only the core CARF development team.
Will not disrupt production and physics analysis

35
Alternatives ORDBMS

ORDBMS (Relational DB with OO interface) are
appearing on the market
They look targeted to those who have already a
relational system and wish to make a transition
to OO
No serious evaluation of their usage in HEP has
been performed yet.
No experiment is using (or planning to use) them
IT/DB will visit Oracle in autumn.
Still early to assess impact of ORDBMS on CMS
Data Model and on migration effort

36
Fallback Solution Hybrid Models

(R)DBMS for MetaData, Calibration, etc
Object-Stream files for event data
Ad-hoc networked dataserver and MSS interface
Less flexible
Rigid split between DBMS and event data
One way navigation from DBMS to event data
More complex
Two different I/O systems
More effort to learn and maintain
This approach will be used by several experiment
at BNL and FermiLab
(RDBMS not directly accessible from user
applications)
CMS and IT/DB are following closely these
experiences.
We believe that this solution could seriously
compromise our ability to perform our physics
program competitively

37
ODBMS Summary

A DBMS is required to manage the large data set
of CMS
(including user data)
An ODBMS provides a coherent and scalable
solution for managing data in an OO software
environment
Once an ODBMS will be deployed to manage the
experiment data, it will be very natural to use
it to manage any kind of data related to detector
studies and physics analysis
Objectivity/DB is a robust and stable kernel
ideal to be used as the base to build a custom
storage framework
Objectivity starts to respond to our peculiar
requirements