Software Frameworks for CMS Data Analysis - PowerPoint PPT Presentation

About This Presentation
Title:

Software Frameworks for CMS Data Analysis

Description:

Distilling the sample into information at higher abstraction level ... Changes into the physics reconstruction and analysis logic affect only plug-ins ... – PowerPoint PPT presentation

Number of Views:40
Avg rating:3.0/5.0
Slides: 25
Provided by: ygap
Category:

less

Transcript and Presenter's Notes

Title: Software Frameworks for CMS Data Analysis


1
Software Frameworksfor CMS Data Analysis
  • Vincenzo Innocente
  • CERN/EP

2
Data Analysis Micro-Process
  • Physics analysis is to a large degree an
    iterative process of
  • Reducing data samples to more interesting subsets
  • Distilling the sample into information at higher
    abstraction level
  • By summarising lower level information
  • By calculating statistical entities from the
    samples
  • A large part of the work can be done on very
    high-level entities in an interactive analysis
    and presentation tool
  • Hence focus on tools that work on simple summary
    information(DSTs, N-tuples, tag databases, ...)
  • Additional tools for detector and event
    visualisation

3
CMS Data Analysis Model
Quasi-online Reconstruction
Environmental data
Detector Control
Online Monitoring
store
Request part of event
Store rec-Obj
Request part of event
Event Filter Object Formatter
Request part of event
store

Persistent Object Store Manager
Database Management System
Store rec-Obj and calibrations
Physics Paper
store
Request part of event
Data Quality Calibrations Group Analysis
Simulation
User Analysis on demand
4
Offline Architecture New Requirements
  • Bigger Experiment, higher rate, more data
  • Larger and dispersed user community performing
    non trivial queries against a large event store
  • New IT technologies to make best use of
  • Increased demand of both flexibility and
    coherence
  • ability to plug-in new algorithms
  • ability to run the same algorithms in multiple
    environments
  • guarantees of quality and reproducibility
  • high-performance user-friendliness

5
Analysis Environments
  • Real Time Event Filtering and Monitoring
  • Data driven pipeline
  • Highly reliability
  • Pre-emptive Simulation, Reconstruction and Event
    Classification
  • Massive parallel batch-sequential process
  • Excellent error recovery and rollback mechanisms
  • Excellent scheduling and bookkeeping systems
  • Interactive Statistical Analysis
  • Rapid Application Development environment
  • Excellent visualization and browsing tools
  • Human readable navigation

6
Migration
  • Today Nobel price becomes trigger for tomorrow
  • (and background the day after)
  • Boundaries between running environments are fuzzy
  • Physics Analysis algorithms should migrate up
    to the online to make the trigger more selective
  • Robust batch systems should be made available for
    physics analysis of large data sample
  • The result of offline calibrations should be fed
    back to online to make the trigger more efficient

7
Coherent Analysis Environment
Network Services
Visualization Tools
Reconstruction
Simulation
Batch Services
Analysis Tools
Persistency Services
8
The Challenge
  • Beyond the interactive analysis tool (User point
    of view)
  • Data analysis presentation N-tuples,
    histograms, fitting, plotting,
  • A great range of other activities with fuzzy
    boundaries (Developer point of view)
  • Batch
  • Interactive from pointy-clicky to Emacs-like
    power tool to scripting
  • Setting up configuration management tools,
    application frameworks and reconstruction
    packages
  • Data store operations Replicating entire data
    stores Copying runs, events, event parts between
    stores Not just copying but also doing something
    more complicatedfiltering, reconstruction,
    analysis,
  • Browsing data stores down to object detail level
  • 2D and 3D visualisation
  • Moving code across final analysis, reconstruction
    and triggers
  • Today this involves (too) many tools

9
Analysis Reconstruction Framework
Physics modules
Specific Framework
Reconstruction Algorithms
Data Monitoring
Event Filter
Physics Analysis
Generic Application Framework
Calibration Objects
Event Objects
Configuration Objects
adapters and extensions
Utility Toolkit
10
Why Frameworks
  • Physicists concentrate on the development of
    reconstruction and analysis algorithms as plug-in
    modules
  • Frameworks
  • orchestrates instances of these modules
  • hides system related complexities
  • Allows for sharing of code for common or related
    tasks.
  • Changes into the physics reconstruction and
    analysis logic affect only plug-ins
  • Changes in system services, migration to new IT
    technologies, affect only the framework

11
Questions
  • What is the role of an experiment-specific
    framework
  • How it integrates with more generic frameworks
  • How the user can have a coherent and consistent
    view of the Analysis process
  • How new tools (new frameworks) can be integrated
    without disrupting the existing architecture

12
Difficult Balance
The most profoundly elegant framework will never
be reused unless the cost of understanding it and
then reusing its abstractions is lower than the
programmers perceived cost of writing them from
scratch (G.Booch, 1994)
  • Flexibility (many abstractions)
  • Wide range of applications
  • Great potentiality of extension and migration
  • Difficult to understand, to use
  • Rigidity (few abstractions, many concrete
    classes)
  • Easy to use
  • Limited range of applications
  • Difficult to migrate, extend

13
Incoherent Solution
  • The experiment kernel deals just with one
    problem event processing
  • External tools are kept as they are
  • Communication through I/O converters
  • Persistency is just one (or more) of the external
    tools
  • Users see a different environment for each part
    of the problem domain

14
Coherent, Monolithic Solution
  • Framework Kernel is expanded to cover the whole
    problem domain
  • User see The Framework
  • New tools should be incorporated into the
    framework
  • Imported classes should be modified to derive
    from framework base-classes to keep coherency
  • Persistency is implemented by the framework
  • Example MS

15
Coherent, Non-invasive Solution
  • Users see a standard environment that acts also
    as integration glue
  • The experiment kernel is composed of a hierarchy
    of application-frameworks reusable in various
    parts of the problem domain
  • External frameworks are integrated directly, if
    they conform to the standard environment, or
    through wrappers, if not.
  • Persistency is encapsulated by one of the kernel
    application-frameworks

Which Glue?
16
Python
  • Python is an interpreted, object-oriented
    language introduced at the beginning of the 90s
  • It had a fast spread particularly among
    scientific communities in search for a rapid
    application development tool able to integrate
    efficiently already existing, highly optimized,
    scientific software (example http//sources.redh
    at.com/gsl)
  • Python provides
  • Scripting functionalities such as Perl or Tcl
  • Runtime dynamic loading
  • A standard OO library for system level support
  • Simple mechanisms for interfacing to C objects
  • A large body of open-source modules covering a
    wide spectrum of application domains, scientific
    in particular

17
Python as a glue
  • Integration in Python is non-intrusive
  • Export to Python just the class interface
    encapsulation is preserved
  • Original (C) representation is respected no
    translation, no conversion
  • Additional Python-specific extensions do not
    impact original design and functionalities
  • Binding with Python is at Runtime
  • Batch applications need not to be Python aware
  • Interactive applications can be extended
    (actually constructed) and modified at runtime

18
Examples (personal experience)
  • Exporting the interface of an application
    framework such as Objectivity/DB took few hours
  • CERN/IT Physics analysis environment (ANAPHE)
    provides a complete Python binding (Lizard) which
    does not affect the core C library
  • Seamless integration of CMS framework kernel
    (COBRA) and CERN/IT ANAPHE library through their
    (independent) python interface
  • Direct application of other Python modules
    (regular expression, string/list manipulation,
    numerics, etc) on ANAPHE or COBRA objects
  • Zero effort in downloading, installing and using
    gsl with Lizard

19
Emacs used to edit CMS C plugin to create and
fill histograms
OpenInventor-based display of selected event
Lizard Qt plotter
ANAPHE histogram Extended with pointers to CMS
events
Python shell with Lizard CMS modules
20
Example of Today Data Analysis
Python (Lizard)
On Demand Reconstruction And Visualization
Interactive User Analysis
Ask to visualize one event
Request user data
Request part of event
Store selected events

Persistent Object Store Manager
Database Management System
Store rec-Obj and user data
store
Offline Reconstruction And Analysis
Simulation
21
Key Components
  • Unique context for persistent objects
  • Today limited in COBRA to a single Objy
    federation
  • A unique persistent Object Identifier
  • Used in communication among Threads and Processes
  • Ability to process a single event with no other
    a-priori knowledge
  • Navigation from event to environment (conditions)
  • On demand reconstruction (implicit invocation)
  • Python used as glue
  • Replacing ksh, csh, tcl, perl, kuip, sigma (you
    name it)
  • Plug and play environment

22
HEP Data
  • Event-Collection Meta-Data
  • Environmental data
  • Detector and Accelerator status
  • Calibrations, Alignments
  • (luminosity, selection criteria, )
  • Event Data, User Data

Navigation is essential for an effective physics
analysis Complexity requires coherent access
mechanisms
23
Conclusions (Challenges)
  • Today HEP Experiment
  • Bigger, higher rate, more data, last longer
  • Larger and dispersed user community
  • IT
  • Ubiquitous
  • Develops fast
  • Become obsolete even faster
  • Traditional HEP analysis software architectures
  • Monolithic
  • Incoherent

24
Conclusions (Solutions)
  • Hierarchy of non-intrusive, loosely-connected
    Frameworks
  • Easier Maintenance, Evolution, Migration
  • Standard framework acting as glue
  • Easier integration
  • Coherent user view
  • Powerful flexible persistency mechanism
  • Uniform
  • Transparent data access
Write a Comment
User Comments (0)
About PowerShow.com