KANGA: ROOT Access to BABAR Data for Physics Analysis

1 / 21
About This Presentation
Title:

KANGA: ROOT Access to BABAR Data for Physics Analysis

Description:

Tag. Monte Carlo truth data ' ... 3. Fast event filtering using simple 'attributes' (TAG) data. ... Kind ANd Gentle Analysis (without Relying On Objectivity) ... –

Number of Views:61
Avg rating:3.0/5.0
Slides: 22
Provided by: davidk103
Learn more at: https://chep03.ucsd.edu
Category:

less

Transcript and Presenter's Notes

Title: KANGA: ROOT Access to BABAR Data for Physics Analysis


1
KANGA ROOT Access to BABAR Data for Physics
Analysis
  • David Kirkby, UC Irvine
  • for the BABAR Computing Group
  • CHEP 03 - Data Management Persistency
  • 25 March 2003

Primary Reference T.J.Adye, A.Dorigo,
R.Dubitzky, A.Forti, S.J.Gowdy, G.Hamel de
Monchenault, R.G.Jacobsen, D.Kirkby, S.Kluth,
E.Leonardi, A.Salnikov, L.Wilden,Comp. Phys.
Comm. 150, p.197-214 (2003).
2
The BABAR Experiment
  • The BABAR experiment records ee- collisions at
    the SLAC PEP-II collider.BABAR has 600
    collaborators from 77 institutions in10
    countries. Approximately half are from US
    institutions.

3
The BABAR Detector
  • The BABAR detector has 200k channels read out
    at100 Hz into a typical raw-data event size of
    25kB.The experiment wrote 300 TB to tape for
    the 40/fb recorded during 2001, with 10 TB kept
    on disk at SLAC.Projected luminosity increases
    will deliver an integrated 500/fb by end of
    2006.

4
BABAR Physics Analysis and Data Access
  • BABAR has published 36 physics papers since Feb
    2001.The typical physics analysis only needs
    access to amicro-DST for sparse subsets of
    data and Monte Carlo.

Tag
0.7kB/evt
Micro-DST(incl. truth subset)
3.0kB/evt
Analysis objs.
8.5kB/evt
Event summary data
Reconstructed data
120kB/evt
Raw/Simulated hit data
53kB/evt
Monte Carlo truth data
15kB/evt
Until 1999, data stored exclusively in an Objy.
Database (now gt750TB). No longer keeping Raw, Sim
Reco.
5
BABAR Analysis Framework
  • BABAR analysis uses a standard software
    framework
  • Begin/NextEvent/Finalize transitions.
  • Each transition is passed through a sequence of
    execution modules with common base class.
  • Special modules handle data I/O and the
    conversion between persistent transient obj.
    representations.
  • User modules deal only with transient object
    representations.
  • Data access is handled differently for event- and
    non-event (conditions) sources.
  • This framework design completely decouples the
    reconstruction and analysis code from the data
    store technology, at some cost in performance.

6
Motivation for KANGA
  • An Aug 99 review of BABAR Computing examined
    challenges involved in producing first physics
    results under conference deadline
    pressure.Access to data, both at SLAC and at
    remote sites, was identified as a critical
    bottleneck in physics analysis.
  • Objectivity (Objy) performance problems
    recognized as weakness of computing model at the
    time. In particular, the limitations imposed by
    large files (2Gb for analysis data), and poor
    lock-server scaling with many (100)
    clients.Review committee recommended that BABAR
    develop a limited-function short-to-medium term
    solution

7
KANGA Design Requirements
  • This recommendation led to the following design
    requirements
  • 1. Access to the identical micro-DST data
    available from Objy. No support for access to
    lower-level data.2. Compatible with existing
    framework and user analysis code. Changes almost
    transparent to analysis users (relink
    required).3. Fast event filtering using simple
    attributes (TAG) data.4. Simple and efficient
    distribution of data to remote (non-SLAC) sites.

8
The Implementation KANGA (ROO)
  • Kind ANd Gentle Analysis (without Relying On
    Objectivity) The key technical decision was to
    use ROOT objects and files for persistent data
    store.In general, there are many tradeoffs
    involved in the Objy/ROOT decision.Our decision
    was made in the context of a limited-function,
    short-term solution that would enhance the
    capabilities of a continuing Objy data store, and
    that could be completed quickly.KANGA was
    implemented and deployed in 4 months by a small
    (5) team in 1999.

9
Event Data Overview
  • KANGA event data is stored in ROOT TTree objects.
    Each branch represents a small set of persistent
    classes with one branch instance per
    event.Events from one run are usually grouped
    into a single file containing 2 trees (Analysis
    objs, Tag attributes). Typical size is 1.7 kB
    for data (21.6 GB per /fb) and 4.7kB for Monte
    Carlo. Tag attributes are stored as built-in
    types.

KANGA file (106 of these now)
Tag attributes
Analysis Objs


class-1
class-n
attr-1
attr-m
10
Event Data Architecture
  • BABAR event data I/O is managed by
    special-purpose framework execution modules. Only
    those modules dealing directly with persistent
    analysis objects and Tag attributes were
    re-implemented for KANGA.

Input Module
Reco.Module
Output Module
Reco.Module

RAW
mDST
Input Module
AnalysisModule
AnalysisModule

A significant factor in the rapid deployment of
KANGA was the earlier design decision to
completely decouple the event store technology
from the analysis framework.
11
Event Data Attribute Tags
  • The design requirement of fast selection on a
    sparse set of event attributes (total energy,
    of muons, etc) required a small compromise in the
    persistent/transient decoupling to gain improved
    efficiency.
  • Instead of converting attributes, use adapter
    pattern to implement transient interface
    directly in terms of persistent objects.
  • This compromise ties transient class directly to
    ROOT persistent class, but without exposing
    persistent class to user code.

12
Event Data Object References
  • Direct references (eg, by pointer) between
    transient classes require special handling to be
    persisted.Implemented general mechanism to
    support persistence of references between
    transient objects valid in a single execution
    context.In practice, this limits references to
    be within an event and does not support
    inter-event references.BABAR transient classes
    do not use direct references, and rely instead on
    indirect indexing. So this feature is not
    currently being exploited.

13
Event Data Schema Evolution
  • Schema describes the organization of data in a
    persistent object.
  • Schema evolution is desirable to support
    improvements in data representation and pruning
    of obsolete data.
  • ROOT I/O supports schema evolution for TObject
    subclasses via user-managed version numbers for
    each persistent class that are used to dispatch
    appropriate input-streamer code at obj-read time.
  • KANGA additionally requires updated classes to
    implement a standard (frozen) interface for
    persistent-gttransient conversion.

14
  • After schema evolution, only new objects are
    written by new code.New and existing code must
    be linked against all versions of persistent
    classes. No change required to user modules.

15
Conditions Data Overview
  • Non-event data tracks slowly-varying (lt1 Hz)
    data-taking conditions, e.g. high-voltages, gas
    flows, temperatures.
  • Calibration results are also considered
    conditions.
  • Conditions data is accessed using time as a key,
    unlike event data.
  • The full BABAR conditions DB is implemented in
    Objy and supports a flexible revision mechanism.

16
Kanga Conditions Data
  • KANGA supports access to the limited set of
    conditions needed for typical physics
    analysis.Access is read-only and limited to a
    single revision.The most recent revision of
    specific conditions are automatically extracted
    from Objy and stored in a single ROOT file of
    20Mb. Use separate files for data, MC.ROOT
    persistent implementation uses a binary tree
    (BTree class) for efficient time-key lookup with
    1s resolution.Correct association of event- and
    non-event ROOT files requires some non-trivial
    bookkeeping.

17
Event Collections
  • Physics analysis typically involves analyzing
    sparse subsets of the events in a data file, but
    different analyses require different
    subsets.Sparse collections used for analysis
    are grouped into 100 skims. Skims were
    initially written using self-contained copies of
    each event. Grouping correlated skims into 20
    streams limited event-duplication overhead to
    200.
  • More recently, pointer-based collections were
    implemented. These are more efficient for bulk
    storage and distribution, but carry additional
    book-keeping overhead. Now moving in this
    direction.

18
KANGA Book-keeping Production
  • The set of available KANGA event-data files and
    their processing history is tracked in a
    relational DB managed with perl scripts
    (SkimTools package).This DB is used to
    schedule and monitor jobs for producing KANGA
    files from Objy (as well as physics skims from
    unfiltered data and MC).Users can query this
    database to prepare a TCL fragment that
    configures their analysis job to analyze a
    dataset.Size of DB is 400Mb. Tables and
    scripts are compatible with Oracle and MySQL.

19
Data Export
  • Straightforward and efficient data export was a
    primary requirement of the KANGA design.Goals
  • - only transfer files that are new (once
    created, a file is assumed to never change)
  • - mirror SLAC filesystem layout to simplify
    logical-to-physical name mapping between
    sites.Initial implementation based on rsync was
    not efficient for typical directories containing
    O(1000) files.Present implementation uses the
    relational DB to efficiently generate lists of
    new files to transfer.

20
Experience and Outlook
  • Since May 2002, the primary KANGA event store is
    based at Rutherford (RAL).RAL currently stores
    22 TB of data and Monte Carlo(8B events) in
    1.1M files.
  • A survey in early 2002 found that at least 19
    institutions operated a local KANGA event store,
    including 5 with the majority of data
    available.Head-to-head comparisons of analysis
    results obtained with Kanga and Objy provide
    valuable QA tool.

21
  • Although conceived as a short-term solution,
    KANGA is still with us 3 years later.Burden of
    duplicated support and storage is becoming
    unsustainable.BABAR is now implementing a new
    Computing Model in which ROOT is the primary
    event store technology.This migration involves
    the eventual complete phase out of Objectivity
    from the event store, and possible significant
    changes to the original KANGA design to support
    other features of the new Computing Model.
Write a Comment
User Comments (0)
About PowerShow.com