AIDA Abstract Interfaces for Data Analysis - PowerPoint PPT Presentation

About This Presentation
Title:

AIDA Abstract Interfaces for Data Analysis

Description:

... such as histograms, ntuples, fitters, IO etc.The adoption ... Fitter- Impl. X. Salamanca, July 2002. Andreas Pfeiffer, CERN/IT-API, andreas.pfeiffer_at_cern.ch ... – PowerPoint PPT presentation

Number of Views:114
Avg rating:3.0/5.0
Slides: 36
Provided by: din106
Category:

less

Transcript and Presenter's Notes

Title: AIDA Abstract Interfaces for Data Analysis


1
AIDAAbstract Interfaces for Data Analysis
  • http//aida.freehep.org

Andreas Pfeiffer CERN IT/API andreas.pfeiffer_at_cer
n.ch
2
Outline
  • What is AIDA
  • History/Collaboration/Documentation
  • Some Details
  • Examples
  • Ongoing work
  • Summary

3
What is AIDA
  • Abstract Interfaces for Data Analysis (AIDA)
  • The goals of the AIDA project are to define
    abstract interfaces for common physics analysis
    objects, such as histograms, ntuples, fitters, IO
    etc.The adoption of these interfaces should make
    it easier for developers and users to select to
    use different tools without having to learn new
    interfaces or change their code. In addition it
    should be possible to exchange data (objects)
    between AIDA compliant applications.

4
Motivation
  • Advantages
  • The user needs to learn only one set of
    interfaces
  • Same user code can be used with different
    AIDA-compliant analysis applications
  • Pool experience of different developer teams
  • LHC, OpenScientist, JAS
  • Different analysis tools can exchange analysis
    objects
  • same storage format, use functionality from other
    tools

5
Abstract Interfaces
  • Abstract Interfaces
  • only pure virtual methods, inheritance only from
    other A.I.
  • components use other components only through
    their A.I.
  • defines a kind of a protocol for a component
  • Maximize flexibility and re-use of packages
  • allow each component to develop independently
  • re-use of existing packages to implement
    components reduces start-up time significantly
  • De-couple implementation of a package from its use

6
AIDA Example
  • Use same code with any AIDA-compliant analysis
    tool.

7
Use of Components withAbstract Interfaces
  • User Code uses only Interface classes
  • IHistogram1D hist histoFactory-gt
    create1D(track quality, 100, 0., 10.)
  • Actual implementations are selected at run-time
  • loading of shared libraries
  • No change at all to user code but keep freedom
    to choose implementation

Histo- Impl. 2
8
Systems implementing AIDA
  • Three implementations of AIDA exist
  • Anaphe/Lizard (C)
  • http//anaphe.web.cern.ch/anaphe
  • Open Scientist (C)
  • http//www.lal.in2p3.fr/OpenScientist
  • JAIDA/JAS (Java) AIDA-JNI 1.0 (C)
  • http//java.freehep.org/lib/freehep/doc/aida
  • GEANT4 adopted AIDA for analysis

9
AIDA Interfaces Summary
  • AIDA Factories
  • ITuple
  • IHistogram
  • ICloud
  • ITree

10
File /data/pfeiffer/Rose/AIDA-2.2.mdl Sun Jun
30 171839 2002 Class Diagram Logical View /
Main Page 1
11
IHistogram (1D-3D)
  • Binned histogram IHistogram1D, 2D, 3D
  • fill methods (with/without weight)
  • Histogram info entries, mean, rms, axis
  • Bin info centre, entries, height, error
  • Histogram arithmetic add, multiply, divide
  • Convenience methods, like coordinate-to-index
    conversion

12
ITuple
  • ITuple - interface to the Data
  • get/set methods for double, float, int,
  • Information about columns min, max, mean, rms
  • Navigating start(), next(), skip(int nRows)
  • Project ITuple into 1D, 2D, 3D histogram
  • New features for AIDA 2.3
  • Support for complex internal structures
    (subfolders)
  • Merging and chaining of ITuples under discussion

13
ICloud
  • Unbinned collection of points ICloud1D, 2D, 3D
  • Can represent scatter plot, dynamically
    rebinnable histogram
  • Can be converted to a binned histogram
  • Standard get/set methods for entries
  • Collection info lower, upper, mean, rms

14
IFunction and Fitting
  • Fitting IFunction, IFitFunction
  • IFunction simple interface, allows to set
    parameters and get function value
  • IFitFunction fit function to a histogram
  • Extends IFunction
  • Various fit control methods step size, bounds,
    etc.
  • Allows to perform fit and get results
  • AIDA 2.2 fitting functionality fairly limited
  • AIDA 2.3 (Under discussion) extended
    functionality

15
ITree
  • ITree
  • directory-like structure (Unix directory
    convention)
  • Methods like cd, ls, mkdir, etc.
  • AIDA analysis objects (tuples, histograms,
    clouds, ets.) exist within ITree directories
  • save/restore functionality, hides storage
    details from the user
  • Compatible with database or file storage
  • Can support multiple file formats
  • Mount/Unmount functionality (like unix) allows
    multiple stores to be seamlessly merged
  • AIDA XML format is defined for data interchange

16
Summary
  • Abstract Interfaces de-couple components of
    frameworks
  • Weakly coupled components and frameworks have
    large advantages
  • User code needs no change if changing
    implementation
  • Even across language boundaries (JAIDA)
  • Ease of re-use of a component
  • Flexibility through independence of
    implementation
  • Maintainability through independent evolution of
    components
  • Example using Geant-4 and AIDA compliant analysis
    tools (see tutorial)

17
AnapheOO Libraries for Data Analysis using C
and Python
  • http//anaphe.web.cern.ch/anaphe

Andreas Pfeiffer CERN IT/API andreas.pfeiffer_at_cer
n.ch
18
Outline
  • Motivation
  • Anaphe Components
  • C
  • Lizard Interactive Data Analysis
  • Python
  • Summary

19
Lifetime of LHC software 25 yrs
20
Anaphe what it is
  • Analysis for physics experiments
  • Modular (OO/C) replacement of CERNLIB
    functionality for use in HEP experiments
  • memory management
  • I/O
  • foundation classes
  • histogramming
  • minimizing/fitting
  • visualization
  • interactive data analysis
  • Trying to use standards wherever possible
  • Trying to re-use existing class libraries

21
Anaphe Components
22
Layered Approach
  • Basic functionalities (histograms, fitting,
    etc.) are available as individual C class
    libraries.
  • Easy replacing one part without throwing away
    everything
  • Insulate components through Abstract Interfaces
  • Apply s/w quality control tools
  • code checking, testing

23
ANAPHE Components
Python / SWIG Objectivity/DB HBook NAG-C
Minuit Qt (free edition)
User Interface - using Abstract Types
24
Basic 3D Graphic Libraries
  • OpenGL (basic graphics)
  • De-facto industry standard for basic 3D graphics
  • Used in CAD/CAE, games, VR, medical imaging
  • OpenInventor (scene mgmt.)
  • OO 3D toolkit for graphics
  • Cubes, polygons, text, materials
  • Cameras, lights, picking
  • 3D viewers/editors,animation
  • Based on OpenGL/MesaGL

25
2D Graphics libraries
  • Qt
  • multi-platform C GUI toolkit
  • C class library, not wrapper around C libs
  • superset of Motif and MFC
  • available on Unix and MS Windows
  • no change for developer
  • commercial but with public domain version
  • www.troll.no
  • Qplotter
  • add-on functionality for HEP
  • HIGZ/HPLOT

26
Mathematical Libraries
  • NAG (Numerical Algorithms Group) C Library
  • Covers a broad range of functionality
  • Linear algebra
  • differential equations
  • quadrature, etc.
  • Special functions of CERNLIB added to Mark-6
    release
  • mostly for theory and accelerator
  • Quality assurance
  • extensive testing done by NAG
  • www.nag.com

27
CLHEP - foundation classes
  • HEP foundation class library
  • Random number generators
  • Physics vectors
  • 3- and 4- vectors
  • Geometry
  • Linear algebra
  • System of units
  • more packages recently added
  • will continue to evolve
  • wwwinfo.cern.ch/asd/lhc/clhep/

28
Histograms the HTL package
  • Histograms are the basic tool for physics
    analysis
  • Statistical information of density distributions
  • Histogram Template Library (HTL)
  • design based on C templates
  • Modular separation between sampling and
    display
  • Extensible open for user defined binning
    systems
  • Flexible support transient/persistent at the
    same time
  • Open large use of abstract interfaces

29
Fitting and Minimization
  • Fitting and Minimization Library (FML)
  • common OO interface
  • NAG-C, MINUIT
  • based on Abstract Interfaces
  • IVector, IModelFunction,
  • fitting as a special case of minimization
  • minimize distance between data and model
  • replacement for HepFitting (and Gemini)
  • Gemini
  • common interface to minimizer engine
  • very thin layer

30
Tags, Ntuples and Events
  • Tags - a special kind of Ntuple
  • Always associated with an underlying persistent
    store
  • Tags may be used to store ntuple-like data
  • extracted from all over the event
  • minPt, maxEmiss, nJets, nMuon, trigger,
  • Main use speedup data selection for analysis
  • Tag simplifies selection without loosing
    complexity
  • Events more complex than a tree structure (CWN)
  • lots of cross-references between classes,
    containers
  • Association from the Tag to the Event may be used
    to navigate to any other part of the Event
  • even from an interactive visualization program

31
  • Lizard a tool for Interactive Data Analysis

32
Interactive Data Analysis
  • Aim OO replacement for PAW (at least)
  • analysis of ntuple-like data (Tags,
    Ntuples, )
  • visualisation of data (Histograms, scatter-plot,
    Vectors)
  • fitting of histograms (and other data)
  • access to experiment specific data/code
  • Maximize flexibility and re-use
  • Foresee customization/integration
  • allow use from within experiments s/w
  • Plan for extensions
  • code for now, design for the future
  • Ensure maintainability
  • use of s/w quality control tools

33
Scripting - why
  • Typical use of scripting is quite different from
    programming (reconstruction, analysis, ...)
  • history go back to where I was before
  • repetition/looping - with modifiable parameters
  • avoid one size fits all or using power-tool as
    hammer
  • rapid prototyping in scripting language
  • quick turn-around times
  • performance critical code in core language
  • exploit richer set of features/functionality
    (e.g. templates in C)
  • scripting languages usually less susceptible to
    changes than mainstream languages
  • potentially longer lives

34
Python - why
  • Python - OO (scripting) language
  • no strange !-variables
  • sensitive to indentation
  • More easy for users
  • as Java
  • Lots of user supplied modules available and ready
    for use
  • scientific, numerics, graphics, GUI, network, OS,
    games, DBs,
  • example http//www.vex.net/parnassus/
  • Parnassus Totals 1173 items in 49 categories.
  • Also usable in Java (Jython)
  • used in JAS for scripting
  • minimize changes needed within AIDA compliant
    environments

35
Python - how
  • SWIG to (semi-) automatically create connection
    to chosen scripting language
  • allows flexibility to choose amongst several
    scripting languages
  • Python, Perl, Tcl, Guile, Ruby, (Java)
  • Very easy to use
  • swig -c -python -shadow -c myClass.h
  • create shared lib from myClass.cpp and
    myClass_wrap.c
  • start python and import myClass.h to use it
  • Very easy to extend
  • simply inherit from swiggified class in python
  • modifications can later be fed back into C
  • performance, type safety, special language
    features (templates),
Write a Comment
User Comments (0)
About PowerShow.com