Title: Analysis Software Strategy
1Analysis Software Strategy
AIDA ANAPHE LIZARD
- Jürgen Knobloch
- HTASC, DESY 9 October 2001
2Anaphe Lizard - AIDA
- Anaphe - Libraries for Hep Computing
- Full replacement of CERNLIB
- Open source, mostly license-free
- Lizard is based on Anaphe components and the
Python scripting language (through SWIG) - PAW functionality
- young but has very solid base in mature Anaphe
- real plug-in structure
- AIDA project (Abstract Interfaces for Data
Analysis) - Interface definitions classes for C and Java
3ANAPHE Components
Lizard
Commander
Python SWIG Objectivity/DB NAG-C Qt
Histograms NTuples Fitting Plotting VectorOfPoints
Functions Analyzer
HTL Tags (HepODBMS Gemini/HepFitting Qplotter Vect
orOfPoints
Abstract types
Implemetations (HEP-specific)
AIDA (Abstract Interfaces for Data Analysis)
non-HEP components
CLHEP Class Libraries for HEP
User Interface - using Abstract Types
4ANAPHE - History
- LHC Libraries for Hep Computing
- 1995 project initiated with the aim to develop a
modular replacement of CERNLIB (lib and tools) - using standards (STL, OpenGL, ...) and commercial
components (e.g., NAG_C, OpenInventor,
IrisExplorer) - First iteration on physics data analysis tool
- data driven approach (based on IRIS Explorer)
- GUI based, not command line driven
- Request to create new physics analysis tool
- September 99new requirements defined together
with experiments - identified categories/components and Abstract
Types
5AIDA - History
- started in Sept. 1999 (HepVis 99, Orsay)
- several (mini-) workshops since then
- main ones Paris 2000 and Boston 2001
- release 1.0 summer 2000
- concentrated on developers view
- Histogram package only
- IAxis, IHistogram, IHistogram1D, IHistogram2D,
IHistogramFactory - release 2.0 May 2001 (Boston release)
- about 20 Interface classes
- aiming at discussion and gathering feedback
6History and Status - Lizard
- Started after CHEP-2000
- Full version out since June 2001
- PAW like analysis functionality plus
- on-demand loading of compiled code using shared
libraries - gives full access to experiments analysis code
and data - based on Abstract Interfaces
- flexible and extensible
- License-free version available
7USDP-like approach
- Start with OO analysis
- collection of user requirements
- OO design phase
- define categories and classes
- find patterns
- Create prototype
- considered throw away
- get feedback from users
- Iterate, iterate, iterate ...
8User requirements
- Ease of use (like PAW)
- Foresee customization/integration
- e.g., use persistency/messaging/... from
experiment - Framework used will not be exposed/imposed
- needs to be compatible with experiments
framework - Plan for extensions
- code for now, design for the future
- Maximize flexibility/interoperability
9Architecture Overview
- Maximize flexibility and re-use
- Abstract Interfaces allow each component to
develop independently - not bound to a specific implementation of any
component (plugin style) - re-use of existing packages to implement
components reduces start-up time significantly - Identify and use patterns - avoid anti-patterns
- learn from other peoples experiences/failures
10Ignominy Tool to Quantify Modularity
- NCCD (spaghetti index)
- ? 1.0 good toolkit
- lt 1.0 independent packages
- gt 1.0 strongly-coupled
Includes Fortran
ATLAS
ROOT
ORCA (CMS)
GEANT4
COBRA (CMS)
Anaphe / Lizard
IGUANA
See 8-024
11CLHEP
- HEP foundation class library
- Random number generators
- Physics vectors
- 3- and 4- vectors
- Geometry
- Linear algebra
- System of units
- more packages recently added
- will continue to evolve
- wwwinfo.cern.ch/asd/lhc/clhep/
122D Graphics libraries
- Qt
- multi-platform C GUI toolkit
- C class library, not wrapper around C libs
- superset of Motif and MFC
- available on Unix and MS Windows
- no change for developer
- commercial but with public domain version
- www.troll.no
- Qplotter
- add-on functionality for HEP
- HIGZ/HPLOT
13Basic 3D Graphic Libraries
- OpenGL (basic graphics)
- De-facto industry standard for basic 3D graphics
- Used in CAD/CAE, games, VR, medical imaging
- OpenInventor (scene mgmt.)
- OO 3D toolkit for graphics
- Cubes, polygons, text, materials
- Cameras, lights, picking
- 3D viewers/editors,animation
- Based on OpenGL/MesaGL
14Mathematical Libraries
- NAG (Numerical Algorithms Group) C Library
- Covers a broad range of functionality
- Linear algebra
- differential equations
- quadrature, etc.
- Special functions of CERNLIB added to Mark-6
release - mostly for theory and accelerator
- Quality assurance
- extensive testing done by NAG
- www.nag.com
15Histograms the HTL package
- Histograms are the basic tool for physics
analysis - Statistical information of density distributions
- Histogram Template Library (HTL)
- design based on C templates
- Modular separation between sampling and
display - Extensible open for user defined binning
systems - Flexible support transient/persistent at the
same time - Open large use of abstract interfaces
- recent addition 3D histograms
16Fitting and Minimization
- Fitting and Minimization Library (FML)
- common OO interface
- NAG-C, MINUIT
- based on Abstract Interfaces
- IVector, IModelFunction,
- Gemini
- common minimization interface
- very thin layer
17Tags, Ntuples and Events
- NtupleTag Library
- Ntuple navigation and analysis
- common OO interface for different storage
- ODBMS
- HBook (CERNLIB)
- Exploiting Tag concept
- enhanced Ntuples
- associated with an underlying persistent store
- optional association to the Event may be used to
navigate to any other part of the Event - even from an interactive visualization program
- main use speedup data selection for analysis
- Tag data is typically better clustered than the
original data
18Interactive Data Analysis
- Aim OO replacement for PAW
- analysis of ntuple-like data (Tags,
Ntuples, ) - visualisation of data (Histograms, scatter-plot,
Vectors) - fitting of histograms (and other data)
- access to experiment specific data/code
- Maximize flexibility and re-use
- plug-in structure
- careful design with limited source and binary
dependencies - Foresee customization/integration
- allow use from within experiments s/w framework!
19Lizard Interfaces
20Anaphe components
21Scripting (Architectural issue)
- Typical use of scripting is quite different from
programming (reconstruction, analysis, ...) - history go back to where I was before
- repetition/looping - with modifiable parameters
- SWIG to (semi-) automatically create connection
to chosen scripting language - allows flexibility to choose amongst several
scripting languages - Python, Perl, Tcl, Guile, Ruby, (Java)
- Python - OO scripting, no strange !-variables
- other scripting languages possible (through SWIG)
- Can be enhanced and/or replaced by a GUI
- scripting window within GUI application
22Sample session
23Future Enhancements
- Access to other implementations of components
- HBOOK histograms and ntuples (RWN) mostly done
- OpenScientist, ROOT histograms?
- Adding other scripting languages
- Perl , Tcl, cint ?
- Communication with Java tools/packages
- via AIDA
- JAS
- WIRED
24Distributed Computing
- Motivation
- move code to data
- parallel analysis
- Techniques
- services via AI
- late binding
- plug-in architecture
- End-user (Lizard)
- look-and-feel of local analysis
- RD started and first prototype available soon
- CORBA
25The AIDA project
- AIDA project (Abstract Interfaces for Data
Analysis) - Presently active mainly developers from existing
packages - Tony Johnson (JAS)
- Andreas Pfeiffer (Lizard/Anaphe)
- Guy Barrand (OpenScientist )
- Mark Dönszelmann (Wired)
- Developers from LHCb/Gaudi
26AIDA
- Design Interfaces for Data Analysis (in HEP)
- The goals of the AIDA project are to define
abstract interfaces for common physics analysis
tools, such as histograms. The adoption of these
interfaces should make it easier for developers
and users to select to use different tools
without having to learn new interfaces or change
their code. In addition it should be possible to
exchange data (objects) between AIDA compliant
applications. (http//aida.freehep.org) - Open for contributions of any kind
- questions, suggestions, code, implementations
27Motivation
- Minimize coupling between components
- Provide flexibility to interchange
implementations of these interfaces - Allows and try to re-use existing packages
- even across language boundaries
- e.g., C analysis using Java Histograms
- Allow for faster turn-around time
28Components Abstract Interfaces
- User Code uses only Interface classes
- IHistogram1D hist histoFactory-gt
create1D(track quality, 100, 0., 10.) - Actual implementations are selected at run-time
- loading of shared libraries
- No change at all to user code but keep freedom to
choose implementation
29Architectural issueAbstract Interfaces
- Abstract Interfaces
- Only pure virtual methods, inheritance only from
other Abstract Interfaces - Components use other components only through
their Abstract Interface - Defines a kind of a protocol for a component
- Allow each component to develop independently
- reduces maintenance effort significantly
- Maximize flexibility and re-use of packages
- Re-use of existing packages to implement
components reduces start-up time significantly
30Architectural issue Components (I)
- Identify components by functionality
- Define protocol using Abstract Interfaces
- Emphasize separation of different aspects for
each component - Example Histogram
- statistical entity (density distribution of a
physics quantity) - view of a collection of data points (which can
be a density distribution but also a detector
efficiency curve) - command to manipulate/store/plot/fit/...
31Architectural issue Components (II)
- Users view is different from implementors
(developers) view - separate Abstract Interfaces for both aspects
- command-layer vs. implementation-layer(s)
- UserInterface as a separate component
- by definition couples to most of the other
components - Facade pattern
- promotes weak coupling between the other
components - interfaces to scripting and/or GUI
32Initial Categories and dependencies
33Across the languages
- JAida C access to Java libs
- using C proxies implementing the C Abstract
Interfaces to the Java interfaces
34Infrastructure (I)
- Location of repository (Java and C)
- Anonymous CVS access
- pserveranoncvs_at_cvs.freehep.org/cvs/aida
(passwd aida) - module aida
- Release area for C code
- /afs/cern.ch/sw/contrib/AIDA/ltversiongt/AIDA/
- include ltAIDA/IHistogram.hgt
- version x.y.p
- tarballs (C) available in /afs/cern.ch/sw/contri
b/AIDA/tar/AIDA-ltversiongt.tar.gz
35Infrastructure (III)
- Mailing lists (archived)
- ltlistNamegt_at_cern.ch
- project-aida-dev (open)
- project-aida (open)
- project-aida-announce (posting moderated,
subscription open) - Web page (http//aida.freehep.org/)
- Updated automatically from repository
- On web page links to implementations
- from whoever provides one (and informs us)
36Release schedule / plan
- Release for discussion and feedback
- even if not (yet) complete
- V 2.0 mid May 2001 (Boston release)
- C and Java version of the Interfaces
- V 2.1 Aug. 2001 (Genova release)
- updates from discussions at Geant-4 workshop
- fixed some problems with C version
- Aiming for 2-3 month release frequency
37More information
- cern.ch/Anaphe
- cern.ch/Anaphe/Lizard
- aida.freehep.org/
- cern.ch/DB
- wwwinfo.cern.ch/asd/lhc/clhep/
38Use-cases of AIDA
- Java reference implementation in FreeHEP
repository - JAS, OpenScientist and Lizard/Anaphe plan for
implementations of version 2.x by Dec. 2001 - Used by Gaudi/Athena (LHCb, Atlas, Harp)
- Gaudi people involved in design
- Adopted and used in Geant-4 examples/testing
- new category created in Geant-4 for analysis
- No need to go for least common denominator
- use reasonable superset and concentrate on
design
39AIDA - Summary
- Design of Abstract Interfaces for Data Analysis
- Maximize flexibility and re-use
- Allow for faster turn-around time
- Allows for and try to re-use existing packages
- No need to go for least common denominator
- use reasonable superset
- concentrate on proper design