Update on the Goodness of Fit Toolkit - PowerPoint PPT Presentation

1 / 24
About This Presentation
Title:

Update on the Goodness of Fit Toolkit

Description:

Validation of Geant4 physics models through comparison of ... Fluorescence spectrum from Icelandic basalt (Mars-like rock): experimental data and simulation ... – PowerPoint PPT presentation

Number of Views:96
Avg rating:3.0/5.0
Slides: 25
Provided by: maria362
Category:

less

Transcript and Presenter's Notes

Title: Update on the Goodness of Fit Toolkit


1
Update on theGoodness of Fit Toolkit
  • B. Mascialino, A. Pfeiffer, M.G. Pia, A. Ribon,
    P. Viarengo

PHYSTAT 2005 Oxford, 11-15 September 2005
http//www.ge.infn.it/geant4/analysis/HEPstatistic
s http//www.ge.infn.it/statisticaltoolkit
2
Historical background
Validation of Geant4 physics models through
comparison of simulation vs experimental data or
reference databases
3
Some use cases
The test statistics computation concerns the
agreement between the two samples empirical
distribution functions
  • Regression testing
  • Throughout the software life-cycle
  • Online DAQ
  • Monitoring detector behaviour w.r.t. a reference
  • Simulation validation
  • Comparison with experimental data
  • Reconstruction
  • Comparison of reconstructed vs. expected
    distributions
  • Physics analysis
  • Comparisons of experimental distributions (ATLAS
    vs. CMS Higgs?)
  • Comparison with theoretical distributions (data
    vs. Standard Model)

4
G.A.P Cirrone, S. Donadio, S. Guatelli, A.
Mantero, B. Mascialino, S. Parlati, M.G. Pia, A.
Pfeiffer, A. Ribon, P. Viarengo A
Goodness-of-Fit Statistical Toolkit IEEE-
Transactions on Nuclear Science (2004), 51 (5)
2056-2063.
StatisticsTesting-V1-01-00 release downloadable
from the web http//www.ge.infn.it/geant4/analysi
s/HEPstatistics/
5
Vision of the project
  • Basic vision
  • General purpose tool
  • Toolkit approach (choice open to users)
  • Open source product
  • Independent from specific analysis tools
  • Easily usable in analysis and other tools

Clearly define scope, objectives
Software quality
  • Rigorous software process

Flexible, extensible, maintainable system
  • Build on a solid architecture

6
Software process guidelines
  • Adopt a process
  • the key to software quality...
  • Unified Process, specifically tailored to the
    project
  • practical guidance and tools from the RUP
  • both rigorous and lightweight
  • mapping onto ISO 15504 (and CMM)
  • Incremental and iterative life-cycle
  • 1st cycle 2-sample GoF tests
  • 1-sample GoF in preparation

7
Architectural guidelines
  • The project adopts a solid architectural approach
  • to offer the functionality and the quality needed
    by the users
  • to be maintainable over a large time scale
  • to be extensible, to accommodate future
    evolutions of the requirements
  • Component-based architecture
  • to facilitate re-use and integration in diverse
    frameworks
  • layer architecture pattern
  • core component for statistical computation
  • independent components for interface to user
    analysis environments
  • Dependencies
  • no dependence on any specific analysis tool
  • can be used by any analysis tools, or together
    with any analysis tools
  • offer a (HEP) standard (AIDA) for the user layer

8
(No Transcript)
9
(No Transcript)
10
User Layer
  • Simple user layer
  • Shields the user from the complexity of the
    underlying algorithms and design
  • Only deal with the users analysis objects and
    choice of comparison algorithm

11
GoF algorithms (currently implemented)
  • Algorithms for binned distributions
  • Anderson-Darling test
  • Chi-squared test
  • Fisz-Cramer-von Mises test
  • Tiku test (Cramer-von Mises test in chi-squared
    approximation)
  • Algorithms for unbinned distributions
  • Anderson-Darling test
  • Cramer-von Mises test
  • Goodman test (Kolmogorov-Smirnov test in
    chi-squared approximation)
  • Kolmogorov-Smirnov test
  • Kuiper test
  • Tiku test (Cramer-von Mises test in chi-squared
    approximation)

12
Recent extensions algorithms
  • Fisz-Cramer-von Mises test and Anderson-Darling
    test
  • exact asymptotic distribution (earlier critical
    values)
  • Tiku test
  • Cramer-von Mises test in a chi-squared
    approximation
  • New tests weighted Kolmogorov-Smirnov, weighted
    Cramer-von Mises
  • various weighting functions available in
    literature
  • In preparation
  • Watson test (can be applied in case of cyclic
    observations, like Kuiper test)
  • Girone test
  • It is the most complete software for the
    comparison of two distributions, even among
    commercial/professional statistics tools
  • goal provide all 2-sample GoF algorithms
    existing in statistics literature
  • Publication in preparation to describe the new
    algorithms

13
Recent extensions user layer
  • First release user layer for AIDA analysis
    objects
  • LCG Architecture Blueprint, Geant4 requirement
  • July 2005 added user layer for ROOT histograms
  • in response to user requirements
  • Other user layer implementations foreseen
  • easy to add
  • sound architecture decouples the mathematical
    component and the users representation of
    analysis objects
  • different requirements from various user
    communities satisfy them without introducing
    dependencies on any analysis tools

14
Software release
  • Releases are publicly downloadable from the web
  • code, documentation etc.
  • For the convenience of LCG users, releases are
    also distributed with LCG AA software as
    external contributions
  • Also ported to Java, distributed with JAS
  • Release with new algorithms planned in autumn
  • publication on recent extensions
  • Releases include extensive user documentation
  • statistics algorithms
  • how to use the software
  • The project is systematically accompanied by
    publications on refereed journals to document the
    recognition of its scientific value

15
Usage
  • Geant4 physics validation
  • rigorous approach quantitative evaluation of
    Geant4 physics models with respect to established
    reference data
  • see for instance K. Amako et al., Comparison of
    Geant4 electromagnetic physics models against the
    NIST reference dataIEEE Trans. Nucl. Sci. 52-
    4 (2005) 910-918
  • LCG Simulation Validation project
  • see for instance A. Ribon, Testing Geant4 with a
    simplified calorimeter setup, http//www.ge.infn.i
    t/geant4/events/july2005
  • CMS
  • validation of new histograms w.r.t. reference
    ones in OSCAR Validation Suite
  • Usage also in space science, medicine etc.

16
Power of GoF tests
  • Do we really need such a wide collection of GoF
    tests? Why?
  • Which is the most appropriate test to compare two
    distributions?
  • How good is a test at recognizing real
    equivalent distributions and rejecting fake ones?

Which test to use?
17
Systematic study of GoF tests
  • No comprehensive study of the relative power of
    GoF tests exists in literature
  • novel research in statistics (not only in physics
    data analysis!)
  • Systematic study of all existing GoF tests in
    progress
  • made possible by the extensive collection of
    tests in the Statistical Toolkit
  • Provide guidance to the users based on sound
    quantitative arguments
  • Preliminary results available
  • Publication in preparation

18
Method for the evaluation of power
Pseudoexperiment a random drawing of two
samples from two parent distributions
N1000 Monte Carlo replicas
For each test, the p-value computed by the GoF
Toolkit derives from the analytical calculation
of the asymptotic distribution, often depending
on the samples sizes
19
Parent distributions
Also Breit-Wigner, other distributions being
considered
20
Characterization of distributions
Skewness
Tailweight
21
Case Parent1 Parent 2
The location-scale problem
Preliminary
Kolmogorov-Smirnov test CL 0.05
The power increases with the sample size
(analytical calculation of the asymptotic
distribution)
Power
small size samples
moderate size samples
N sample
22
Case Parent1 ? Parent 2
Preliminary
The general shape problem
A) Symmetric distributions
(S1 S2 1)
For short/medium tailed distributions
For long tailed distributions
B) Skewed versus symmetric distributions
T2
23
Comparative evaluation of tests
Preliminary
Tailweight
Skewness
24
Preliminary results
  • No clear winner for all the considered
    distributions in general
  • the performance of a test depends on its
    intrinsic features as well as on the features of
    the distributions to be compared
  • Practical recommendations
  • first classify the type of the distributions in
    terms of skewness and tailweight
  • choose the most appropriate test given the type
    of distributions
  • Systematic study of the power in progress
  • for both binned and unbinned distributions
  • Topic still subject to research activity in the
    domain of statistics
  • Publication in preparation

25
Outlook
  • 1-sample GoF tests (comparison w.r.t. a function)
  • Comparison of two/multi-dimensional distributions
  • Systematic study of the power of GoF tests
  • Goal to provide an extensive set of algorithms so
    far published in statistics literature, with a
    critical evaluation of their relative strengths
    and applicability
  • Treatment of errors, filtering
  • New release coming soon
  • New papers in preparation
  • Other components beyond GoF? Suggestions are
    welcome

26
Conclusions
  • A novel, complete software software toolkit for
    statistical analysis is being developed
  • rich set of algorithms
  • rigorous architectural design
  • rigorous software process
  • A systematic study of the power of GoF tests is
    in progress
  • unexplored area of research
  • Application in various domains
  • Geant4, HEP, space science, medicine
  • Feedback and suggestions are very much
    appreciated
  • The project is open to developers interested in
    statistical methods
Write a Comment
User Comments (0)
About PowerShow.com