Update on the Goodness of Fit Toolkit - PowerPoint PPT Presentation

1 / 24

About This Presentation

Title:

Update on the Goodness of Fit Toolkit

Description:

Validation of Geant4 physics models through comparison of ... Fluorescence spectrum from Icelandic basalt (Mars-like rock): experimental data and simulation ... – PowerPoint PPT presentation

Number of Views:96

Avg rating:3.0/5.0

Slides: 25

Provided by: maria362

Category:

more less

Transcript and Presenter's Notes

Title: Update on the Goodness of Fit Toolkit

1
Update on theGoodness of Fit Toolkit

B. Mascialino, A. Pfeiffer, M.G. Pia, A. Ribon,
P. Viarengo

PHYSTAT 2005 Oxford, 11-15 September 2005
http//www.ge.infn.it/geant4/analysis/HEPstatistic
s http//www.ge.infn.it/statisticaltoolkit
2
Historical background
Validation of Geant4 physics models through
comparison of simulation vs experimental data or
reference databases
3
Some use cases
The test statistics computation concerns the
agreement between the two samples empirical
distribution functions

Regression testing
Throughout the software life-cycle
Online DAQ
Monitoring detector behaviour w.r.t. a reference
Simulation validation
Comparison with experimental data
Reconstruction
Comparison of reconstructed vs. expected
distributions
Physics analysis
Comparisons of experimental distributions (ATLAS
vs. CMS Higgs?)
Comparison with theoretical distributions (data
vs. Standard Model)

4
G.A.P Cirrone, S. Donadio, S. Guatelli, A.
Mantero, B. Mascialino, S. Parlati, M.G. Pia, A.
Pfeiffer, A. Ribon, P. Viarengo A
Goodness-of-Fit Statistical Toolkit IEEE-
Transactions on Nuclear Science (2004), 51 (5)
2056-2063.
StatisticsTesting-V1-01-00 release downloadable
from the web http//www.ge.infn.it/geant4/analysi
s/HEPstatistics/
5
Vision of the project

Basic vision
General purpose tool
Toolkit approach (choice open to users)
Open source product
Independent from specific analysis tools
Easily usable in analysis and other tools

Clearly define scope, objectives
Software quality

Rigorous software process

Flexible, extensible, maintainable system

Build on a solid architecture

6
Software process guidelines

Adopt a process
the key to software quality...
Unified Process, specifically tailored to the
project
practical guidance and tools from the RUP
both rigorous and lightweight
mapping onto ISO 15504 (and CMM)
Incremental and iterative life-cycle
1st cycle 2-sample GoF tests
1-sample GoF in preparation

7
Architectural guidelines

The project adopts a solid architectural approach
to offer the functionality and the quality needed
by the users
to be maintainable over a large time scale
to be extensible, to accommodate future
evolutions of the requirements
Component-based architecture
to facilitate re-use and integration in diverse
frameworks
layer architecture pattern
core component for statistical computation
independent components for interface to user
analysis environments
Dependencies
no dependence on any specific analysis tool
can be used by any analysis tools, or together
with any analysis tools
offer a (HEP) standard (AIDA) for the user layer

8
(No Transcript)
9
(No Transcript)
10
User Layer

Simple user layer
Shields the user from the complexity of the
underlying algorithms and design
Only deal with the users analysis objects and
choice of comparison algorithm

11
GoF algorithms (currently implemented)

Algorithms for binned distributions
Anderson-Darling test
Chi-squared test
Fisz-Cramer-von Mises test
Tiku test (Cramer-von Mises test in chi-squared
approximation)
Algorithms for unbinned distributions
Anderson-Darling test
Cramer-von Mises test
Goodman test (Kolmogorov-Smirnov test in
chi-squared approximation)
Kolmogorov-Smirnov test
Kuiper test
Tiku test (Cramer-von Mises test in chi-squared
approximation)

12
Recent extensions algorithms

Fisz-Cramer-von Mises test and Anderson-Darling
test
exact asymptotic distribution (earlier critical
values)
Tiku test
Cramer-von Mises test in a chi-squared
approximation
New tests weighted Kolmogorov-Smirnov, weighted
Cramer-von Mises
various weighting functions available in
literature
In preparation
Watson test (can be applied in case of cyclic
observations, like Kuiper test)
Girone test
It is the most complete software for the
comparison of two distributions, even among
commercial/professional statistics tools
goal provide all 2-sample GoF algorithms
existing in statistics literature
Publication in preparation to describe the new
algorithms

13
Recent extensions user layer

First release user layer for AIDA analysis
objects
LCG Architecture Blueprint, Geant4 requirement
July 2005 added user layer for ROOT histograms
in response to user requirements
Other user layer implementations foreseen
easy to add
sound architecture decouples the mathematical
component and the users representation of
analysis objects
different requirements from various user
communities satisfy them without introducing
dependencies on any analysis tools

14
Software release

Releases are publicly downloadable from the web
code, documentation etc.
For the convenience of LCG users, releases are
also distributed with LCG AA software as
external contributions
Also ported to Java, distributed with JAS
Release with new algorithms planned in autumn
publication on recent extensions
Releases include extensive user documentation
statistics algorithms
how to use the software
The project is systematically accompanied by
publications on refereed journals to document the
recognition of its scientific value

15
Usage

Geant4 physics validation
rigorous approach quantitative evaluation of
Geant4 physics models with respect to established
reference data
see for instance K. Amako et al., Comparison of
Geant4 electromagnetic physics models against the
NIST reference dataIEEE Trans. Nucl. Sci. 52-
4 (2005) 910-918
LCG Simulation Validation project
see for instance A. Ribon, Testing Geant4 with a
simplified calorimeter setup, http//www.ge.infn.i
t/geant4/events/july2005
CMS
validation of new histograms w.r.t. reference
ones in OSCAR Validation Suite
Usage also in space science, medicine etc.

16
Power of GoF tests

Do we really need such a wide collection of GoF
tests? Why?
Which is the most appropriate test to compare two
distributions?
How good is a test at recognizing real
equivalent distributions and rejecting fake ones?

Which test to use?
17
Systematic study of GoF tests

No comprehensive study of the relative power of
GoF tests exists in literature
novel research in statistics (not only in physics
data analysis!)
Systematic study of all existing GoF tests in
progress
made possible by the extensive collection of
tests in the Statistical Toolkit
Provide guidance to the users based on sound
quantitative arguments
Preliminary results available
Publication in preparation

18
Method for the evaluation of power
Pseudoexperiment a random drawing of two
samples from two parent distributions
N1000 Monte Carlo replicas
For each test, the p-value computed by the GoF
Toolkit derives from the analytical calculation
of the asymptotic distribution, often depending
on the samples sizes
19
Parent distributions
Also Breit-Wigner, other distributions being
considered
20
Characterization of distributions
Skewness
Tailweight
21
Case Parent1 Parent 2
The location-scale problem
Preliminary
Kolmogorov-Smirnov test CL 0.05
The power increases with the sample size
(analytical calculation of the asymptotic
distribution)
Power
small size samples
moderate size samples
N sample
22
Case Parent1 ? Parent 2
Preliminary
The general shape problem
A) Symmetric distributions
(S1 S2 1)
For short/medium tailed distributions
For long tailed distributions
B) Skewed versus symmetric distributions
T2
23
Comparative evaluation of tests
Preliminary
Tailweight
Skewness
24
Preliminary results

No clear winner for all the considered
distributions in general
the performance of a test depends on its
intrinsic features as well as on the features of
the distributions to be compared
Practical recommendations
first classify the type of the distributions in
terms of skewness and tailweight
choose the most appropriate test given the type
of distributions
Systematic study of the power in progress
for both binned and unbinned distributions
Topic still subject to research activity in the
domain of statistics
Publication in preparation

25
Outlook

1-sample GoF tests (comparison w.r.t. a function)
Comparison of two/multi-dimensional distributions
Systematic study of the power of GoF tests
Goal to provide an extensive set of algorithms so
far published in statistics literature, with a
critical evaluation of their relative strengths
and applicability
Treatment of errors, filtering
New release coming soon
New papers in preparation
Other components beyond GoF? Suggestions are
welcome

26
Conclusions

A novel, complete software software toolkit for
statistical analysis is being developed
rich set of algorithms
rigorous architectural design
rigorous software process
A systematic study of the power of GoF tests is
in progress
unexplored area of research
Application in various domains
Geant4, HEP, space science, medicine
Feedback and suggestions are very much
appreciated
The project is open to developers interested in
statistical methods