Title: Xavier Prudent LAPP
1Andreas Höcker (ATLAS), Kai Voss (ATLAS), Helge
Voss (LHCb), Jörg Stelzer (BaBar), Peter
Speckmayer (CERN)
Xavier Prudent - LAPP BaBar Collaboration Meeting
June 2006 - Montreal
2Multi-variable analysis widely used in HEP
(LEP, BaBar, Belle, D0, MiniBooNE, )
Common Reproaches to Multi-Variable Methods
In case of correlations cuts are not transparent
anymore
Black box methods
Training sample may not describe correctly the
data
Creates no bias, only bad performance
Need a control sample
Systematics ?
Independent uneasy implementations
Need of a global tool that would provide with
the most common MV methods do both the training
and evaluation of these methods enable easy
computation of systematics
3TMVA means Toolkit for Multivariate Analysis
Root package written by Andreas Höcker, Kai Voss,
Helge Voss, Jörg Stelzer, Peter Speckmayer for
the evaluation of MV methods in parallel of an
analysis
MV Methods available so far Rectangular cut
optimization Correlated Likelihood estimator
(PDE) Multi-dimensional likelihood estimator
(PDE) Fisher Mahalanobis discriminant H-Matrix
(?2 estimator) Neural Network (2 different
implementations) Boosted decision tree
TMVA provide training, testing evaluation of
these methods A dedicated class enables to plug
the training results in your favorite analysis
4Cut Optimization
? Scan in signal efficiency for highest
background rejection
- PDE approach, generalization to multi dimension
likelihood, - Output transformed by an inverse Fermi function
(?less peaked) - De-correlation possible with the square root of
the covariance matrix
Correlated de-correlated likelihood
Fisher discriminant and H-matrix
? Classical definitions
Neural Network
- 2 NNs, both multi-layer perceptrons with
stochastic learning - Clermont Ferrand ANN (used for ALEPH Higgs
analysis) - TMultiLayerPerceptron (ANN from ROOT)
- Inspired from MiniBooNE
- Sequential applications of cuts
(Boosted) Decision trees
5Training
What is a Boosted Decision Tree ?
Each event has a weight Wi ( 1 to start)
S/B 52/48
6How to get and use TMVA ?
7- How to download TMVA ?
- Get a tgz file from TMVA website
http//tmva.sourceforge.net then click on - Via cvs cvs z3 dpserveranonymous_at_cvs.source
forge.net/cvsroot/tmva co P TMVA
Download
? Automatic creation of 6 directories
src/ Source for TMVA library example/ example of
how to use TMVA lib/ TMVA library once
compiled reader/ all functionalities to apply MV
weights macros/ root macro to display results
development/ working testing directories
For your own analysis gt cp example myTMVA ?
Modify makefile for compilation in /myTMVA
8Detailed Steps for the Example
How to compile TMVA ?
Include TMVA/lib in PATH
/home cd TMVA /home/TMVA source setup.csh
/home/TMVA cd src/ /home/TMVA/src make
Compile the librairy ? libTMVA.so
9How to choose the MV method I want ?
Go to exemple/ directory and open
TMVAnalysis.cpp
/home/TMVA/src cd ../examples
You will find a list of available methods
(Booleans) Switch to 1/0 the method you
want/dont want
Bool_t Use_Cuts 1 Bool_t
Use_Likelihood 0 Bool_t
Use_LikelihoodD 0 Bool_t Use_PDERS
0 Bool_t Use_HMatrix
0 Bool_t Use_Fisher 1 Bool_t
Use_CFMlpANN 1 Bool_t Use_TMlpANN
0 Bool_t Use_BDT_GiniIndex 0
Bool_t Use_BDT_CrossEntro 0 Bool_t
Use_BDT_SdivStSpB 0 Bool_t Use_BDT_MisClass
0
You just have to switch on or off the Booleans
! Here for instance I will compare Cuts, Fisher,
Neural Net CFM
10How to point to TMVA the training samples
variables ?
In TMVAnalysis.cpp
Both ascii or root files can be used as input
Creation of the factory object
How to point the input ascii files
How to point the variables (example with 4
variables)
In /examples/data toy_sig.dat bkg_toy.dat
11How to change training options ?
In TMVAnalysis.cpp
Training cycles, hidden layer,
neurons per layer
factory-gtPreparTrainigAndTestTree( mycut,
2000, 4000 )
events used
training
testing
? Description of every options in class
BookMethod
12How to I run TMVA ?
/home/TMVA/src cd ../examples /home/TMVA/examples
make /home/TMVA/examples TMVAnalysis
myOutput.root
Name of output root file
What does it create ?
Some weights files for each trained MV methods in
weight/ A root file in main directory with the MV
outputs and efficiencies
How to look at the results ?
Use the nice ROOT macros in directory macros/
/home/TMVA/examples root l /home/TMVA/examples
.L ../macros/efficiencies.C /home/TMVA/examples
efficiencies("MyOutput.root")
Plots created in directory plots/
13Which ROOT macros are available ? (1)
variables.C
? Distributions of input variables
14Which ROOT macros are available ? (2)
correlations.C
? Colored correlation matrix of input
variables Numeric values displayed during TMVA
running
15Which ROOT macros are available ? (3)
mvas.C
? Outputs of MV methods
16Which ROOT macros are available ? (4)
efficiencies.C
? Background rejection vs. Signal
efficiency Direct comparison of all MV methods !
17I have trained the MV method I want I have the
weight files
How to use this MV method in my analysis ?
TMVA/reader/TMVApplication.cpp
Detailed example is Dedicated class The next
slide shows what must be included in your
analysis program Work in progress (being
implemented in ROOT) thus possible differences
with later version
reader/TMVA_reader.hh
18include TMVA_reader.h using TMVAppTMVA_Reader
Void MyAnalysis() vectorltstringgt inputVars
inputVars.push_back( "var1" )
inputVars.push_back( "var2" )
inputVars.push_back( "var3" )
inputVars.push_back( "var4" ) TMVA_Reader
tmva new TMVA_Reader( inputVars )
tmva-gtBookMVA(TMVA_ReaderFisher,
TMVAnalysis_Fisher.weights") vectorltdoublegt
varValues varValues.push_back( var1 )
varValues.push_back( var2 ) varValues.push_back
( var3 ) varValues.push_back( var4 ) double
mvaFi tmva-gtEvaluateMVA( varValues,
TMVA_ReaderFisher ) delete tmva
19TMVA is already used by several AWG in BaBar
Group Dalitz Charmless UK TMVA Fisher for
continuum rejection in the Dalitz-plot analyzes
of Kspp- and Kp-p ( BADs 1376 and 1512 ).
Use of 11 input variables, pictures taken from
BAD 1376
20Group D0h0 TMVA ClmF NN for continuum rejection
in the measurement of the BFs of the color
suppressed modes B0 ? D0h0 (h0 ?, ?, ?, ?, p0)
and in the measurement of CKM ß angle
Use of 4 input variables
21Measurement of sin(2a) with B??p ? Uses
Clermont-Ferrand NN to get rid of combinatory
background
Measurement of CKM angle ? with GLW method
(Emmanuel Latour LLR) ? Uses Fisher to get rid
of combinatory background
Signal MC signal B ?DK, D?D0p0,
D0?Kp Background MCs udsc
22What to keep in mind about TMVA ?
A powerful Multivariate toolkit with 12 different
methods (more are coming) User friendly package
from training to plots ! Already used in
BaBar Comparison possible easy between the
different MV methods C Root functionalities,
announced in ROOT version V5-11-06
http//root.cern.ch/
Have a look at http//tmva.sourceforge.net/
!! Talk by Kai Voss at CERN http//agenda.cern.ch/
askArchive.php?baseagendacatega057207ida05720
7s27t6/transparencies TMVA Tutorial https//twiki.
cern.ch/twiki/bin/view/Atlas/AnalysisTutorial1105
TMVA_Multi_Variate_Data_Analysis Physics analysis
HN advertisement http//babar-hn.slac.stanford.edu
5090/HyperNews/get/physAnal/2989.html
A similar tool has been developed by Ilya Narsky
( StatPatternRecognition )
23Back Up Slides
24Available Options For Every Methods in
TMVAnalysis.cpp
Rectangular cut optimization Correlated
Likelihood estimator (PDE) Multi-dimensional
likelihood estimator (PDE) Fisher Mahalanobis
discriminant H-Matrix (?2 estimator) Neural
Network (2 different implementations) Boosted
decision tree
25Rectangular cuts
factory-gtBookMethod( "MethodCuts", Method nBin
OptionVar1 OptionVarn" )
TMVA method
bins in the hist of efficiency S/B
Method of cut - "MC" Monte Carlo
optimization (recommended) - "FitSel" Minuit
Fit "Fit_Migrad" or "Fit_Simplex" - "FitPDF"
PDF-based only useful for uncorrelated input
variables
Option for each variables - "FMax" ForceMax
(the max cut is fixed to maximum of variable i)
- "FMin" ForceMin (the min cut is fixed to
minimum of variable i) - "FSmart" ForceSmart
(the min or max cut is fixed to min/max, based on
mean value) - Adding "All" to "option_vari", eg,
"AllFSmart" will use this option for all
variables - if "option_vari" is empty ( ""),
no assumptions on cut min/max are made
26Likelihood
factory-gtBookMethod( "MethodLikelihood",
TypeOfSpline NbSmooth NbBin Decorr")
TMVA method
Which spline is used for smoothing the
pdfs Splinei i1,2,3,5
How often the input histos are smoothed
Average num of events per PDF bin to trigger
warning
Option for decorrelation or not - NoDecorr
do not use square-root-matrix to decorrelate
variable space - Decorr decorrelate variable
space
27Fisher Discriminant and H matrix
factory-gtBookMethod( "MethodFisher", "Fisher" )
TMVA method
- Which method
- Fisher
- Mahalanobis (another definition of distance)
factory-gtBookMethod( "MethodHMatrix" )
TMVA method
28Artificial Neural Network
factory-gtBookMethod( WhichANN,
NbCyclesNeuronsL1NeuronsL2NeuronsLn" )
- Which type of NN
- MethodCFMlpANN
- Clermond Ferrand NN, used for Higgs search in
ALEPH - MethodTMlpANN ROOT NN
Number of training cycles
Number of neurons in each layer The 1st layer
has necessarily as many neurons as input variables
29Boosted Decision Trees
TMVA method
factory-gtBookMethod( "MethodBDT", nTree
BoostType SeparationType nEvtMin
MaxNodePurity nCuts)
Number or trees
- AdaBoost - EpsilonBoost
Method of boosting
- Method for evaluating the misclassification
- GiniIndex
- CrossEntropy
- SdivSqrtSplusB
- MisClassificationError
Minimum Number of events in a node (leaf criteria)
Higher bound for leave or intermediate node
Number of steps in the optimization of the cut
for a node