TMVA Toolkit for Multivariate Data Analysis - PowerPoint PPT Presentation

About This Presentation

Title:

TMVA Toolkit for Multivariate Data Analysis

Description:

bagging (re-sampling with replacement) random weights. Boosted Decision Trees. Remark: bagging/boosting create a basis of classifiers ... – PowerPoint PPT presentation

Number of Views:30

Avg rating:3.0/5.0

Slides: 34

Provided by: helg155

Category:

more less

Transcript and Presenter's Notes

Title: TMVA Toolkit for Multivariate Data Analysis

1
TMVA Toolkit for Multivariate Data Analysis with
ROOT
Helge Voss, MPI-K Heidelberg on behalf of
Andreas Höcker, Fredrik Tegenfeld, Joerg Stelzer

Supply an environment to easily
apply different sophisticated data selection
algorithms
have them all trained, tested and evaluated
find the best one for your selection problem

and contributors A.Christov, S.Henrot-Versillé,
M.Jachowski, A.Krasznahorkay Jr., Y.Mahalalel,
X.Prudent, P.Speckmayer, M.Wolter, A.Zemla
http//tmva.sourceforge.net/ arXiv
physics/0703039
2
Motivation/Outline

ROOT is the analysis framework used by most
(HEP)-physicists
Idea rather than just implementing new MVA
techniques and making them somehow available in
ROOT (i.e. like TMulitLayerPercetron does)
have one common platform/interface for all MVA
classifiers
easy to use and compare different MVA classifiers
train/test on same data sample and evaluate
consistently

Outline
introduction
the MVA classifiers available in TMVA
demonstration with toy examples
summary

3
Multivariate Event Classification

All multivariate classifiers condense
(correlated) multi-variable input information
into a single scalar output variable Rn
? R

y(Bkg) ? 0 y(Signal) ? 1
One variable to base your decision on

4
What is in TMVA

TMVA currently includes
Rectangular cut optimisation
Projective and Multi-dimensional likelihood
estimator
Fisher discriminant and H-Matrix (?2 estimator)
Artificial Neural Network (3 different
implementations)
Boosted/bagged Decision Trees
Rule Fitting
Support Vector Machines

all classifiers are highly customizable

common pre-processing of input de-correlation,
principal component analysis

support of arbitrary pre-selections and
individual event weights

TMVA package provides training, testing and
evaluation of the classifiers

each classifier provides a ranking of the input
variables

classifiers produce weight files that are read by
reader class for MVA application

integrated in ROOT(since release 5.11/03) and
very easy to use!

5
Preprocessing the Input Variables Decorrelation

Commonly realised for all methods in TMVA
(centrally in DataSet class)

Removal of linear correlations by rotating
variables
using the square-root of the correlation matrix
using the Principal Component Analysis

SQRT derorr.
PCA derorr.
original

Note that this de-correlation is only complete,
if
input variables are Gaussians
correlations linear only
in practise gain form de-correlation often
rather modest or even harmful ?

6
Cut Optimisation

Simplest method cut in rectangular volume using

scan in signal efficiency 0 ?1 and maximise
background rejection
from this scan, the optimal working point in
terms if S,B numbers can be derived

Technical problem how to perform optimisation
TMVA uses random sampling, Simulated Annealing
or Genetics Algorithm
speed improvement in volume search
? training events are sorted in Binary Seach
Trees

do this in normal variable space or de-correlated
variable space

7
Projected Likelihood Estimator (PDE Appr.)

Combine probability from different variables for
an event to be signal or background like

Optimal if no correlations and PDFs are correct
(known)
usually it is not true ? development of
different methods

Technical problem how to implement reference
PDFs
3 ways counting, function fitting ,
parametric fitting (splines, kernel estimators.)

8
Multidimensional Likelihood Estimator

Generalisation of 1D PDE approach to Nvar
dimensions

Optimal method in theory if true N-dim PDF
were known

Practical challenges
derive N-dim PDF from training sample

x2
S

TMVA implementation Range search PDERS
count number of signal and background events in
vicinity of a data event ? fixed size or
adaptive (latter one kNN-type classifiers)

test event
B
x1

volumes can be rectangular or spherical

use multi-D kernels (Gaussian, triangular, )
to weight events within a volume

speed up range search by sorting training
events in Binary Trees

Carli-Koblitz, NIM A501, 576 (2003)
9
Fisher Discriminant (and H-Matrix)

Well-known, simple and elegant classifier
determine linear variable transformation where
linear correlations are removed
mean values of signal and background are pushed
as far apart as possible

the computation of Fisher response is very
simple
linear combination of the event variables
Fisher coefficients

Fisher coefficients
10
Artificial Neural Network (ANN)

Get a non-linear classifier response by giving
linear combination of input variables to nodes
with non-linear activation

Nodes (or neurons) and arranged in series
? Feed-Forward Multilayer Perceptrons (3
different implementations in TMVA)

(Activation function)

Training adjust weights using known event such
that signal/background are best separated

11
Decision Trees
Decision Trees

sequential application of cuts which splits
the data into nodes, and the final nodes (leaf)
classifies an event as signal or background

Training growing a decision tree

Start with Root node
Split training sample according to cut on best
variable at this node
Splitting criterion e.g., maximum Gini-index
purity ? (1 purity)
Continue splitting until min. number of events or
max. purity reached

Classify leaf node according to majority of
events, or give weight unknown test events are
classified accordingly

Decision tree after pruning
Decision tree before pruning

Bottom up Pruning
remove statistically insignificant nodes ?
avoid overtraining

12
Boosted Decision Trees
Boosted Decision Trees

Decision Trees well know since a long time but
hardly used in HEP (although very similar to
simple Cuts)
Disatvantage instability small changes in
training sample can give large changes in tree
structure

Boosted Decision Trees (1996) combine
several decision trees forest
classifier output is the (weighted) majority
vote of individual trees
trees derived from same training sample with
different event weights
e.g. AdaBoost wrong classified training
events are given a larger weight
bagging (re-sampling with replacement) ?random
weights

Remark bagging/boosting ? create a basis of
classifiers
final classifier is a linear combination of
base classifiers

13
Rule Fitting(Predictive Learning via Rule
Ensembles)

Following RuleFit from Friedman-Popescu

Friedman-Popescu, Tech Rep, Stat. Dpt, Stanford
U., 2003

Classifier is a linear combination of simple base
classifiers
that are called rules and are here
sequences of cuts

The procedure is
create the rule ensemble ? created from a set of
decision trees
fit the coefficients ? Gradient directed
regularization (Friedman et al)

14
Support Vector Machines

Find hyperplane that best separates signal from
background
best separation maximum distance between
closest events (support) to hyperplane
linear decision boundary

Non linear cases
transform the variables in higher dimensional
feature space where linear boundary
(hyperplanes) can separate the data
transformation is done implicitly using Kernel
Functions that effectively introduces a metric
for the distance measures that mimics the
transformation
Choose Kernel and fit the hyperplane

x1
Available Kernels Gaussian,
Polynomial,
Sigmoid
x1
15
A Complete Example Analysis
void TMVAnalysis( ) TFile outputFile
TFileOpen( "TMVA.root", "RECREATE" )
TMVAFactory factory new TMVAFactory(
"MVAnalysis", outputFile,"!V") TFile input
TFileOpen("tmva_example.root") TTree
signal (TTree)input-gtGet("TreeS")
TTree background (TTree)input-gtGet("TreeB")
factory-gtAddSignalTree ( signal, 1. )
factory-gtAddBackgroundTree( background, 1.)
factory-gtAddVariable("var1var2", 'F')
factory-gtAddVariable("var1-var2", 'F')
factory-gtAddVariable("var3", 'F')
factory-gtAddVariable("var4", 'F')
factory-gtPrepareTrainingAndTestTree("",
"NSigTrain3000NBkgTrain3000SplitModeRandom!V
" ) factory-gtBookMethod(
TMVATypeskLikelihood, "Likelihood",

"!V!TransformOutputSpline2NSmooth5NAvEvtPerB
in50" ) factory-gtBookMethod(
TMVATypeskMLP, "MLP", "!VNCycles200HiddenLa
yersN1,NTestRate5" ) factory-gtTrainAllMet
hods() factory-gtTestAllMethods()
factory-gtEvaluateAllMethods()
outputFile-gtClose() delete factory
16
Example Application
void TMVApplication( ) TMVAReader
reader new TMVAReader("!Color")
Float_t var1, var2, var3, var4
reader-gtAddVariable( "var1var2", var1 )
reader-gtAddVariable( "var1-var2", var2 )
reader-gtAddVariable( "var3", var3 )
reader-gtAddVariable( "var4", var4 )
reader-gtBookMVA( "MLP method",
"weights/MVAnalysis_MLP.weights.txt" ) TFile
input TFileOpen("tmva_example.root")
TTree theTree (TTree)input-gtGet("TreeS")
Float_t userVar1, userVar2 theTree-gtSetBranchA
ddress( "var1", userVar1 )
theTree-gtSetBranchAddress( "var2", userVar2 )
theTree-gtSetBranchAddress( "var3", var3 )
theTree-gtSetBranchAddress( "var4", var4 )
for (Long64_t ievt3000 ievtlttheTree-gtGetEntries(
)ievt) theTree-gtGetEntry(ievt)
var1 userVar1 userVar2 var2 userVar1
- userVar2 cout ltlt reader-gtEvaluateMVA(
"MLP method" ) ltltendl delete
reader
17
A purely academic Toy example

Use data set with 4 linearly correlated Gaussian
distributed variables

--------------------------------------- Rank
Variable Separation --------------------------
-------------    1 var3
3.834e02     2 var2       3.062e02
   3 var1       1.097e02
4 var0     5.818e01 --------------------
-------------------
18
Validating the classifiers
Validating the Classifier Training

Projective likelihood PDFs, MLP training, BDTs,
....

average no. of nodes before/after pruning 4193 /
968
19
Classifier Output
The Output

TMVA output distributions

Fisher
PDERS
Likelihood
correlations removed
due to correlations
Neural Network
Boosted Decision Trees
Rule Fitting
20
Evaluation Output
The Output

TMVA output distributions for Fisher, Likelihood,
BDT and MLP

For this case Fisher discriminant provides the
theoretically best possible method ? Same as
de-correlated Likelihood
Cuts and Likelihood w/o de-correlation are
inferior
Note About All Realistic Use Cases are Much More
Difficult Than This One
21
Evaluation Output (taken from TMVA printout)
Evaluation results ranked by best signal
efficiency and purity (area) ---------------------
--------------------------------------------------
------- MVA Signal efficiency at
bkg eff. (error) Sepa- Signifi- Methods
_at_B0.01 _at_B0.10 _at_B0.30 Area
ration cance ----------------------------------
-------------------------------------------- Fishe
r 0.268(03) 0.653(03) 0.873(02)
0.882 0.444 1.189 MLP
0.266(03) 0.656(03) 0.873(02) 0.882 0.444
1.260 LikelihoodD 0.259(03) 0.649(03)
0.871(02) 0.880 0.441 1.251 PDERS
0.223(03) 0.628(03) 0.861(02) 0.870
0.417 1.192 RuleFit 0.196(03)
0.607(03) 0.845(02) 0.859 0.390
1.092 HMatrix 0.058(01) 0.622(03)
0.868(02) 0.855 0.410 1.093 BDT
0.154(02) 0.594(04) 0.838(03) 0.852
0.380 1.099 CutsGA 0.109(02)
1.000(00) 0.717(03) 0.784 0.000
0.000 Likelihood 0.086(02) 0.387(03)
0.677(03) 0.757 0.199 0.682 --------------
--------------------------------------------------
-------------- Testing efficiency compared to
training efficiency (overtraining
check) -------------------------------------------
----------------------------------- MVA
Signal efficiency from test sample (from
traing sample) Methods _at_B0.01
_at_B0.10 _at_B0.30 ---------------------
--------------------------------------------------
------- Fisher 0.268 (0.275)
0.653 (0.658) 0.873 (0.873) MLP
0.266 (0.278) 0.656 (0.658) 0.873
(0.873) LikelihoodD 0.259 (0.273)
0.649 (0.657) 0.871 (0.872) PDERS
0.223 (0.389) 0.628 (0.691) 0.861
(0.881) RuleFit 0.196 (0.198)
0.607 (0.616) 0.845 (0.848) HMatrix
0.058 (0.060) 0.622 (0.623) 0.868
(0.868) BDT 0.154 (0.268)
0.594 (0.736) 0.838 (0.911) CutsGA
0.109 (0.123) 1.000 (0.424) 0.717
(0.715) Likelihood 0.086 (0.092)
0.387 (0.379) 0.677 (0.677) -----------------
--------------------------------------------------
----------
Better classifier
Check for over-training
22
More Toys Circular correlations
More Toys Linear-, Cross-, Circular Correlations

Illustrate the behaviour of linear and nonlinear
classifiers

Circular correlations (same for signal and
background)
23
Illustustration Events weighted by MVA-response
Weight Variables by Classifier Performance

Example How do classifiers deal with the
correlation patterns ?

Linear Classifiers
Fisher
Likelihood
decorrelated Likelihood
Non Linear Classifiers
Decision Trees
PDERS
24
Final Classifier Performance
Final Classifier Performance

Background rejection versus signal efficiency
curve

Circular Example
25
More Toys Schachbrett (chess board)
Event Distribution

Performance achieved without parameter
adjustments
PDERS and BDT are best out of the box

After some parameter tuning, also SVM und
ANN(MLP) perform

Theoretical maximum
Events weighted by SVM response
26
TMVA-Users Guide
We (finally) have a Users Guide !
Available from tmva.sf.net
TMVA Users Guide 78pp, incl. code examples
arXiv physics/0703039
27
Summary

TMVA unifies highly customizable and performing
multivariate classification algorithms in a
single user-friendly framework

This ensures most objective classifier
comparisons and simplifies their use

TMVA is available from tmva.sf.net and in ROOT
(gt5.11/03)

A typical TMVA analysis requires user interaction
with a Factory (for classifier training) and a
Reader (for classifier application)

a set of ROOT macros displays the evaluation
results

We will continue to improve flexibility and add
new classifiers
Bayesian Classifiers
Committee Method ? combination of different
MVA techniques
C-code output for trained classifiers (for
selected methods)

28
More Toys Linear, Cross, Circular correlations
More Toys Linear-, Cross-, Circular Correlations

Illustrate the behaviour of linear and nonlinear
classifiers

Linear correlations (same for signal and
background)
Linear correlations (opposite for signal and
background)
Circular correlations (same for signal and
background)
29
Illustustration Events weighted by MVA-response
Weight Variables by Classifier Performance

How well do the classifier resolve the various
correlation patterns ?

Linear correlations (same for signal and
background)
Linear correlations (opposite for signal and
background)
Circular correlations (same for signal and
background)
30
Final Classifier Performance
Final Classifier Performance

Background rejection versus signal efficiency
curve

Linear Example
Cross Example
Circular Example
31
Stability with respect to irrelevant variables
Stability with Respect to Irrelevant Variables

Toy example with 2 discriminating and 4
non-discriminating variables ?

use only two discriminant variables in classifiers
use all discriminant variables in classifiers
32
Using TMVA in Training and Application
Can be ROOT scripts, C executables or python
scripts (via PyROOT), or any other high-level
language that interfaces with ROOT
33
Introduction Event Classification