TMVA Toolkit for Multivariate Data Analysis - PowerPoint PPT Presentation

About This Presentation
Title:

TMVA Toolkit for Multivariate Data Analysis

Description:

bagging (re-sampling with replacement) random weights. Boosted Decision Trees. Remark: bagging/boosting create a basis of classifiers ... – PowerPoint PPT presentation

Number of Views:30
Avg rating:3.0/5.0
Slides: 34
Provided by: helg155
Category:

less

Transcript and Presenter's Notes

Title: TMVA Toolkit for Multivariate Data Analysis


1
TMVA Toolkit for Multivariate Data Analysis with
ROOT
Helge Voss, MPI-K Heidelberg on behalf of
Andreas Höcker, Fredrik Tegenfeld, Joerg Stelzer
  • Supply an environment to easily
  • apply different sophisticated data selection
    algorithms
  • have them all trained, tested and evaluated
  • find the best one for your selection problem

and contributors A.Christov, S.Henrot-Versillé,
M.Jachowski, A.Krasznahorkay Jr., Y.Mahalalel,
X.Prudent, P.Speckmayer, M.Wolter, A.Zemla
http//tmva.sourceforge.net/ arXiv
physics/0703039
2
Motivation/Outline
  • ROOT is the analysis framework used by most
    (HEP)-physicists
  • Idea rather than just implementing new MVA
    techniques and making them somehow available in
    ROOT (i.e. like TMulitLayerPercetron does)
  • have one common platform/interface for all MVA
    classifiers
  • easy to use and compare different MVA classifiers
  • train/test on same data sample and evaluate
    consistently
  • Outline
  • introduction
  • the MVA classifiers available in TMVA
  • demonstration with toy examples
  • summary

3
Multivariate Event Classification
  • All multivariate classifiers condense
    (correlated) multi-variable input information
    into a single scalar output variable Rn
    ? R

y(Bkg) ? 0 y(Signal) ? 1
One variable to base your decision on

4
What is in TMVA
  • TMVA currently includes
  • Rectangular cut optimisation
  • Projective and Multi-dimensional likelihood
    estimator
  • Fisher discriminant and H-Matrix (?2 estimator)
  • Artificial Neural Network (3 different
    implementations)
  • Boosted/bagged Decision Trees
  • Rule Fitting
  • Support Vector Machines
  • all classifiers are highly customizable
  • common pre-processing of input de-correlation,
    principal component analysis
  • support of arbitrary pre-selections and
    individual event weights
  • TMVA package provides training, testing and
    evaluation of the classifiers
  • each classifier provides a ranking of the input
    variables
  • classifiers produce weight files that are read by
    reader class for MVA application
  • integrated in ROOT(since release 5.11/03) and
    very easy to use!

5
Preprocessing the Input Variables Decorrelation
  • Commonly realised for all methods in TMVA
    (centrally in DataSet class)
  • Removal of linear correlations by rotating
    variables
  • using the square-root of the correlation matrix
  • using the Principal Component Analysis

SQRT derorr.
PCA derorr.
original
  • Note that this de-correlation is only complete,
    if
  • input variables are Gaussians
  • correlations linear only
  • in practise gain form de-correlation often
    rather modest or even harmful ?

6
Cut Optimisation
  • Simplest method cut in rectangular volume using
  • scan in signal efficiency 0 ?1 and maximise
    background rejection
  • from this scan, the optimal working point in
    terms if S,B numbers can be derived
  • Technical problem how to perform optimisation
  • TMVA uses random sampling, Simulated Annealing
    or Genetics Algorithm
  • speed improvement in volume search
  • ? training events are sorted in Binary Seach
    Trees
  • do this in normal variable space or de-correlated
    variable space

7
Projected Likelihood Estimator (PDE Appr.)
  • Combine probability from different variables for
    an event to be signal or background like
  • Optimal if no correlations and PDFs are correct
    (known)
  • usually it is not true ? development of
    different methods
  • Technical problem how to implement reference
    PDFs
  • 3 ways counting, function fitting ,
    parametric fitting (splines, kernel estimators.)

8
Multidimensional Likelihood Estimator
  • Generalisation of 1D PDE approach to Nvar
    dimensions
  • Optimal method in theory if true N-dim PDF
    were known
  • Practical challenges
  • derive N-dim PDF from training sample

x2
S
  • TMVA implementation Range search PDERS
  • count number of signal and background events in
    vicinity of a data event ? fixed size or
    adaptive (latter one kNN-type classifiers)

test event
B
x1
  • volumes can be rectangular or spherical
  • use multi-D kernels (Gaussian, triangular, )
    to weight events within a volume
  • speed up range search by sorting training
    events in Binary Trees

Carli-Koblitz, NIM A501, 576 (2003)
9
Fisher Discriminant (and H-Matrix)
  • Well-known, simple and elegant classifier
  • determine linear variable transformation where
  • linear correlations are removed
  • mean values of signal and background are pushed
    as far apart as possible
  • the computation of Fisher response is very
    simple
  • linear combination of the event variables
    Fisher coefficients

Fisher coefficients
10
Artificial Neural Network (ANN)
  • Get a non-linear classifier response by giving
    linear combination of input variables to nodes
    with non-linear activation
  • Nodes (or neurons) and arranged in series
  • ? Feed-Forward Multilayer Perceptrons (3
    different implementations in TMVA)

(Activation function)
  • Training adjust weights using known event such
    that signal/background are best separated

11
Decision Trees
Decision Trees
  • sequential application of cuts which splits
    the data into nodes, and the final nodes (leaf)
    classifies an event as signal or background
  • Training growing a decision tree
  • Start with Root node
  • Split training sample according to cut on best
    variable at this node
  • Splitting criterion e.g., maximum Gini-index
    purity ? (1 purity)
  • Continue splitting until min. number of events or
    max. purity reached
  • Classify leaf node according to majority of
    events, or give weight unknown test events are
    classified accordingly

Decision tree after pruning
Decision tree before pruning
  • Bottom up Pruning
  • remove statistically insignificant nodes ?
    avoid overtraining

12
Boosted Decision Trees
Boosted Decision Trees
  • Decision Trees well know since a long time but
    hardly used in HEP (although very similar to
    simple Cuts)
  • Disatvantage instability small changes in
    training sample can give large changes in tree
    structure
  • Boosted Decision Trees (1996) combine
    several decision trees forest
  • classifier output is the (weighted) majority
    vote of individual trees
  • trees derived from same training sample with
    different event weights
  • e.g. AdaBoost wrong classified training
    events are given a larger weight
  • bagging (re-sampling with replacement) ?random
    weights
  • Remark bagging/boosting ? create a basis of
    classifiers
  • final classifier is a linear combination of
    base classifiers

13
Rule Fitting(Predictive Learning via Rule
Ensembles)
  • Following RuleFit from Friedman-Popescu

Friedman-Popescu, Tech Rep, Stat. Dpt, Stanford
U., 2003
  • Classifier is a linear combination of simple base
    classifiers
  • that are called rules and are here
    sequences of cuts
  • The procedure is
  • create the rule ensemble ? created from a set of
    decision trees
  • fit the coefficients ? Gradient directed
    regularization (Friedman et al)

14
Support Vector Machines
  • Find hyperplane that best separates signal from
    background
  • best separation maximum distance between
    closest events (support) to hyperplane
  • linear decision boundary

x2
  • Non linear cases
  • transform the variables in higher dimensional
    feature space where linear boundary
    (hyperplanes) can separate the data
  • transformation is done implicitly using Kernel
    Functions that effectively introduces a metric
    for the distance measures that mimics the
    transformation
  • Choose Kernel and fit the hyperplane

x1
Available Kernels Gaussian,
Polynomial,
Sigmoid
x1
15
A Complete Example Analysis
void TMVAnalysis( ) TFile outputFile
TFileOpen( "TMVA.root", "RECREATE" )
TMVAFactory factory new TMVAFactory(
"MVAnalysis", outputFile,"!V") TFile input
TFileOpen("tmva_example.root") TTree
signal (TTree)input-gtGet("TreeS")
TTree background (TTree)input-gtGet("TreeB")
factory-gtAddSignalTree ( signal, 1. )
factory-gtAddBackgroundTree( background, 1.)
factory-gtAddVariable("var1var2", 'F')
factory-gtAddVariable("var1-var2", 'F')
factory-gtAddVariable("var3", 'F')
factory-gtAddVariable("var4", 'F')
factory-gtPrepareTrainingAndTestTree("",
"NSigTrain3000NBkgTrain3000SplitModeRandom!V
" ) factory-gtBookMethod(
TMVATypeskLikelihood, "Likelihood",

"!V!TransformOutputSpline2NSmooth5NAvEvtPerB
in50" ) factory-gtBookMethod(
TMVATypeskMLP, "MLP", "!VNCycles200HiddenLa
yersN1,NTestRate5" ) factory-gtTrainAllMet
hods() factory-gtTestAllMethods()
factory-gtEvaluateAllMethods()
outputFile-gtClose() delete factory
16
Example Application
void TMVApplication( ) TMVAReader
reader new TMVAReader("!Color")
Float_t var1, var2, var3, var4
reader-gtAddVariable( "var1var2", var1 )
reader-gtAddVariable( "var1-var2", var2 )
reader-gtAddVariable( "var3", var3 )
reader-gtAddVariable( "var4", var4 )
reader-gtBookMVA( "MLP method",
"weights/MVAnalysis_MLP.weights.txt" ) TFile
input TFileOpen("tmva_example.root")
TTree theTree (TTree)input-gtGet("TreeS")
Float_t userVar1, userVar2 theTree-gtSetBranchA
ddress( "var1", userVar1 )
theTree-gtSetBranchAddress( "var2", userVar2 )
theTree-gtSetBranchAddress( "var3", var3 )
theTree-gtSetBranchAddress( "var4", var4 )
for (Long64_t ievt3000 ievtlttheTree-gtGetEntries(
)ievt) theTree-gtGetEntry(ievt)
var1 userVar1 userVar2 var2 userVar1
- userVar2 cout ltlt reader-gtEvaluateMVA(
"MLP method" ) ltltendl delete
reader
17
A purely academic Toy example
  • Use data set with 4 linearly correlated Gaussian
    distributed variables

--------------------------------------- Rank
Variable  Separation --------------------------
-------------      1 var3     
3.834e02     2 var2       3.062e02
     3 var1       1.097e02    
4 var0       5.818e01 --------------------
-------------------
18
Validating the classifiers
Validating the Classifier Training
  • Projective likelihood PDFs, MLP training, BDTs,
    ....

average no. of nodes before/after pruning 4193 /
968
19
Classifier Output
The Output
  • TMVA output distributions

Fisher
PDERS
Likelihood
correlations removed
due to correlations
Neural Network
Boosted Decision Trees
Rule Fitting
20
Evaluation Output
The Output
  • TMVA output distributions for Fisher, Likelihood,
    BDT and MLP

For this case Fisher discriminant provides the
theoretically best possible method ? Same as
de-correlated Likelihood
Cuts and Likelihood w/o de-correlation are
inferior
Note About All Realistic Use Cases are Much More
Difficult Than This One
21
Evaluation Output (taken from TMVA printout)
Evaluation results ranked by best signal
efficiency and purity (area) ---------------------
--------------------------------------------------
------- MVA Signal efficiency at
bkg eff. (error) Sepa- Signifi- Methods
_at_B0.01 _at_B0.10 _at_B0.30 Area
ration cance ----------------------------------
-------------------------------------------- Fishe
r 0.268(03) 0.653(03) 0.873(02)
0.882 0.444 1.189 MLP
0.266(03) 0.656(03) 0.873(02) 0.882 0.444
1.260 LikelihoodD 0.259(03) 0.649(03)
0.871(02) 0.880 0.441 1.251 PDERS
0.223(03) 0.628(03) 0.861(02) 0.870
0.417 1.192 RuleFit 0.196(03)
0.607(03) 0.845(02) 0.859 0.390
1.092 HMatrix 0.058(01) 0.622(03)
0.868(02) 0.855 0.410 1.093 BDT
0.154(02) 0.594(04) 0.838(03) 0.852
0.380 1.099 CutsGA 0.109(02)
1.000(00) 0.717(03) 0.784 0.000
0.000 Likelihood 0.086(02) 0.387(03)
0.677(03) 0.757 0.199 0.682 --------------
--------------------------------------------------
-------------- Testing efficiency compared to
training efficiency (overtraining
check) -------------------------------------------
----------------------------------- MVA
Signal efficiency from test sample (from
traing sample) Methods _at_B0.01
_at_B0.10 _at_B0.30 ---------------------
--------------------------------------------------
------- Fisher 0.268 (0.275)
0.653 (0.658) 0.873 (0.873) MLP
0.266 (0.278) 0.656 (0.658) 0.873
(0.873) LikelihoodD 0.259 (0.273)
0.649 (0.657) 0.871 (0.872) PDERS
0.223 (0.389) 0.628 (0.691) 0.861
(0.881) RuleFit 0.196 (0.198)
0.607 (0.616) 0.845 (0.848) HMatrix
0.058 (0.060) 0.622 (0.623) 0.868
(0.868) BDT 0.154 (0.268)
0.594 (0.736) 0.838 (0.911) CutsGA
0.109 (0.123) 1.000 (0.424) 0.717
(0.715) Likelihood 0.086 (0.092)
0.387 (0.379) 0.677 (0.677) -----------------
--------------------------------------------------
----------
Better classifier
Check for over-training
22
More Toys Circular correlations
More Toys Linear-, Cross-, Circular Correlations
  • Illustrate the behaviour of linear and nonlinear
    classifiers

Circular correlations (same for signal and
background)
23
Illustustration Events weighted by MVA-response
Weight Variables by Classifier Performance
  • Example How do classifiers deal with the
    correlation patterns ?

Linear Classifiers
Fisher
Likelihood
decorrelated Likelihood
Non Linear Classifiers
Decision Trees
PDERS
24
Final Classifier Performance
Final Classifier Performance
  • Background rejection versus signal efficiency
    curve

Circular Example
25
More Toys Schachbrett (chess board)
Event Distribution
  • Performance achieved without parameter
    adjustments
  • PDERS and BDT are best out of the box
  • After some parameter tuning, also SVM und
    ANN(MLP) perform

Theoretical maximum
Events weighted by SVM response
26
TMVA-Users Guide
We (finally) have a Users Guide !
Available from tmva.sf.net
TMVA Users Guide 78pp, incl. code examples
arXiv physics/0703039
27
Summary
  • TMVA unifies highly customizable and performing
    multivariate classification algorithms in a
    single user-friendly framework
  • This ensures most objective classifier
    comparisons and simplifies their use
  • TMVA is available from tmva.sf.net and in ROOT
    (gt5.11/03)
  • A typical TMVA analysis requires user interaction
    with a Factory (for classifier training) and a
    Reader (for classifier application)
  • a set of ROOT macros displays the evaluation
    results
  • We will continue to improve flexibility and add
    new classifiers
  • Bayesian Classifiers
  • Committee Method ? combination of different
    MVA techniques
  • C-code output for trained classifiers (for
    selected methods)

28
More Toys Linear, Cross, Circular correlations
More Toys Linear-, Cross-, Circular Correlations
  • Illustrate the behaviour of linear and nonlinear
    classifiers

Linear correlations (same for signal and
background)
Linear correlations (opposite for signal and
background)
Circular correlations (same for signal and
background)
29
Illustustration Events weighted by MVA-response
Weight Variables by Classifier Performance
  • How well do the classifier resolve the various
    correlation patterns ?

Linear correlations (same for signal and
background)
Linear correlations (opposite for signal and
background)
Circular correlations (same for signal and
background)
30
Final Classifier Performance
Final Classifier Performance
  • Background rejection versus signal efficiency
    curve

Linear Example
Cross Example
Circular Example
31
Stability with respect to irrelevant variables
Stability with Respect to Irrelevant Variables
  • Toy example with 2 discriminating and 4
    non-discriminating variables ?

use only two discriminant variables in classifiers
use all discriminant variables in classifiers
32
Using TMVA in Training and Application
Can be ROOT scripts, C executables or python
scripts (via PyROOT), or any other high-level
language that interfaces with ROOT
33
Introduction Event Classification
  • Different techniques use different ways trying to
    exploit (all) features
  • ? compare and choose

A linear boundary?
A nonlinear one?
Rectangular cuts?
S
  • How to place the decision boundary?
  • ? Let the machine learn it from training events
Write a Comment
User Comments (0)
About PowerShow.com