Automating cuts in HEP: the classification or decision tree

1 / 23
About This Presentation
Title:

Automating cuts in HEP: the classification or decision tree

Description:

A new technique: 'boosted decision trees' UW EPE seminar 21 Oct 2004 - T. Burnett ... Boosting ... D0: Insightful tools cannot deal with weighting, boosting ... –

Number of Views:44
Avg rating:3.0/5.0
Slides: 24
Provided by: tob79
Category:

less

Transcript and Presenter's Notes

Title: Automating cuts in HEP: the classification or decision tree


1
Automating cuts in HEP the classification (or
decision) tree
  • Introduction
  • Application to Dzero Single Top analysis
  • Serious use by GLAST
  • A new technique boosted decision trees

2
Introduction
  • In HEP experiments (and GLAST) we collect an
    event sample, determined by trigger conditions
  • Each event may be a desired signal, or an
    unavoidable background to be rejected
  • Each event is characterized by a set of measured
    variables we can predict the dependence of
    signal or background on these variables, usually
    with a Monte Carlo simulation of the assumed
    background source, and the detector response
  • The big question how to use the set of measured
    variables to select signal events, or to just
    measure the signal rate?

3
A toy example
4
Example, cont
  • What is the best way to measure the signal rate?
  • Significance inverse variance/signal event N
    signal events, event rate S, statistical error
    ?s
  • Other physics/science needs might be for a pure
    sample with sacrifice of efficiency

5
Significance compare max L with counting above a
cut (toy, again)
6
The real world Want a function of all those
variables
  • Traditional role (in HEP) is Neural Networks
  • First the many neurons in the intermediate layers
    must be set by training with background and
    signal
  • Classification trees are very similar, but much
    more transparent
  • Important variables are identified easily
  • Tree can be examined in detail
  • Invented long ago (60s) not used in HEP since
    70sBreiman, L., Friedman, J., Olshen, R., and
    Stone, C. (1984) Classification and Regression
    Trees", Wadsworth.

7
A simple example with Dzero
  • single top production at the Tevatron

t-channel
s-channel
W2 jet background
8
Introducing Insightful
  • World HQ on west Lake UnionMarkets
  • S-PLUS statistical software system
  • Insightful Miner data-mining software

9
Insightful Miner demo of a classification tree
(with real D0 data)
The classification node
Input tabular data files
10
The tree itself
11
Classification variable importance
Using the Gini criterion
12
Bottom line how does it do?

13
GLAST
Under construction Launch 2007
14
LAT overview
  • Precision Si-strip Tracker (TKR) 18 XY
    tracking planes. Single-sided silicon strip
    detectors (228 mm pitch) Measure the photon
    direction gamma ID.
  • Hodoscopic CsI Calorimeter(CAL) Array of
    1536 CsI(Tl) crystals in 8 layers. Measure the
    photon energy image the shower.
  • Segmented Anticoincidence Detector (ACD) 89
    plastic scintillator tiles. Reject background
    of charged cosmic rays segmentation removes
    self-veto effects at high energy.
  • Electronics System Includes flexible, robust
    hardware trigger and software filters.

15
GLAST pioneer HEP CT user
  • Discovered, applied, promoted by Bill Atwood
  • Created in the 60s, actually applied to HEP at
    SLAC by Jerry FriedmanBreiman, L., Friedman,
    J., Olshen, R., and Stone, C. (1984)
    Classification and Regression Trees", Wadsworth.
  • Separate applications
  • Identify events with well-measured energy
  • Select events will well-measured tracks
  • Separate cosmic-ray induced background from
    actual gamma rays

16
Case I use CT for energy filter
Problem The large gaps in the CAL and the thick
layers of the Tracker compromise
the energy determination. Strategy Identify
poorly measured events and eliminate
them. Technique Split events into energy
classes and for each class use a Classification
Tree to determine the
well-measured events.
Splits
Trees
17
Results
All
Good
Bad
18
A problem that was solved here
  • How to incorporate the decision trees in our
    standard analysis?
  • Answer a class that reads the XML description
    from IM , implements the decision tree structure.

19
Weighting and Boosting
  • How about weighted events?
  • Very natural for Monte Carlo, and absolutely
    necessary for D0 analysis
  • Used to describe triggering and tagging
    probability
  • But not supported by either S-PLUS or IM.

20
An improved CT boosting
  • Applied to MiniBooNE by Byron Roe and
    collaborators (arXivphysics/0408124)
  • It solves two problems
  • The trees are unstable (IM deals with this by
    averaging the results from multiple trees,
    trained with independent data samples)
  • There are nodes that do not select well

21
Boosting
  • Basic idea increase the weight for bad events,
    then run the tree again, and again, and again
    (they did 1000!)

22
Details of weighted training
Ws, Wb weights of signal, background
Define purity of sample on a branch
For a given branch, minimize
Boost increase weights for events that are
misclassified
23
Status
  • GLAST standard and successful part of
    reconstruction but boosting can probably help!
  • D0 Insightful tools cannot deal with weighting,
    boosting
  • Code is needed to create, apply trees that runs
    in context of both! -- starting such a project
Write a Comment
User Comments (0)
About PowerShow.com