Automating cuts in HEP: the classification or decision tree

1 / 23

About This Presentation

Title:

Automating cuts in HEP: the classification or decision tree

Description:

A new technique: 'boosted decision trees' UW EPE seminar 21 Oct 2004 - T. Burnett ... Boosting ... D0: Insightful tools cannot deal with weighting, boosting ... –

Number of Views:44

Avg rating:3.0/5.0

Slides: 24

Provided by: tob79

Category:

more less

Transcript and Presenter's Notes

Title: Automating cuts in HEP: the classification or decision tree

1
Automating cuts in HEP the classification (or
decision) tree

Introduction
Application to Dzero Single Top analysis
Serious use by GLAST
A new technique boosted decision trees

2
Introduction

In HEP experiments (and GLAST) we collect an
event sample, determined by trigger conditions
Each event may be a desired signal, or an
unavoidable background to be rejected
Each event is characterized by a set of measured
variables we can predict the dependence of
signal or background on these variables, usually
with a Monte Carlo simulation of the assumed
background source, and the detector response
The big question how to use the set of measured
variables to select signal events, or to just
measure the signal rate?

3
A toy example
4
Example, cont

What is the best way to measure the signal rate?
Significance inverse variance/signal event N
signal events, event rate S, statistical error
?s
Other physics/science needs might be for a pure
sample with sacrifice of efficiency

5
Significance compare max L with counting above a
cut (toy, again)
6
The real world Want a function of all those
variables

Traditional role (in HEP) is Neural Networks
First the many neurons in the intermediate layers
must be set by training with background and
signal
Classification trees are very similar, but much
more transparent
Important variables are identified easily
Tree can be examined in detail
Invented long ago (60s) not used in HEP since
70sBreiman, L., Friedman, J., Olshen, R., and
Stone, C. (1984) Classification and Regression
Trees", Wadsworth.

7
A simple example with Dzero

single top production at the Tevatron

t-channel
s-channel
W2 jet background
8
Introducing Insightful

World HQ on west Lake UnionMarkets
S-PLUS statistical software system
Insightful Miner data-mining software

9
Insightful Miner demo of a classification tree
(with real D0 data)
The classification node
Input tabular data files
10
The tree itself
11
Classification variable importance
Using the Gini criterion
12
Bottom line how does it do?

13
GLAST
Under construction Launch 2007
14
LAT overview

Precision Si-strip Tracker (TKR) 18 XY
tracking planes. Single-sided silicon strip
detectors (228 mm pitch) Measure the photon
direction gamma ID.
Hodoscopic CsI Calorimeter(CAL) Array of
1536 CsI(Tl) crystals in 8 layers. Measure the
photon energy image the shower.
Segmented Anticoincidence Detector (ACD) 89
plastic scintillator tiles. Reject background
of charged cosmic rays segmentation removes
self-veto effects at high energy.
Electronics System Includes flexible, robust
hardware trigger and software filters.

15
GLAST pioneer HEP CT user

Discovered, applied, promoted by Bill Atwood
Created in the 60s, actually applied to HEP at
SLAC by Jerry FriedmanBreiman, L., Friedman,
J., Olshen, R., and Stone, C. (1984)
Classification and Regression Trees", Wadsworth.
Separate applications
Identify events with well-measured energy
Select events will well-measured tracks
Separate cosmic-ray induced background from
actual gamma rays

16
Case I use CT for energy filter
Problem The large gaps in the CAL and the thick
layers of the Tracker compromise
the energy determination. Strategy Identify
poorly measured events and eliminate
them. Technique Split events into energy
classes and for each class use a Classification
Tree to determine the
well-measured events.
Splits
Trees
17
Results
All
Good
Bad
18
A problem that was solved here

How to incorporate the decision trees in our
standard analysis?
Answer a class that reads the XML description
from IM , implements the decision tree structure.

19
Weighting and Boosting

How about weighted events?
Very natural for Monte Carlo, and absolutely
necessary for D0 analysis
Used to describe triggering and tagging
probability
But not supported by either S-PLUS or IM.

20
An improved CT boosting

Applied to MiniBooNE by Byron Roe and
collaborators (arXivphysics/0408124)
It solves two problems
The trees are unstable (IM deals with this by
averaging the results from multiple trees,
trained with independent data samples)
There are nodes that do not select well

21
Boosting

Basic idea increase the weight for bad events,
then run the tree again, and again, and again
(they did 1000!)

22
Details of weighted training
Ws, Wb weights of signal, background
Define purity of sample on a branch
For a given branch, minimize
Boost increase weights for events that are
misclassified
23
Status

GLAST standard and successful part of
reconstruction but boosting can probably help!
D0 Insightful tools cannot deal with weighting,
boosting
Code is needed to create, apply trees that runs
in context of both! -- starting such a project

Write a Comment

User Comments (0)