Title: Gamma/Hadron separation in atmospheric Cherenkov telescopes
1- Gamma/Hadron separation in atmospheric Cherenkov
telescopes - Overview
- multi-wavelength astrophysics
- imaging Cherenkov telescopes (IACT-s)
- image classification
- methods under study
- trying for a rigorous comparison
2Wavelength regimes in astrophysics
- extend over 20 orders of magnitude in energy, if
one adds infrared, radio and microwave
observations - Cherenkov telescopes use visible light, but few
quanta imaging takes a different meaning - some instruments have to be satellite-based, due
to the absorbing effect of the atmosphere
3Full sky at different wavelengths
4An AGN at different wavelengths
5Objects of interest active galactic nuclei
Black holes spin and develop a jet with shock
waves electrons and protons get accelerated and
impart their energy to high-E g-rays
6Principle of imaging Cherenkov telescopes
- a shower develops in the atmosphere, charged
relativistic particles emit Cherenkov radiation
(at WLs visible to UV) - some photons arrive at sea level, get reflected
by a mirror to a camera - high sensitivity and good time resolution are
vital, precision is not high reflectivity
mirrors, the best possible photomultipliers in
the camera
7Principle of imaging Cherenkov telescopes
8Principle of image parameters
- hadron showers (cosmics) dominate the hardware
trigger, image analysis must discriminate gammas
from hadrons - showers show different characteristics (like in
any calorimeter) feature extraction using
principal component analysis and other
characteristics must be used - experiment in view
of best separation
9One of the predecessor telescopes (HEGRA) in 1999
10Photomontage of the MAGIC telescope in La Palma
(2000)
11Installing the mirror dish of MAGIC La Palma,
Dec 2001
12(No Transcript)
13- Multivariate classification
- cuts are in the n-space of features (in our case
image parameters), the problem gets unwieldy even
at low n - correlations between the features cause simple
cuts in variables to be an ineffective method - decorrelation by standard methods (e.g.
Karhunen-Loeve) does not solve the problem, being
a linear operation - finding new variables does help, so do cut
parameters along one axis, that depend on
features along a different axis dynamic cuts
(subjective!) - ideally, a transformation to a single test
statistic should be found
14- Different classification methods
- cuts in the image parameters (including dynamic
cuts) - mathematically optimized cuts in the image
parameters classification and regression tree
(CART), commercial products available - linear discriminant analysis (LDA)
- composite (2-D) probabilities (CP)
- kernel methods
- artificial neural networks (ANN)
15There are many general methods on the market
(this slide from A.Faruque, Mississipi State
University)
16- Method details and comments
- cuts and supercuts
- wide experience exists in many physics
experiments and for all IACT-s any method
claiming to be superior must use results from
these as yardstick - does need an optimization criterion, will not
result in a relation between gamma acceptance and
hadron contamination (i.e. no single test
statistic) - usually leads to separate studies and
approximations for each new data set (this is
past experience) - often difficult to reproduce
17- Method details and comments CART
- developed originally by high-energy physicists
to do away with the randomness in optimizing cuts
(Breimann, Friedmann, Olshen, Stone, 1984) - now developed into a data mining method,
commercially available from several companies - basic operations growing a tree, pruning it,
splitting the leaves again - done in some
heuristic succession - the problem is to find a robust measure to
choose from the many trees that are (or can be)
grown - made for large samples no experience with
IACT-s, but there are promising early results
18- Method details and comments LDA
- parametric method, finding linear combinations
of the original image parameters such that the
separation between signal (gamma) and background
(hadron) distributions gets maximized - fast, simple and (probably) very robust
- ignores non-linear correlations in n-dimensional
space (because of linear transformation) - little experience with LDA in IACT-s, early
tests show that higher-order variables are needed
(e.g. x,y -gt x2y)
19Method details and comments LDA
20Method details and comments LDA Like Principal
Component Analysis (PCA), LDA is used for
data classification and dimensionality reduction.
LDA maximizes the ratio of between-class variance
to within-class variance, for any pair of data
sets. This guarantees maximal separability. The
prime difference between LDA and PCA is that PCA
performs feature classification (e.g. image
parameters!) while LDA performs data
classification. PCA changes both the shape and
location of the data in its transformed space,
whereas LDA provides more class separability by
building a decision region between the
classes. The formalism is simple the
transformation into the best separable space is
performed by the eigenvectors of a matrix readily
derived from the data (for our application in
two classes, gammas and hadrons) Caveat both
the PCA and LDA are linear transformations they
may be of limited efficiency when non-linearity
is involved.
21- Method details and comments kernel
- kernel density estimation is a nonparametric
multivariate classification technique. The
advantage is that of generality of the
class-conditional and consistently estimated
densities - uses individual event likelihoods, defined as
the closeness to the population of gamma events
or hadron events in n-dimensional space. The
closeness is expressed by a kernel function as
metric - mathematically convincing, but leading into
practical problems, including limitations in
dimensionality there is also some randomness in
choosing the kernel function - has been toyed with in Whipple (the earliest
functioning IACT), results look convincing
however, Whipple still uses supercuts only first
experience with kernels in MAGIC positive
22Method details and comments kernel
23- Method details and comments
- composite probabilities (2-D)
- intuitive determination of event probabilities
by multiplying the probabilities in all 2D
projections that can be made from image
parameters, using constant bin content for some
data - shown on some IACT data to at least match best
existing results (but strict comparisons suffered
from moving data sets)
24Method details and comments composite
probabilities (2-D)
CP program uses same-content binning in
2 dimensions
Bins are set up for gammas (red), probabilities
are evaluated for protons (blue) all possible 2-D
projections are used
25- Method details and comments ANN-s
- method has been presented often in the past -
resembles the CART method but works in locally
linearly transformed data - substantial randomness in choosing depth of
tree, training method, transfer function.. - so far no convincing results on IACT-s, Whipple
have tried and rejected
26Gamma events in MAGIC before and after cleaning
27Proton events in MAGIC before and after cleaning
28Comparison MC gammas / MC protons
29Comparison MC gammas / MC protons
30Comparison MC gammas / MC protons
31Different methods on the same data set
Typically, optimization parameters are fully
defined by cost, purity, and sample size
32- We are running a comparative study criteria
- strictly defined disjoint training and control
samples - must give estimators for hadron contamination
and gamma acceptance (purity and cost) - should ideally result in a smooth function
relating purity with cost, i.e. result in a
single test statistic - if not, must show results for several
optimization criteria, e.g. estimated hadron
contamination at fixed gamma acceptance values,
significance, etc. - for MC events, can control results by comparing
classification to the known origin of events
33- Even if there were a clear conclusion..
- there remain some serious caveats
- these methods all assume an abstract space of
image parameters, which is ok in Monte Carlo
situations, only - real data are subject to influences that distort
this space - starfield and night sky background
- atmospheric conditions
- unavoidable detector changes and malfunction
- no method can invent new independent parameters
- we assume that in final analysis, gammas will be
Monte Carlo, measurements are on/off we must
deal with variables which may not be
representative in Monte Carlo events and yet
influence the observed image parameters e.g
zenith angle changes continuously, energy is
something we want to observe, hence unknown - some compromise between frequent Monte Carlo-ing
and parametric corrections to parameters is the
likely solution