Gamma/Hadron separation in atmospheric Cherenkov telescopes - PowerPoint PPT Presentation

About This Presentation

Title:

Gamma/Hadron separation in atmospheric Cherenkov telescopes

Description:

extend over 20 orders of magnitude in ... cuts and supercuts ... however, Whipple still uses supercuts; only first experience with kernels in MAGIC: positive ... – PowerPoint PPT presentation

Number of Views:30

Avg rating:3.0/5.0

Slides: 34

Provided by: Nic1220

Category:

more less

Transcript and Presenter's Notes

Title: Gamma/Hadron separation in atmospheric Cherenkov telescopes

1

Gamma/Hadron separation in atmospheric Cherenkov
telescopes
Overview
multi-wavelength astrophysics
imaging Cherenkov telescopes (IACT-s)
image classification
methods under study
trying for a rigorous comparison

2
Wavelength regimes in astrophysics

extend over 20 orders of magnitude in energy, if
one adds infrared, radio and microwave
observations
Cherenkov telescopes use visible light, but few
quanta imaging takes a different meaning
some instruments have to be satellite-based, due
to the absorbing effect of the atmosphere

3
Full sky at different wavelengths
4
An AGN at different wavelengths
5
Objects of interest active galactic nuclei
Black holes spin and develop a jet with shock
waves electrons and protons get accelerated and
impart their energy to high-E g-rays
6
Principle of imaging Cherenkov telescopes

a shower develops in the atmosphere, charged
relativistic particles emit Cherenkov radiation
(at WLs visible to UV)
some photons arrive at sea level, get reflected
by a mirror to a camera
high sensitivity and good time resolution are
vital, precision is not high reflectivity
mirrors, the best possible photomultipliers in
the camera

7
Principle of imaging Cherenkov telescopes
8
Principle of image parameters

hadron showers (cosmics) dominate the hardware
trigger, image analysis must discriminate gammas
from hadrons
showers show different characteristics (like in
any calorimeter) feature extraction using
principal component analysis and other
characteristics must be used - experiment in view
of best separation

9
One of the predecessor telescopes (HEGRA) in 1999
10
Photomontage of the MAGIC telescope in La Palma
(2000)
11
Installing the mirror dish of MAGIC La Palma,
Dec 2001
12
(No Transcript)
13

Multivariate classification
cuts are in the n-space of features (in our case
image parameters), the problem gets unwieldy even
at low n
correlations between the features cause simple
cuts in variables to be an ineffective method
decorrelation by standard methods (e.g.
Karhunen-Loeve) does not solve the problem, being
a linear operation
finding new variables does help, so do cut
parameters along one axis, that depend on
features along a different axis dynamic cuts
(subjective!)
ideally, a transformation to a single test
statistic should be found

Different classification methods
cuts in the image parameters (including dynamic
cuts)
mathematically optimized cuts in the image
parameters classification and regression tree
(CART), commercial products available
linear discriminant analysis (LDA)
composite (2-D) probabilities (CP)
kernel methods
artificial neural networks (ANN)

15
There are many general methods on the market
(this slide from A.Faruque, Mississipi State
University)
16

Method details and comments
cuts and supercuts
wide experience exists in many physics
experiments and for all IACT-s any method
claiming to be superior must use results from
these as yardstick
does need an optimization criterion, will not
result in a relation between gamma acceptance and
hadron contamination (i.e. no single test
statistic)
usually leads to separate studies and
approximations for each new data set (this is
past experience) - often difficult to reproduce

Method details and comments CART
developed originally by high-energy physicists
to do away with the randomness in optimizing cuts
(Breimann, Friedmann, Olshen, Stone, 1984)
now developed into a data mining method,
commercially available from several companies
basic operations growing a tree, pruning it,
splitting the leaves again - done in some
heuristic succession
the problem is to find a robust measure to
choose from the many trees that are (or can be)
grown
made for large samples no experience with
IACT-s, but there are promising early results

Method details and comments LDA
parametric method, finding linear combinations
of the original image parameters such that the
separation between signal (gamma) and background
(hadron) distributions gets maximized
fast, simple and (probably) very robust
ignores non-linear correlations in n-dimensional
space (because of linear transformation)
little experience with LDA in IACT-s, early
tests show that higher-order variables are needed
(e.g. x,y -gt x2y)

19
Method details and comments LDA
20
Method details and comments LDA Like Principal
Component Analysis (PCA), LDA is used for
data classification and dimensionality reduction.
LDA maximizes the ratio of between-class variance
to within-class variance, for any pair of data
sets. This guarantees maximal separability. The
prime difference between LDA and PCA is that PCA
performs feature classification (e.g. image
parameters!) while LDA performs data
classification. PCA changes both the shape and
location of the data in its transformed space,
whereas LDA provides more class separability by
building a decision region between the
classes. The formalism is simple the
transformation into the best separable space is
performed by the eigenvectors of a matrix readily
derived from the data (for our application in
two classes, gammas and hadrons) Caveat both
the PCA and LDA are linear transformations they
may be of limited efficiency when non-linearity
is involved.
21

Method details and comments kernel
kernel density estimation is a nonparametric
multivariate classification technique. The
advantage is that of generality of the
class-conditional and consistently estimated
densities
uses individual event likelihoods, defined as
the closeness to the population of gamma events
or hadron events in n-dimensional space. The
closeness is expressed by a kernel function as
metric
mathematically convincing, but leading into
practical problems, including limitations in
dimensionality there is also some randomness in
choosing the kernel function
has been toyed with in Whipple (the earliest
functioning IACT), results look convincing
however, Whipple still uses supercuts only first
experience with kernels in MAGIC positive

22
Method details and comments kernel
23

Method details and comments
composite probabilities (2-D)
intuitive determination of event probabilities
by multiplying the probabilities in all 2D
projections that can be made from image
parameters, using constant bin content for some
data
shown on some IACT data to at least match best
existing results (but strict comparisons suffered
from moving data sets)

24
Method details and comments composite
probabilities (2-D)
CP program uses same-content binning in
2 dimensions
Bins are set up for gammas (red), probabilities
are evaluated for protons (blue) all possible 2-D
projections are used
25

Method details and comments ANN-s
method has been presented often in the past -
resembles the CART method but works in locally
linearly transformed data
substantial randomness in choosing depth of
tree, training method, transfer function..
so far no convincing results on IACT-s, Whipple
have tried and rejected

26
Gamma events in MAGIC before and after cleaning
27
Proton events in MAGIC before and after cleaning
28
Comparison MC gammas / MC protons
29
Comparison MC gammas / MC protons
30
Comparison MC gammas / MC protons
31
Different methods on the same data set
Typically, optimization parameters are fully
defined by cost, purity, and sample size
32

We are running a comparative study criteria
strictly defined disjoint training and control
samples
must give estimators for hadron contamination
and gamma acceptance (purity and cost)
should ideally result in a smooth function
relating purity with cost, i.e. result in a
single test statistic
if not, must show results for several
optimization criteria, e.g. estimated hadron
contamination at fixed gamma acceptance values,
significance, etc.
for MC events, can control results by comparing
classification to the known origin of events

Even if there were a clear conclusion..
there remain some serious caveats
these methods all assume an abstract space of
image parameters, which is ok in Monte Carlo
situations, only
real data are subject to influences that distort
this space
starfield and night sky background
atmospheric conditions
unavoidable detector changes and malfunction
no method can invent new independent parameters
we assume that in final analysis, gammas will be
Monte Carlo, measurements are on/off we must
deal with variables which may not be
representative in Monte Carlo events and yet
influence the observed image parameters e.g
zenith angle changes continuously, energy is
something we want to observe, hence unknown
some compromise between frequent Monte Carlo-ing
and parametric corrections to parameters is the
likely solution