Gamma/Hadron separation in atmospheric Cherenkov telescopes - PowerPoint PPT Presentation

About This Presentation
Title:

Gamma/Hadron separation in atmospheric Cherenkov telescopes

Description:

extend over 20 orders of magnitude in ... cuts and supercuts ... however, Whipple still uses supercuts; only first experience with kernels in MAGIC: positive ... – PowerPoint PPT presentation

Number of Views:30
Avg rating:3.0/5.0
Slides: 34
Provided by: Nic1220
Category:

less

Transcript and Presenter's Notes

Title: Gamma/Hadron separation in atmospheric Cherenkov telescopes


1
  • Gamma/Hadron separation in atmospheric Cherenkov
    telescopes
  • Overview
  • multi-wavelength astrophysics
  • imaging Cherenkov telescopes (IACT-s)
  • image classification
  • methods under study
  • trying for a rigorous comparison

2
Wavelength regimes in astrophysics
  • extend over 20 orders of magnitude in energy, if
    one adds infrared, radio and microwave
    observations
  • Cherenkov telescopes use visible light, but few
    quanta imaging takes a different meaning
  • some instruments have to be satellite-based, due
    to the absorbing effect of the atmosphere

3
Full sky at different wavelengths
4
An AGN at different wavelengths
5
Objects of interest active galactic nuclei
Black holes spin and develop a jet with shock
waves electrons and protons get accelerated and
impart their energy to high-E g-rays
6
Principle of imaging Cherenkov telescopes
  • a shower develops in the atmosphere, charged
    relativistic particles emit Cherenkov radiation
    (at WLs visible to UV)
  • some photons arrive at sea level, get reflected
    by a mirror to a camera
  • high sensitivity and good time resolution are
    vital, precision is not high reflectivity
    mirrors, the best possible photomultipliers in
    the camera

7
Principle of imaging Cherenkov telescopes
8
Principle of image parameters
  • hadron showers (cosmics) dominate the hardware
    trigger, image analysis must discriminate gammas
    from hadrons
  • showers show different characteristics (like in
    any calorimeter) feature extraction using
    principal component analysis and other
    characteristics must be used - experiment in view
    of best separation

9
One of the predecessor telescopes (HEGRA) in 1999
10
Photomontage of the MAGIC telescope in La Palma
(2000)
11
Installing the mirror dish of MAGIC La Palma,
Dec 2001
12
(No Transcript)
13
  • Multivariate classification
  • cuts are in the n-space of features (in our case
    image parameters), the problem gets unwieldy even
    at low n
  • correlations between the features cause simple
    cuts in variables to be an ineffective method
  • decorrelation by standard methods (e.g.
    Karhunen-Loeve) does not solve the problem, being
    a linear operation
  • finding new variables does help, so do cut
    parameters along one axis, that depend on
    features along a different axis dynamic cuts
    (subjective!)
  • ideally, a transformation to a single test
    statistic should be found

14
  • Different classification methods
  • cuts in the image parameters (including dynamic
    cuts)
  • mathematically optimized cuts in the image
    parameters classification and regression tree
    (CART), commercial products available
  • linear discriminant analysis (LDA)
  • composite (2-D) probabilities (CP)
  • kernel methods
  • artificial neural networks (ANN)

15
There are many general methods on the market
(this slide from A.Faruque, Mississipi State
University)
16
  • Method details and comments
  • cuts and supercuts
  • wide experience exists in many physics
    experiments and for all IACT-s any method
    claiming to be superior must use results from
    these as yardstick
  • does need an optimization criterion, will not
    result in a relation between gamma acceptance and
    hadron contamination (i.e. no single test
    statistic)
  • usually leads to separate studies and
    approximations for each new data set (this is
    past experience) - often difficult to reproduce

17
  • Method details and comments CART
  • developed originally by high-energy physicists
    to do away with the randomness in optimizing cuts
    (Breimann, Friedmann, Olshen, Stone, 1984)
  • now developed into a data mining method,
    commercially available from several companies
  • basic operations growing a tree, pruning it,
    splitting the leaves again - done in some
    heuristic succession
  • the problem is to find a robust measure to
    choose from the many trees that are (or can be)
    grown
  • made for large samples no experience with
    IACT-s, but there are promising early results

18
  • Method details and comments LDA
  • parametric method, finding linear combinations
    of the original image parameters such that the
    separation between signal (gamma) and background
    (hadron) distributions gets maximized
  • fast, simple and (probably) very robust
  • ignores non-linear correlations in n-dimensional
    space (because of linear transformation)
  • little experience with LDA in IACT-s, early
    tests show that higher-order variables are needed
    (e.g. x,y -gt x2y)

19
Method details and comments LDA
20
Method details and comments LDA Like Principal
Component Analysis (PCA), LDA is used for
data classification and dimensionality reduction.
LDA maximizes the ratio of between-class variance
to within-class variance, for any pair of data
sets. This guarantees maximal separability. The
prime difference between LDA and PCA is that PCA
performs feature classification (e.g. image
parameters!) while LDA performs data
classification. PCA changes both the shape and
location of the data in its transformed space,
whereas LDA provides more class separability by
building a decision region between the
classes. The formalism is simple the
transformation into the best separable space is
performed by the eigenvectors of a matrix readily
derived from the data (for our application in
two classes, gammas and hadrons) Caveat both
the PCA and LDA are linear transformations they
may be of limited efficiency when non-linearity
is involved.
21
  • Method details and comments kernel
  • kernel density estimation is a nonparametric
    multivariate classification technique. The
    advantage is that of generality of the
    class-conditional and consistently estimated
    densities
  • uses individual event likelihoods, defined as
    the closeness to the population of gamma events
    or hadron events in n-dimensional space. The
    closeness is expressed by a kernel function as
    metric
  • mathematically convincing, but leading into
    practical problems, including limitations in
    dimensionality there is also some randomness in
    choosing the kernel function
  • has been toyed with in Whipple (the earliest
    functioning IACT), results look convincing
    however, Whipple still uses supercuts only first
    experience with kernels in MAGIC positive

22
Method details and comments kernel
23
  • Method details and comments
  • composite probabilities (2-D)
  • intuitive determination of event probabilities
    by multiplying the probabilities in all 2D
    projections that can be made from image
    parameters, using constant bin content for some
    data
  • shown on some IACT data to at least match best
    existing results (but strict comparisons suffered
    from moving data sets)

24
Method details and comments composite
probabilities (2-D)
CP program uses same-content binning in
2 dimensions
Bins are set up for gammas (red), probabilities
are evaluated for protons (blue) all possible 2-D
projections are used
25
  • Method details and comments ANN-s
  • method has been presented often in the past -
    resembles the CART method but works in locally
    linearly transformed data
  • substantial randomness in choosing depth of
    tree, training method, transfer function..
  • so far no convincing results on IACT-s, Whipple
    have tried and rejected

26
Gamma events in MAGIC before and after cleaning
27
Proton events in MAGIC before and after cleaning
28
Comparison MC gammas / MC protons
29
Comparison MC gammas / MC protons
30
Comparison MC gammas / MC protons
31
Different methods on the same data set
Typically, optimization parameters are fully
defined by cost, purity, and sample size
32
  • We are running a comparative study criteria
  • strictly defined disjoint training and control
    samples
  • must give estimators for hadron contamination
    and gamma acceptance (purity and cost)
  • should ideally result in a smooth function
    relating purity with cost, i.e. result in a
    single test statistic
  • if not, must show results for several
    optimization criteria, e.g. estimated hadron
    contamination at fixed gamma acceptance values,
    significance, etc.
  • for MC events, can control results by comparing
    classification to the known origin of events

33
  • Even if there were a clear conclusion..
  • there remain some serious caveats
  • these methods all assume an abstract space of
    image parameters, which is ok in Monte Carlo
    situations, only
  • real data are subject to influences that distort
    this space
  • starfield and night sky background
  • atmospheric conditions
  • unavoidable detector changes and malfunction
  • no method can invent new independent parameters
  • we assume that in final analysis, gammas will be
    Monte Carlo, measurements are on/off we must
    deal with variables which may not be
    representative in Monte Carlo events and yet
    influence the observed image parameters e.g
    zenith angle changes continuously, energy is
    something we want to observe, hence unknown
  • some compromise between frequent Monte Carlo-ing
    and parametric corrections to parameters is the
    likely solution
Write a Comment
User Comments (0)
About PowerShow.com