Multivariate Analysis Past, Present and Future - PowerPoint PPT Presentation

1 / 30
About This Presentation
Title:

Multivariate Analysis Past, Present and Future

Description:

Particle physics (h, f, E, f) Astrophysics (?, f, E, t) ... X1 Sepal length. X2 Sepal width. X3 Petal length. X4 Petal width ' ... – PowerPoint PPT presentation

Number of Views:139
Avg rating:3.0/5.0
Slides: 31
Provided by: wwwconfSl
Category:

less

Transcript and Presenter's Notes

Title: Multivariate Analysis Past, Present and Future


1
Multivariate AnalysisPast, Present and Future
  • Harrison B. Prosper
  • Florida State University
  • PHYSTAT 2003
  • 10 September 2003

2
Outline
  • Introduction
  • Historical Note
  • Current Practice
  • Issues
  • Summary

3
Introduction
  • Data are invariably multivariate
  • Particle physics (h, f, E, f)
  • Astrophysics (?, f, E, t)

4
Introduction II A Textbook Example
  • Objects
  • Jet 1 (b) 3
  • Jet 2 3
  • Jet 3 3
  • Jet 4 (b) 3
  • Positron 3
  • Neutrino 2
  • 17

5
Introduction III
  • Astrophysics/Particle physics Similarities
  • Events
  • Interesting events occur at random
  • Poisson processes
  • Backgrounds are important
  • Experimental response functions
  • Huge datasets

6
Introduction IV
  • Differences
  • In particle physics we control when events occur
    and under what conditions
  • We have detailed predictions of the relative
    frequency of various outcomes

7
Introduction VAll we do is Count!
  • Our experiments are ideal Bernoulli trials
  • At Fermilab, each collision, that is, trial, is
    conducted the same way every 400ns
  • de Finettis analysis of exchangeable trials is
    an accurate model of what we do

8
Introduction VI
  • Typical analysis tasks
  • Data Compression
  • Clustering and cluster characterization
  • Classification/Discrimination
  • Estimation
  • Model selection/Hypothesis testing
  • Optimization

9
Historical Note
Karl Pearson (1857 1936)
R.A. Fisher (1890 1962)
P.C. Mahalanobis (1893 1972)
10
Historical Note Iris Data
Iris Versicolor
Iris Sotosa
R.A. Fisher, The Use of Multiple Measurements in
Taxonomic Problems, Annals of Eugenics, v. 7, p.
179-188 (1936)
11
Iris Data
  • Variables
  • X1 Sepal length
  • X2 Sepal width
  • X3 Petal length
  • X4 Petal width
  • What linear function of the four measurements
    will maximize the ratio of the difference between
    the specific means to the standard deviations
    within species? R.A. Fisher

12
Fisher Linear Discriminant (1936)
Solution
Which is the same, within a constant, as
13
Current Practice in Particle Physics
  • Reducing number of variables
  • Principal Component Analysis (PCA)
  • Discrimination/Classification
  • Fisher Linear Discriminant (FLD)
  • Random Grid Search (RGS)
  • Feedforward Neural Network (FNN)
  • Kernel Density Estimation (KDE)

14
Current Practice II
  • Parameter Estimation
  • Maximum Likelihood (ML)
  • Bayesian (KDE and analytical methods)
  • e.g., see talk by Florencia Canelli (12A)
  • Weighting
  • Usually 0, 1, referred to as cuts
  • Sometimes use the R. Barlow method

15
Cuts (0, 1 weights)
Points that lie below the cuts are cut out
1
0
We refer to (x0, y0) as a cut-point
16
Grid Search
Apply cuts at each grid point
compute some measure of their effectiveness and
choose most effective cuts
Curse of dimensionality number of cut-points
NbinNdim
17
Random Grid Search
Take each point of the signal class as a
cut-point
y
n events in sample k events after
cuts fraction n/k
x
H.B.P. et al, Proceedings, CHEP 1995
18
Example DØ Top Discovery (1995)
19
Optimal Discrimination
Bayes Discriminant
20
FeedForward Neural Networks
  • Applications
  • Discrimination
  • Parameter estimation
  • Function and density estimation
  • Basic Idea
  • Encode mapping (Kolmogorov, 1950s).
  • using a set of 1-D functions.

21
Example DØ Search for LeptoQuarks
l
q
l
LQ
q
g
22
Issues
  • Method choice
  • Life is short and data finite so how should one
    choose a method?
  • Model complexity
  • How to reduce dimensionality of data, while
    minimizing loss of information?
  • How many model parameters?
  • How should one avoid over-fitting?

23
Issues I I
  • Model robustness
  • Is a cut on a multivariate discriminant
    necessarily more sensitive to modeling errors
    than a cut on each of its input variables?
  • What is a practical, but useful, way to assess
    sensitivity to modeling errors and robustness
    with respect to assumptions?

24
Issues - III
  • Accuracy of predictions
  • How should one place error bars on
    multivariate-based results?
  • Is a Bayesian approach useful?
  • Goodness of fit
  • How can this be done in multiple dimensions?

25
Summary
  • After 80 years of effort we have many powerful
    methods of analysis
  • A few of which are now used routinely in physics
    analyses
  • The most pressing need is to understand some
    issues better so that when the data tsunami
    strikes we can respond sensibly

26
FNN Probabilistic Interpretation
Minimize the empirical risk function with respect
to w
Solution (for large N)
If t(x) kd1-I(x), where I(x) 1 if x is of
class k, 0 otherwise
D.W. Ruck et al., IEEE Trans. Neural Networks
1(4), 296-298 (1990) E.A. Wan, IEEE Trans. Neural
Networks 1(4), 303-305 (1990)
27
Self Organizing Map
  • Basic Idea (Kohonen, 1988)
  • Map each of K feature vectors X (x1,..,xN)T
    into one of M regions of interest defined by the
    vector wm so that all X mapped to a given wm are
    closer to it than to all remaining wm.
  • Basically, perform a coarse-graining of the
    feature space.

28
Support Vector Machines
  • Basic Idea
  • Data that are non-separable in N-dimensions have
    a higher chance of being separable if mapped into
    a space of higher dimension
  • Use a linear discriminant to partition the high
    dimensional feature space.

29
Independent Component Analysis
  • Basic Idea
  • Assume X (x1,..,xN)T is a linear sum X AS of
    independent sources S (s1,..,sN)T. Both A,
    the mixing matrix, and S are unknown.
  • Find a de-mixing matrix T such that the
    components of U TX are statistically independent

30
(No Transcript)
Write a Comment
User Comments (0)
About PowerShow.com