Title: Multivariate Analysis Past, Present and Future
1Multivariate AnalysisPast, Present and Future
- Harrison B. Prosper
- Florida State University
- PHYSTAT 2003
- 10 September 2003
2Outline
- Introduction
- Historical Note
- Current Practice
- Issues
- Summary
3Introduction
- Data are invariably multivariate
- Particle physics (h, f, E, f)
- Astrophysics (?, f, E, t)
4Introduction II A Textbook Example
- Objects
- Jet 1 (b) 3
- Jet 2 3
- Jet 3 3
- Jet 4 (b) 3
- Positron 3
- Neutrino 2
- 17
5Introduction III
- Astrophysics/Particle physics Similarities
- Events
- Interesting events occur at random
- Poisson processes
- Backgrounds are important
- Experimental response functions
- Huge datasets
6Introduction IV
- Differences
- In particle physics we control when events occur
and under what conditions - We have detailed predictions of the relative
frequency of various outcomes
7Introduction VAll we do is Count!
- Our experiments are ideal Bernoulli trials
- At Fermilab, each collision, that is, trial, is
conducted the same way every 400ns - de Finettis analysis of exchangeable trials is
an accurate model of what we do
8Introduction VI
- Typical analysis tasks
- Data Compression
- Clustering and cluster characterization
- Classification/Discrimination
- Estimation
- Model selection/Hypothesis testing
- Optimization
9Historical Note
Karl Pearson (1857 1936)
R.A. Fisher (1890 1962)
P.C. Mahalanobis (1893 1972)
10Historical Note Iris Data
Iris Versicolor
Iris Sotosa
R.A. Fisher, The Use of Multiple Measurements in
Taxonomic Problems, Annals of Eugenics, v. 7, p.
179-188 (1936)
11Iris Data
- Variables
- X1 Sepal length
- X2 Sepal width
- X3 Petal length
- X4 Petal width
- What linear function of the four measurements
will maximize the ratio of the difference between
the specific means to the standard deviations
within species? R.A. Fisher
12Fisher Linear Discriminant (1936)
Solution
Which is the same, within a constant, as
13Current Practice in Particle Physics
- Reducing number of variables
- Principal Component Analysis (PCA)
- Discrimination/Classification
- Fisher Linear Discriminant (FLD)
- Random Grid Search (RGS)
- Feedforward Neural Network (FNN)
- Kernel Density Estimation (KDE)
14Current Practice II
- Parameter Estimation
- Maximum Likelihood (ML)
- Bayesian (KDE and analytical methods)
- e.g., see talk by Florencia Canelli (12A)
- Weighting
- Usually 0, 1, referred to as cuts
- Sometimes use the R. Barlow method
15Cuts (0, 1 weights)
Points that lie below the cuts are cut out
1
0
We refer to (x0, y0) as a cut-point
16Grid Search
Apply cuts at each grid point
compute some measure of their effectiveness and
choose most effective cuts
Curse of dimensionality number of cut-points
NbinNdim
17Random Grid Search
Take each point of the signal class as a
cut-point
y
n events in sample k events after
cuts fraction n/k
x
H.B.P. et al, Proceedings, CHEP 1995
18Example DØ Top Discovery (1995)
19Optimal Discrimination
Bayes Discriminant
20FeedForward Neural Networks
- Applications
- Discrimination
- Parameter estimation
- Function and density estimation
- Basic Idea
- Encode mapping (Kolmogorov, 1950s).
- using a set of 1-D functions.
21Example DØ Search for LeptoQuarks
l
q
l
LQ
q
g
22Issues
- Method choice
- Life is short and data finite so how should one
choose a method? - Model complexity
- How to reduce dimensionality of data, while
minimizing loss of information? - How many model parameters?
- How should one avoid over-fitting?
23Issues I I
- Model robustness
- Is a cut on a multivariate discriminant
necessarily more sensitive to modeling errors
than a cut on each of its input variables? - What is a practical, but useful, way to assess
sensitivity to modeling errors and robustness
with respect to assumptions?
24Issues - III
- Accuracy of predictions
- How should one place error bars on
multivariate-based results? - Is a Bayesian approach useful?
- Goodness of fit
- How can this be done in multiple dimensions?
25Summary
- After 80 years of effort we have many powerful
methods of analysis - A few of which are now used routinely in physics
analyses - The most pressing need is to understand some
issues better so that when the data tsunami
strikes we can respond sensibly
26FNN Probabilistic Interpretation
Minimize the empirical risk function with respect
to w
Solution (for large N)
If t(x) kd1-I(x), where I(x) 1 if x is of
class k, 0 otherwise
D.W. Ruck et al., IEEE Trans. Neural Networks
1(4), 296-298 (1990) E.A. Wan, IEEE Trans. Neural
Networks 1(4), 303-305 (1990)
27Self Organizing Map
- Basic Idea (Kohonen, 1988)
- Map each of K feature vectors X (x1,..,xN)T
into one of M regions of interest defined by the
vector wm so that all X mapped to a given wm are
closer to it than to all remaining wm. - Basically, perform a coarse-graining of the
feature space.
28Support Vector Machines
- Basic Idea
- Data that are non-separable in N-dimensions have
a higher chance of being separable if mapped into
a space of higher dimension - Use a linear discriminant to partition the high
dimensional feature space.
29Independent Component Analysis
- Basic Idea
- Assume X (x1,..,xN)T is a linear sum X AS of
independent sources S (s1,..,sN)T. Both A,
the mixing matrix, and S are unknown. - Find a de-mixing matrix T such that the
components of U TX are statistically independent
30(No Transcript)