Title: Linear Techniques for Regression and Classification on Functional Data
1Linear Techniques for Regression and
Classification on Functional Data
Gilbert Saporta Chaire de Statistique Appliquée
CEDRIC Conservatoire National des Arts et
Métiers 292 rue Saint Martin F 75141 Paris Cedex
03 saporta_at_cnam.fr http//cedric.cnam.fr/saporta
Joint work with D. Costanzo (U.Calabria)
C.Preda (U.Lille2)
2Outline
- 1. IntroductionÂ
- 2. OLS regression on functional data
- 3. PLS functional regression
- 4. Clusterwise regression
- 5. Discrimination
- 6. Anticipated prediction
- 7. Conclusion and perspectives
31.Introduction
- Very high dimensional data an infinite number of
variables - Regression on functional data
- Example 1 Y amount of crop
- Xt temperature curves
- p ?
R.A.Fisher The Influence of Rainfall on the
Yield of Wheat at Rothamsted Philosophical
Transactions of the Royal Society, B 213 89-142
(1924)
4 - Example 2 Growth index of 84 shares at Paris
stock exchange during 60 minutes
How to predict X55 till X60, for a new share,
knowing X from t0 till t55?
5 Discrimination on functional data
- Example 3 Kneading curves for cookies (Danone
Vitapole)
6- After smoothing with cubic B-splines (Lévéder
al, 2004)
How to predict the quality of the cookies?
7- Linear combination
- Integral regression (Fisher 1924)
- instead of a finite sum
8- Discrimination on functional data
- Particular case of regression when the response
is binary - Anticipation
- Determine an optimal time tltT giving a
prediction based on 0t almost as good as the
prediction using all the data 0T
92. OLS regression on functional data
- Y Xt (with zero mean)
- 2.1 The OLS problem
- Minimizing
- leads to normal, or Wiener-Hopf, equations
-
- where C(t,s) cov(Xt, Xs)E(XtXs)
10 - 2.2 Karhunen-Loeve decomposition (functional PCA)
- factor loadings
- principal components
11 - Picards theorem ? is unique if and only if
-
- Generally not trueespecially when n is finite
since p gtn. Perfect fit when minimizing
12 -
- Even if ? is unique, Wiener-Hopf equation is
not an ordinary integral equation the solution
is more frequently a distribution than a function - Constrained solutions are needed. (cf Green
Silverman 1994, Ramsay Silverman 1997).
13 - 2.3 Regression on principal components
- Rank q approximation
14- Numerical computations
- Solve integral equations in the general case
- for step functions finite number of variables
and of units operators are matrices, but with a
very high size - Approximations by discretisation of time
15 - Which principal components?
- First q?
- q best correlated with Y?
- Principal components are computed irrespective of
the response
163. Functional PLS regression
- Use PLS components instead of principal
components. - first PLS component
-
-
-
- further PLS components as usual
17 - order q approximation of Y by Xt
- Convergence theorem
- q have to be finite in order to get a formula!
- Usually q is selected by cross-validation
- (Preda Saporta, 2005a)
18 - First PLS component easily interpretable
coefficients with the same sign as r(yxt) - No integral equation
- PLS fits better than PCR
- Same proof as in De Jong, 1993
194. Clusterwise regression
- 4.1 Model
- G , variable with K categories (sub-populations)
20 - 4.2 OLS and clusterwise regression
- Residual variance of global regression within
cluster residual variance variance due to the
difference between local (clusterwise) and global
regression (OLS)
21 - 4.3 Estimation (Charles, 1977)
- number of clusters k needs to be known
- Alternated least squares
- For a given partition estimate linear
regressions for each cluster - Reallocate each point to the closest regression
line (or surface) -
- Equivalent to ML for fixed regressors, fixed
partition model (Hennig, 2000) - 4.4 Optimal k
- AIC, BIC, crossvalidation
224.5 Clusterwise functional PLS regression
- OLS functional regression not adequate to give
estimations in each cluster - Our proposal estimate local models with
functional PLS regression - Is the clusterwise algorithm still consistent?
- Proof in Preda Saporta, 2005b
23 - Prediction
- Allocate a new observation to a cluster (nearest
neighbor or other classification technique) - Use the corresponding local model
- May be generalised if Y is itself a random
vector
244.6 Application to stock market data
- Growth index during 1 hour (between 10h and 11h)
of 84 shares at Paris Stock Exchange - Goal predict a new share between 10h55 and 11h
using data between 10h and 10h55
25 - Exact computations need 1366 variables (number of
intervals where the 85 curves are constant) - Discretisation in 60 intervals.
- Comparison between PCR and PLS
26 - Crash of share 85 not detected!
27- Clusterwise PLS
- Four clusters (17321025)
- Number of PLS component for each cluster 1 3 2
2 (cross-validation)
28 - Share 85 classified into cluster 1
293. Functional linear discrimination
- LDA linear combinations
- maximizing the ratio
- Between group variance /Within group variance
- For 2 groups Fishers LDF via a regression
between coded Y and Xt - eg
- (Preda Saporta, 2005a)
30- PLS regression with q components gives an
approximation of ß(t) and of the score -
- For more than 2 groups PLS2 regression between
k-1 indicators of Y and Xt - First PLS component given by the first
eigenvector of the product of Escoufier operators
WxWY - Preda Saporta, 2002 and Barker Rayens , 2003
31Quality measures
- For k2 ROC curve and AUC
- For a specific threshold, x is classified into
G1if dT(x)gts - Sensitivity or true positive rate
P(dT(x)gts/Y1)1-ß - 1- specificity or 1- true negative rate
P(dT(x)gts/Y0)?
32ROC curve
- Perfect discrimination
- ROC curve is confounded with the edges of unit
square - For identical conditional distributions ROC
curve is confounded with the diagonal
33- ROC curve invariant for any increasing monotonous
transformation - Area under ROC curve a global measure of
performance allowing model comparisons
(partially) -
- X1 drawn from G1 and X2 from G2
- AUC estimated by the proportion of concordant
pairs - nc Wilcoxon-Mann-Whitney statistic
- UW n1n20.5n1(n11) AUCU/n1n2
344. Anticipated prediction
- tltT such that the analysis on 0t give donne
predictions almost as good as with 0T - Solution
- When increasing s from 0 to T, look for the first
value such that AUC(s) does not differ
significantly from AUC(T)
35- A bootstrap procedure
- Stratified resampling of the data
- For each replication b, AUCb(s) and AUCb(T) are
computed - Students T test or Wilcoxon on the B paired
differences ?bAUCb(s)- AUCb(T)
365.Applications
- 5.1 simulated data
- Two classes with equal priors
- W(t) brownian motion
37(No Transcript)
38 39- 5.2 Kneading curves
- After T 480s of kneading one gets cookies
where quality is Y - 115 observations 50 good , 40 bad et 25
adjustable - 241 equally spaced measurements
- Smoothing with cubic B-splines , 16 knots
40- Performance for Ygood,bad
- Repeat 100 times the split into learning and test
samples of size (60, 30) - Average error rate
- 0.142 with principal components
- 0.112 with PLS components
- Average AUC 0.746
-
- ß(t)
41- Anticipated prediction
- B50
- t186
- The recording period of the resistance dough can
be reduced to less than half of the current one
426.Conclusions and perspectives
- PLS regression is an efficient and simple way to
get linear prediction for functional data - We have proposed a bootstrap procedure for the
problem of anticipated prediction
43- Works in progress
- on-line forecasting instead of using the
same anticipated decision time t for all data,
we could adapt t to each new trajectory given
its incoming measurements. - Clusterwise discrimination
- Comparison with functional logistic regression
- Aguilera et al, 2006
44References
- Aguilera A.M., Escabias, M. Valderrama M.J.
(2006) Using principal components for estimating
logistic regression with high-dimensional
multicollinear data, Computational Statistics
Data Analysis, 50, 1905-1924 - Barker M., Rayens W. (2003) Partial least squares
for discrimination. J Chemomet 17166173 - Charles, C., 1977. Régression typologique et
reconnaissance des formes. Ph.D., Université
Paris IX. - D. Costanzo, C. Preda et G. Saporta (2006).
Anticipated prediction in discriminant analysis
on functional data for binary response . In
COMPSTAT2006, p. 821-828, Physica-Verlag - Hennig, C., (2000). Identifiability of models for
clusterwise linear regression. J. Classification
17, 273296. - Lévéder C., Abraham C., Cornillon P. A.,
Matzner-Lober E., Molinari N. (2004)
Discrimination de courbes de pétrissage.
Chimiometrie 2004, 3743. - Preda C. , Saporta G. (2005a) PLS regression on
a stochastic process, Computational Statistics
and Data Analysis, 48, 149-158. - Preda C. , Saporta G. (2005b) Clusterwise PLS
regression on a stochastic process,
Computational Statistics and Data Analysis, 49,
99-108. - Preda C., Saporta G. Lévéder C., (2007) PLS
classification of functional data, Computational
Statistics - Ramsay Silverman (1997) Functional data
analysis, Springer