Title: Columbia University
1Columbia University Advanced Machine Learning
Perception Fall 2006 Term Project
Nonlinear Dimensionality Reduction and K-Nearest
Neighbor Classification Applied to Global Climate
Data
Carlos Henrique Ribeiro Lima New York Dec/2006
2Outline
- Goals
- Motivation and Dataset
- Methodology
- Results
- Low-Dimensional Manifold
- KNN on Low-Dimensional Manifold
- Conclusion
31. Goals
- Use of kernel PCA based on Semidefinite
Embedding to identify the low-dimensional,
non-linear, manifold of climate data sets ?
identification of main modes of spatial
variability - Classification on the feature space ?
predictions on the original space (KNN method)
42. Motivation
Dataset of Monthly Sea Surface Temperature (SST)
Huge economical and social impacts of extreme El
Nino events (e.g. 1997) ? Need of forecasting
models!
52. Dataset
- Monthly Sea Surface Temperature (SST) Data
- from Jan/1856 to Dec/2005
- Latitudinal Band 25oS-25oN
- Grid with 599 cells
- Training data Jan/1856 to Dec/1975 120 years
- Testing set Jan/1976 to Dec/2005 30 years
- Input matrix
n 1440 points m 599 dimensions
63. Methodology
1) Semidefinite Embedding (Code from K. Q.
Weinberger)
Semipositive definiteness
Inner product centered on the origin
Isometry - local distances of the input space are
preserved on the feature space
2) KNN ? Euclidian Distance 3) Probabilistic
Forecasting ? Skill Score (RPS)
74. Results Low-Dimensional Manifold
84. Results Labeling on the feature space
94. Results Forecasts Testing Set KNN method
and skill score
E.g. March 1997 1) Want to predict the class
of nino3 in Dec/1997 ? lead time 9 months. 2)
KNN on feature space (March1856 to 1975) 3)
Take classes and weights of the k neighbors 4)
Skill score.
104. Results Forecasts Testing Set KNN method
and skill score El Nino of 1982 and 1997
11- 5. Conclusions
- Semidefinite Embedding performs well on the SST
data (high dimensional ? just 3 dimensions 90of
exp. variance) - KNN method provides very good classification and
forecasts - Need to check sensibility to change in some
parameters ( local neighbors, KNN) - Plan to extend to other climate datasets
- Try other metrics, multivariate data, etc.