Title: Introduction to Chemometrics
1Introduction to Chemometrics
- Sergey Kucheryavskiy
- ACABS research group
- Aalborg University, campus Esbjerg
- www.acabs.dk
2What is Chemometrics?
- WikipediaChemometrics is the application of
mathematical or statistical methods to chemical
data - International Chemometrics SocietyChemometrics
is the science of relating measurements made on a
chemical system or process to the state of the
system via application of mathematical or
statistical methods - D. L. MassartChemometrics is the chemical
discipline that uses mathematical, statistical
and other methods employing formal logic to
design or select optimal measurement procedures
and experiments, and to provide maximum relevant
chemical information by analyzing chemical data
3What is Chemometrics?
- B.M. Mariyanov, Lectures on Chemometrics, Tomsk
State University - Quality control in chemical analysis
- Mathematical modeling
- Processing of analyte signals
- Pattern recognition
- Databases. Artificial intelligence
- Application of Information Theory in Chemistry
4What is Chemometrics?
- A.V. Garmash, Introduction to Chemometrics and
Chemical Metrology, Moscow State University - Mathematical statistics basics
- Normal distribution
- Variance analysis
- Variance analysis in Chemistry
- Correlation analysis
- Classification and identification. Pattern
recognition - Regression and calibration
- Design of experiments
Chemometrics Statistics? Chemometrics
Statistics in Analytical Chemistry?
5What is Chemometrics?
- Chemometrics is a data analytical discipline,
such as - Deals with multivariate (and multiway) data
- Based on soft modeling
- Uses projection methods and concept of latent
variables - Considers data as information noise
- Considers noise as useless information
6Multivariate data
Variables
Samples
7Multivariate and multiway data
Many inputs induce an effect Many effects are
derived from one input etc
8Many variables and many samples
One measurement spectrum (600 points)
9Soft and hard modeling
10Projection methods and latent variables
Data without structure
Data with hidden structure
11Projection methods and latent variables
Formal dimension number of variables Effective
dimension number of latent variables that cover
all data variance
Formal dimension 3
12Projection methods and latent variables
Formal dimension number of variables Effective
dimension number of latent variables that cover
all data variance
Effective dimension 1
13Projection methods and latent variables
- Projection to latent variable subspace
- allows to reduce dimension
- provides possibility for visual data analysis
How to find latent variables?
14Principal component space
- Choose latent variable (first principal
component, PC1) along direction of maximum
variance - Project all samples to PC1
- Residual variance
- considered as noise (useless information)
- modeled by PC2
PC1
15Principal component space
16Principal component analysis
Loadings
Scores
Raw data
Residuals
Data
Model
Noise
X
TPT
E
TPT
Explained variance
Residual variance
17PCA Scores
18PCA Scores
T
- Row sample PC coordinates
- Column projection of samples to PC
19PCA Loadings
PT
- Row PC basis vector in variable space
- Column projection of variable basis vector to
PC space
20??????? ???????? E
. . . . .
- ei distance from sample to PC space
- e2tot residual variance
ei
21Example 1 Wine analysis
Three types of Italian wine grown in the same
region but fermented from three different
cultivars (grapes) 178 samples x 13 variables
22Example 1 Wine analysis
Scores and Loadings plots PCA main tools
Scores
Loadings
23Example 2 AMT analysis of coated pellets
1
2
3
4
AMT-spectra
PCA scores
24PCA conclusions
- Principal Component Analysis
- Works with X data measurements, observations,
etc. - Chooses latent variables (Principal Components)
along maximum variance directions - Scores and Loadings plots are the main PCA tools
- Principal components are orthogonal!
25Multivariate calibration
Spectra
Concentrations
26Example 3 polyaromatic hydrocarbons
27Example 3 polyaromatic hydrocarbons
Simulated spectra
28Example 3 polyaromatic hydrocarbons
MLR-regression
29Projection on Latent Structure
- Both X and Y data modelled jointly
- Two sets of scores (T, U) and loadings (P, Q)
plus loading-weights (W) are calculated - Calculations are iterative, aim to maximaze
Covariance(T, U) - Prediction Y Tnew Bt Y Xnew B B
W(PTW)-1QT
X TPT Ex Y UQT Ey
30Example 3 polyaromatic hydrocarbons
PLS-regression
C1
C2
C3
31Example 3 polyaromatic hydrocarbons
PLS
MLR
C1
C3
32Example 4 estimation of octane number
K. Esbensen. Multivariate Data Analysis in
Practice. Camo, 2002
33Example 4 estimation of octane number
34What Chemometrics can do?
Tasks
Methods
PCA
Dimension reduction Analysis of data structure
Discrimination and classification
PCA, SIMCA
Multivariate calibration
PCR, PLS
Prediction
and much more!
35What Chemometrics can do?
Multivariate Curve Resolution
Multivariate Image Analysis
MSPC/PAT
Nonlinear regression
Multivay analysis
Design of Experiments
Chemometrics
Linear algebra
Statistics
Sampling
Matlab
Excel
Instruments