Title: 2. The PARAFAC model
12. The PARAFAC model
- Quimiometria Teórica e Aplicada
- Instituto de QuÃmica - UNICAMP
2Example fluorescence data (1)
Each fluorescence spectrum is a matrix of
emission vs excitation wavelengths Xi (201 ? 61)
3Example fluorescence data (2)
- Each spectrum is a linear sum of three
components tryptophan, phenylalanine and
tyrosine.
Xi ai1b1c1T ai2b2c2T ai3b3c3T Ei
Ei
4Example fluorescence data (3)
- Five samples were measured and stacked to give a
three-way array X (5 ? 201 ? 61).
E
5Example fluorescence data (4)
- If we are given a set of fluroescence spectra, X,
how can we determine
- How many chemical species are present?
- Which chemical species are present? What are
their pure excitation and emission spectra? - i.e. self-modelling curve resolution (SMCR)
- What is the concentration of each species in each
sample? - i.e. (second-order) calibration
- Answer use the PARAFAC model!
6The PARAFAC model (1)
7The PARAFAC model (2)
X
K
I
J
- Loadings
- A (I ? R) describes variation in the first mode.
- B (J ? R) describes variation in the second mode.
- C (K ? R) describes variation in the third mode.
- Residuals
- E (I ? J ? K) are the model residuals.
8Example fluorescence data (5)
X
- Loadings
- A (5 ? 3) describes the component concentrations.
- B (201 ? 3) describes the pure component emission
spectra. - C (61 ? 3) describes the pure component
excitation spectra.
- Residuals
- E (5 ? 201 ? 61) describes instrument noise.
9Example fluorescence data (6)
- A 3-component PARAFAC model describes 99.94 of X.
10Example fluorescence data (7)
- The A-loadings describe the relative amounts of
species 1 (tryptophan), 2 (tyrosine) and 3
(phenylalanine) in each sample
Concentrations (ppm)
2.6685
0.0141
0.0471
1.5455
- In order to know the absolute amounts, it is
necessary to use a standard of known
concentrations, i.e. sample 5.
11The PARAFAC formula
XI?JK A(C?B)T EI?JK
- Data array
- X (I ? J ? K) is matricized into XI?JK (I ? JK)
- Loadings
- A (I ? R) describes variation in the first mode
- B (J ? R) describes variation in the second mode
- C (K ? R) describes variation in the third mode
- Residuals
- E (I ? J ? K) is matricized into EI?JK (I ? JK)
12PCA vs PARAFAC
PCA
PARAFAC
Components are calculated sequentially in order
of importance.
Components are calculated simultaneously in
random order.
Orthogonal, i.e. BTB I
Not (usually) orthgonal.
Solution is unique (i.e. not possible to rotate
factors without losing fit).
Solution has rotational freedom.
13Rotational freedom
- The bilinear model X ABT E contains
rotational freedom. There are many sets of
loadings (and scores) which give exactly the same
residuals, E
X ABT E
ARR-1BT E
ABT E (AAR BTR-1BT)
- This model is not unique there are many
different sets of loadings which give the same
fit.
14PARAFAC solution is unique
- The trilinear model X A(C?B)T E is said to be
unique, because it is not possible to rotate the
loadings without changing the residuals, E
X A(C?B)T E
ARR-1(C?B)T E
A(C?B)T E
- This is why PARAFAC is able to find the correct
fluorescence profiles because the unique
solution is close to the true solution.
15Spot the difference!
PCA loadings
PARAFAC loadings
16Alternating least squares (ALS)
- How to estimate the PCA model X ABT E?
- Step 3 - Check for convergence - if not, go to
Step 1.
17Three different unfoldings the formula is
symmetric
XI?JK A(C?B)T EI?JK
XI?JK
or
XJ?KI B(A?C)T EJ?KI
XJ?KI
or
XK?IJ C(B?A)T EK?IJ
XK?IJ
18How is the PARAFAC model calculated?
- How to estimate the model X A(C?B)T E?
- Step 4 Check for convergence. If not, go to Step
1.
19Good initialization is sometimes important
response surface
- Initialization methods
- random numbers (do this ten times and compare
models) - use another method to give rough estimate (e.g.
DTLD, MCR) - use sensible guesses (e.g. elution profiles are
Gaussian)
20Conclusions (1)
- The PARAFAC model decomposes a three-way array
array into three sets of loadings one for each
mode.Each set of loadings describes the
variation in that mode, e.g. differences in
concentration, changes in time, spectral profiles
etc. - PARAFAC components are calculated together and
have no particular order. PARAFAC components are
not orthogonal and cannot be rotated. - PARAFAC can be used for curve resolution and for
calibration.
21Conclusions (2)
- Some data sets have a chemical structure which is
particularly suitable for the PARAFAC model, e.g.
fluorescence spectroscopy. - The PARAFAC model can also be used for four-way,
five-way, N-way etc. data by simply using more
sets of loadings.