Multivariate Resolution in Chemistry

About This Presentation

Title:

Multivariate Resolution in Chemistry

Description:

Evolving Factor Analysis HPLC-DAD example D Wavelengths Retention times Spectrum Chromatogram Evolving Factor ... and optimisation methods to find out the best values ... – PowerPoint PPT presentation

Number of Views:161

Avg rating:3.0/5.0

Slides: 79

Provided by: CID74

Category:

more less

Transcript and Presenter's Notes

Title: Multivariate Resolution in Chemistry

1
Multivariate Resolution in Chemistry

Lecture 1

Roma Tauler IIQAB-CSIC, Spain e-mail
rtaqam_at_iiqab.csic.es
2
Lecture 1

Introduction to data structures and
soft-modelling methods.
Factor Analysis of two-way data Bilinear models.
Rotation and intensity ambiguities.
Pseudo-rank, local rank and rank deficiency.
Evolving Factor Analysis.

3
Chemical sensors and analytical data
structures one variable x1 e.g. pH two
variables x1,x2 e.g pH i T three variables x1,
x2 and x3 e.g. pH, T i P n variables ?????
pH

P
4
Data Structures Zero order Zero-way data
x one sample gives one scalar (tensor 0th
order) Examples - selective electrodes, pH -
absorption at one wavelength ? - height/area
chromatographic peak Assumptions - total
selectivity - known lineal response Tools -
univariate algebra and statistics
Advantages - simple and easy to
understand Disadvantages - only one compound
information - total selectivity - one
sensor for every analyte - low information
content
Time
x
x
x
x
x
x
hi
x
x
x
ci
5
Data Structures First order One-way data
x1, x2, ....., xn one sample gives one vector
(tensors of order 1) Examples - matrix of
sensors - absorption at many ? (spectra) -
chromatograms at a single ? - current
intensities at many E - readings with time
(kinetics) -.................. Assumptions
- known lineal responses - different and
independent responses Tools - linear algebra
- multivariate statistics - spectral analysis
- chemometrics (PCA,MLR, PCR, PLS...) Advantages
- Calibration in presence of interferences
is possible - Multicomponent analysis is
possible Disadvantages Interferences should be
present in calibration samples
Chromatogram
Time
6
Data Structures Second order / Two-way data
xij each sample gives a data table/matrix
tensor of order 2 X ? xkykT Examples -
LC-DAD LC-FTIR GC-MS LC-MS FIA-DAD
CE-MS,.. (hyphenated techniques) - esp.
excitation/emission (fluorescence) - MS/MS, NMR
2D, GCxGC-MS ... - spectroscopic/voltammetric
monitoring of chemical
reactions/processes with pH, time, T,
etc. Assumptions - linear responses -
sufficient rank (of the data matrices) Tools -
linear algebra - chemometrics Advantages -
calibration for the analyte in the presence of
interferences not modelled in calibration
samples is possible - full characterization of
the analyte and interferents may be possible -
few calibration samples are needed (only one
sample calibration)
7
Data Structures Third order Three-way data
xijk each sample gives a data cube tensor of
order 3 X ? xkykzk Examples - Several
spectroscopic matrices - Several hyphenated
chromatographic - Hyphenated multidimensional
chromatography (GC x GC / MS) -
excitation/emission/time .............. Assumption
s - bilinear/trilinear responses - sufficient
rank (of the data matrices) Tools - multilinear
and tensor algebra - chemometrics Advantages
- unique solutions (no ambiguities) -
calibration for the analyte in the presence of
interferences not modelled in calibration
samples is possible - full characterization of
the analyte and interferents is possible - few
calibration samples are needed (only one sample
calibration)
Multi-way data analysis (PARAFAC, GRAM)
Extended multivariate resolution

8
0th order data ISE, pH,.. 1th order data
spectra 2nd order data LC/DAD
GC/MS fluorescence 3rf order data
time/ /excitation/ /emission
9
Examples Chemical reaction systems monitored
using spectroscopic measurements (even at
femtosecond scale) to follow the evolution of a
reaction with time, pH, temperature, etc., and
the detection of the formation and disappearance
of intermediate and transient species Monitoring
chemical reactions.
P
e
r
i
s
t
a
l
t
i
c
p
u
m
p
D (NR,NC)
S
p
e
c
t
r
o
p
h
o
t
o
m
e
t
e
r
pH
0
.
0
5
0
m
l
wavelength
o
T

3
7
C
T
h
e
r
m
o
s
t
a
t
i
c
b
a
t
h
10
Examples Quality control and optimisation of
industrial batch reactions and processes, where
on-line measurements are applied to monitor the
process. Process analysis
probe
wavelength
D (NR,NC)
time
11
Examples Analytical characterisation of complex
environmental, industrial and food mixtures using
hyphenated (chromatography, continuous flow
methods with spectroscopic detection) Chromatograp
hic Hyphenated techniques LC-DAD, GC-MS, LC-MS,
LC-MS/MS....
D (NR,NC)
time
wavelength
12
Examples FIA-DAD-UV with pH gradient for the
analysis of a mixture of drugs.
D (NR,NC)
pH
wavelength
13
Examples Analytical characterisation of complex
sea-water samples by means of Excitation-Emission
spectra for an unknown with tripheniltin (in the
reaction with flavonol) Excitation emission
(fluorescence) EEM techniques
D (NR,NC)
excitation
emission
14
Examples Protein folding and dynamic
protein-nucleic acid interaction processes. In
the post-genomic era, understanding these
biochemical complex evolving processes is one of
the main challenges of the current proteomics
research. Conformation changes
Primary structure
Secondary structure
Tertiary structure
Quaternary structure
Val Leu Ser Ala Asp Ala Trp Gly Val His
?-helix
?-sheet
turn
Random coil
Amino acids
Globule formation
Assembled subunits
Helix, sheet formation
D (NR,NC)
Temperature
wavelength
15
Examples Image analysis of spatially distributed
chemicals on 2D surfaces measured using coupled
microscopy-spectroscopy techniques in geological
samples, biological tissues or food
samples. Spectroscopic Image analysis
16
Data Structures in Chemistry Experimental Data
two orders/ways/modes of measurement
D(NR,NC)
row-order (way,mode) i.e. usually change in
chemical composition (concentration order)
column order (way,mode) i.e usually change in
system properties like in spectroscopy,
voltammetry,... (spectral order)
17
Chemical data tables (two-way data)
J variables (wavelengths)
Instrumental measurements (spectra, voltammograms
,...)
Data table or matrix
concentration changes measurements (time,
tempera-ture, pH, ....
I spectra (times)
D
Plot of spectra (rows)
Plot of elution profiles (columns)
18
Chemical data modelling

Chemical data modelling methods may be divided
in
Hard- modelling methods (deterministic)
Soft-modelling methods (data driven)
Hybrid hard-soft modelling methods

Soft modelling
Hard modelling
Physical Hard Model
Analytical Information
Data
Data
?
Data driven soft model
Physical Model
Analytical Information
19
Hard-modelling

Hard-modelling approaches for chemical
(stationary, dynamic, evolving) systems are
based on an accurate physical description of the
system and on the solution of complex systems of
(differential) equations fitting the experimental
measurements describing the evolution and
dynamics of these systems. They are deterministic
models.
Hard-modelling methods usually use non-linear
least squares regression (Marquardt algorithm)
and optimisation methods to find out the best
values for the parameters of the model.
Hard-modelling usually deal with univariate data.
It has been often used in the past until the
advent of modern instrumentation and computers
giving large amounts of data outputs.
Hard-modelling is often successful for laboratory
experiments, where all the variables are under
control and the physicochemical nature of the
dynamic model is known and can be fully described
using a known mathematical model

20
Hard-modelling

However, and even at a laboratory level, there
are examples where hard-modelling requirements
and constraints are not totally fulfilled or no
physicochemical model is known to describe the
process (e.g. in chromatographic separations or
in protein folding experiments).
Data sets obtained from the study of natural and
industrial evolving processes are too complex and
difficult to analyse using hard-modelling
methods. In these cases, there is no known
physical model available or it is too complex to
be set in a general way.
Advanced hard-modelling in industrial
applications has been attempted to model
experimental difficulties, such as changes in
temperature, pH, ionic strength and activity
coefficients. This is a very difficult task!

Data Fitting in the Chemical Sciences P. Gans,
John Wiley and Sons, New York 1992
21
Hard modelling

Output C, S and model parameters.
The model should describe all the variation in
the experimental measurements.

22
Soft-modelling

Soft-modelling instead, attempts the description
of these systems without the need of an a priori
physical or (bio)chemical model postulation. The
goal of the latter methods is the explanation of
the variations observed in the systems using the
minimal and softer assumptions about data. They
are data driven models.
Soft models usually give an improved analytical
description of the analysed process.
Soft modelling needs more data than
hard-modelling. Soft modelling methods deal with
multivariate data. Its use has augmented in the
recent years because of the advent of modern
analytical instrumentation and computers
providing large amounts of data outputs.
The disadvantage of soft models is their poorer
extrapolating capabilities (compared with
hard-modelling).

23
Soft-modelling

A soft model is hardly able to predict the
behaviour of the system under very different
conditions from which it was derived.
Complex multivariate soft-modelling data analysis
methods have been introduced for the study of
chemical processes/systems like Factor Analysis
derived methods.
Factor Analysis is a multivariate technique for
reducing matrices of data to their lowest
dimensionality by the use of orthogonal factor
space and transformations that yield predictions
and/or recognizable factors.

Factor Analysis in Chemistry 3rd Edition,
E.R.Malinowski, Wiley, New York 2002
24
Soft modelling
ST
C
D
,
Constrained ALS optimisation LS (D,C) ? S LS
(D,S) ? C min (D CS)

Output C and S.
All absorbing contributions in and out of
the process are modelled.

25
Lecture 1

Introduction to data structures and
soft-modelling methods.
Factor Analysis of two-way data Bilinear models.
Rotation and intensity ambiguities.
Pseudo-rank, local rank and rank deficiency.
Evolving Factor Analysis.

26
Soft-modelling
Factor Analysis (Bilinear Model)
experimental data is modelled as a linear sum of
weighted (scores) factors (loadings)
In matrix form
data scores loadings
27
Soft-modelling
BILINEARITY
Assumption Bilinearity (the contributions of the
components in the two orders of measurement are
additive)
28
Soft-modelling
GOALS OF BILINEAR MODEL
0.35
0.35
0.3
0.3
0.25
0.25

0.2
0.2
0.15
0.15
0.1
0.1
0.05
0.05
0
0
0
50
100
0
20
40
60
Recovery of the responses of every component
(chemical species) in the different modes of
measurement
29
Soft-modelling Factor Analysis
Real Factor Models
Predictions
30
Soft-modelling Factor Analysis (traditional
approach)
matrix multiplication
Covariance matrix
Data matrix
decomposition
combination
abstract reproduction
Abstract Factors
Real Factors
target transformation
abstract rotation
New Abstract Factors
31
Soft-modelling methods (I)

Factor Analysis methods based on the use of
latent variables or eigenvalue/singular value
data matrix decompositions. Examples
PCA, SVD, rotation FA methods
Evolving Factor Analysis methods
Rank Annihilation methods
Window Factor Analysis methods
Heuristic Evolving Latent Projections methods
Subwindow Factor Analysis methods
..

32
Soft-modelling methods (II)

Multivariate Resolution methods do a data matrix
decomposition into their pure components
without using explicitly latent variables
analysis techniques. Examples
SIMPLISMA
Orthogonal Projection Approach (OPA),
Positive Matrix Factorization methods (and
Multilinear Engine extensions)
Multivariate Curve Resolution-Alternating Least
Squares (MCR-ALS)
Gentle
.....

33
Soft-modelling methods (III)

Three-way and Multiway methods
which decompose three-way or multiway data
structures. Examples
Multiway and multiset extensions of PCA
Genralized rank Annihilation, GRAM Direct
Trilear Decomposition (DTD, TLD)
Multiway and multiset extensions of MCR-ALS
methods
PARAFAC-ALS
Tucker3-ALS
.......

34
Soft-modelling
Factor Analysis in Chemistry, 3rd Ed.,
E.R.Malinowski, John Wiley Sons, New York,
2002 Principal Component Analysis, I.T. Jollife,
2nd Ed., Springer, Berlin, 2002 Multiway
Analysis, Applications in the Chemical Sciences,
A.Smilde, R.Bro and P.Geladi, John Wiley Sons,
New York, 2004 Multivariate Image Analysis,
P.Geladi, John Wiley and Sons, 1996 Soft modeling
of Analytical Data. A.de Juan, E.Casassas and
R.Tauler, Encyclopedia of Analytical Chemistry
Instrumentation and Applications, Edited by
R.A.Meyers, John Wiley Sons, 2000, Vol 11,
9800-9837
35
Soft-modelling
Data structures Type of Models One way data
(vectors) ? Linear and non-linear models
di b0 b ci di
fnon-linear(ci) Two way data (matrices) ?
Bilinear and non-bilinear models
Non-bilinear data can still
be linear in one of the two
modes Three-way data (cubes) ? Trilinear and
non-trilinear models
Non-trilinear data can still be bilinear in two
modes
di
I samples
J variables
dij
I samples
D
36
Soft-modelling
Bilinear models for two way data
J
dij
I
D
dij is the data measurement (response) of
variable j in sample i n1,...,N are the number
of components (species, sources...) cin is the
concentration of component n in sample i snj is
the response of component n at variable j
37
Soft-modelling
Bilinear models for two way data
J
J
J
U or C
VT or ST
N
D
E
?
I

I
I
N ltlt I or J
N
PCA D UVT E U orthogonal, VT orthonormal VT
in the direction of maximum variance Unique
solutions but without physical meaning Useful
for interpretation but not for resolution!
MCR D CST E Other constraints
(non-negativity, unimodality, local rank, ) UC
and VT ST non-negative,... C or ST
normalization Non-unique solutions but with
physical meaning Useful for resolution (and
obviously for interpretation)!
38
PCA Model (Principal Component Analysis) X U
VT E U scores matrix (orthogonal) VT
loadings matrix (orthonormal) SVD Model
(Singular Value Decomposition) D U S VT
E U scores matrix (orthonormal) S diagonal
matrix of the singular values s s ?1/2 ?
eigenvalues of the covariances matrix DDT VT
loadings matrix (orthonormal)
39
PCA Model D U VT
unexplained variance
VT
D
E

loadings (projections)
U
scores
D u1v1T u2v2T unvnT E
n number of components (ltlt number of variables in
D)

.

D
u1v1T
u2 v2T
unvnT
E
rank 1
rank 1
rank 1
40

PCA Model
X U VT E
X structure noise
It is an approximation to the experimental data
matrix X
Loadings, Projections VT relationships between
original variables
and the principal components (eigenvectors of the
covariances matrix).
Vectors in VT (loadings) are orthonormals
(orthogonal and normalized).
Scores, Targets U relationships between the
samples (coordinates of
samples or objects in the space defined by the
principal components
Vectors in U (scores) are orthogonal
Noise E Experimental error, non-explained
variances

Summary of Principal Component Analysis PCA
Formulation of the problem to solve
Plot of the original data
3. Data pretreatment.
(data centering, autoscaling, logarithmic
transformation)
4. Built PCA model. Determination of the number
of
components. Graphical inspection of
explained/residual
plots)
5. Study of the PCA model PCA. Multivariate
data exploration
- loadings plot gt map of the variables
- scores plot gt map of the samples
Interpretation of the PCA mode. Identification of
the
main sources of data variance
7. Analysis of the residuals matrix E D -U VT

42
PCA
U scores
VT loadings
43
Multivariate Curve Resolution (MCR)
Pure component information
s1
?
sn
ST
c
c
n
1
C
Retention times
Pure signals Compound identity source
identification and Interpretation
Pure concentration profiles Chemical
model Process evolution Compound
contribution relative quantitation
44
Lecture 1

Introduction to data structures and
soft-modelling methods.
Factor Analysis of two-way data Bilinear models.
Rotation and intensity ambiguities.
Pseudo-rank, local rank and rank deficiency.
Evolving Factor Analysis.

45
Factor Analysis Ambiguities in the analysis of a
data matrix (two-way data)
Rotation and scale/intensity ambiguities
Rotation Ambiguities
Factor Analysis (PCA) Data Matrix
Decomposition D U VT E True Data Matrix
Decomposition D C ST E
46
Factor Analysis Ambiguities in the analysis of a
data matrix (two-way data)
Rotation and scale/intensity ambiguities
Rotation Ambiguities
D U T T-1 VT E C ST E C U T ST
T-1 VT
How to find the rotation matrix T?
47
Rotation and scale/intensity ambiguities

D C ST E D E
Cnew C T
( NR,N) (NR,N) (N,N)
STnew T-1 ST
(N,NC) (N,N) (N,NC)
D C ST CnewSTnew
Matrix decomposition is not unique!
T(N,N) is any non-singular matrix
Rotational freedom for any T

48
Rotation and scale/intensity ambiguities
Rotation ambiguities and rotation matrix T(N,N)
49
Rotation and scale/intensity ambiguities
Intensity (scale) ambiguities
For any scalar k
Intensity/scale ambiguities make difficuly to
obtain quantitative information When they are
solved then it is also possible to have
quantitative information

50
Rotation and scale/intensity ambiguities
Intensity (scale) ambiguities
cold x k cnew
cold sold ( cold x k)(1/k x sold) cnew snew
x
x
1/k x sold snew
51
Rotation and scale/intensity ambiguities

Questions to answer
Is it possible to have unique solutions?
What are the conditions to have unique solutions?
If total unique solutions are not possible
Is it still possible at least to find out some of
the possible solutions?
Is it possible to have an estimation of the band
or range of possible/feasible solutions?
How this range of feasible solutions can be
reduced?

52
Lecture 1

Introduction to data structures and
soft-modelling methods.
Factor Analysis of two-way data Bilinear models.
Rotation and intensity ambiguities.
Pseudo rank, local rank and rank deficiency.
Evolving Factor Analysis.

53
Definitions Mathematical rank of a data matrix
is the minimum number of linearly independent
rows or columns describing the variance of the
whole data set. Minimum number of basis vectors
spanning the row and column vector spaces. It may
be obtained by SVD or PCA. Pseudo-rank or
Chemical rank is the mathematical rank in absence
of experimental error/noise. Usually it is equal
to the number of chemical/physical components
contributing to the observed data variance apart
from experimental noise/error. Obtained from the
number of larger components from PCA, SVD or
other FA methods Local Rank is the chemical
rank of data submatrices. Obtained from EFA,
EFF, SIMPLISMA, OPA, or other FA submatrix
analysis methods Rank deficiency when chemical
rank is lower than the known number of
contributions. Rank deficiency may be
broken/solved by data matrix augmentation and
perturbation strategies. Rank overlap rank
deficiency caused by equal vector profiles of
different chemical/physical components in one or
more modes.
54
Pseudo Rank Number of contributions (factors,
components)
Principal Component Analysis
Gives an abstract (orthogonal) bilinear model to
describe optimally the variation in our data set.
Useful chemical information Size of the
model (chemical rank)
Number of chemical contributions
55
Pseudo Rank Number of contributions (factors,
components)
56
Pseudo Rank Number of contributions (factors,
components)
Principal Component Analysis (SVD algorithm)
large size ?
small size ?
57
Pseudo Rank Number of contributions (factors,
components)

Overestimations of rank (overfitting).
Large overestimation the measurements may not
follow a bilinear model.
Small overfestimation presence of structured
noise or high noise levels.

Underestimations of rank (rank deficiency).
Linear dependencies
Contributions with very similar signals or
concentration profiles.
Compounds with non-measurable signals.
Minor compounds.

58
Rank deficiency

Are all the signals distinguishable and
independent?
Are all the concentration profiles
distinguishable and independent?

No ? Rank- deficient systems
Detectable rank lt nr. of process contributions
Examples 1) 2nd order reaction A B C, B
C, 3 chemical species/contributions, but Rank
2 2) Enantiomer conversion monitored by UV and
the spectrum D spectrum L, two chemical
species/components but Rank 1 (Rank overlap)
59
Rank deficiency

Closed reaction systems. Some concentration
profiles are described as linear combinations of
others.
System HA / A-, HB / B-
CA HA A-
CB HB B-
CB kCA
HA, HB
A- CA - HA
B- CB - HB kCA - HB

Rank 3
HA, HB, A-, B- ? f (HA, HB, CA)
60
Breaking rank-deficiency by matrix augmentation
Rank deficiency

Matrix Augmentation in the rank-deficient
direction

Data set
Rank 4
61
Lecture 1

Introduction to data structures and
soft-modelling methods.
Factor Analysis of two-way data Bilinear models.
Rotation and intensity ambiguities.
Pseudo-rank and rank deficiency.
Local Rank and Evolving Factor Analysis.

Local exploratory analysis
Study of the variation of the number of
contributions in the process or system. Study of
the rank variation during the process.
Evolving Factor Analysis (EFA)
Fixed Size Moving Window - Evolving Factor
Analysis (FSMW-EFA)

63
Evolving Factor Analysis

Stepwise chemometric monitoring of a process.
Forward Evolving FA (from beginning to end)
Backward Evolving FA (from end to beginning)

Working procedure Display of subsequent PCA
analyses along gradually increasing data set
windows.
64
Evolving Factor Analysis

HPLC-DAD example

D
65
Evolving Factor Analysis

Forward Evolving Factor Analysis

66
Evolving Factor Analysis

Forward Evolving Factor Analysis

Noise level
67
Evolving Factor Analysis

Backward Evolving Factor Analysis

68
Evolving Factor Analysis

Backward Evolving Factor Analysis

Noise level
69
Evolving Factor Analysis

Combined EFA plot (forward and backward EFA)

70
Evolving Factor Analysis
Consecutive emergence-decay profiles. No embedded
compounds.

Sequential processes

71
Evolving Factor Analysis

Approximate concentration profiles

EFA derived concentration profiles
Real concentration profiles
72
Fixed Size Moving Window-Evolving FA (FSMW-EFA)

Local rank map along the process direction or the
signal direction.

Working procedure Subsequent PCA in fixed size
windows moving stepwisely along the data set.
Window size ? min(number of components 1)
73
FSMW-EFA
74
FSMW-EFA
Noise level
75
EFA
EFA
Local rank detection
FSMW-EFA
FSMW-EFA
76
FSMW-EFA vs. EFA

EFA
Displays the evolution of the process.
The compounds are well identified (concentration
windows)
Local rank information is not easily interpreted.

FSMW-EFA
Clear definition of local rank.
Sensitive to detection of minor compounds.
The idea of process evolution is not preserved.

77
Getting Local rank information from Evolving
Factor Analysis methods

Detection of the selective windows or regions
where only one species exists (total selectivity)
Detection of zero concentration windows or
regions (no species is present)
Detection of windows or regions where a
particular species is not present
Detection of the concentration windows or regions
where one species is present (other species can
coexist)

78
References

EFA
H. Gampp, M. Maeder, C.J. Meyer and A.D.
Zuberbühler. Talanta, 32, 1133-1139 (1985).
M. Maeder. Anal. Chem. 59, 527-530 (1987).
FSMW-EFA
H.R. Keller and D.L. Massart. Anal. Chim. Acta,
246, 379-390 (1991).
SIMPLISMA
W. Windig and J. Guilment. Anal. Chem., 63,
1425-1432 (1991).