Multivariate Resolution in Chemistry - PowerPoint PPT Presentation

1 / 78
About This Presentation
Title:

Multivariate Resolution in Chemistry

Description:

Evolving Factor Analysis HPLC-DAD example D Wavelengths Retention times Spectrum Chromatogram Evolving Factor ... and optimisation methods to find out the best values ... – PowerPoint PPT presentation

Number of Views:155
Avg rating:3.0/5.0
Slides: 79
Provided by: CID74
Category:

less

Transcript and Presenter's Notes

Title: Multivariate Resolution in Chemistry


1
Multivariate Resolution in Chemistry
  • Lecture 1

Roma Tauler IIQAB-CSIC, Spain e-mail
rtaqam_at_iiqab.csic.es
2
Lecture 1
  • Introduction to data structures and
    soft-modelling methods.
  • Factor Analysis of two-way data Bilinear models.
  • Rotation and intensity ambiguities.
  • Pseudo-rank, local rank and rank deficiency.
  • Evolving Factor Analysis.

3
Chemical sensors and analytical data
structures one variable x1 e.g. pH two
variables x1,x2 e.g pH i T three variables x1,
x2 and x3 e.g. pH, T i P n variables ?????
pH









P
4
Data Structures Zero order Zero-way data
x one sample gives one scalar (tensor 0th
order) Examples - selective electrodes, pH -
absorption at one wavelength ? - height/area
chromatographic peak Assumptions - total
selectivity - known lineal response Tools -
univariate algebra and statistics
Advantages - simple and easy to
understand Disadvantages - only one compound
information - total selectivity - one
sensor for every analyte - low information
content
Time
x
x
x
x
x
x
hi
x
x
x
ci
5
Data Structures First order One-way data
x1, x2, ....., xn one sample gives one vector
(tensors of order 1) Examples - matrix of
sensors - absorption at many ? (spectra) -
chromatograms at a single ? - current
intensities at many E - readings with time
(kinetics) -.................. Assumptions
- known lineal responses - different and
independent responses Tools - linear algebra
- multivariate statistics - spectral analysis
- chemometrics (PCA,MLR, PCR, PLS...) Advantages
- Calibration in presence of interferences
is possible - Multicomponent analysis is
possible Disadvantages Interferences should be
present in calibration samples
Chromatogram
Time
6
Data Structures Second order / Two-way data
xij each sample gives a data table/matrix
tensor of order 2 X ? xkykT Examples -
LC-DAD LC-FTIR GC-MS LC-MS FIA-DAD
CE-MS,.. (hyphenated techniques) - esp.
excitation/emission (fluorescence) - MS/MS, NMR
2D, GCxGC-MS ... - spectroscopic/voltammetric
monitoring of chemical
reactions/processes with pH, time, T,
etc. Assumptions - linear responses -
sufficient rank (of the data matrices) Tools -
linear algebra - chemometrics Advantages -
calibration for the analyte in the presence of
interferences not modelled in calibration
samples is possible - full characterization of
the analyte and interferents may be possible -
few calibration samples are needed (only one
sample calibration)
7
Data Structures Third order Three-way data
xijk each sample gives a data cube tensor of
order 3 X ? xkykzk Examples - Several
spectroscopic matrices - Several hyphenated
chromatographic - Hyphenated multidimensional
chromatography (GC x GC / MS) -
excitation/emission/time .............. Assumption
s - bilinear/trilinear responses - sufficient
rank (of the data matrices) Tools - multilinear
and tensor algebra - chemometrics Advantages
- unique solutions (no ambiguities) -
calibration for the analyte in the presence of
interferences not modelled in calibration
samples is possible - full characterization of
the analyte and interferents is possible - few
calibration samples are needed (only one sample
calibration)
Multi-way data analysis (PARAFAC, GRAM)
Extended multivariate resolution

8
0th order data ISE, pH,.. 1th order data
spectra 2nd order data LC/DAD
GC/MS fluorescence 3rf order data
time/ /excitation/ /emission
9
Examples Chemical reaction systems monitored
using spectroscopic measurements (even at
femtosecond scale) to follow the evolution of a
reaction with time, pH, temperature, etc., and
the detection of the formation and disappearance
of intermediate and transient species Monitoring
chemical reactions.
P
e
r
i
s
t
a
l
t
i
c
p
u
m
p
D (NR,NC)
S
p
e
c
t
r
o
p
h
o
t
o
m
e
t
e
r
pH
0
.
0
5
0
m
l
wavelength
o
T

3
7
C
T
h
e
r
m
o
s
t
a
t
i
c
b
a
t
h
10
Examples Quality control and optimisation of
industrial batch reactions and processes, where
on-line measurements are applied to monitor the
process. Process analysis
probe
wavelength
D (NR,NC)
time
11
Examples Analytical characterisation of complex
environmental, industrial and food mixtures using
hyphenated (chromatography, continuous flow
methods with spectroscopic detection) Chromatograp
hic Hyphenated techniques LC-DAD, GC-MS, LC-MS,
LC-MS/MS....
D (NR,NC)
time
wavelength
12
Examples FIA-DAD-UV with pH gradient for the
analysis of a mixture of drugs.
D (NR,NC)
pH
wavelength
13
Examples Analytical characterisation of complex
sea-water samples by means of Excitation-Emission
spectra for an unknown with tripheniltin (in the
reaction with flavonol) Excitation emission
(fluorescence) EEM techniques
D (NR,NC)
excitation
emission
14
Examples Protein folding and dynamic
protein-nucleic acid interaction processes. In
the post-genomic era, understanding these
biochemical complex evolving processes is one of
the main challenges of the current proteomics
research. Conformation changes
Primary structure
Secondary structure
Tertiary structure
Quaternary structure
Val Leu Ser Ala Asp Ala Trp Gly Val His
?-helix
?-sheet
turn
Random coil
Amino acids
Globule formation
Assembled subunits
Helix, sheet formation
D (NR,NC)
Temperature
wavelength
15
Examples Image analysis of spatially distributed
chemicals on 2D surfaces measured using coupled
microscopy-spectroscopy techniques in geological
samples, biological tissues or food
samples.  Spectroscopic Image analysis
16
Data Structures in Chemistry Experimental Data
two orders/ways/modes of measurement
D(NR,NC)
row-order (way,mode) i.e. usually change in
chemical composition (concentration order)
column order (way,mode) i.e usually change in
system properties like in spectroscopy,
voltammetry,... (spectral order)
17
Chemical data tables (two-way data)
J variables (wavelengths)
Instrumental measurements (spectra, voltammograms
,...)
Data table or matrix
concentration changes measurements (time,
tempera-ture, pH, ....
I spectra (times)
D
Plot of spectra (rows)
Plot of elution profiles (columns)
18
Chemical data modelling
  • Chemical data modelling methods may be divided
    in
  • Hard- modelling methods (deterministic)
  • Soft-modelling methods (data driven)
  • Hybrid hard-soft modelling methods

Soft modelling
Hard modelling
Physical Hard Model
Analytical Information
Data
Data
?
Data driven soft model
Physical Model
Analytical Information
19
Hard-modelling
  • Hard-modelling approaches for chemical
    (stationary, dynamic, evolving) systems are
    based on an accurate physical description of the
    system and on the solution of complex systems of
    (differential) equations fitting the experimental
    measurements describing the evolution and
    dynamics of these systems. They are deterministic
    models.
  • Hard-modelling methods usually use non-linear
    least squares regression (Marquardt algorithm)
    and optimisation methods to find out the best
    values for the parameters of the model.
  • Hard-modelling usually deal with univariate data.
    It has been often used in the past until the
    advent of modern instrumentation and computers
    giving large amounts of data outputs.
  • Hard-modelling is often successful for laboratory
    experiments, where all the variables are under
    control and the physicochemical nature of the
    dynamic model is known and can be fully described
    using a known mathematical model

20
Hard-modelling
  • However, and even at a laboratory level, there
    are examples where hard-modelling requirements
    and constraints are not totally fulfilled or no
    physicochemical model is known to describe the
    process (e.g. in chromatographic separations or
    in protein folding experiments).
  • Data sets obtained from the study of natural and
    industrial evolving processes are too complex and
    difficult to analyse using hard-modelling
    methods. In these cases, there is no known
    physical model available or it is too complex to
    be set in a general way.
  • Advanced hard-modelling in industrial
    applications has been attempted to model
    experimental difficulties, such as changes in
    temperature, pH, ionic strength and activity
    coefficients. This is a very difficult task!

Data Fitting in the Chemical Sciences P. Gans,
John Wiley and Sons, New York 1992
21
Hard modelling
  • Output C, S and model parameters.
  • The model should describe all the variation in
    the experimental measurements.

22
Soft-modelling
  • Soft-modelling instead, attempts the description
    of these systems without the need of an a priori
    physical or (bio)chemical model postulation. The
    goal of the latter methods is the explanation of
    the variations observed in the systems using the
    minimal and softer assumptions about data. They
    are data driven models.
  • Soft models usually give an improved analytical
    description of the analysed process.
  • Soft modelling needs more data than
    hard-modelling. Soft modelling methods deal with
    multivariate data. Its use has augmented in the
    recent years because of the advent of modern
    analytical instrumentation and computers
    providing large amounts of data outputs.
  • The disadvantage of soft models is their poorer
    extrapolating capabilities (compared with
    hard-modelling).

23
Soft-modelling
  • A soft model is hardly able to predict the
    behaviour of the system under very different
    conditions from which it was derived.
  • Complex multivariate soft-modelling data analysis
    methods have been introduced for the study of
    chemical processes/systems like Factor Analysis
    derived methods.
  • Factor Analysis is a multivariate technique for
    reducing matrices of data to their lowest
    dimensionality by the use of orthogonal factor
    space and transformations that yield predictions
    and/or recognizable factors.

Factor Analysis in Chemistry 3rd Edition,
E.R.Malinowski, Wiley, New York 2002
24
Soft modelling
ST
C
D
,
Constrained ALS optimisation LS (D,C) ? S LS
(D,S) ? C min (D CS)
  • Output C and S.
  • All absorbing contributions in and out of
    the process are modelled.

25
Lecture 1
  • Introduction to data structures and
    soft-modelling methods.
  • Factor Analysis of two-way data Bilinear models.
  • Rotation and intensity ambiguities.
  • Pseudo-rank, local rank and rank deficiency.
  • Evolving Factor Analysis.

26
Soft-modelling
Factor Analysis (Bilinear Model)
experimental data is modelled as a linear sum of
weighted (scores) factors (loadings)
In matrix form
data scores loadings
27
Soft-modelling
BILINEARITY
Assumption Bilinearity (the contributions of the
components in the two orders of measurement are
additive)
28
Soft-modelling
GOALS OF BILINEAR MODEL
0.35
0.35
0.3
0.3
0.25
0.25

0.2
0.2
0.15
0.15
0.1
0.1
0.05
0.05
0
0
0
50
100
0
20
40
60
Recovery of the responses of every component
(chemical species) in the different modes of
measurement
29
Soft-modelling Factor Analysis
Real Factor Models
Predictions
30
Soft-modelling Factor Analysis (traditional
approach)
matrix multiplication
Covariance matrix
Data matrix
decomposition
combination
abstract reproduction
Abstract Factors
Real Factors
target transformation
abstract rotation
New Abstract Factors
31
Soft-modelling methods (I)
  • Factor Analysis methods based on the use of
    latent variables or eigenvalue/singular value
    data matrix decompositions. Examples
  • PCA, SVD, rotation FA methods
  • Evolving Factor Analysis methods
  • Rank Annihilation methods
  • Window Factor Analysis methods
  • Heuristic Evolving Latent Projections methods
  • Subwindow Factor Analysis methods
  • ..

32
Soft-modelling methods (II)
  • Multivariate Resolution methods do a data matrix
    decomposition into their pure components
    without using explicitly latent variables
    analysis techniques. Examples
  • SIMPLISMA
  • Orthogonal Projection Approach (OPA),
  • Positive Matrix Factorization methods (and
    Multilinear Engine extensions)
  • Multivariate Curve Resolution-Alternating Least
    Squares (MCR-ALS)
  • Gentle
  • .....

33
Soft-modelling methods (III)
  • Three-way and Multiway methods
  • which decompose three-way or multiway data
    structures. Examples
  • Multiway and multiset extensions of PCA
  • Genralized rank Annihilation, GRAM Direct
    Trilear Decomposition (DTD, TLD)
  • Multiway and multiset extensions of MCR-ALS
    methods       
  • PARAFAC-ALS
  • Tucker3-ALS
  • .......

34
Soft-modelling
Factor Analysis in Chemistry, 3rd Ed.,
E.R.Malinowski, John Wiley Sons, New York,
2002 Principal Component Analysis, I.T. Jollife,
2nd Ed., Springer, Berlin, 2002 Multiway
Analysis, Applications in the Chemical Sciences,
A.Smilde, R.Bro and P.Geladi, John Wiley Sons,
New York, 2004 Multivariate Image Analysis,
P.Geladi, John Wiley and Sons, 1996 Soft modeling
of Analytical Data. A.de Juan, E.Casassas and
R.Tauler, Encyclopedia of Analytical Chemistry
Instrumentation and Applications, Edited by
R.A.Meyers, John Wiley Sons, 2000, Vol 11,
9800-9837
35
Soft-modelling
Data structures Type of Models One way data
(vectors) ? Linear and non-linear models
di b0 b ci di
fnon-linear(ci) Two way data (matrices) ?
Bilinear and non-bilinear models
Non-bilinear data can still
be linear in one of the two
modes Three-way data (cubes) ? Trilinear and
non-trilinear models
Non-trilinear data can still be bilinear in two
modes
di
I samples
J variables
dij
I samples
D
36
Soft-modelling
Bilinear models for two way data
J
dij
I
D
dij is the data measurement (response) of
variable j in sample i n1,...,N are the number
of components (species, sources...) cin is the
concentration of component n in sample i snj is
the response of component n at variable j
37
Soft-modelling
Bilinear models for two way data
J
J
J
U or C
VT or ST
N
D
E
?
I

I
I
N ltlt I or J
N
PCA D UVT E U orthogonal, VT orthonormal VT
in the direction of maximum variance Unique
solutions but without physical meaning Useful
for interpretation but not for resolution!
MCR D CST E Other constraints
(non-negativity, unimodality, local rank, ) UC
and VT ST non-negative,... C or ST
normalization Non-unique solutions but with
physical meaning Useful for resolution (and
obviously for interpretation)!
38
PCA Model (Principal Component Analysis) X U
VT E U scores matrix (orthogonal) VT
loadings matrix (orthonormal) SVD Model
(Singular Value Decomposition) D U S VT
E U scores matrix (orthonormal) S diagonal
matrix of the singular values s s ?1/2 ?
eigenvalues of the covariances matrix DDT VT
loadings matrix (orthonormal)
39
PCA Model D U VT
unexplained variance
VT
D
E


loadings (projections)
U
scores
D u1v1T u2v2T unvnT E
n number of components (ltlt number of variables in
D)


.

D
u1v1T
u2 v2T
unvnT
E
rank 1
rank 1
rank 1
40
  • PCA Model
  • X U VT E
  • X structure noise
  • It is an approximation to the experimental data
    matrix X
  • Loadings, Projections VT relationships between
    original variables
  • and the principal components (eigenvectors of the
    covariances matrix).
  • Vectors in VT (loadings) are orthonormals
    (orthogonal and normalized).
  • Scores, Targets U relationships between the
    samples (coordinates of
  • samples or objects in the space defined by the
    principal components
  • Vectors in U (scores) are orthogonal
  • Noise E Experimental error, non-explained
    variances

41
  • Summary of Principal Component Analysis PCA
  • Formulation of the problem to solve
  • Plot of the original data
  • 3. Data pretreatment.
  • (data centering, autoscaling, logarithmic
    transformation)
  • 4. Built PCA model. Determination of the number
    of
  • components. Graphical inspection of
    explained/residual
  • plots)
  • 5. Study of the PCA model PCA. Multivariate
    data exploration
  • - loadings plot gt map of the variables
  • - scores plot gt map of the samples
  • Interpretation of the PCA mode. Identification of
    the
  • main sources of data variance
  • 7. Analysis of the residuals matrix E D -U VT

42
PCA
U scores
VT loadings
43
Multivariate Curve Resolution (MCR)
Pure component information
s1
?
sn
ST
c
c
n
1
C
Retention times
Pure signals Compound identity source
identification and Interpretation
Pure concentration profiles Chemical
model Process evolution Compound
contribution relative quantitation
44
Lecture 1
  • Introduction to data structures and
    soft-modelling methods.
  • Factor Analysis of two-way data Bilinear models.
  • Rotation and intensity ambiguities.
  • Pseudo-rank, local rank and rank deficiency.
  • Evolving Factor Analysis.

45
Factor Analysis Ambiguities in the analysis of a
data matrix (two-way data)
Rotation and scale/intensity ambiguities
Rotation Ambiguities
Factor Analysis (PCA) Data Matrix
Decomposition D U VT E True Data Matrix
Decomposition D C ST E
46
Factor Analysis Ambiguities in the analysis of a
data matrix (two-way data)
Rotation and scale/intensity ambiguities
Rotation Ambiguities
D U T T-1 VT E C ST E C U T ST
T-1 VT
How to find the rotation matrix T?
47
Rotation and scale/intensity ambiguities
  • D C ST E D E
  • Cnew C T
  • ( NR,N) (NR,N) (N,N)
  • STnew T-1 ST
  • (N,NC) (N,N) (N,NC)
  • D C ST CnewSTnew
  • Matrix decomposition is not unique!
  • T(N,N) is any non-singular matrix
  • Rotational freedom for any T

48
Rotation and scale/intensity ambiguities
Rotation ambiguities and rotation matrix T(N,N)
49
Rotation and scale/intensity ambiguities
Intensity (scale) ambiguities
For any scalar k
Intensity/scale ambiguities make difficuly to
obtain quantitative information When they are
solved then it is also possible to have
quantitative information

50
Rotation and scale/intensity ambiguities
Intensity (scale) ambiguities
cold x k cnew
cold sold ( cold x k)(1/k x sold) cnew snew
x
x
1/k x sold snew
51
Rotation and scale/intensity ambiguities
  • Questions to answer
  • Is it possible to have unique solutions?
  • What are the conditions to have unique solutions?
  • If total unique solutions are not possible
  • Is it still possible at least to find out some of
    the possible solutions?
  • Is it possible to have an estimation of the band
    or range of possible/feasible solutions?
  • How this range of feasible solutions can be
    reduced?

52
Lecture 1
  • Introduction to data structures and
    soft-modelling methods.
  • Factor Analysis of two-way data Bilinear models.
  • Rotation and intensity ambiguities.
  • Pseudo rank, local rank and rank deficiency.
  • Evolving Factor Analysis.

53
Definitions Mathematical rank of a data matrix
is the minimum number of linearly independent
rows or columns describing the variance of the
whole data set. Minimum number of basis vectors
spanning the row and column vector spaces. It may
be obtained by SVD or PCA. Pseudo-rank or
Chemical rank is the mathematical rank in absence
of experimental error/noise. Usually it is equal
to the number of chemical/physical components
contributing to the observed data variance apart
from experimental noise/error. Obtained from the
number of larger components from PCA, SVD or
other FA methods Local Rank is the chemical
rank of data submatrices. Obtained from EFA,
EFF, SIMPLISMA, OPA, or other FA submatrix
analysis methods Rank deficiency when chemical
rank is lower than the known number of
contributions. Rank deficiency may be
broken/solved by data matrix augmentation and
perturbation strategies. Rank overlap rank
deficiency caused by equal vector profiles of
different chemical/physical components in one or
more modes.
54
Pseudo Rank Number of contributions (factors,
components)
Principal Component Analysis
Gives an abstract (orthogonal) bilinear model to
describe optimally the variation in our data set.
Useful chemical information Size of the
model (chemical rank)
Number of chemical contributions
55
Pseudo Rank Number of contributions (factors,
components)
56
Pseudo Rank Number of contributions (factors,
components)
Principal Component Analysis (SVD algorithm)
large size ?
small size ?
57
Pseudo Rank Number of contributions (factors,
components)
  • Overestimations of rank (overfitting).
  • Large overestimation the measurements may not
    follow a bilinear model.
  • Small overfestimation presence of structured
    noise or high noise levels.
  • Underestimations of rank (rank deficiency).
  • Linear dependencies
  • Contributions with very similar signals or
    concentration profiles.
  • Compounds with non-measurable signals.
  • Minor compounds.

58
Rank deficiency
  • Are all the signals distinguishable and
    independent?
  • Are all the concentration profiles
    distinguishable and independent?

No ? Rank- deficient systems
Detectable rank lt nr. of process contributions
Examples 1) 2nd order reaction A B C, B
C, 3 chemical species/contributions, but Rank
2 2) Enantiomer conversion monitored by UV and
the spectrum D spectrum L, two chemical
species/components but Rank 1 (Rank overlap)
59
Rank deficiency
  • Closed reaction systems. Some concentration
    profiles are described as linear combinations of
    others.
  • System HA / A-, HB / B-
  • CA HA A-
  • CB HB B-
  • CB kCA
  • HA, HB
  • A- CA - HA
  • B- CB - HB kCA - HB

Rank 3
HA, HB, A-, B- ? f (HA, HB, CA)
60
Breaking rank-deficiency by matrix augmentation
Rank deficiency
  • Matrix Augmentation in the rank-deficient
    direction

Data set
Rank 4
61
Lecture 1
  • Introduction to data structures and
    soft-modelling methods.
  • Factor Analysis of two-way data Bilinear models.
  • Rotation and intensity ambiguities.
  • Pseudo-rank and rank deficiency.
  • Local Rank and Evolving Factor Analysis.

62
  • Local exploratory analysis
  • Study of the variation of the number of
    contributions in the process or system. Study of
    the rank variation during the process.
  • Evolving Factor Analysis (EFA)
  • Fixed Size Moving Window - Evolving Factor
    Analysis (FSMW-EFA)

63
Evolving Factor Analysis
  • Stepwise chemometric monitoring of a process.
  • Forward Evolving FA (from beginning to end)
  • Backward Evolving FA (from end to beginning)

Working procedure Display of subsequent PCA
analyses along gradually increasing data set
windows.
64
Evolving Factor Analysis
  • HPLC-DAD example

D
65
Evolving Factor Analysis
  • Forward Evolving Factor Analysis

66
Evolving Factor Analysis
  • Forward Evolving Factor Analysis

Noise level
67
Evolving Factor Analysis
  • Backward Evolving Factor Analysis

68
Evolving Factor Analysis
  • Backward Evolving Factor Analysis

Noise level
69
Evolving Factor Analysis
  • Combined EFA plot (forward and backward EFA)

70
Evolving Factor Analysis
Consecutive emergence-decay profiles. No embedded
compounds.
  • Sequential processes

71
Evolving Factor Analysis
  • Approximate concentration profiles

EFA derived concentration profiles
Real concentration profiles
72
Fixed Size Moving Window-Evolving FA (FSMW-EFA)
  • Local rank map along the process direction or the
    signal direction.

Working procedure Subsequent PCA in fixed size
windows moving stepwisely along the data set.
Window size ? min(number of components 1)
73
FSMW-EFA
74
FSMW-EFA
Noise level
75
EFA
EFA
Local rank detection
FSMW-EFA
FSMW-EFA
76
FSMW-EFA vs. EFA
  • EFA
  • Displays the evolution of the process.
  • The compounds are well identified (concentration
    windows)
  • Local rank information is not easily interpreted.
  • FSMW-EFA
  • Clear definition of local rank.
  • Sensitive to detection of minor compounds.
  • The idea of process evolution is not preserved.

77
Getting Local rank information from Evolving
Factor Analysis methods
  • Detection of the selective windows or regions
    where only one species exists (total selectivity)
  • Detection of zero concentration windows or
    regions (no species is present)
  • Detection of windows or regions where a
    particular species is not present
  • Detection of the concentration windows or regions
    where one species is present (other species can
    coexist)

78
References
  • EFA
  • H. Gampp, M. Maeder, C.J. Meyer and A.D.
    Zuberbühler. Talanta, 32, 1133-1139 (1985).
  • M. Maeder. Anal. Chem. 59, 527-530 (1987).
  • FSMW-EFA
  • H.R. Keller and D.L. Massart. Anal. Chim. Acta,
    246, 379-390 (1991).
  • SIMPLISMA
  • W. Windig and J. Guilment. Anal. Chem., 63,
    1425-1432 (1991).
Write a Comment
User Comments (0)
About PowerShow.com