Title: 'Uncertainty
1'Uncertainty in multivariate calibration
application to embedded NIR data'
Juan Antonio Fernández Pierna
Scientific collaborator F.N.R.S. Brussels,
Belgium - Statistics and Informatics
Department Univ. of Agronomical Sciences of
Gembloux (FUSAGx), Belgium - Quality of
Agricultural Products Department Walloon
Agricultural Research Centre (CRA-W), Gembloux,
Belgium
IV Winter Symposium on Chemometrics, February
15-18, Moscow (Chernogolovka), Russia
2PART I Uncertainty study Embedded
NIR
PART II Imaging using a NIR camera
3PART I based on
Estimation of partial least squares regression
prediction uncertainty when the reference values
carry a sizeable measurement error
J.A. Fernández Pierna, L. Jin, F. Wahl, N. Faber,
D.L. Massart Chemometrics and Intelligent
Laboratory Systems 65 (2003) 281-291
4Summary
1- Introduction
2- Uncertainty?
3- How to determine the uncertainty
4- Examples
5- Conclusions
5The uncertainty of a calculated value is
statistically defined as the interval around that
value such that any repetition of the calculation
will produce a new result that lies within this
interval with a given probability
A result is not complete without an associated
measure of uncertainty.
6So, a result without reliability (uncertainty)
statement cannot be published or communicated
because it is not (yet) a result. I am appealing
to my colleagues of all analytical journals not
to accept papers anymore which do not respect
this simple logic.
P. De Bièvre, Editorial
Measurement results without statements of
reliability (uncertainty) should not be taken
seriously Accreditation and Quality Assurance, 2
(1997) 269.
Source N. Faber, BCS Workshop Uncertainty
estimation in multivariate calibration
Antwerp, November 3, 2004
7IUPAC guidelines for single component calibration
K. Danzer and L.A. Curry, Guidelines for
calibration in analytical chemistry. Part 1.
Fundamentals and single component calibration,
Pure Appl. Chem. 70 (1998) 993. This document
shows that the error analysis for univariate
calibration is fairly simple.
IUPAC guidelines for multicomponent calibration
K. Danzer, M. Otto and L.A. Curry, Guidelines
for calibration in analytical chemistry. Part 2.
Multispecies calibration, Pure Applied
Chemistry, 76 (2004) 1215. This document
illustrates that the error analysis for
multivariate calibration is relatively complex.
Source N. Faber, BCS Workshop Uncertainty
estimation in multivariate calibration
Antwerp, November 3, 2004
8Notation
X true predictor matrix (NIR spectra) y true
predictand vector (property of interest)
9Introduction
PLS regression
Measured values (observable)
Unobservable measurement error
True values (unobservable)
PLS prediction
UNCERTAINTY
10Uncertainty
- PLS/PCR until now theory was scarce and not
well-tested about how to estimate the quality
of each individual prediction.
11Multivariate empirical validation that
implicitly accounts for all error sources
- Root mean squared error of prediction (RMSEP) for
test set of N samples
12Obvious problems with test set validation
The result (RMSEP) is a constant measure for
prediction uncertainty that cannot lead to
prediction intervals with correct coverage
probabilities (say 95). A crucial assumption is
that the reference values are sufficiently
precise this is certainly not always true
(octane rating, classical Kjeldahl) - often the
prediction is even better than the reference
value. High intrinsic variability of RMSEP
estimate requires N to be large.
Source N. Faber, BCS Workshop Uncertainty
estimation in multivariate calibration
Antwerp, November 3, 2004
13Some benefits of a sample-specific multivariate
prediction uncertainty
Construction of prediction intervals, e.g. for
monitoring the performance of an analysis using
control samples (see ASTM standard E1655,
Standard practices for infrared, multivariate,
quantitative analysis). Realistic estimation of
limit of detection, since RMSEP - a constant
value - poorly describes extreme
samples. Opportunities for sample design and
variable selection.
Source N. Faber, BCS Workshop Uncertainty
estimation in multivariate calibration
Antwerp, November 3, 2004
14How to determine the sample-specific multivariate
prediction uncertainty?
1 - Repeating the experiment under relevant
conditions
2 - Resampling methods (Monte-Carlo)
3 - Equations in the literature
15Monte-Carlo simulation Boostrapping
- Generation of data sets by introducing
artificial perturbations that emulate the
effect of the perturbation of the original
data
16Noise addition method
17Boostrapping
PLSR (F factors)
Randomly sampling with replacement e1(e2 e1 e2
e4)
e2(e1 e1 e2 e3) e3(e1 e2 e3 e4) e4(e2 e3 e3
e4) ...
B times
PLSR (F factors)
Uncertainty estimation
18The Martens-De Vries equation
- Expression used in the Unscrambler software
package (CAMO)
19The Martens-De Vries equation
20The Faber and Kowalski equation
K Faber, B. R. Kowalski, Chemom. Intell. Lab.
Syst. 34 (1996) 283-292
21Variations
22Comparison
Independent error estimates are not required
all ingredients are estimated directly from the
data.
Valid if measurement errors can be neglected
assumed that all variables are observable
23Embedded NIR
24To have on the field a direct determination of
the dry material of the forage. For breeders
having the dry material of the forage at the
field is really important.
25Embedded NIR instrument
Diode Array Zeiss instrument
26Embedded NIR instrument
InGaAs detector 128 diodes
Data treatment
l
l
3
2
l
l
3
1
ZEISS CORONA 45 950-1700 nm
Source
s
Grating
abs
l
Source G. Sinnaeve, 2nd International
Conference on Embedded NIR spectroscopy
Gembloux, November 18-19, 2004
27Constructing the models
Identification
Weighting
Fresh sample
Oven 70 C , 48 h
Dry sample
Hammer Mill
Weighting
Coarse grinding
Dry Matter
Cyclotec Mill
1st grinding
Fine grinding
NIR predictions
Other criteria
28Dry Matter PLS calibration
PLS (8)
Calibration set
Test set
29- Comparison of the yield expressed in kg DM /ha
using the oven and the embedded NIR methods
Yield (kg DM/ha)
30- Comparison of the classification or the ranking
of the cultivars according to their yield
expressed in kg DM /ha using the oven and the
embedded NIR methods
31Embedded data
Noise addition
32Embedded data
Faber and Kowalski equation
33Gas oil data
Faber and Kowalski equation
34Polyether Polyol data
Faber and Kowalski equation
35Formula versus resampling
- Formula
- insight in dominant sources of error
- evaluation is (usually) fast
- often difficult to obtain
- quite restrictive in their application, because
of distributional assumptions
36Formula versus resampling
- Resampling
- easy to implement
- not (very) restrictive in their application
- little insight (black box)
- evaluation is (relatively) slow
- not always clear how to resample
37Conclusions PART I, 1
- Monte-Carlo methods allows working on residuals
and directly operate on the noise
- Monte-Carlo methods are easy to implement
38Conclusions PART I, 2
- The uncertainty obtained (Faber equation) is
with respect to the true - values and is sample-specific.
- The De Vries formula only works under the
classical regression assumption that all
variables are observable with negligible
measurement noise.
- Leverage-based formulas have been recently
proposed (successfully) for non-linear
variations of PLSR, multiway PLSR and PLSR after
OSC
- In the future all the techniques should be
adapted for the estimation of - the prediction uncertainty (ANN, SVM)
39PART II Imaging using a NIR camera
40NIR camera
- - Camera InGaAs
- - 900 1700 nm / 10 nm
- 240 x 320 pixels
- Pixel size 80 µm 80 µm
- Surface analysed 5 cm²
- 76 800 spectra 24 MB
- 300 - 350 separated particles
- Time of analysis 5 minutes
Spectral volume
Wavelengths
41(No Transcript)
42(No Transcript)
43Analysis of raspberries by NIR imaging showing a
grading in the maturity
i
PC 3
iii
ii
(Berries i low maturity ii medium maturity
iii riped).
Source Walloon Agricultural Research Centre,
2004 - 2005
44Analysis of white currants by NIR imaging.
1340 nm
1410 nm
NIR images at 1340 nm
NIR images at 1410 nm
Source Walloon Agricultural Research Centre,
Gembloux, Belgium (2004 2005)
45Analysis of single kernels by NIR imaging to
detect insect infested grains
1400 nm
A)
image at 1400 nm of three infested wheat kernels
PC 5
1
3
2
B)
fifth PC image of the intact (1) and infested (2
3) coffee beans
Source Walloon Agricultural Research Centre,
Gembloux, Belgium (2004 2005)
46Analysis of wheat grains by NIR imaging
PC 6
1140 nm
NIR image at 1140 nm, spectra of the germ (dotted
line) and albumen (continuous line), as well as
sixth PC image bringing to the fore the germ of
each kernel.
Source Walloon Agricultural Research Centre,
Gembloux, Belgium (2004 2005)
47References
Uncertainty
Estimation of partial least squares regression
prediction uncertainty when the reference values
carry a sizeable measurement error J.A.
Fernández Pierna, L. Jin, F. Wahl, N. Faber, D.L.
Massart Chemometrics and Intelligent Laboratory
Systems 65 (2003) 281-291
Imaging NIR camera
Combination of Support Vector Machines (SVM) and
Near Infrared (NIR) imaging spectroscopy for the
detection of meat and bone meat (MBM) in compound
feeds J.A. Fernández Pierna, V. Baeten, A.
Michotte Renier, R.P. Cogdill and P. Dardenne.
Journal of Chemometrics 18 (2005)
Acknowledgements
Dr. N. Faber, http//www.chemometry.com/ Dr. V.
Baeten, CRA-W Dr. G. Sinnaeve, CRA-W Dr. P.
Dardenne, CRA-W Prof. J.J. Claustriaux,
FUSAGx F.N.R.S. for financial support