Title: Definition and overview of chemometrics
1Definition and overview of chemometrics
2Paul Geladi
Head of Research NIRCE Chairperson NIR Nord Unit
of Biomass Technology and Chemistry Swedish
University of Agricultural Sciences Umeå Technobot
hnia Vasa paul.geladi _at_ btk.slu.se paul.geladi
_at_ syh.fi
3(No Transcript)
4(No Transcript)
5Project geography
6Chemometrics
- Mathematics
- Statistics
- Computer Science
- In Chemistry
7Similar fields
- Biometrics 1900
- Psychometrics 1930
- Econometrics 1950
- Technometrics 1960
8Chemometrics
- Design of Experiments (DOE)
- Exploratory Data Analysis
- Classification
- Regression and Calibration
9Design of Experiments
- Most important where possible
- Uses
- ANOVA
- F-test
- t-test
- Plots
- Response Surfaces
10Design of Experiments
- y b0 b1x1 b2x2 ...bKxK b11x12
- b22x22 ... bKKxK2 b12x1x2 ... e
- Factors x1, x2,...xK changed systematically
- Response y measured and modeled
11Exploratory Data Analysis
- Design not possible
- Sampling situations
- Find structure
- Find groupings
- Find outliers
12Classification
- Check for groupings UNSUPERVISED
- Existing groupings SUPERVISED
- Visualize groupings
- Classify
- Test
13Regression / Calibration
- Two types of variables X / y
- Relationship linear / nonlinear
- Model
- Diagnostics
- Residual
14y
x
15Multivariate Data Analysis
16Multivariate Data Analysis
- Sampled data and design with too many reponses
- Mining
- Hospitals
- Agriculture
- Food industry
- More
17Nomenclature
- Samples are objects
- What is measured on the object is a variable
1834.92
Spectrum
K
1
1
Samples
Vectors
I
19A vector is a collection of numbers. It is
always a column vector.
12 3.6 11.1 5.9 34 0.5 1.4 17
2012 3.6 11.1 5.9 34 0.5 1.4 17
The transpose of a vector is a row
vector. Symbols for transpose are and T. a or
aT.
21Particle size, 1 sample
22Small particles, 35 samples
23The Data Matrix
K
A data matrix is a vector of vectors
I
24Size histograms, all samples
Particle area
25Times in batch reaction
NIR wavelengths
26Geometry of multivariate space
27Problem
- I and K can be large
- Correlation
- Univariate statistics does not apply
283 variables blood oxygen, iron, hemoglobin
I patients
29Hb
Fe
O2
30Hb
Fe
O2
31Hb
Fe
O2
32Hb
Fe
O2
33Hb
Fe
O2
34Hb
Fe
O2
35Hb
Fe
O2
36Hb
Fe
O2
37Hb
Fe
O2
38Properties of multivariate space
- Rotation
- vectors unchanged / distance unchanged
- Translation
- vectors changed / distance unchanged
- Rescaling / change units
- all changes
39Consequences
- We can move the coordinate sytem around
- The relative distances between objects do not
change - We can rotate the coordinate system
- Scale changes are important
- Move coordinate system to center of data
- Scale properly
40Vectors (physics)
x x1, x2, x3
x ( x12 x22 x32 ) 1/2
41Geometry
c2 a2 b2
c
a
b
42Vectors (K dimensions)
x x1, x2,..., xK
x ( x12 x22 ... xK2 ) 1/2
43Problem
- We can not see in more than 3 dimensions
- Paper, computer screen 2-2.5 dimensions
44Hb
Fe
O2
45Hb
Fe
O2
46Projection
- 2D plane (screen, paper)
- Many projections possible
- Find a good one
- Find a few good ones
- What is good?