Title: Model-based Classification in Food Authenticity Studies
1Model-based Classification in Food Authenticity
Studies
- D. Toher1,2, G. Downey1 and T.B. Murphy2
- Presented by Deirdre Toher
- 1 Ashtown Food Research Centre, Teagasc,
- (formerly The National Food Centre), Dublin 15
- 2 Dept of Statistics, School of Computer Science
and Statistics, Trinity College Dublin, Dublin 2
2Outline
- Food authenticity
- Spectroscopic data
- Current mathematical methods
- Proposed alternative
- Dimension reduction
- Model-based clustering
- Updating
- Example near-infrared data with results
3Food Authenticity what and why?
- Detecting when foods are not what they are
claimed to be - Tampering/adulteration, mislabelling
- Economic fraud worth millions of US dollars
globally - Promote quality products
- Build consumer trust
4Food Authenticity how?
- Near infrared spectroscopy
- Non-invasive
- Relatively inexpensive
- Multivariate Mathematics
- Partial Least Squares Regression
- Factorial Discriminant Analysis
- Model-based Clustering
- Other methods available (sp..)
5Spectroscopic Data
- Near infrared transflectance spectroscopy
- High dimensional data
- Range 1100-2498 nm, reading every 2 nm
- 700 values for each sample
6Current Mathematical Methods
- Discriminant Partial Least Squares Regression
- Factorial Discriminant Analysis
- Problem?
- Limited to two-group classification problems
- No quantification of certainty
7Proposed Alternative
- Model-based clustering
- Expansion of discriminant analysis
- Allows clusters to vary in shape and size
- Gives probability of a sample being in each
cluster/group - Can classify situations with more than two
groupings
8Possible Cluster Shapes
9The Dimensionality Problem
- Model-based clustering requires dimension
reduction - for efficient computation
- to prevent singular covariance matrices
- Use wavelet analysis with thresholding
10EM Algorithm Updating
- EM algorithm
- expected value of the likelihood function
- maximises the expected value
- commonly used in statistics for estimating
missing values - Updating
- uses previous estimates of labels as a starting
point for iteration
11Example Honey Adulteration
- Irish honey extended with
- fructoseglucose mixtures
- fully inverted beet syrup
- high fructose corn syrup
- Total of 478 spectra
- 157 pure and 321 adulterated
- 225 with fructoseglucose mixtures
- 56 with fully inverted beet syrup
- 40 with high fructose corn syrup
12Classification Achieved
- Classification rates on test set data achieved
-
- with correct proportions of each type of
adulterant in the training set for pure or
adulterated question.
Training / Test EM EM Updating
50 / 50 94.72 (1.12) 94.43 (1.10)
25 / 75 93.22 (1.08) 93.05 (1.03)
10 / 90 90.82 (1.76) 92.22 (1.11)
13Classification Achieved
Classification rates on test set data
achieved with correct proportions of pure /
adulterated in the training set for pure or
adulterated question.
Training / Test EM EM Updating
50 / 50 94.38 (1.16) 94.11 (0.89)
25 / 75 93.50 (1.08) 93.03 (1.02)
10 / 90 90.54 (1.80) 92.05 (1.09)
14Classification Achieved
Classification rates on test set data achieved
using 50 training, 50 test data with
correct proportion of pure / adulterated in the
training data set for type of adulteration
question.
Question EM EM Updating
Pure or adulterated? 91.09 (1.40) 90.64 (1.36)
Type of adulteration 86.23 (1.20) 84.12 (1.67)
15Classification Achieved
Classification rates on test set data achieved
using 50 training, 50 test data with
correct proportions of each type of adulterant in
the training set for type of adulteration
question.
Question EM EM Updating
Pure or adulterated? 89.41 (1.76) 88.61 (1.82)
Type of adulteration 85.70 (1.96) 83.57 (2.23)
16Probability v Accurate Classification
- Probability of group membership - by colour
- (black being pure, red being adulterated)
17Conclusions
- EM algorithm gives a method of predicting group
membership - Updating procedures effective with small training
sets - Quantifying certainty
- Allows cost of misclassification to be easily
incorporated into modelling
18Questions?
- Funded by
- Teagasc under the Walsh Fellowship Scheme
- Irish Department of Agriculture Food
- (FIRM programme)
- Science Foundation of Ireland
- Basic Research Grant scheme (Grant 04/BR/M0057)