Model-based Classification in Food Authenticity Studies - PowerPoint PPT Presentation

About This Presentation
Title:

Model-based Classification in Food Authenticity Studies

Description:

fructose:glucose mixtures. fully inverted beet syrup. high fructose corn syrup ... 40 with high fructose corn syrup. Classification Achieved ... – PowerPoint PPT presentation

Number of Views:42
Avg rating:3.0/5.0
Slides: 19
Provided by: toh6
Category:

less

Transcript and Presenter's Notes

Title: Model-based Classification in Food Authenticity Studies


1
Model-based Classification in Food Authenticity
Studies
  • D. Toher1,2, G. Downey1 and T.B. Murphy2
  • Presented by Deirdre Toher
  • 1 Ashtown Food Research Centre, Teagasc,
  • (formerly The National Food Centre), Dublin 15
  • 2 Dept of Statistics, School of Computer Science
    and Statistics, Trinity College Dublin, Dublin 2

2
Outline
  • Food authenticity
  • Spectroscopic data
  • Current mathematical methods
  • Proposed alternative
  • Dimension reduction
  • Model-based clustering
  • Updating
  • Example near-infrared data with results

3
Food Authenticity what and why?
  • Detecting when foods are not what they are
    claimed to be
  • Tampering/adulteration, mislabelling
  • Economic fraud worth millions of US dollars
    globally
  • Promote quality products
  • Build consumer trust

4
Food Authenticity how?
  • Near infrared spectroscopy
  • Non-invasive
  • Relatively inexpensive
  • Multivariate Mathematics
  • Partial Least Squares Regression
  • Factorial Discriminant Analysis
  • Model-based Clustering
  • Other methods available (sp..)

5
Spectroscopic Data
  • Near infrared transflectance spectroscopy
  • High dimensional data
  • Range 1100-2498 nm, reading every 2 nm
  • 700 values for each sample

6
Current Mathematical Methods
  • Discriminant Partial Least Squares Regression
  • Factorial Discriminant Analysis
  • Problem?
  • Limited to two-group classification problems
  • No quantification of certainty

7
Proposed Alternative
  • Model-based clustering
  • Expansion of discriminant analysis
  • Allows clusters to vary in shape and size
  • Gives probability of a sample being in each
    cluster/group
  • Can classify situations with more than two
    groupings

8
Possible Cluster Shapes
9
The Dimensionality Problem
  • Model-based clustering requires dimension
    reduction
  • for efficient computation
  • to prevent singular covariance matrices
  • Use wavelet analysis with thresholding

10
EM Algorithm Updating
  • EM algorithm
  • expected value of the likelihood function
  • maximises the expected value
  • commonly used in statistics for estimating
    missing values
  • Updating
  • uses previous estimates of labels as a starting
    point for iteration

11
Example Honey Adulteration
  • Irish honey extended with
  • fructoseglucose mixtures
  • fully inverted beet syrup
  • high fructose corn syrup
  • Total of 478 spectra
  • 157 pure and 321 adulterated
  • 225 with fructoseglucose mixtures
  • 56 with fully inverted beet syrup
  • 40 with high fructose corn syrup

12
Classification Achieved
  • Classification rates on test set data achieved
  • with correct proportions of each type of
    adulterant in the training set for pure or
    adulterated question.

Training / Test EM EM Updating
50 / 50 94.72 (1.12) 94.43 (1.10)
25 / 75 93.22 (1.08) 93.05 (1.03)
10 / 90 90.82 (1.76) 92.22 (1.11)
13
Classification Achieved
Classification rates on test set data
achieved with correct proportions of pure /
adulterated in the training set for pure or
adulterated question.
Training / Test EM EM Updating
50 / 50 94.38 (1.16) 94.11 (0.89)
25 / 75 93.50 (1.08) 93.03 (1.02)
10 / 90 90.54 (1.80) 92.05 (1.09)
14
Classification Achieved
Classification rates on test set data achieved
using 50 training, 50 test data with
correct proportion of pure / adulterated in the
training data set for type of adulteration
question.
Question EM EM Updating
Pure or adulterated? 91.09 (1.40) 90.64 (1.36)
Type of adulteration 86.23 (1.20) 84.12 (1.67)
15
Classification Achieved
Classification rates on test set data achieved
using 50 training, 50 test data with
correct proportions of each type of adulterant in
the training set for type of adulteration
question.
Question EM EM Updating
Pure or adulterated? 89.41 (1.76) 88.61 (1.82)
Type of adulteration 85.70 (1.96) 83.57 (2.23)
16
Probability v Accurate Classification
  • Probability of group membership - by colour
  • (black being pure, red being adulterated)

17
Conclusions
  • EM algorithm gives a method of predicting group
    membership
  • Updating procedures effective with small training
    sets
  • Quantifying certainty
  • Allows cost of misclassification to be easily
    incorporated into modelling

18
Questions?
  • Funded by
  • Teagasc under the Walsh Fellowship Scheme
  • Irish Department of Agriculture Food
  • (FIRM programme)
  • Science Foundation of Ireland
  • Basic Research Grant scheme (Grant 04/BR/M0057)
Write a Comment
User Comments (0)
About PowerShow.com