Net Analyte Signal Based Multivariate Calibration Methods - PowerPoint PPT Presentation

1 / 34
About This Presentation
Title:

Net Analyte Signal Based Multivariate Calibration Methods

Description:

Medicinal & Natural Products Chemistry Research Center, Shiraz University ... Scree plot, indicator function, imbedded error, real error, ... 4Th. ICW. 4Th ICW ... – PowerPoint PPT presentation

Number of Views:252
Avg rating:3.0/5.0
Slides: 35
Provided by: Admi139
Category:

less

Transcript and Presenter's Notes

Title: Net Analyte Signal Based Multivariate Calibration Methods


1

4Th Iranian chemometrics Workshop (ICW)
Zanjan-2004
2
The Problem of Factor Selection in
PCA-BasedCalibration Methods
  • By
  • Bahram Hemmateenejad
  • Medicinal Natural Products Chemistry Research
    Center, Shiraz University of Medical Science

3
Multivariate Calibration
  • Regression Equation relating measurements on m
    samples to k different variables by
  • y X b
  • y (m?1) Dependent variable or Predicted
    Variable
  • X (m?k) Independent variables or Predictor
    Variables
  • b (k?1) regression coefficient

4
  • Multicomponent Analysis
  • y concentration of the analyte
  • X Recorded analytical signals at k different
    channels, i.e. absorbance at different wavelength
  • QSAR/QSPR Studies
  • y chemical property or biological activity
  • X Molecular descriptors representing structural
    features of molecules by number

5
Problems associated with MLR
  • Colinearity between the independent variables (X)
  • Number of dependent variables (k) should be much
    lower than the number of samples (m)

Reduced number of variables must be used
6
  • Feature selection
  • The variables are selected based on their
    generalization ability using selection methods
    such as stepwise variable selection, genetic
    algorithm, simulated annealing,
  • Feature extraction
  • The variables are transformed into new
    coordinate axes with lower dimension
  • Principal Component Analysis (PCA) or Factor
    Analysis (FA)

7
PCA or FA or PFA
  • X T P
  • X (m?k)
  • T (m?k)
  • P (k?k)
  • T t1 t2 t3 t4 t5 tk Score
  • PTpT1 pT2 pT3 pT4 pT5 pTk Loading
  • ? ?1 ?2 ?3 ?4 ?5 ?k eigen-value
  • ?1 gt ?2 gt ?3 gt ?4 gt ?5 gt gt ?k

8
  • Each vector of T or P is named eigen-vector or
    PC or factor
  • ?i shows the amount of variances in the X
    matrix that is explained by the corresponding
    eigen-vectors (ti or pi)
  • A reduced set of PCs is necessary to reproduce
    the original data matrix without losing
    significant information

(m?k) (m?f) (f?k)
9
  • f is the number of significant factors
  • f is the rank of the original data matrix
  • f describes the complexity of the X matrix
  • Ideally, f is the number of nonzero eigen-values
  • f can be determined by the theory of FA
  • Scree plot, indicator function, imbedded error,
    real error,

10
PCA-Based regression method
  • MLR (Classical Least Squares)
  • y X b
  • b (XTX)-1XTy
  • ynew xnew b
  • Principal Component Regression (PCR)
  • X T P
  • y T b
  • b (TTT)-1TTy
  • tnew xnew P
  • ynew tnew b

11
Some Questions
  • How many PCs must be used in PCR?
  • Which PCs should be considered in PCR modeling?
  • Is the magnitude of an eigen-value necessarily a
    measure of its significance for the calibration?
  • Significance of factor selection

12
Top-down eigen-value ranking(ER)
  • Factors are entered to the model based on their
    decreasing eigen-value one after the other
  • Once new factor is entered, the regression model
    is build and its performances are validated by
    the existing procedures such as cross-validation

13
(No Transcript)
14
(No Transcript)
15
(No Transcript)
16
(No Transcript)
17
(No Transcript)
18
(No Transcript)
19
(No Transcript)
20
(No Transcript)
21
Top-down Correlation Ranking (CR)
  • First the correlation between each one of the
    factors and the dependent variable
    (concentration, y) is determined
  • Then, the factors are entered to the models based
    on their decreasing correlation consecutively.

22
(No Transcript)
23
(No Transcript)
24
Other factor selection methods
  • Stepwise selection procedure
  • Search algorithms
  • Simulated annealing
  • Genetic algorithm

25
Some references
  • Xie YL, Kalivas JH. Evaluation of principal
    component selection methods to form a global
    prediction model by principal component
    regression. Anal. Chim. Acta 1997 348 19-27.
  • Sutter JM, Kalivas JH. Which principal components
    to utilize for principal component regression. J.
    Chemometrics 1992 6 217-225.
  • Sun J. A correlation principal component
    regression analysis of NIR data. J. Chemometrics
    1995 9 21-29.
  • Depczynski U, Frost VJ, Molt K. Genetic
    algorithms applied to the selection of factors in
    principal component regression. Anal. Chim. Acta
    2000 420 217-227.
  • Barros AS, Rutledge DN. Genetic algorithm applied
    to the selection of principal components.
    Chemometrics Intell. Lab. Syst. 1998 40 65-81.
  • Verdu-Andres J, Massart DL. Comparison of
    prediction-and correlation-Based methods to
    select the best Subset of principal components
    for principal component regression and detect
    outlying objects. Appl. Spect. 1998 52
    1425-1434.
  • Xie YL, Kalivas JH. Local prediction models by
    principal component regression. Anal. Chim. Acta
    1997 348 29-38.
  • Ferre L. Selection of components in principal
    component analysis a comparison of methods.
    Comput. Stat. Data Anal. 1995 19 669-682.

26
A QSPR example
  • Quantitative Structure-Electrochemistry
    Relationship Study of Some Organic Compounds
  • Dependent variable
  • Half-wave reduction potential (E1/2)of 69
    compounds
  • Independent variables
  • 1150 theoretical molecular descriptors calculated
    by DRAGON software

27
(No Transcript)
28
(No Transcript)
29
(No Transcript)
30
Principal Component-Artificial Neural Network
(PC-ANN)
  • ANN is a nonlinear non-parametric modeling method
  • Feature selection is more important for ANN
  • Feature selection-based ANN modeling is a complex
    procedure
  • Orthogonalization of the variables before
    introducing to the network substantially
    decreases the computational time and increases
    the overall performances of the ANN
  • PC-ANN is a feature extraction-based algorithm

31
PC-GA-ANN Algorithm
  • Genetic Algorithm Applied to the selection of
    Factors in PC-ANN modeling,
  • The set of PCs selected by GA could model the
    structure-antagonist activity of the calcium
    channel blockers better than the ER procedure
  • B. Hemmateenejad, M. Akhond, R. Miri, M.
    Shamsipur, J. Chem. Inf,. Comput. Sci. 43 (2003)
    1328.
  • How are the factors ranked based on their
    correlation coefficient in PC-ANN?

32
CR-PC-ANN Algorithm
  • Correlation Ranking Procedure for factor
    selection in PC-ANN modeling,
  • The nonlinear relationship between each one of
    the PCs and the dependent variable (y) was
    modeled by separate ANN models.
  • It was found that the subset of PCs selected by
    CR was relatively the same as those selected by
    GA. Therefore the results of these factor
    selection procedures were similar
  • B. Hemmateenejad, Chemometrics Intelligent
    Laboratory System, 2004, Accepted.

33
  • Application of ab initio theory to QSAR study of
    the 1,4-dihydrpyridine-based calcium channel
    blockers using GA-MLR and PC-GA-ANN procedures,
    B. Hemmateenejad, M.A. Safarpour, R.Miri, F.
    Taghavi, Journal of Computational Chemistry 25
    (2004) 1495.
  • Highly Correlating Distance-Connectivity-Based
    Topological Indices. 2 Prediction of 15
    Properties of a Large Set of Alkanes Using a
    Stepwise Factor Selection-Based PCR Analysis, M.
    Shamsipur, R. Ghavami, B. Hemmateenejad, H.
    Sharghi, QSAR Combinatorial Sciences, 2004,
    Accepted.
  • Quantitative Structure-Electrochemistry
    Relationship Study of some Organic Compounds
    using PCR and PC-ANN, B. Hemmateenejad, M.
    Shamsipur, Internet Electronic Journal of
    Molecular Design 3 (2004) 316.
  • Toward an Optimal Procedure for PC-ANN Model
    Building Prediction of the Carcinogenic Activity
    of a Large Set of Drugs, B. Hemmateenejad, M.A.
    Safarpour, R. Miri, N. Nesari, Journal of
    Chemical Information and Computer Sciences,
    Revised
  • Optimal QSAR analysis of the carcinogenic
    activity of drugs by correlation ranking and
    genetic algorithm-based PCR, B. Hemmateenejad,
    Journal of Chemometrics, Submitted.

34
Feature Works
  • Selection of Latent Variables in PLS
  • Application of other selection algorithms such as
    successive projections algorithm
  • Comparison between the importance of factor
    selection in multicomponent analysis and
    QSAR/QSPR studies
  • Application of the factor selection-based ANN
    modeling in multicomponent analysis
  • Validation of the different factor selection
    algorithms by new criteria
Write a Comment
User Comments (0)
About PowerShow.com