Chance Correlation in QSAR studies - PowerPoint PPT Presentation

1 / 22
About This Presentation
Title:

Chance Correlation in QSAR studies

Description:

Introduction of Fmax Critical. Simulated random data. Run 1000 times. Different N, K and P ... Variable selection process. if, R2max and Q2max are low ... – PowerPoint PPT presentation

Number of Views:250
Avg rating:3.0/5.0
Slides: 23
Provided by: Apad6
Category:

less

Transcript and Presenter's Notes

Title: Chance Correlation in QSAR studies


1
Chance Correlation in QSAR studies
Ahmadreza Mehdipour Medicinal Natural Product
Chemistry Research Center
2
Correlation or causation?
  • Correlation is essential but not sufficient
  • Correlation is meaningless unless its cause (or
    role) in the biological activity is interpreted
  • A satisfactory QSAR correlation does not mean
    that a particular descriptor causes the efficient
    action of a compound

3
Chance Correlation
  • Topliss Ratio (J. Med. Chem. 1972, 35, 1066)
  • A misconception
  • Ratio of variables in model to Sample Size
  • Ratio of variables in Data Pool to Sample Size
  • Revalidation of problem by Livingstone
  • (J. Med. Chem. 2005, 48, 6661)

4
  • Topliss et al. demonstrated that the more
    independent variables (X) that are available for
    selection in a multiple linear regression model,
    the more likely a model will be found by chance.
    These authors recommended that in order to reduce
    the risk of chance correlations there should be a
    certain ratio of data points to the number of
    independent variables available. Unfortunately,
    this ratio was often misinterpreted as the number
    of data points to the number of independent
    variables in the final model, a practice that did
    very little if anything to reduce chance effects.
  • D.W. Salt, S. Ajmani, R. Crichton, D.J.
    Livingstone, An improved approximation to the
    estimation of the critical F values in best
    subset regression. J. Chem. Inf. Model. 47 (2007)
    143-149.

5
Chance CorrelationHow does it occur?
  • A Trial Example with random data
  • Characteristics
  • N (Sample Size)20
  • K (Number of variables in data pool)10, 20, 50,
    75, 100

6
N20 K10
7
N20 K20
8
N20 K50
9
N20 K75
10
N20 K100
11
Avoiding chance correlation
  • What should we do?

12
Solutions for detection of chance correlation
  • Fmax critical
  • Randomization of Y (input scrambling)
  • Validation procedures

13
Fmax Critical
  • Linvingstone Approach
  • Normal tabulated F is significant
  • ONLY WHEN
  • KP
  • K number of variables in data pool
  • P number of variables in model

14
Fmax Critical
  • However, in most cases
  • KgtgtP
  • K number of variables in data pool
  • P number of variables in model
  • NSample Size

15
Introduction of Fmax Critical
  • Simulated random data
  • Run 1000 times
  • Different N, K and P
  • Obtain Fmax for each combination
  • (for a significance level of 5)
  • Check for some Known data sets
  • www.cmd.port.ac.uk

16
Randomization of Y
  • Ys are randomly attributed to samples

17
Y-randomization
  • However
  • This method should also be performed during
  • Variable selection process
  • if, R2max and Q2max are low
  • Then, the risk of chance correlation is low

18
Cross-validation Process
  • Different N, K, P
  • N10, 20, 30, 40, 50, 80, 100
  • P1-8
  • Np, 10, 20, 30, 50, 100
  • Run 1000 times
  • Evaluation factors
  • R2 of training set
  • Q21 Q2 for LOO CV
  • Q220 Q2 for Leave-20 of samples-Out CV
  • Q250 Q2 for Leave-50 of samples-Out CV
  • R2P R2 of one random test set (25 of samples)

19
(No Transcript)
20
Cross-validation Process
  • Leave-one-out Vs Leave-group-out
  • Q2L50O is independent of N, K, P
  • Hemmateenejad B, Mehdipour AR, Bagheri L, Miri R,
    Judging the significance of the multiple linear
    regression-based QSAR models by cross-validation.
    To be submitted

21
Concluding Remarks
  • Be aware of N to K ratio
  • Not only N to P ratio
  • Check different approaches for chance correlation

22
Models are not real but sometimes are helpful
Write a Comment
User Comments (0)
About PowerShow.com