Discrimination Models and Variance Stabilizing Transformations of Metabolomic NMR Data

1 / 22
About This Presentation
Title:

Discrimination Models and Variance Stabilizing Transformations of Metabolomic NMR Data

Description:

Discrimination Models and Variance Stabilizing Transformations of Metabolomic NMR Data ... etc. Analytical techniques Microarray ... 5 stunted, 5 sick Tissue ... –

Number of Views:319
Avg rating:3.0/5.0
Slides: 23
Provided by: Paru83
Learn more at: https://www.amstat.org
Category:

less

Transcript and Presenter's Notes

Title: Discrimination Models and Variance Stabilizing Transformations of Metabolomic NMR Data


1
Discrimination Models and Variance Stabilizing
Transformations of Metabolomic NMR Data
  • Institute on Research and Statistics, Sacramento
  • 04/08/04
  • Parul Vora Purohit

2
Biodata and omics
  • Genome Project
  • Genomics - Study of Genes
  • Proteomics - Study of proteins
  • Metabolomics - Study of metabolites
  • cellomics, CHOmics, chromonoics, etc.
  • Analytical techniques
  • Microarray Spectroscopy
  • Mass Spectroscopy
  • NMR Spectroscopy

3
NMR Spectroscopy
Intense homogenous and magnetic field High
Powered RF transmittor capable of delivering
short pulses 500 MHz stimulate 1H nuclear spin
transitions Probe which enables the coils used to
excite and detect the signal Plot of signal vs
shift in frequency from original pulse Measured
in ppm (ratio from the original signal)
  • Curtsey Joseph Medendorp / Public Information /
    University of Kentucky

4
NMR Data
  • Allows detection of compounds with H content
  • Shift characterizes the chemicals (metabolites)
  • Examples
  • 2.14 ppm glutamine ? CH2 group
  • 2.27 ppm - valine ß CH group
  • 6.91 ppm tyrosine C3, 5H ring
  • 65,000 points (variables) per sample

5
Questions
  • Classification Can we distinguish sick
    organisms from the healthy ones?
  • Identification Which metabolites play a role in
    the disease (biomarker)?
  • DIFFERENCES IN THE DETAILS!

6
Abalone Data
  • A set of 18 abalone
  • 8 healthy, 5 stunted, 5 sick
  • Tissue from muscle
  • Questions
  • Can we classify the abalone accurately ?
  • Can we detect any metabolites that are markers?

7
Problems / Solutions
  • Multivariate Techniques
  • Matrix of 65,000 (variables) x 18 (samples)
  • Too many variables as compared to the number of
    samples
  • Dimension Reduction by Binning
  • Classification and metabolite marker
    identification using PCA and Cluster Analysis
  • Methods assume that the data is normally
    distributed with a constant variance
  • Generalized Log Transformation improves results!

8
NMR Data Pre-Processing
  • Background Subtraction
  • TMSP Peak (standard at 0 ppm removed)
  • Water Peak Removal
  • 4.72-4.96 ppm removed)
  • Normalization
  • Integrated Intensity normalized to 1.0 to remove
    the effects of systematic intensity changes
    between abalone
  • Binning / Size

9
Binned Spectrum
Bin Size .04 ppm 239 Bins
  • Bin Size Range 0.00125 ppm 0.7 ppm
  • Intensity of Bin Integrated Intensity of all
    points in Bin
  • Restricted Region of interest to 0.2 ppm 10.0
    ppm

10
Principal Component Analysis (PCA)
  • Technique that allows for the explanation of the
    variance-covariance of the variables in terms of
    a linear combination of them
  • X t1pT1 t2pT2 tkpTk E pi -
    eigenvectors
  • Projections of the original data matrix on these
    components give the relations between the samples
    Scores Plot
  • A plot of the eigenvectors of the covariance
    matrix gives a relationship between the variables
    Loadings Plot
  • Reduces the dimension of the problem a few
    components suffice to explain the variance
  • Courtesy Wise, B. M. and Gallagher, N. B.,
    PLS_Toolbox 2.1

11
PCA Results
Scores Plot
Loadings Plot
12
Cluster Analysis - Hierarchical
Transformed Data Groups Clearly Identified
Untransformed Data
13
Generalized Log Transformation
  • Shown that a transformation of the form
  • f(y) ln( y (y2 c) )
  • can lead to a variance stabilizing effect on the
    data
  • The parameter c can be obtained by Maximum
  • Likelihood or ANOVA methods and is of the
    value
  • c s2 / S2
  • where s2 is the variance of the noise and S2 the
    variance of the high peaks
  • Durbin, B., Hardin, J., Rocke, D. M.,
    Bioinformatics, 2002, 18, s105-s110
  • Sue Geller, Jeff Gregg, Paul Hagerman, David
    Rocke, Transformation and Normalization of
    Oligonucleotide Microarray Data, 2003

14
Maximum Likelihood
  • Need replicates to determine accurate the SSE (c)
  • Find c for the minimum SSE
  • Find c steps using Newtons method or educated
    intervals
  • Box, G. and Cox. D.R. (1964) An Analysis of
    transformations. J. roy. Stat. Soc.. Series B
    (Methodological), 26, 211.

Error Sum of Squares
c
15
Transformed Spectrum
Calculate c using the replicate data by maximum
likelihood methods Use transformation of the form
using replicates, Transform data to stabilize
the variance f(y) ln( y (y2 c) )
Bin Size .04 ppm 239 Bins, c 2.7e-7
16
Stabilized Variance
Bin Size .04ppm
Bin Size .04ppm C 2.7E-7
17
Scores Plot Transformation Effects
Untransformed Data
Transformed Data
18
Loadings Plot Transformation Effects
Untransformed Data
Transformed Data
19
Cluster Analysis - Hierarchical
Transformed Data Groups Clearly Identified
Untransformed Data
20
Raw Spectra Significant Bins
Healthy Stunt. Sick
Healthy Stunt. Sick
Glycogen, Sucrose, Fructose ?
Bin 124 5.38 ppm Bin 76 3.22
ppm Bin 125 5.42 ppm Bin 77 3.26
ppm Bin 126 5.46 ppm Bin 78 3.3
ppm
21
Conclusions
  • Demonstrated the use of data reduction
    techniques, multi-variate techniques for studying
    NMR and Mass Spectrometer data
  • Demonstrated the use of these techniques to
    identify metabolite and protein bio-markers
  • Showed the usefulness of transformations in
    rendering the data more useful

22
Acknowledgements
  • David M. Rocke, CIPIC
  • David L. Woodruff, CIPIC
  • Mark R. Viant, U. of Birmingham, U. K.
Write a Comment
User Comments (0)
About PowerShow.com