Discrimination Models and Variance Stabilizing Transformations of Metabolomic NMR Data

1 / 22

About This Presentation

Title:

Discrimination Models and Variance Stabilizing Transformations of Metabolomic NMR Data

Description:

Discrimination Models and Variance Stabilizing Transformations of Metabolomic NMR Data ... etc. Analytical techniques Microarray ... 5 stunted, 5 sick Tissue ... –

Number of Views:319

Avg rating:3.0/5.0

Slides: 23

Provided by: Paru83

Learn more at: https://www.amstat.org

Category:

more less

Transcript and Presenter's Notes

Title: Discrimination Models and Variance Stabilizing Transformations of Metabolomic NMR Data

1
Discrimination Models and Variance Stabilizing
Transformations of Metabolomic NMR Data

Institute on Research and Statistics, Sacramento
04/08/04
Parul Vora Purohit

2
Biodata and omics

Genome Project
Genomics - Study of Genes
Proteomics - Study of proteins
Metabolomics - Study of metabolites
cellomics, CHOmics, chromonoics, etc.
Analytical techniques
Microarray Spectroscopy
Mass Spectroscopy
NMR Spectroscopy

3
NMR Spectroscopy
Intense homogenous and magnetic field High
Powered RF transmittor capable of delivering
short pulses 500 MHz stimulate 1H nuclear spin
transitions Probe which enables the coils used to
excite and detect the signal Plot of signal vs
shift in frequency from original pulse Measured
in ppm (ratio from the original signal)

Curtsey Joseph Medendorp / Public Information /
University of Kentucky

4
NMR Data

Allows detection of compounds with H content
Shift characterizes the chemicals (metabolites)
Examples
2.14 ppm glutamine ? CH2 group
2.27 ppm - valine ß CH group
6.91 ppm tyrosine C3, 5H ring
65,000 points (variables) per sample

5
Questions

Classification Can we distinguish sick
organisms from the healthy ones?
Identification Which metabolites play a role in
the disease (biomarker)?
DIFFERENCES IN THE DETAILS!

6
Abalone Data

A set of 18 abalone
8 healthy, 5 stunted, 5 sick
Tissue from muscle
Questions
Can we classify the abalone accurately ?
Can we detect any metabolites that are markers?

7
Problems / Solutions

Multivariate Techniques
Matrix of 65,000 (variables) x 18 (samples)
Too many variables as compared to the number of
samples
Dimension Reduction by Binning
Classification and metabolite marker
identification using PCA and Cluster Analysis
Methods assume that the data is normally
distributed with a constant variance
Generalized Log Transformation improves results!

8
NMR Data Pre-Processing

Background Subtraction
TMSP Peak (standard at 0 ppm removed)
Water Peak Removal
4.72-4.96 ppm removed)
Normalization
Integrated Intensity normalized to 1.0 to remove
the effects of systematic intensity changes
between abalone
Binning / Size

9
Binned Spectrum
Bin Size .04 ppm 239 Bins

Bin Size Range 0.00125 ppm 0.7 ppm
Intensity of Bin Integrated Intensity of all
points in Bin
Restricted Region of interest to 0.2 ppm 10.0
ppm

10
Principal Component Analysis (PCA)

Technique that allows for the explanation of the
variance-covariance of the variables in terms of
a linear combination of them
X t1pT1 t2pT2 tkpTk E pi -
eigenvectors
Projections of the original data matrix on these
components give the relations between the samples
Scores Plot
A plot of the eigenvectors of the covariance
matrix gives a relationship between the variables
Loadings Plot
Reduces the dimension of the problem a few
components suffice to explain the variance

Courtesy Wise, B. M. and Gallagher, N. B.,
PLS_Toolbox 2.1

11
PCA Results
Scores Plot
Loadings Plot
12
Cluster Analysis - Hierarchical
Transformed Data Groups Clearly Identified
Untransformed Data
13
Generalized Log Transformation

Shown that a transformation of the form
f(y) ln( y (y2 c) )
can lead to a variance stabilizing effect on the
data
The parameter c can be obtained by Maximum
Likelihood or ANOVA methods and is of the
value
c s2 / S2
where s2 is the variance of the noise and S2 the
variance of the high peaks
Durbin, B., Hardin, J., Rocke, D. M.,
Bioinformatics, 2002, 18, s105-s110
Sue Geller, Jeff Gregg, Paul Hagerman, David
Rocke, Transformation and Normalization of
Oligonucleotide Microarray Data, 2003

14
Maximum Likelihood

Need replicates to determine accurate the SSE (c)
Find c for the minimum SSE
Find c steps using Newtons method or educated
intervals
Box, G. and Cox. D.R. (1964) An Analysis of
transformations. J. roy. Stat. Soc.. Series B
(Methodological), 26, 211.

Error Sum of Squares
c
15
Transformed Spectrum
Calculate c using the replicate data by maximum
likelihood methods Use transformation of the form
using replicates, Transform data to stabilize
the variance f(y) ln( y (y2 c) )
Bin Size .04 ppm 239 Bins, c 2.7e-7
16
Stabilized Variance
Bin Size .04ppm
Bin Size .04ppm C 2.7E-7
17
Scores Plot Transformation Effects
Untransformed Data
Transformed Data
18
Loadings Plot Transformation Effects
Untransformed Data
Transformed Data
19
Cluster Analysis - Hierarchical
Transformed Data Groups Clearly Identified
Untransformed Data
20
Raw Spectra Significant Bins
Healthy Stunt. Sick
Healthy Stunt. Sick
Glycogen, Sucrose, Fructose ?
Bin 124 5.38 ppm Bin 76 3.22
ppm Bin 125 5.42 ppm Bin 77 3.26
ppm Bin 126 5.46 ppm Bin 78 3.3
ppm
21
Conclusions

Demonstrated the use of data reduction
techniques, multi-variate techniques for studying
NMR and Mass Spectrometer data
Demonstrated the use of these techniques to
identify metabolite and protein bio-markers
Showed the usefulness of transformations in
rendering the data more useful

22
Acknowledgements