Metabolite fingerprinting: detecting biological features by independent component analysis - PowerPoint PPT Presentation

1 / 34
About This Presentation
Title:

Metabolite fingerprinting: detecting biological features by independent component analysis

Description:

... distribution has a kurtosis of zero. Positive kurtosis indicates a peaked' distribution ... Plotted are the number of extracted ICs with negative kurtosis. ... – PowerPoint PPT presentation

Number of Views:68
Avg rating:3.0/5.0
Slides: 35
Provided by: bioinform8
Category:

less

Transcript and Presenter's Notes

Title: Metabolite fingerprinting: detecting biological features by independent component analysis


1
Metabolite fingerprinting detecting
biologicalfeatures by independent component
analysis
PCA
ICA
  • 11/22/2004
  • ???
  • IASL

2
Outline
  • Motivation
  • Introduction
  • Metabolite fingerprint
  • PCA
  • ICA
  • Materials and Methods
  • 96 samples
  • QTOF
  • Results
  • Results
  • Kurtosis
  • Conclusion
  • Our strategy --- For ICAT (Heavy or Light
    labeling)

3
Motivation
  • Metabolite fingerprinting is a technology for
    providing information from spectra of total
    compositions of metabolites.
  • The question is how metabolite fingerprinting
    reflects the biological background.
  • In many applications the classical principal
    component analysis (PCA) is used for detecting
    relevant information.
  • Due to its independence condition, the
    independent component analysis (ICA) is more
    suitable for our questions than PCA.
  • However, ICA has not been developed for a small
    number of high-dimensional samples, therefore a
    strategy is needed to overcome this limitation.

4
Outline
  • Motivation
  • Introduction
  • Metabolite fingerprint
  • PCA
  • ICA
  • Materials and Methods
  • 96 samples
  • QTOF
  • Results
  • Results
  • Kurtosis
  • Conclusion
  • Our strategy --- For ICAT (Heavy or Light
    labeling)

5
Introduction - Metabolite fingerprinting
  • All of these analytical approaches cannot be done
    by full-composition metabolomic tests, but
    instead call for a cheaper and faster first-round
    screening method.
  • To group data according to inherent biological
    characteristics and distinguishes these from
    inherent, unrelated background noise.
  • Without individually determining metabolite
    identities, have been termed metabolite
    fingerprinting (Fiehn, 2001) and were
    successfully applied to discriminate strains of
    bacteria using time-of-flight mass spectrometry
    (Vaidyanathan et al., 2001) or other techniques
    such as infrared spectroscopy (Thomas et al.,
    2000).
  • In biomedical fields, the same strategy is used
    by applying nuclear magnetic resonance and termed
    metabonomics.
  • One of the main questions in metabolite
    fingerprinting is what the major pieces of
    information provided by the spectra are, and
    whether the information relates to the
    experimental conditions or to some interfering
    signals.

6
PCA v.s. ICA
  • Principal component analysis seeks directions in
    feature space that best represent the data in
    least squares sense.
  • Independent component analysis seeks directions
    in the data that are most independent from one
    another.

7
Introduction - PCA
  • One well-established technique for dimensionality
    reduction and visualization is the classical
    principal component analysis (PCA), where the
    extracted information is represented by a set of
    new variables, termed components or features.
    Diamantaras and Kung (1996) give a good overview
    of different PCA-algorithms.
  • In the field of metabolomics, PCA became a
    popular tool for visualizing datasets and for
    extracting relevant information (Ward et al.,
    2003 Urbanczyk-Wochniak et al., 2003).
  • However, PCA is only powerful if the biological
    question is related to the highest variance in
    the dataset. If this is not the case, other
    techniques of statistics or related fields may be
    more helpful, depending on the biological
    question, as shown by Goodacre et al. (2003) and
    Johnson et al. (2003) for supervised techniques
    in combination with validation and
    pre-processing.

8
Introduction - ICA
  • In ICA, an independence condition is optimized,
    which often gives more meaningful components than
    optimization of only the variance, as is done by
    PCA.
  • Because of this the components of ICA are termed
    independent components (ICs), meaning that
    different ICs represent different non-overlapping
    information.
  • For applying ICA we assume that the observed data
    have been determined by some unknown fundamental
    factors, which are independent of each other.
  • By searching for components as statistically
    independent as possible these required factors
    can be detected. These fundamental factors are
    often termed sources and the application field is
    called blind source separation, BSS.

9
The simple Cocktail Party Problem
Mixing matrix A
x1
s1
Observations
Sources
x2
s2
x As
n sources, mn observations
10
The simple Cocktail Party Problem
Mixing matrix A
x1
s1
Observations
Sources
x2
s2
x As
n sources, mn observations
11
Independent Component Analysis (ICA)
Without knowing position of microphones or what
any person is saying, can you isolate each of the
voices?
12
Independent Component Analysis (ICA)
Assumption each sound from speaker unrelated to
others (independent)
13
ICA Example
  • BSS of recorded speech and music signals.

http//www.cnl.salk.edu/tewon/ica_cnl.html
14
ICA Separation
ICA
Two Independent Sources
Mixture at two Mics
Get the Independent Signals out of the Mixture
15
Independent Component Analysis
  • Possible applications for ICA include
  • Neurobiological modeling
  • Radio and telephone communication
  • Preprocessing
  • EEG and MEG processing

16
Outline
  • Motivation
  • Introduction
  • Metabolite fingerprint
  • PCA
  • ICA
  • Materials and Methods
  • 96 samples
  • QTOF
  • Results
  • Results
  • Kurtosis
  • Conclusion
  • Our strategy --- For ICAT (Heavy or Light
    labeling)

17
Materials and Methods
Total samples 96
Arabidopsis thaliana
MS
Electrospray/QTOF mass spectra
Weighted density function
763 variables, 92 samples
Hybrid vigour or Heterosis display interesting
features such as higher growth, better fitness
and improved resistance against biotic and
abiotic stress factors.
Therefore, we expected to find the largest
distance between the F1 groups and the parents,
the second largest difference between the two
parents and just a small difference or none at
all between the two F1 genotypes.
18
Materials and Methods
763 variables, 92 samples
By applying PCA for visualization we have to
assume that the most interesting information is
directly related to the highest variance in the
data.
PCA
PCs
ICA
minimize the dependence
ICs
define a criterion for sorting these components
to our interest.
Sorted by Kurotosis
Meaningful ICs
19
Fig. 1. Mass spectra comparison of different
Arabidopsis lines and their crosses. The
intensities are plotted against the mass
(mass-to-charge ratios, m/z). From each group one
sample is arbitrarily taken. The global structure
of the spectra is very similar. However there are
differences between masses of smaller
intensities. To select the relevant information
is the challenge for our analysis.
20
Fig. 2. Combined spectral data. Above, the
intensities are plotted against the mass (m/z)
for all mass-intensity pairs (given by the
highest peak in the spectra) over all samples.
Only the mass range of 115119 amu of the total
range of 501500 amu is shown. For assigning the
mass values to a set of variables a density
function is used, shown below. The peaks of the
density function (marked by a plus ) point to
high concentrations of mass values. The masses
around one peak (marked by a circle ?) are
assigned to one variable, the residual
mass-intensity pairs are removed.
21
Fig. 3. PCA on normalized data. In each plot the
first two components of PCA are plotted against
each other. PCA is applied to different
normalized datasets.Without any normalization
there is no clear separation between the
different groups. By scaling the metabolites to
unit variance, the parent generation can be
separated from the F1 generation. By scaling the
samples to unit vector norm, even the
parent-lines can be separated.
22
Fig. 4. PCA on vector normalized data. The first
three principal components (PCs) are plotted
pairwise against each other. Note that the first
PC (of highest variance) is not related to our
problem of separating the sample groups. Better
results are given by components of smaller
variance, PC 2 and PC 3.
23
Fig. 5. ICA compared to PCA. The best
PCA-visualization given by PC 2 and PC 3 is
plotted on the left. The different groups are
only partially separable. Compared to this the
ICA result, given by the two ICs of most negative
kurtosis, IC 1 and IC 2, is shown on the right.
ICA gives a projection of the data with a greater
separation between the different groups.
24
Fig. 6. The third component of ICA (IC 3) has no
information about the experimental groups (left).
However, there is a relation to the time, when
the samples are measured, shown on the right.
This technical factor could not be detected by
PCA.
25
Outline
  • Motivation
  • Introduction
  • Metabolite fingerprint
  • PCA
  • ICA
  • Materials and Methods
  • 96 samples
  • QTOF
  • Results
  • Results
  • Kurtosis
  • Conclusion
  • Our strategy --- For ICAT (Heavy or Light
    labeling)

26
Kurtosis
  • Kurtosis is a classical measure of
    non-Gaussianicity, and is computationally and
    theoretically, relatively simple. It indicates
    whether the data are peaked or flat relative to a
    Gaussian (normal) distribution. A Gaussian
    distribution has a kurtosis of zero. Positive
    kurtosis indicates a peaked distribution
    (super- Gaussian) and negative kurtosis indicates
    a flat distribution (sub-Gaussian).

High vs. Low Variance
These graphs illustrate the notion of variance.
The one on the left is more dispersed than the
one on the right. It has a higher variance.
27
Kurtosis
  • Examples of super-Gaussian distributions (highly
    positive kurtosis) are speech signals, because
    these are predominantly close to zero.
  • Negative kurtosis can indicate
  • Cluster structure -- The former can resolve
    between two experimental conditions (high and low
    concentrations of metabolites)
  • Uniformly distributed factor -- The latter can
    represent a continuously changed experimental
    factor such as the temperature or the light
    intensity.
  • Thus the components with most negative kurtosis
    could give us the most relevant information.

28
Fig. 7. Left different numbers of PCs are used
for dimensionality reduction. ICA is applied for
each of these reduced datasets. Plotted are the
number of extracted ICs with negative kurtosis.
By using the first 6 components of PCA, ICA can
extract the highest number of interesting ICs,
whereas the kurtosis of IC 4 is close to zero.
Right For this 6 dimensional reduced dataset,
the kurtosis of all extracted ICs are plotted.
29
The 10 masses of highest influence are shown for
different components. On the left the masses
given by the classical PCA are shown for PC 2 and
PC 3. These are the PCs which are closest to the
first two ICs of ICA, shown on the right. The
masses given by ICA are different to these of PCA
and are rather assignable to only one IC. These
higher mass separations are shown in Figure 8.
30
Fig. 8. Mass influences. For each mass from Table
1 the absolute influence on each component is
plotted. The masses in PCA have a greater
influence on both components than the masses in
ICA, which are assigned more to one or to the
other component.
31
Fig. 9. Outlier detection by ICA. The last two
components with the most positive kurtosis are
plotted against each other. The IC 6
clearly indicates an outlier, marked by an arrow.
32
Outline
  • Motivation
  • Introduction
  • Metabolite fingerprint
  • PCA
  • ICA
  • Materials and Methods
  • 96 samples
  • QTOF
  • Results
  • Results
  • Kurtosis
  • Conclusion
  • Our strategy --- For ICAT (Heavy or Light
    labeling)

33
Our strategy --- For ICAT (Heavy or Light
labeling)
ICA Separation
  • ICA (??????PPT)
  • Separation
  • Recognition
  • IASL ICAT PROCEDURE

Heavy
Light
ICA
Two Independent Sources
Mixture at two Experiments
Get the Independent Signals out of the Mixture
34
??? ??
  • De noising
  • Missing data recover
  • Speech enhancement and Recognition
  • Voice Signal Noise 11000
Write a Comment
User Comments (0)
About PowerShow.com