Title: Data Analysis for Mouse MALDI Spectrum Data
1Data Analysis for Mouse MALDI Spectrum Data
- Xuelian Wei
- Department of Statistics
2Outline
- Data sets analyzed in this presentation
- C18 Q4 Low (73 spec 3 control spec, 41 samples
control). - C18 Q9 Low (84 spec 10 control spec, 43 samples
control). - C18 S4 Low (78 spec 8 control spec, 40 samples
control). - WCX Q4 Low (76 spec 10 control spec, 42
samples control). - WCX Q9 High (85 spec 9 control spec, 43 samples
control). - Protein modification detection
- Two lipid modifications of proteins, 138.10 and
120.09 Da. - Oxidation modification, 16 Da.
- Correlation analysis with phenotype data
- 23 phenotypes (ApoA1 and ApoA2 were detected due
to too many missing values).
3Pre-processing
- Adjusted Intensity table for all spectra and all
peaks. - Each row represent a peak.
- Each column represent a spectrum.
- Adjusted intensity table for all samples and all
peak clusters. (using for future analysis) - Each row represent a peak cluster.
- Each column represent a sample.
- Each cell is the summation of average adjusted
intensity in that peak cluster. - The minimal MZ in a cluster represent the MZ for
that cluster.
4Pre-processing
- Adjusted Intensity table for all spectra and all
peaks. - Adjusted intensity table for all samples and all
peak clusters. (using for future analysis)
51. C18 Q4 Low
62. C18 Q9 Low
73. C18 S4 Low
84. WCX Q4 Low
95. WCX Q9 High
10Pre-Processing
- Refer to complimentary files to see more detailed
zoom-in peak cluster detection. - Such as _plot_mapped_peak_cluster_5_6_7_8.pdf.
11Possible outlier spectra
12Protein modification detectionMethod I
- Idea based on pre-defined peak clusters.
- Algorithm
- the difference between the MZs of the first peak
in each pair of peak clusters is within
Da. - Drawback
- The pre-defined peak clusters may not be perfect.
- The size may be different a lot.
- Conclusion
- It only provides possible protein modifications.
- Subjective judgment needed.
13Protein modification detectionMethod I
14Protein modification detectionMethod II
- Idea based on all peaks.
- Algorithm
- First find all possible modification peak pairs.
- If 75 peaks in a peaks cluster have matched
modification peaks, marked it as a potential peak
cluster with modification.
15Protein modification detectionMethod II
16Protein modification detectionMethod II
17Protein modification detectionMethod II
18Protein modification detection
- Be careful! The of detected modifications in
above table is meaningless, make your own
judgment based on figures! - Refer to complimentary files for more detail
figures, such as Protein_mod_138.10_2.pdf
19Protein modification detection
- For mod 16 150, count the number of matched
peak cluster pairs.
20Protein modification detection
- For mod 16 150, count the number of matched
peak pairs.
21Correlation Analysis with phenotype data
- 25 Phenotypes
- ApoA1 ApoA2 excluded form this study due to too
many missing.
22Correlation Analysis with phenotype data
- Correlation
- First, Normscore-transformation applied to all
vectors to reduce the outlier effect. - For a given phenotype, say Insulin_ug_l, the
correlation between Insulin_ug_l and all peak
cluster profile are computed. - Question how to choose cutoff point?
23Correlation Analysis with phenotype data
- Permutation FDR
- The order of samples have been permuted, hence
the internal relationship between phenotype and
peak cluster has been broken, lets see how the
correlations distributed under such permutation. - Permute 1000 times.
24Correlation Analysis with phenotype data
- FDR
- For a given cutoff point, say 0.5.
- D of discover of peak clusters with
abs(correlation) higher than the cutoff point in
our data. - FD of false discover of peak clusters
with abs(correlation) higher than the cutoff
point in 1000 permutation data / 1000. - FDR D / FD 100.
25Correlation Analysis with phenotype data
- Permutated p_value
- Permutated p_value of peak clusters in 1000
permutations with correlation exceed than the
observed correlation / of peak clusters 1000.
26Correlation Analysis with phenotype data
27Correlation Analysis with phenotype data
- Refer to complimentary files for more FDR
figures, such as _PlotFDR.ps. - Refer to complimentary files for more scatter
plot, such as _Scatter_plot_insulin_ug_l.ps.
28With large bin size
29With large bin size
30Result for bin.size60
31(No Transcript)
32(No Transcript)
33(No Transcript)
34(No Transcript)
35(No Transcript)
36(No Transcript)
37