Title: Canadian Bioinformatics Workshops
1Canadian Bioinformatics Workshops
22
Module Title of Module
3Module 6
- David Wishart
- Informatics and Statistics for Metabolomics
- June 16-17, 2011
4A Typical Metabolomics Experiment
52 Routes to Metabolomics
Quantitative (Targeted) Methods
Chemometric (Profiling) Methods
6Metabolomics Data Workflow
Chemometric Methods Targeted Methods
- Data Integrity Check
- Spectral alignment or binning
- Data normalization
- Data QC/outlier removal
- Data reduction analysis
- Compound ID
- Data Integrity Check
- Compound ID and quantification
- Data normalization
- Data QC/outlier removal
- Data reduction analysis
7Data Integrity/Quality
- LC-MS and GC-MS have high number of false
positive peaks - Problems with adducts (LC), extra derivatization
products (GC), isotopes, breakdown products
(ionization issues), etc. - Not usually a problem with NMR
- Check using replicates and adduct calculators
MZedDB http//maltese.dbs.aber.ac.uk8888/hrmet/in
dex.html HMDB http//www.hmdb.ca/search/spectra?ty
pems_search
8Data/Spectral Alignment
- Important for LC-MS and GC-MS studies
- Not so important for NMR (pH variation)
- Many programs available (XCMS, ChromA, Mzmine)
- Most based on time warping algorithms
http//mzmine.sourceforge.net/ http//bibiserv.tec
hfak.uni-bielefeld.de/chroma http//metlin.scripps
.edu/download/
9Binning (3000 pts to 14 bins)
xi,yi x 232.1 (AOC) y 10 (bin )
bin1 bin2 bin3 bin4 bin5 bin6 bin7 bin8...
10Data Normalization/Scaling
Same or different?
- Can scale to sample or scale to feature
- Scaling to whole sample controls for dilution
- Normalize to integrated area, probabilistic
quotient method, internal standard, sample
specific (weight or volume of sample) - Choice depends on sample circumstances
11Data Normalization/Scaling
- Can scale to sample or scale to feature
- Scaling to feature(s) helps manage outliers
- Several feature scaling options available log
transformation, auto-scaling, Pareto scaling,
probabilistic quotient, and range scaling
MetaboAnalyst http//www.metaboanalyst.ca Dieterle
F et al. Anal Chem. 2006 Jul 178(13)4281-90.
12Data QC, Outlier Removal Data Reduction
- Data filtering (remove solvent peaks, noise
filtering, false positives, outlier removal --
needs justification) - Dimensional reduction or feature selection to
reduce number of features or factors to consider
(PCA or PLS-DA) - Clustering to find similarity
13MetaboAnalyst
- Web server designed to handle large sets of
LC-MS, GC-MS or NMR-based metabolomic data - Supports both univariate and multivariate data
processing, including t-tests, ANOVA, PCA, PLS-DA - Identifies significantly altered metabolites,
produces colorful plots, provides detailed
explanations summaries - Links sig. metabolites to pathways via SMPDB
http//www.metaboanalyst.ca
14Metabolite concentrations
MS / NMR peak lists
GC/LC-MS raw spectra
MS / NMR spectra bins
- Peak detection
- Retention time correction
Baseline filtering
Peak alignment
- Resources utilities
- Peak searching
- Pathway mapping
- Name conversion
- Lipidomics
- Metabolite set libraries
- Data integrity check
- Missing value imputation
- Data normalization
- Row-wise normalization (4)
- Column-wise normalization (4)
- Statistical analysis
- Univariate analysis
- Dimension reduction
- Feature selection
- Cluster analysis
- Classification
- Pathway analysis
- Enrichment analysis
- Topology analysis
- Interactive visualization
- Time-series /two factor
- Visualization
- Two-way ANOVA
- ASCA
- Temporal comparison
15MetaboAnalyst Overview
- Raw data processing
- Using MetaboAnalyst
- Data Reduction Statistical analysis
- Using Metaboanalyst
- Functional enrichment analysis
- Using MSEA in MetaboAnalyst
- Metabolic pathway analysis
- Using MetPA in MetaboAnalyst
16Example Datasets
17Example Datasets
18Metabolomic Data Processing
19Common Tasks
- Purpose to convert various raw data forms into
data matrices suitable for statistical analysis - Supported data formats
- Concentration tables (Targeted Analysis)
- Peak lists (Untargeted)
- Spectral bins (Untargeted)
- Raw spectra (Untargeted)
20Data Upload
21Alternatively
22Data Set Selected
- Here we will be selecting a data set from dairy
cattle fed different proportions of cereal grains
(0, 15, 30, 45) - The rumen was analyzed using NMR spectroscopy
using quantitative metabolomic techniques - High grain diets are thought to be stressful on
cows
23Data Integrity Check
24Data Normalization
25Data Normalization
- At this point, the data has been transformed to a
matrix with the samples in rows and the variables
(compounds/peaks/bins) in columns - MetaboAnalyst offers three types of
normalization, row-wise normalization,
column-wise normalization and combined
normalization - Row-wise normalization aims to make each sample
(row) comparable to each other (i.e. urine
samples with different dilution effects)
26Data Normalization
- Column-wise normalization aims to make each
variable (column) comparable to each other. This
procedure is useful when variables are of very
different orders of magnitude. Four methods have
been implemented for this purpose log
transformation, autoscaling, Pareto scaling and
range scaling
27Normalization Result
28Quality Control
- Dealing with outliers
- Detected mainly by visual inspection
- May be corrected by normalization
- May be excluded
- Noise reduction
- More of a concern for spectral bins/ peak lists
- Usually improves downstream results
29Visual Inspection
- What does an outlier look like?
30Outlier Removal
31Noise Reduction
32Noise Reduction (cont.)
- Characteristics of noise
- Low intensities
- Low variances (default)
33Data Reduction and Statistical Analysis
34Common tasks
- To detect interesting patterns
- To identify important features
- To assess difference between the phenotypes
- Classification / prediction
-
35(No Transcript)
36ANOVA
37View Individual Compounds
38Questions
- Q Which compounds show significant difference
among all the neighboring groups (0-15, 15-30,
and 30-45)? - Q For Uracil, are groups 15, 30, 45
significantly different from each other?
39Template Matching
- Looking for compounds showing interesting
patterns of change
40Template Matching (cont.)
41Question
- Q Identify compounds that decrease in the first
three groups but increase in the last group?
42PCA Scores Plot
43PCA Loading Plot
44Question
- Q Identify compounds that contribute most to the
separation between group 15 and 45
45PLS-DA Score Plot
46Determine of Components
47Important Compounds
48Model Validation
49Questions
- Q What does p lt 0.01 mean?
- Q How many permutations need to be performed if
you want to claim p value lt 0.0001?
50Heatmap Visualization
51Heatmap Visualization (cont.)
52Question
- Q Identify compounds with a low concentration in
group 0, 15 but increase in the group 35 and 45 - Q Which compound is the only one significantly
increased in group 45?
53Download Results
54Analysis Report
55Metabolite Set Enrichment Analysis
56Metabolite Set Enrichment Analysis (MSEA)
- Web tool designed to handle lists of metabolites
(with or without concentration data) - Modeled after Gene Set Enrichment Analysis (GSEA)
- Supports over representation analysis (ORA),
single sample profiling (SSP) and quantitative
enrichment analysis (QEA) - Contains a library of 6300 pre-defined metabolite
sets including 85 pathway sets 850 disease sets
http//www.msea.ca
57Enrichment Analysis
- Purpose To test if there are some biologically
meaningful groups of metabolites that are
significantly enriched in your data - Biological meaningful
- Pathways
- Disease
- Localization
- Currently, only supports human metabolomic data
58MSEA
- Accepts 3 kinds of input files
- 1) list of metabolite names only
- 2) list of metabolite names concentration data
from a single sample - 3) a concentration table with a list of
metabolite names concentrations for multiple
samples/patients
59Start with a Compound List
60Upload Compound List
61Compound Name Standardization
62Name Standardization (cont.)
63Select a Metabolite Set Library
64Result
65Result (cont.)
66The Matched Metabolite Set
67Single Sample Profiling
68Single Sample Profiling (cont.)
69Concentration Comparison
70Concentration Comparison (cont.)
71Quantitative Enrichment Analysis
72Data Set Selected
- Here we are using a collection of metabolites
identified by NMR (compound list
concentrations) from the urine from 77 lung and
colon cancer patients, some of whom were
suffering from cachexia (muscle wasting)
73Result
74The Matched Metabolite Set
75Question
- Q Are these metabolites increased or decreased
in the cachexia group?
76Metabolic Pathway Analysis with MetPA
77Pathway Analysis
- Purpose to extend and enhance metabolite set
enrichment analysis for pathways by - Considering the pathway structures
- Supporting pathway visualization
- Currently supports 15 organisms
78Data Upload
79Data Set Selected
- Here we are using a collection of metabolites
identified by NMR (compound list
concentrations) from the urine from 77 lung and
colon cancer patients, some of whom were
suffering from cachexia (muscle wasting)
80Normalization
81Pathway Libraries
82Network Topology Analysis
83Which Node is More Important?
High degree centrality
High betweenness centrality
84Pathway Visualization
85Pathway Visualization (cont.)
86Question
- Q Which pathway do you think is likely to be
affected the most? Why?
87Result
88Not Everything Was Covered
- Clustering (K-means, SOM)
- Classification (SVM, randomForests)
- Time-series data analysis
- Two factor data analysis
- Peak searching
- .