MS Preprocessing and Evaluation - PowerPoint PPT Presentation

1 / 16
About This Presentation
Title:

MS Preprocessing and Evaluation

Description:

Geneva Artificial Intelligence Laboratory. Centre Universitaire d'Informatique ... – PowerPoint PPT presentation

Number of Views:30
Avg rating:3.0/5.0
Slides: 17
Provided by: cuiu
Category:

less

Transcript and Presenter's Notes

Title: MS Preprocessing and Evaluation


1
MS Preprocessing and Evaluation
  • Julien Prados

2
Introduction
  • Context
  • diagnosis and biomarker extraction from SELDI
    MALDI mass spectra
  • Issues
  • preprocessing mass spectra
  • tune preprocessing parameters
  • evaluate preprocessing quality

3
Work Flow
  • Preprocessing
  • Baseline Estimation
  • MS Normalisation (TIC)
  • Noise Estimation/Elimination
  • Peak Detection
  • Extract Peak Caracteristics
  • Peak Alignment

Spectra with Control Diseased labels
List of Discriminant Features
A learning dataset
Classification model for prediction of patient
state from its mass spectrum
Machine Learning
4
Signal Distortions Correction
  • Baseline estimated with open operator (local
    maxima of the local minima) in a sliding window
  • Total Ion Current Normalization with baseline
    corrected signal and part of the signal gt 2000 Da

5
Noise Estimation / Elimination
  • Noise estimated by standard deviation in a
    sliding window
  • To determine what is a peak and what is not
  • Possibility of signal smoothing (e.g. with
    wavelet, FFT)
  • Better to work with raw data

6
Peak Detection
7
Extracting Peak Area
  • p fixed point found by peak detection
  • The signal is splitted into regions according to
    the minima between two consecutive peaks
  • In each region, pl and pr are found by mean least
    square fitting of a piecewise linear model in two
    segments (horizontal and oblique)
  • Area of the peak is given by area of the triangle
    (pl, p, pr)

8
Peak Alignement Missing Values
  • Peak alignment performed by hierachical
    clustering (closest peaks are merged)
  • Two strategies for missing values
  • set missing values to zero because there is no
    peak
  • retrieve signal intensity (not obvious for peak
    area)

?
?
?
9
Data Representation
  • 3 possible data representations
  • peak intensity signal intensity for missing
    values (is)
  • peak intensity zero filling of missing values
    (iz)
  • peak area zero filling of missing values (az)

10
Preprocessing Evaluation
  • Solution 1 (Ideal case) Have samples with known
    content, or use a MS simulator and estimate peak
    detection performance.
  • Solution 2 Do spectra replicates and estimate
    peak detection stability.
  • Solution 3 In diagnostic applications, choose
    preprocessing parameters minimising
    generalisation error.

11
Choosing Peak Detection Parameter
  • Compare detections between a normal MS spectrum
    and a blank one
  • ? We can used a peak detection parameter of 2.5

12
Preprocessing Evaluation in Diagnostic (1/2)Data
Representation Evaluation
  • is/iz ? filling missing values with signal
    intensity instead of zeroes retains more
    discriminatory informations
  • iz/az ? using area or intensity does not result
    in significant differences
  • is/raw ? no significant information lost in
    preprocessing, but a much more compact
    representation gain

13
Preprocessing Evaluation in Diagnostic
(2/2)Influence of Peak Detection Parameter
  • Choosing is representation and SMO algorithm,
    what is the influence of peak detection parameter
    on the information content of the preprocessed
    datasets ?

14
Preprocessing Evaluation in Presence of
Replicates (1/2)
  • perform peak detection and alignment

15
Preprocessing Evaluation in Presence of
Replicates (2/2)
  • Estimate pourcentage of peaks find in at least
    10/10, 9/10, 8/10 ... 1/10 replicates

16
Conclusion
  • Results depends heavily on parameter tunning, it
    should be done in an informed manner
  • manual selection
  • automatic selection
  • We saw preprocessing pipeline of SELDI data.
    Preprocessing LC-MS data bring new issues
  • More dimensions (2D, 3D)
  • How to perform realtime peak detection and
    alignement of LCMS data ?
  • How to perform realtime protein identification
    and guide MS/MS selection ?
  • Challenge How to build learning methods able to
    deal with raw data

17
Peak Definition
  • Valley definition for a point p The minimum
    points on left and right of p such that their is
    no point with intensity higher than the intensity
    of p between them.
  • Peak definition a spectrum point p is considered
    a peak if its left and right valleys are deeper
    than the noise level.
  • Remark
  • No assumption on peak width

18
Data Representation Evaluation
  • Error evaluation of 3 classification algorithms
  • Instance Base Learning (IBk)
  • Decision Tree (J48)
  • Support Vector Machine (SMO)
  • On 3 data representations
  • peak intensity signal intensity for missing
    values (is)
  • peak intensity zero filling of missing values
    (iz)
  • peak area zero filling of missing values (az)
  • For 3 datasets
  • Stroke (Stk)
  • Prostate Cancer (Pro)
  • Ovarian Cancer (Ova)

19
Perspectives
  • Preprocessing pipeline used in a reproducibility
    study of MALDI-TOF MS Zeferos et al. Sample
    Preparation and Bioinformatics in MALDI Profiling
    of Urinary Proteins. Submitted, JChromat, 2006
  • Peak detection algorithm has been extended to 2D
    (and even nD) and applied on Nano-LC MS

20
2D Nano-LC Peak Detection
21
Thank you !
22
Preprocessing Objectives
  • Correct signal distortions (baseline,
    normalization)
  • Reduce dimensionality of learning problem (peak
    detection)
  • Avoid removing discriminative informations

23
vd vs Error
Write a Comment
User Comments (0)
About PowerShow.com