Title: A New Approach to Analyzing Gene Expression Time Series Data
1A New Approach to Analyzing Gene Expression Time
Series Data
- Ziv Bar-Joseph
- Georg Gerber
- David K. Gifford
- Tommi S. Jaakkola
- Itamar Simon
Learning Seminar Bioinformatics Other
Applications Prof. Nathan Intrator Presented
By Adam Segoli Schubert May 16, 2005
2Overview
- Gene Expression
- Time Series
- Statistical Analysis of Time-Series
- DNA Microarray
- Gene Expression Time-Series
- Analyzing Gene Expression Time-Series Data
- Estimating Unobserved Expression Values and Time
Points - What is a Spline?
- Using the Splines
- Parameters Analysis
- Aligning Time-Series Data
- Aligning Temporal Data Using Splines
- Results Unobserved Data Estimation
- Result - Aligning Temporal Data
- References
3Gene Expression
4Time-Series
- A series of values of variables taken in
successive periods of time - Time Points
- Sampling Intervals (constant / inconstant)
- A well established area in statistical analysis
of data is dedicated to the study of time-series
5Statistical Analysis of Time-Series
- Two main goals
- Identifying the nature of the phenomenon
- Predicting unobserved values of the time-series
variable
6DNA Microarray
- Allows the monitoring of expression levels of
thaousands of genes under a variety of
conditions. - The data of microarray experiments is usually in
the form of a large matrix. - Very Expensive.
7Gene Expression Time-Series
- Determined by measuring mRNA levels or protein
concentrations - Commonly are very short (i.e. 4 to 20 samples)
- Usually unevenly sampled
- The measuring techniques are extremely
noise-prone and/or subject to bias in the
biological measurements.
8Analyzing Gene Expression Time-Series Data
- Estimating Unobserved Expression Values and Time
Points - Aligning Time-Series Data
9Estimating Unobserved Expression Values and Time
Points
- Row Average or Filling with Zeros
- Singular Value Decomposition (SVD)
- Weighted K-Nearest Neighbors
- Linear Interpolation
10A New Analysis Approach
11What is a Spline?
- A special curve defined piecewise by polynomials.
- Given k points ti called knots in an interval
a,b with - The parametric curve is
called a Spline of degree n if
and - A Cubic Spline if n 3.
12Using the Splines
- We Obtain a continues time formulation by using
cubic splines to represent gene expression
curves. - Spline control points are uniformly spaced.
- We constrain spline coefficients of co-expressed
genes to have the same covariance matrix.
13Estimating Unobserved Data Using Splines
- Given c Genes Classes.
- - The gene i (of class j) value as
observed at time t - Can be written as
14Estimating Unobserved Data Using Splines
- Resampling gene I at any time t of an unobserved
time point -
- Estimating Missing Values
- Averaging of the observed values using the class
covariance matrix , class average and
the gene specific variation . - Where are determined by a
probabilistic model.
15Estimating Unobserved Data Using Splines
Parameters Analysis
- Yi Vector of observed expression values for
gene i. - Si Matrix mxq for m observations.
16Aligning Time-Series Data
- Dynamic Time Wraping
- Developed for voice recognition purposes at the
70s. - Dynamic Programming
- John Aach George M. Church
- operates on individual genes
-
17Aligning Temporal Data Using Splines
- Operates on a set of genes.
- Assume we have two spline curve for gene i
- We define a mapping function T(s) t
18Aligning Temporal Data Using Splines
- We Define the alignment error for each gene
- Alignment Limits
- Starting Point
- Ending Point
19Aligning Temporal Data Using Splines
- We define the error for a set of genes S of size
n as -
- - Weighted coefficients that sum to one
- (uniform / nonuniforn).
20Aligning Temporal Data Using Splines
- The Mapping function (T(s) t) can then be found
by minimizing s value. Using standard
non-linear optimization techniques.
21Results Unobserved Data Estimation
- Comparison of the new approach with
- Linear Interpolation
- Spline interpolation using individual genes
- K-Nearest neighbors (KNN)
- k 20
22(No Transcript)
23(No Transcript)
24Result - Aligning Temporal Data
- Aligned three yeast cell-cycle gene expression
time series
25(No Transcript)
26Thank You!
27References
- C. S. Moller-Levet. Clustering of Gene Expressiom
Time-Series Data. - Biology. Fifth Edition By Neil A. Campbell, Jane
B. Reece, and Lawrence G. Mitchell. - J. Aach and G. M. Church. Aligning gene
expression time series with time warping
algorithms. Bioinformatics, 17495-508, 2001. - C. de Boor. A practical guide to splines.
Springer, 1978. - P. Dhaeseleer, X. Wen, S. Fuhrman, and R.
Somogyi. Linear modeling of mrna expression
levels during cns development and injury. In
PSB99, 1999. - G. James and T. Hastie/ Functional linear
discriminant analysis for irregulary sampled
curves. Jurnal of the Royal Statistical Society,
to appear, 2001. - Sharan R. and Shamir R. Algorithmic approaches to
clustering gene expression data/ current topics
in coputational Biology, To appear. - O. Troyanskaya, M. Cantor, and et al/ Missing
value estimation methods for dna microarrays.
bioinformatics, 17520-525, 2001.