Jacques van Helden jvanheld@ucmb.ulb.ac.be - PowerPoint PPT Presentation

1 / 13
About This Presentation
Title:

Jacques van Helden jvanheld@ucmb.ulb.ac.be

Description:

each one of the p columns contains information ... SIC1. MAT. CLN2. Y' MET. Alpha. cdc15. cdc28. Elu. Spellman et al. (1998). Mol Biol Cell 9(12), 3273-97. ... – PowerPoint PPT presentation

Number of Views:50
Avg rating:3.0/5.0
Slides: 14
Provided by: jacquesv8
Category:
Tags: helden | jacques | jvanheld | sic1 | ucmb | ulb | van

less

Transcript and Presenter's Notes

Title: Jacques van Helden jvanheld@ucmb.ulb.ac.be


1
Visualization
  • Statistical Analysis of Microarray Data

2
Reduction in data dimension
  • Statistical Analysis of Microarray Data

3
Why to reduce dimensionality ?
  • A series of microarrays can be represented as a N
    x p matrix, where
  • each one of the p columns contains information
    about an experiment (different conditions,
    treatments, tissues)
  • each one of the N rows contains information about
    a spot (gene)
  • Object dimensions
  • Each gene can be considered as a p-dimensional
    object (one dimension per experiment).
  • Each experiment can be considered as a
    N-dimensional object (one dimension per gene).
  • Visualization
  • Visualization devices are restricted to 2
    (printer) or at best 3 (space explorer)
    dimensions.
  • One would thus like to display objects in 2D or
    3D, whilst retaining the maximum of information.
  • After reduction of dimensions, some clusters may
    already appear in the data set.
  • Analysis
  • Some analysis methods loose their accuracy when
    there are too many vriables (over-fitting).
  • Reducing the data to a subset of dimensions will
    allow a trade-of between the loss of information
    and the gain in accuracy. In this case, the
    appropriate number of dimensions may be higher
    than 3, its choice depends on the data itself
    (e.g. number of objects per training group).

4
How to reduce dimensionality ?
  • Several methods are available for reducing the
    number of dimensions of a data set
  • Principal Component Analysis
  • Singular Value Decomposition
  • Spring embedding

5
Principal component analysis
  • Multidimensional data
  • n objects, p variables (in this case p2)
  • Principal components
  • n objects, p factors
  • Each factor is a linear combination of variables
  • Reduction in dimensions
  • Selection of a subset of principal components
  • q factors, with q lt p (in this case, q1)

A
B
C
Gilbert, D., Schroeder, M. van Helden, J.
(2000). Trends in Biotechnology 18) 487-495.
6
PCA example - gene expression data
7
PCA example - gene expression data
  • Data set
  • n114 objects (genes)
  • p8 variables (chips)
  • Drawing
  • The 2 most explanatory factors are used as X and
    Y axis
  • Red arrows represent projection of the initial
    axes (variables) onto the 2 principal component
    plane.
  • The central cloud is made of MET and control
    genes, whereas the PHO genes are outside.

8
PCA example - gene expression data
  • Data set
  • n5783 objects (genes)
  • p8 variables (chips)
  • Drawing
  • The 2 most explanatory factors are used as X and
    Y axis
  • A few points are clearly outside the cloud.

9
Data reduction with principal components
  • Data from Gasch (2000). Growth on alternate
    carbon sources (11 chips).
  • The plot represents the two first components
    after PCA transformation
  • Pink dots represent genes which are significantly
    regulated in at least one chip
  • Beware the 2 first components are not sufficient
    to highlight all the regulated genes in the 11
    conditions

10
Multidimensional scaling
  • Data from Gasch (2000). Growth on alternate
    carbon sources (11 chips).
  • Subset of 398 genes significantly regulated in at
    least one chip
  • Singular value decomposition on correlation
    matrix

11
Singular value decomposition
Cell cycle data
Random data
Cell cycle data
Random data
  • Calculate a distance matrix between objects
  • in this case Pearson's coefficient of correlation
  • Assign 2D-coordinates which reflect at best the
    distances

12
Singular value decomposition
Gilbert et al. (2000). Trends Biotech. 18(Dec),
487-495.
13
Adapted from Gilbert et al. (2000). Trends
Biotech. 18(Dec), 487-495.
Raw data
Visualization
Processing
  • Matrix
  • n rows
  • p columns
  • coloring
  • Ordering (optional)
  • row swapping
  • column swapping

Matrix viewer
  • Dendrogram
  • rooted
  • unrooted
  • n leaves

Tree drawing
Clusters,Tree
Clustering
  • Multivariate data matrix
  • n objects
  • p variables

Pairwise distance measurement
  • Distance matrix
  • n x n distances
  • symmetrical

Coloring (optional)
  • Euclidian space
  • 1D to 3D
  • n dots
  • coloring
  • dot volume
  • interactive
  • Multidimensional scaling
  • PCoA
  • spring embedding

Space explorer (VRML)
  • Coordinates
  • n elements
  • d dimensions

Principal component analysis
  • Normalization
  • mean
  • variance
  • covariance
  • Normalized table
  • n elements
  • p dimensions

Reduction to significant dimensions
Write a Comment
User Comments (0)
About PowerShow.com