Modelbased clustering of gene expression data from microarray experiments - PowerPoint PPT Presentation

1 / 21
About This Presentation
Title:

Modelbased clustering of gene expression data from microarray experiments

Description:

Model-based clustering of gene ... Results. Discussion. Introduction ... model-based clustering methodology is that the strength of evidence measure for ... – PowerPoint PPT presentation

Number of Views:300
Avg rating:3.0/5.0
Slides: 22
Provided by: great8
Category:

less

Transcript and Presenter's Notes

Title: Modelbased clustering of gene expression data from microarray experiments


1
Model-based clustering of gene expression data
from microarray experiments
  • Debashis Ghosh, Arul M. Chinnaiyan
  • Department of Biostatistics, University of
    Michigan
  • Department of Pathology, University of Michigan
  • yfhuang

2
Outline
  • Introduction
  • Systems and Methods
  • Algorithm
  • Implementation
  • Results
  • Discussion

3
Introduction
  • Large collections of gene will be rapidly
    available for parallel genomic studies
  • Currently two platforms to dominate the
    microarray field oligonucleotide arrays and
    spotted cDNA arrays

4
Introduction (Cont)
  • Clustering methods are useful when the goal is to
    discover grouping in the gene expression data,
    and no external information exists.
  • An attractive feature of the model-based approach
    is that it provide a statistical criterion for
    accessing the number of true clusters in the
    dataset of interest.

5
Systems and Methods
  • Data Preprocessing
  • Model Specification
  • Density function
  • Multivariate normal density

6
Algorithm
  • Two Steps in fitting model to the data
  • Initialization by model-based hierarchical
    agglomerative clustering
  • Maximum likelihood estimation using EM algorithm
  • A criteria for determining the number of clusters
    in the data

7
Hierarchical Agglomerative Clustering
  • Classification log-likelihood
  • To find the maximizer of classification
    log-likelihood
  • Base on a combination of the dissimilarity matrix
    and a method of defining distance between clusters

8
Expectation-Maximization Algorithm
  • Complete-data likelihood
  • Log-likelihood

9
Expectation-Maximization Algorithm (Cont)
  • Estimator

10
Selecting the number of clusters
  • Determining the number of clusters based on Bayes
    Factor
  • Bayes Factor
  • Bayesian Information Criterion (BIC)

11
Implementation
  • Model-based clustering in microarray studies
  • Analysis of genes and ESTs (Expression Sequence
    Tag)
  • Number of samples profiled (n) ltlt the number of
    genes on a microarray (p) ? impossible to fit the
    model to these data

12
Implementation (Cont)
  • Model-based clustering of genes
  • Using k-means clustering for preprocessing step
  • Model-based clustering of samples
  • Using principal components analysis for dimension
    reduction
  • Using Bayes factors to determine the number of
    clusters
  • Software
  • Using MCLUST

13
Result
  • Cutaneous melanoma data
  • Data cDNA microarray experiments performed by
    Bittner et al.
  • 31 melanoma samples
  • Microarray contains 8150 human cDNA, of which
    6912 were sequence verified.

14
Result (Cont)
  • Prostate cancer data
  • Data 3955 genes across the 26 samples
  • Original data 9984 (5000 known gene from
    Research Genetics human cDNA clone set, 4400
    ESTs, 500 control elements)

15
Cluster Dendrogram (Bittner)
16
PCA of Melanoma Data
17
(No Transcript)
18
PCA of Prostate Cancer Data
19
(No Transcript)
20
Discussion
  • A attractive feature of the model-based
    clustering methodology is that the strength of
    evidence measure for the number of true clusters
    in the data is computed
  • Bayes factor for model specification

21
Reference
  • Principal Components Analysis(PCA)
  • Bayes Factor
Write a Comment
User Comments (0)
About PowerShow.com