Examples of Clustering Biological Data - PowerPoint PPT Presentation

1 / 18
About This Presentation
Title:

Examples of Clustering Biological Data

Description:

Two computer scientists or quantitative experts. Pick a research project related to the class ... 'There are 2n-1 linear orderings consistent with the structure ... – PowerPoint PPT presentation

Number of Views:48
Avg rating:3.0/5.0
Slides: 19
Provided by: leannecon
Learn more at: http://psrg.lcs.mit.edu
Category:

less

Transcript and Presenter's Notes

Title: Examples of Clustering Biological Data


1
Examples of Clustering Biological Data
  • 6.892 / 7.93 Spring Term 2002
  • March 12, 2002
  • David Gifford

2
Student Projects
  • Ideally groups of four
  • Two biologists
  • Two computer scientists or quantitative experts
  • Pick a research project related to the class
  • You will have access to data, Matlab, etc.
  • Work product
  • Preliminary outline of plans (5 minutes in class)
  • Final presentation (30 minutes)
  • Talk and slides are graded

3
Ordering effect
4
The problem
There are 2n-1 linear orderings consistent with
the structure of the tree. An optimal linear
ordering, one that maximizes the similarity of
adjacent elements in the ordering, is impractical
to compute.
Eisen et al, PNAS 98
5
Problem definition
Denote by ? the space of the possible linear
orderings consistent with the tree. Denote by v1
vn the tree leaves. Our goal is to find an
ordering that maximizes the similarity of
adjacent elements
where S is the similarity matrix
6
Computing the optimal similarity
Recursively compute the optimal similarity
OT(u,w) for any pair of leaves (u,w) which could
be on different corners (leftmost and rightmost)
of T. For a leaf u?T, CT(u) is the set of all
possible corner leaves of T when u is on one
corner of T.
T
T2
T1
w
u
k
m
OT(u,w) maxm?CT1(u),k?CT2(w) OT1(u,m)
OT2(k,w) S(m,k)
7
Improvement
worst time is still O(n4) but
8
Running time biological datasets
Results obtained on 700 Mhz Pentium pc with 512M
memory running Linux
9
Does it help ?
Recall the statement we started with - An
optimal linear ordering, one that maximizes the
similarity of adjacent elements in the
ordering, is impractical to compute.

10
Results hand generated data
Input
11
Biological results
  • Spellman identified 800 genes as cell cycle
    regulated in Saccharomyces cerevisiae.
  • Genes were assigned to five groups termed
    G1,S,S/G2,G2/M and M/G1 which approximate the
    commonly used cell cycle groups in the
    literature.
  • This assignment was performed using a phasing
    method which is a supervised classification
    algorithm.
  • In addition to the phasing method, the authors
    clustered these genes using hierarchical
    clustering, noting
  • There is no simple relationship between these
    two phasing and clustering methods, although
    there are common features in the results.

12
Cell Cycle 24 experiments of cdc15 temperature
sensitive mutant
Hierarchical clustering
13
24 experiments of cdc15 temperature sensitive
mutant
14
59 experiments, combining cdc15, cdc28 and ?
factor arrest
15
Clustering of the 79 experiments in Eisens
paper. The numbers to the right of each gene
represents the complex to which it belongs
according to the MIPS database.
Optimal ordering
Hierarchical clustering
16
Using optimal ordering to identify the different
clusters. 24 experiments of cdc15 mutant from
Spellmans paper.
0
1
0
1
17
Clustering Demos
  • K-means
  • Demo 1 3 clusters in data, k 3
  • Demo 2 1 cluster in data, k 2
  • Mixture Models
  • Demo 3 1 cluster in data, k 2 (same data as
    Demo 2)
  • Demo 4 3 clusters in data, k 2
  • Demo 5 3 clusters in data, k 3
  • Ill put the code on the web site

18
Clustering Summary
  • Clustering allows you to organize data and see
    patterns
  • Can reduce the dimensionality of data (e.g. PCA)
  • Ultimately, we would like to use clusters to
    explain biological phenomenon
  • The first step is classification
Write a Comment
User Comments (0)
About PowerShow.com