Biology-Driven Clustering of Microarray Data: - PowerPoint PPT Presentation

About This Presentation
Title:

Biology-Driven Clustering of Microarray Data:

Description:

Title: Biology-Driven Clustering of Microarray Data Last modified by: Kevin Coombes Created Date: 9/30/1996 6:28:10 PM Document presentation format – PowerPoint PPT presentation

Number of Views:79
Avg rating:3.0/5.0
Slides: 14
Provided by: bioinforma2
Category:

less

Transcript and Presenter's Notes

Title: Biology-Driven Clustering of Microarray Data:


1
Biology-Driven Clustering of Microarray Data
  • Applications to the NCI60 Data Set

K.R. Coombes, K.A. Baggerly, D.N. Stivers, J.
Wang, D. Gold, H.G. Sung, and S.J. Lee
2
Introduction
Methods
  • Most analyses of microarray data proceed as
    though it were simply a large, unstructured
    matrix. Such analyses ignore substantial amounts
    of existing biological information. In the study
    of cancer, we already know many important genes
    through their involvement in specific biological
    processes, and we know that reproducible
    chromosomal abnormalities play an important role.
    We see a need for developing analytic strategies
    that exploit this biological information.
  • We analyzed the NCI60 data set by first
    determining the chromosomal location and
    biological function of the genes on the
    microarray. We performed separate analyses using
    genes on individual chromosomes and genes
    involved in different biological processes. The
    fundamental advantage of this approach is that it
    provides results that are immediately and
    directly interpretable without resorting to ex
    post facto rationalizations.

3
How many genes on the microarray have good
annotations?
  • Problem
  • I.M.A.G.E. clone IDs and GenBank accession
    numbers are archival.
  • UniGene clusters, gene names, descriptions, etc.,
    are changeable.
  • Solution
  • Download the latest version of UniGene (build
    137) and LocusLink (July 2001) to update
    annotations, using the GenBank accession numbers
    describing both 3 and 5 ends of the genes
    spotted on the microarrays.

Table 1 There are only 7478 spots (out of
10,000) on the array with valid, matching UniGene
cluster IDs. Genes with unknown or conflicting
annotations were eliminated before performing any
further analysis.
4
Where are the genes located?
We compared the number of genes on the microarray
that mapped to each chromosome with the number
known to be on the chromosome, based on current
figures from the NCBI. A chi-squared test was
used to test whether the distribution of genes on
chromosomes was uniform.
Figure 1 Distribution of the genes on the array
by chromosome. Chromosomes 19 and Y are
substantially underrepresented when compared to
the numbers known to LocusLink chromosomes 6 and
13 are overrepresented.
5
How do we determine gene functions?
  • Using our updated UniGene clusters, we followed
    the links from UniGene to LocusLink to
    GeneOntology.
  • GeneOntology is a structured, hierarchical
    vocabulary to describe gene functions in three
    broad areas
  • biological process (why)
  • molecular function (what)
  • cellular component (where)
  • The 7478 good spots on the array corresponded to
    6614 distinct genes, of which 5074 were known to
    LocusLink, and 2989 had at least one annotation
    in GeneOntology.
  • We focused on the biological process annotations
    in the GeneOntology vocabulary, since these had
    the most natural interpretation for application
    to the study of cancer. We counted the number of
    genes having annotations of functions at or below
    each level in the hierarchy, and selected a set
    of categories that each contained roughly one to
    a few hundred genes, with the categories as a
    whole accounting for more than 95 of all
    annotations (Table 2).

6
What functional categories are represented on the
array?
Table 2 The number of annotations (Ann.) into
and the number of spots on the array in various
functional categories chosen from the biological
process annotations from LocusLink into
GeneOntology. Individual spots may have multiple
annotations into the same category individual
genes may be represented by multiple spots.
7
How good is a dendrogram?
We introduced a quality grade, based on the
dendrograms, to describe how well each set of
genes used to produce a dendrogram classifies
each kind of cancer
  • A there is a cluster containing all and only
    one kind of cancer
  • B all, with one or two extras
  • C all except one
  • D all except one, with extras
  • E all except two
  • F all except two, with extras

Grades for the dendrogram of Figure 2 are
displayed in the following table.
Figure 2 Dendrogram using all genes with valid
annotations and with expression levels
above those of the blank spots.
8
Heterogeneity of different types of cancer
  • Some cancers (colon, leukemia) are fairly
    homogeneous and easy to distinguish from others.
  • Some (breast, lung) are so heterogeneous as to be
    nearly impossible to distinguish.
  • Some chromosomes (1, 2, 6, 7, 9, 12, 17) can
    distinguish many types of cancer.
  • Some (16, 21) can not accurately distinguish any
    kind of cancer. The dendrograms using genes from
    these chromosomes are equivalent to randomly
    scrambling of the cancer cell lines.

Table 3 Grades given to dendrograms that cluster
samples by genes on specific chromosomes. Grades
range from A to F, with blanks indicating no
clustering for that type of sample.
Abbreviations Bbreast, Ccolon, Lleukemia,
Mmelanoma, Nnon small cell lung, Oovarian,
Pprostate, Rrenal, Scentral nervous system.
9
Chromosome 2
Figure 3 The genes on chromosome 2 do
an excellent job of distinguishing cancer types.
We can also locate specific clusters of genes on
the chromosome with strong signatures
identifying leukemia, melanoma, and colon cancer.
10
Chromosome 16
Figure 4 Genes on chromosome 16 cannot
reliably distinguish any single kind of cancer in
this study. There are, nevertheless, strong gene
signatures driving the clustering, which does not
appear to match anything we know about the
biology of the samples.
11
Protein Metabolism
Figure 5 The genes involved in protein
metabolism do an excellent job of distinguishing
cancer types. We can also locate specific
clusters of genes on the chromosome with strong
signatures identifying leukemia, colon cancer,
lung cancer, and central nervous system cancer.
12
Apoptosis
Figure 6 The genes involved in apoptosis do a
poor job of distinguishing cancer types. This
suggests that the mechanisms by which cancers
overcome cell death cut across the normal
biological lines drawn by histology.
13
Conclusions
  • Functional categories that are good at
    distinguishing cancers include signal
    transduction, cell cycle, cell proliferation, and
    protein metabolism. Some differences result from
    the histology of the underlying tissue. Others
    reflect differences in the way particular kinds
    of cancers overcome limits on cell growth.
  • Categories that are poor at distinguishing
    cancers include energy pathways and apoptosis.
    The latter observation has potential implications
    for cancer therapies designed to trigger
    apoptosis, since it suggests that the mechanisms
    by which cancer cells avoid cell death are not
    linked to the general type of cancer but are
    either common across cancers or idiosyncratic.
  • Multiple views into the data provide substantial
    insight into differences in cancer types and gene
    sets.
  • Cancer types differ greatly in their degree of
    heterogeneity, ranging from homogeneous (colon,
    leukemia) through moderately heterogeneous
    (renal, melanoma) to extremely heterogeneous
    (breast and lung).
  • Homogeneous cancers exhibit strong identifying
    signals across most views of the data, regardless
    of function or chromosome.
  • There are large difference in the ability of
    genes of different chromosomes to distinguish
    cancer types. There are similar differences for
    genes involved in different biological processes
    (data not shown).
Write a Comment
User Comments (0)
About PowerShow.com