More on Microarrays - PowerPoint PPT Presentation

About This Presentation
Title:

More on Microarrays

Description:

Goal: Find groups of yeast genes whose expression profiles are similar ... Toronen used hierarchical SOM to cluster yeast genes responsible fore diauxic shift. ... – PowerPoint PPT presentation

Number of Views:13
Avg rating:3.0/5.0
Slides: 11
Provided by: publi5
Category:
Tags: fore | microarrays | more

less

Transcript and Presenter's Notes

Title: More on Microarrays


1
More on Microarrays
  • Chitta Baral
  • Arizona State University

2
Some case studies
  • Yeast cell cycle
  • Goal Find groups of yeast genes whose expression
    profiles are similar over a 24-hour period.
  • Approach Obtain gene expression measurements
    using Affymetrix S98 genome microarrays for a
    synchronized sample of yeast cells over a 24-hr
    period by sampling the total RNA populations at
    30-minute intervals.
  • 48 separate time points were sampled twice (for
    duplicate measurements) at each time point. (T1
    T48 and replicates T1T48)
  • Drug intervention study
  • Goal Characterize effect(s) of drug X three
    hours after it is introduced into normal adult
    mice by the expression level of liver cell genes.
  • Approach Gene expression profiles of normal
    adult mice liver cells that are not treated with
    drug X are used as the control state.
  • Call the preintervention or control state A, and
    the post intervention state B
  • For replicate measurements, liver samples were
    obtained without drug X application from MA adult
    mice and another MB adult mice liver samples were
    obtained after drug X was applied.

3
Some potential questions when trying to cluster
  • What uncategorized genes have an expression
    pattern similar to these genes that are
    well-characterized?
  • How different is the pattern of expression of
    gene X from other genes?
  • What genes closely share a pattern of expression
    with gene X?
  • What category of function might gene X belong to?
  • What are all the pairs of genes that closely
    share patterns of expression?
  • Are there subtypes of disease X discernible by
    tissue gene expression?
  • What tissue is this sample tissue closest to?

4
Questions cont.
  • Which are the different patterns of gene
    expression?
  • Which genes have a pattern that may have been a
    result of the influence of gene X?
  • What are all the gene-gene interactions present
    among these tissue samples?
  • Which genes best differentiate these two group of
    tissues?
  • Which gene-gene interactions best differentiate
    these two groups of tissue samples.
  • DIFFERENT ALGORITHMS ARE MORE PARTICULARLY SUITED
    TO ANSWER SOME OF THESE QUESTIONS, COMPARED WITH
    THE OTHERS.

5
Bioinformatics algorithms and some known uses --
Unsupervised
  • Feature determination Determining genes with
    interesting properties, without looking for a
    particular pattern determined a priori.
  • Principal component analysis determine genes
    explaining the majority of the variance in the
    data set.
  • Cluster determination Determine groups of genes
    or samples with similar patterns of gene
    expression.
  • Nearest neighbour clustering clusters are
    decided first , the clusters are calculated and
    each gene is assigned to a single cluster.
  • Self-organizing maps
  • Tamayao et al. used it to functionally cluster
    genes into various patterned time courses in
    HL-60 cell macrophage differentiation.
  • Toronen used hierarchical SOM to cluster yeast
    genes responsible fore diauxic shift.
  • K-means clustering
  • Soukas et al. used it to cluster genes involved
    in leptin signalling.
  • Tavazoie et al. used it to cluster genes with
    common regulatory sequences.

6
k-means clustering basic idea
  • Input n objects (or points) and a number k
  • A set of k-clusters that mimimizes the
    squared-error criterion (sum of squared errors
    Si1k S p in Ci p-mi2, where mi is the mean of
    cluster ci.)
  • Algorithm complexity is O(nkt), where t
    iterations
  • Arbitrarily choose k objects as the initial
    cluster centers
  • Repeat
  • (Re)assign each object to the cluster to which
    the object is the most similar based on the mean
    value of the objects in the cluster.
  • Update the cluster means, (i.e., calculate the
    mean value of the objects for each cluster)
  • Until no change.

7
Pluses and minuses of k-means
  • Pluses Low complexity
  • Minuses
  • Mean of a cluster may not be easy to define (data
    with categorical attributes)
  • Necessity of specifying k
  • Not suitable for discovering clusters of
    non-convex shape or of very different sizes
  • Sensitive to noise and outlier data points (a
    small number of such data can substantially
    influence the mean value)
  • Some of the above objections (especially the last
    one) can be overcomed by the k-medoid algorithm.
  • Instead of the mean value of the objects in a
    cluster as a reference point, the medoid can be
    used, which is the most centrally located object
    in a cluster.

8
K-Medoid clustering
  • Input Set of objects (often points in a
    multi-dimensional space)
  • Output These objects clustered into k clusters
  • Algorithm Complexity O(n2) when k ltltn for each
    iteration
  • Select arbitrarily k representative objects.
  • Mark these objects as selected and mark the
    remaining as non-selected
  • Repeat
  • For all selected objects Oi do
    (complexity O(k(n-k)n))
  • For all non-selected objects Oh compute Cih,
    where
  • Cih, denotes the total cost of swapping i
    with h, i.e., Cih Sj Cjih, where Cjih dhj d
    ij, and dij denotes d(Oi, Oj) the distance
    between Oi and Oj.
  • Select imin, hmin such that Cimin,hmin Mini,h
    Cih (complexity O(k(n-k)))
  • If Cimin,hmin lt 0 then mark Oi as non-selected
    and Oh as selected.
  • Until no change
  • The selected objects now define the clusters.
  • A non-selected object Oj belongs to the cluster
    represented by an object Oi if d(Oi, Oj) Min Oe
    d(Oj, Oe), where min is taken over all selected
    objects Oe.

9
Self organizing maps
  • A neural network algorithm that has been used for
    a wide variety of applications, mostly for
    engineering problems but also for data analysis.
  • SOM can be used at the same time both to reduce
    the amount of data by clustering, and for
    projecting the data nonlinearly onto a
    lower-dimensional display.
  • SOM vs k-means
  • In the SOM the distance of each input from all of
    the reference vectors instead of just the closest
    one is taken into account, weighted by the
    neighborhood kernel h. Thus, the SOM functions as
    a conventional clustering algorithm if the width
    of the neighborhood kernel is zero.
  • Whereas in the K-means clustering algorithm the
    number K of clusters should be chosen according
    to the number of clusters there are in the data,
    in the SOM the number of reference vectors can be
    chosen to be much larger, irrespective of the
    number of clusters. The cluster structures will
    become visible on the special displays

10
Bioinformatics algorithms and some known uses
Unsupervised cont.
  • Cluster determination (cont.)
  • Aggolmerative clustering bottom up method, where
    clusters start as empty, then genes are
    successively added to the existing clusters
  • Dendograms Groups are defined as sub-trees in a
    phylogenetic-type tree created by a comprehensive
    pair-wise dissimilarity measure.
  • 2-D Dendograms
  • Divisive or partitional clustering top-down
    method, where large clusters are successively
    broken down into smaller ones, until each
    sub-cluster contains only one object (gene)
  • Dendograms and 2-D Dendograms.
Write a Comment
User Comments (0)
About PowerShow.com