Context-Specific Bayesian Clustering for Gene Expression Data - PowerPoint PPT Presentation

About This Presentation
Title:

Context-Specific Bayesian Clustering for Gene Expression Data

Description:

Context-Specific Bayesian Clustering for Gene Expression Data ... DDT. sorbitol. Nitrogen Dep. Diauxic shift. YPD. Starvation. YP. Steady. Promoters Analysis ... – PowerPoint PPT presentation

Number of Views:98
Avg rating:3.0/5.0
Slides: 19
Provided by: yoseph3
Category:

less

Transcript and Presenter's Notes

Title: Context-Specific Bayesian Clustering for Gene Expression Data


1
Context-Specific Bayesian Clustering for Gene
Expression Data
  • Yoseph Barash Nir Friedman
  • School of Computer Science Engineering
  • Hebrew University

2
Introduction
  • New experimental methods ? abundance of data
  • Gene Expression
  • Genomic sequences
  • Protein levels
  • Data analysis methods are crucial for
    understanding such data
  • Clustering serves as tool for organizing the data
    and finding patterns in it

3
This Talk
  • New method for clustering
  • Combines different types data
  • Emphasis on learning context-specific description
    of the clusters
  • Application to gene expression data
  • Combine expression data with genomic information

4
The Data
Experiments
Binding Sites
Genes
i
  • Goal
  • Understand interactions between TF and expression
    levels

5
Simple Clustering Model
  • attributes are independent given the cluster
  • Simple model ? computationally cheap
  • Genes are clustered according to both expression
    levels and binding sites

6
Local Probability Models
TF1
TF2
Multinomial
Gaussian
7
Structure in Local Probability Models
TF1
TF2
8
Context Specific Independence
  • Benefits
  • Identifies what features characterize each
    cluster
  • Reduces bias during learning
  • A compact and efficient representation

9
Scoring CSI Cluster Models
  • Represent conditional probabilities with
    different parametric families
  • Gaussian,
  • Multinomial,
  • Poisson
  • Choose parameters priors from appropriate
    conjugate prior families
  • Score
  • where

MarginalLikelihood
Prior
10
Learning Structure Naive Approach
  • A hard problem
  • Standard approach

Basic problem efficiency
11
Learning Structure Structural EM
We can evaluate each edges parameters separately
given complete data for MAP we compute EM only
once for each iteration Guaranteed to converge to
a local optimum
12
Results on Synthetic Data
  • Basic approach
  • Generate data from a known structure
  • Evaluate learned structures for different sample
    numbers (200 800).
  • Add noise of unrelated samples to the training
    set to simulate genes that do not fall into
    nice functional categories (10-30).
  • Test learned model for structure as well as for
    correlation between its tagging and the one
    given by the original model.
  • Main results

Cluster number models with fewer clusters were
sharply penalized. Often models with 1-2
additional clusters got similar score , with
degenerate clusters none of the real samples
where classified to.
Structure accuracy very few false negative edges
, 10-20 false positive edges (score dependent)
Mutual information Ratio max for 800 samples ,
100-95 for 500 and 90 for 200 samples.
13
Yeast Stress Data (Gasch et al 2001)
  • Examines response of yeast to stress situations
  • Total 93 arrays
  • We selected 900 genes that changed in a
    selective manner
  • Treatment steps
  • Initial clustering
  • Found putative binding sites based on clusters
  • Re-clustered with these sites

14
Stress Data -- CSI Clusters
15
CSI Clusters
16
Promoters Analysis
  • Cluster 3
  • MIG1 CCCCGC, CGGACC, ACCCCG
  • GAL4 CGGGCC
  • Others CCAATCA

17
Promoters Analysis
  • Cluster 7
  • GCN4 TGACTCA
  • Others CGGAAAA, ACTGTGG

18
Discussion
  • Goals
  • Identify binding sites/transcription factors
  • Understand interactions among transcription
    factors
  • Combinatorial effects on expression
  • Predict role/function of the genes
  • Methods
  • Integration of model of statistical patterns of
    binding sites (see Holmes Bruno, ISMB00)
  • Additional dependencies among attributes
  • Tree augmented Naive Bayes
  • Probabilistic Relational Models (see poster)
Write a Comment
User Comments (0)
About PowerShow.com