ContextSpecific Bayesian Clustering for Gene Expression Data

About This Presentation

Title:

Description:

Number of Views:45

Avg rating:3.0/5.0

Slides: 23

Provided by: Kyubae8

Category:

more less

Transcript and Presenter's Notes

Title: ContextSpecific Bayesian Clustering for Gene Expression Data

1
Context-Specific Bayesian Clustering for Gene
Expression Data

2
Abstract

Clustering of genes by their expression levels
and their putative TF binding sites
Extended naïve Bayes classifier
Context-specific independencies
Structural EM algorithm
Experiments on the yeast dataset

3
Introduction

A central goal of molecular biology
To understand the regulation of protein synthesis
New technologies
DNA sequencing (promoter regions that contain the
binding sites of transcription factors)
DNA microarrays
The hypothesis
Genes with a common functional role have similar
expression patterns across different experiments
and similar binding sites
Clustering of genes

4
Naïve Bayesian Clustering

5
Naïve Bayesian Clustering (Contd)

6
Naïve Bayesian Clustering (Contd)

7
Selective Naïve Bayesian Models

8
Context-Specific Independence

9
Learning of Bayesian Clustering Models

10
The Bayesian Score

11
The Bayesian Score (Contd)

12
For Complete Data

13
For Incomplete Data

14
Learning CSI Clustering

15
Structural EM Procedure

Initialization
The full model
The random model
To escape local maxima
A random first-ascent hill-climb search
Restart the structural EM algorithm and find the
local maxma again
If the score of this model is better than the
former, restart the structural EM algorithm

16
Simulation Studies

17
Result on the Simulation Studies

18
Biological Data

Budding yeast gene expression data
Spellmans cell-cycle data (800 genes and 77
experiments)
Garshs stress data (1271 genes and 92
experiments)
The number of putative binding sites in the
1000bp upstream of the ORF (from SCPD and
TRANSFAC) for each gene
With MatInspector program
The whole upstream region and four sub-regions
Initialization point
k-means clustering with only gene expression data