APO-SYS workshop on data analysis and pathway charting

About This Presentation

Title:

APO-SYS workshop on data analysis and pathway charting

Description:

APO-SYS workshop on data analysis and pathway charting. Igor Ulitsky ... Seagull Shavit. Igor Ulitsky. Roded Sharan. Yossi Shiloh. Ron Shamir ... –

Number of Views:64

Avg rating:3.0/5.0

Slides: 34

Provided by: ranie

Category:

more less

Transcript and Presenter's Notes

Title: APO-SYS workshop on data analysis and pathway charting

1
APO-SYS workshop on data analysis and pathway
charting

Igor Ulitsky
Ron Shamirs Computational Genomics Group

2
Part I Presentations

EXPANDER
AMADEUS
SPIKE
MATISSE

3
Part II Hands-on Session

EXPANDER
MATISSE
SPIKE

4
EXPression ANalyzer and DisplayER

Adi Maron-Katz
Chaim Linhart
Amos Tanay
Rani Elkon
Israel Steinfeld

Seagull Shavit Igor Ulitsky Roded Sharan Yossi
Shiloh Ron Shamir
http//acgt.cs.tau.ac.il/expander
5
EXPANDER

Low level analysis
Missing data estimation (KNN or manual)
Normalization quantile, loess
Filtering fold change, variation, t-test
Standardization mean 0 std 1, take log, fixed
norm
High level gene partition analysis
Clustering
Biclustering
Ascribing biological meaning to patterns
Enriched functional categories (Gene Ontology)
Identify transcriptional regulators promoter
analysis
Built-in support for 9 organisms
human, mouse, rat, chicken, zebrafish, fly, worm,
arabidopsis, yeast

6
Input data
Normalization/ Filtering
Links to public annotation databases
Visualization utilities
Clustering (CLICK, SOM, K-means, Hierarchical)
Biclustering (SAMBA)
Functional enrichment (TANGO)
Promoter signals (PRIMA)
7
EXPANDER - Preprocessing

Input data
Expression matrix (probe-row condition-column)
One-channel data (e.g., Affymetrix)
Dual-channel data (cDNA microarrays, data are
(log) ratios between the Red and Green channels)
.cel files
ID conversion file map probes to genes
Gene sets data

Data definitions
Defining condition subsets
Data type scale (log)

8
EXPANDER Preprocessing (II)

Data Adjustments
Missing value estimation (KNN or arbitrary)
Merging conditions
Normalization removal of systematic biases from
the analyzed chips
Implemented methods quantile, lowess
Visualization box plots, scatter plots (simple,
M vs. A)

9
EXPANDER Preprocessing (III)

Filtering Focus downstream analysis on the set
of responding genes
Fold-Change
Variation
Statistical tests (T-test)
Standardization Create a common scale
For each probe Mean0, STD1
Log data (base 2)
Fixed Norm (divide by norm of probe vector)

10
Input data
Normalization/ Filtering
Links to public annotation databases
Visualization utilities
Clustering (CLICK, SOM, K-means, Hierarchical)
Biclustering (SAMBA)
Functional enrichment (TANGO)
Promoter signals (PRIMA)
11
Cluster Analysis

Partition the responding genes into distinct
sets, each with a particular expression pattern
Identify major patterns in the data reduce the
dimensionality of the problem
co-expression ? co-function
co-expression ? co-regulation
Partition the genes to achieve
Homogeneity genes inside a cluster show highly
similar expression pattern.
Separation genes from different clusters have
different expression patterns.

12
Cluster Analysis (II)

Implemented algorithms
CLICK, K-means, SOM, Hierarchical
Visualization
Mean expression patterns
Heat-maps

13
Example study responses to ionizing radiation
Ionizing Radiation
Double Strand Breaks
14
Example study experimental design

Genotypes Atm-/- and control w.t. mice
Tissue Lymph node
Treatment Ionizing radiation
Time points 0, 30 min, 120 min
Microarrays Affymetrix U74Av2 (12k probesets)

15
Test case - Data Analysis

Dataset six conditions (2 genotypes, 3 time
points)
Normalization
Filtering step define the responding genes
set
genes whose expression level is changed by at
least 1.75 fold
Over 700 genes met this criterion
The set contains genes with various response
patterns we applied CLICK to this set of genes

16
Major Gene Clusters Irradiated Lymph node
Atm-dependent early responding genes
17
Major Gene Clusters Irradiated Lymph node
Atm-dependent 2nd wave of responding genes
18
Input data
Normalization/ Filtering
Links to public annotation databases
Visualization utilities
Clustering (CLICK, SOM, K-means, Hierarchical)
Biclustering (SAMBA)
Functional enrichment (TANGO)
Promoter signals (PRIMA)
19
Ascribe Functional Meaning to the Clusters

Gene Ontology (GO) annotations for human, mouse,
rat, chicken, fly, worm, Arabidopsis, Zebrafish
and yeast.
TANGO Apply statistical tests that seek
over-represented GO functional categories in the
clusters.

20
Enriched GO Functional Categories

Hierarchical structure ? highly dependent
categories.
Problems
High redundancy
Multiple testing corrections assume independent
tests
TANGO

21
Functional Enrichment - Visualization
22
Functional Categories
cell cycle control (plt1x10-6 )
23
Functional Categories
Cell cycle control (plt5x10-6) Apoptosis (p0.001)
24
Input data
Normalization/ Filtering
Links to public annotation databases
Visualization utilities
Clustering (CLICK, SOM, K-means, Hierarchical)
Biclustering (SAMBA)
Functional enrichment (TANGO)
Promoter signals (PRIMA)
25
Clues are in the promoters
Identify Transcriptional Regulators
ATM
Hidden layer
?
?
?
?
?
p53
TF-C
TF-B
TF-A
NEW
Observed layer
g3
g13
g12
g10
g9
g1
g8
g7
g6
g5
g4
g11
g2
26
Reverse engineering of transcriptional networks

Infers regulatory mechanisms from gene expression
data
Assumption
co-expression ? transcriptional co-regulation ?
common cis-regulatory promoter elements
Step 1 Identification of co-expressed genes
using microarray technology (clustering algs)
Step 2 Computational identification of
cis-regulatory elements that are over-represented
in promoters of the co-expressed gene

27
PRIMA general description

Input
Target set (e.g., co-expressed genes)
Background set (e.g., all genes on the chip)
Analysis
Identify transcription factors whose binding site
signatures are enriched in the Target set with
respect to the Background set.
TF binding site models TRANSFAC DB
Default From -1000 bp to 200 bp relative the TSS

28
Promoter Analysis - Visualization
29
PRIMA - Results
30
PRIMA Results
P-value Enrichment factor Transcription factor
6.0x10-5 2.6 CREB
P-value Enrichment factor Transcription factor

NF-?B
5.1
3.8x10-8
p53
4.2
9.6x10-7
STAT-1
3.2
5.4x10-6
Sp-1
1.7
6.5x10-4
31
Input data
Normalization/ Filtering
Links to public annotation databases
Visualization utilities
Clustering (CLICK, SOM, K-means, Hierarchical)
Biclustering (SAMBA)
Functional enrichment (TANGO)
Promoter signals (PRIMA)
32
Biclustering

Clustering becomes too restrictive on large
datasets
Seeks global partition of genes according to
similarity in their expression across ALL
conditions
Relevant knowledge can be revealed by identifying
genes with common pattern across a subset of the
conditions
Biclustering algorithmic approach

33
A. Tanay, R. Sharan, R. Shamir RECOMB 02
Biclustering SAMBAStatistical Algorithmic
Method for Bicluster Analysis

Bicluster (module) subset of genes with
similar behavior in a subset of conditions
Computationally challenging has to consider
many combinations of sub-conditions

34
Biclustering Visualization
35
Expression Data Input File
conditions
probes
36
ID Conversion File
37
Normalization Box plots
38
Standardization of Expression Levels
39
Cluster Analysis Visualization (I)
40
Cluster Analysis - Visualization (II)

Write a Comment

User Comments (0)