Title: EXPression ANalyzer and DisplayER
1EXPression ANalyzer and DisplayER
- Adi Maron-Katz
- Igor Ulitsky
- Chaim Linhart
- Amos Tanay
- Rani Elkon
Seagull Shavit Roded Sharan Israel
Steinfeld Yossi Shiloh Ron Shamir
Ron Shamirs Computational Genomics Group
2Schedule
- 1015 1110 Expander
- 1110 1130 Amadeus
- 1130 1145 Spike
- 1145 1210 Matisse, FAME
- 1310 1500 Hands-on
3- EXPANDER an integrative package for analysis of
gene expression data - Built-in support for 11 organisms
- human, mouse, rat, chicken, zebra-fish, fly,
- worm, arabidopsis, yeast (sce, pombe), E.coli ()
- Demonstration - on a dataset collected in our
labs
4What can it do?
- Low level analysis
- Missing data estimation (KNN or manual)
- Data adjustments (merge conditions, divide by
base, take log) - Normalization quantile, loess
- Filtering fold change, variation, t-test
- Standardization mean 0 std 1
- High level gene partition analysis
- Clustering
- Biclustering
- Network based clustering
5 What Can it do? (II)
- Ascribing biological meaning to patterns
- Functional analysis (enriched Gene Ontology
terms) - Promoter analysis (over-represented
transcription factor binding sites) - Chromosomal location analysis
- miRNA targets enrichment analysis
- Custom annotations enrichment analysis
- Signaling pathway enrichment analysis and
visualization
6Input data
Normalization/ Filtering
Visualization utilities
Links to public annotation databases
Grouping (Clustering/ Biclustering/ Network
based clustering)
Functional enrichment (TANGO)
Promoter signals (PRIMA)
Location enrichment
miRNA Targets enrichment (FAME)
7EXPANDER Data
- Input data
- Expression matrix (probe-row condition-column)
- One-channel data (e.g., Affymetrix)
- Dual-channel data, in which data is log R/G (e.g.
cDNA microarrays) - .cel files
- ID conversion file maps probes to genes
- Gene sets data defines gene groups
8EXPANDER Data (II)
- Data definitions
- Defining condition subsets
- Data type scale (log)
- Data Adjustments
- Missing value estimation (KNN or arbitrary)
- Merging conditions
- Divide by base
- Log data (base 2)
9EXPANDER Preprocessing
- Normalization removal of systematic biases from
the analyzed chips - Implemented methods quantile, lowess
- Visualization box plots, scatter plots (simple,
M vs. A) - Filtering Focus downstream analysis on the set
of responding genes - Fold-Change
- Variation
- Statistical tests (T-test)
- SAM (Significance Analysis of Microarrays)
- Standardization Mean0, STD1 (visualization)
-
10Input data
Normalization/ Filtering
Visualization utilities
Links to public annotation databases
Grouping (Clustering/ Biclustering/ Network
based clustering)
Functional enrichment (TANGO)
Promoter signals (PRIMA)
Location enrichment
miRNA Targets enrichment (FAME)
11Cluster Analysis
- partition the responding genes into distinct
sets, each with a particular expression pattern - Identify major patterns ? reduce dimensionality
of the problem - co-expression ? co-function
- co-expression ? co-regulation
- Partition the genes to achieve
- High Homogeneity within clusters
- High Separation between clusters
12Cluster Analysis (II)
- Implemented algorithms
- CLICK, K-means, SOM, Hierarchical
- Visualization
- Mean expression patterns
- Heat-maps
- Chromosomal positions
- Network sub-graph
13Example study responses to ionizing radiation
Ionizing Radiation
Double Strand Breaks
14Example study experimental design
- Genotypes Atm-/- and control w.t. mice
- Tissue Lymph node
- Treatment Ionizing radiation
- Time points 0, 30 min, 120 min
- Microarrays Affymetrix U74Av2 (12k probesets)
15Test case - Data Analysis
- Dataset six conditions (2 genotypes, 3 time
points) - Normalization
- Filtering step define the responding genes
set - genes whose expression level is changed by at
least 1.75 fold - 700 genes met this criterion
- The set contains genes with various response
patterns we applied CLICK to this set of genes
16Major Gene Clusters Irradiated Lymph node
Atm-dependent early responding genes
17Major Gene Clusters Irradiated Lymph node
Atm-dependent 2nd wave of responding genes
18Input data
Normalization/ Filtering
Visualization utilities
Links to public annotation databases
Grouping (Clustering/ Biclustering/ Network
based clustering)
Functional enrichment (TANGO)
Promoter signals (PRIMA)
Location enrichment
miRNA Targets enrichment (FAME)
19Ascribe functional meaning to clusters
- Gene Ontology (GO) annotations for human, mouse,
rat, chicken, fly, worm, arabidopsis, zebra-fish,
yeast (sce and pombe) and e.coli. - TANGO Apply statistical tests that seek
over-represented GO functional categories in the
clusters.
20Enriched GO Functional Categories
- Hierarchical structure ? highly dependent
categories. - Problems
- High redundancy
- Multiple testing corrections assume independent
tests - TANGO
21Functional Enrichment - Visualization
22 Functional Categories
cell cycle control (plt1x10-6 )
23 Functional Categories
Cell cycle control (plt5x10-6) Apoptosis (p0.001)
24Input data
Normalization/ Filtering
Visualization utilities
Links to public annotation databases
Grouping (Clustering/ Biclustering/ Network
based clustering)
Functional enrichment (TANGO)
Promoter signals (PRIMA)
Location enrichment
miRNA Targets enrichment (FAME)
25Clues are in the promoters
Identify Transcriptional Regulators
ATM
Hidden layer
?
?
?
?
?
p53
TF-C
TF-B
TF-A
NEW
Observed layer
g3
g13
g12
g10
g9
g1
g8
g7
g6
g5
g4
g11
g2
26Reverse engineering of transcriptional networks
- Infers regulatory mechanisms from gene expression
data - Assumption
- co-expression ? transcriptional co-regulation ?
common cis-regulatory promoter elements - Step 1 Identification of co-expressed genes
using microarray technology (clustering algs) - Step 2 Computational identification of
cis-regulatory elements that are over-represented
in promoters of the co-expressed gene
27PRIMA general description
- Input
- Target set (e.g., co-expressed genes)
- Background set (e.g., all genes on the chip)
- Analysis
- Identify transcription factors whose binding site
signatures are enriched in the Target set with
respect to the Background set. - TF binding site models TRANSFAC DB
- Default From -1000 bp to 200 bp relative the TSS
28Promoter Analysis - Visualization
29PRIMA - Results
30PRIMA Results
NF-?B
5.1
3.8x10-8
p53
4.2
9.6x10-7
STAT-1
3.2
5.4x10-6
Sp-1
1.7
6.5x10-4
31Biclustering
- Clustering becomes too restrictive on large
datasets - Seeks global partition of genes according to
similarity in their expression across ALL
conditions - Relevant knowledge can be revealed by identifying
genes with common pattern across a subset of the
conditions - Novel algorithmic approach is needed
Biclustering
32Biclustering SAMBAStatistical Algorithmic
Method for Bicluster Analysis
A. Tanay, R. Sharan, R. Shamir RECOMB 02
- Bicluster (module) subset of genes with
similar behavior in a subset of conditions - Computationally challenging has to consider
many combinations of sub-conditions
33Biclustering Visualization
34Network based clustering
- Goal to identify modules using gene expression
data and interaction networks. - GE data Interactions file (.sif) .
- MATISSE (Module Analysis via Topology of
Interactions and Similarity SEts).
35Network based clustering visualization
- Similar to clustering visualization (gene list,
mean patterns, heat maps, etc.). - Interactions map
36Input data
Normalization/ Filtering
Visualization utilities
Links to public annotation databases
Grouping (Clustering/ Biclustering/ Network
based clustering)
Functional enrichment (TANGO)
Promoter signals (PRIMA)
Location enrichment
miRNA Targets enrichment (FAME)
37Location analysis
- Goal Detect genes that are located in the same
area and are co-expressed. - Search for over represented chromosomal areas
within gene groups. - Statistical test.
- Redundancy filter
- Ignoring known gene clusters
38Location analysis visualization
- Enrichment analysis visualization
- Positions view with color assignments
39Input data
Normalization/ Filtering
Visualization utilities
Links to public annotation databases
Grouping (Clustering/ Biclustering/ Network
based clustering)
Functional enrichment (TANGO)
Promoter signals (PRIMA)
Location enrichment
miRNA Targets enrichment (FAME)
40miRNA Analysis
- Goal detect microRNAs whose binding sites are
over/under represented in the 3' UTRs of a gene
groups. - FAME Algorithm
- Empirical tests using a sampling technique
(random permutations). - Accounting for biases in the 3' UTR sequences
41Thank you
42Expression Data Input File
conditions
probes
43ID Conversion File
44Gene Sets File
45Normalization Box plots
46Standardization of Expression Levels
47Cluster Analysis Visualization (I)
48Cluster Analysis - Visualization (II)
49Positions visualization