EXPression ANalyzer and DisplayER - PowerPoint PPT Presentation

1 / 41

About This Presentation

Title:

EXPression ANalyzer and DisplayER

Description:

EXPression ANalyzer and DisplayER. Adi Maron-Katz. Igor Ulitsky. Chaim Linhart ... Seagull Shavit. Roded Sharan. Israel Steinfeld. Yossi Shiloh. Ron Shamir ... – PowerPoint PPT presentation

Number of Views:79

Avg rating:3.0/5.0

Slides: 42

Provided by: rani151

Category:

more less

Transcript and Presenter's Notes

Title: EXPression ANalyzer and DisplayER

1
EXPression ANalyzer and DisplayER

Adi Maron-Katz
Igor Ulitsky
Chaim Linhart
Amos Tanay
Rani Elkon

Seagull Shavit Roded Sharan Israel
Steinfeld Yossi Shiloh Ron Shamir
Ron Shamirs Computational Genomics Group
2
Schedule

1015 1110 Expander
1110 1130 Amadeus
1130 1145 Spike
1145 1210 Matisse, FAME
1310 1500 Hands-on

EXPANDER an integrative package for analysis of
gene expression data
Built-in support for 11 organisms
human, mouse, rat, chicken, zebra-fish, fly,
worm, arabidopsis, yeast (sce, pombe), E.coli ()
Demonstration - on a dataset collected in our
labs

4
What can it do?

Low level analysis
Missing data estimation (KNN or manual)
Data adjustments (merge conditions, divide by
base, take log)
Normalization quantile, loess
Filtering fold change, variation, t-test
Standardization mean 0 std 1
High level gene partition analysis
Clustering
Biclustering
Network based clustering

5
What Can it do? (II)

Ascribing biological meaning to patterns
Functional analysis (enriched Gene Ontology
terms)
Promoter analysis (over-represented
transcription factor binding sites)
Chromosomal location analysis
miRNA targets enrichment analysis
Custom annotations enrichment analysis
Signaling pathway enrichment analysis and
visualization

6
Input data
Normalization/ Filtering
Visualization utilities
Links to public annotation databases
Grouping (Clustering/ Biclustering/ Network
based clustering)
Functional enrichment (TANGO)
Promoter signals (PRIMA)
Location enrichment
miRNA Targets enrichment (FAME)
7
EXPANDER Data

Input data
Expression matrix (probe-row condition-column)
One-channel data (e.g., Affymetrix)
Dual-channel data, in which data is log R/G (e.g.
cDNA microarrays)
.cel files
ID conversion file maps probes to genes
Gene sets data defines gene groups

8
EXPANDER Data (II)

Data definitions
Defining condition subsets
Data type scale (log)
Data Adjustments
Missing value estimation (KNN or arbitrary)
Merging conditions
Divide by base
Log data (base 2)

9
EXPANDER Preprocessing

Normalization removal of systematic biases from
the analyzed chips
Implemented methods quantile, lowess
Visualization box plots, scatter plots (simple,
M vs. A)
Filtering Focus downstream analysis on the set
of responding genes
Fold-Change
Variation
Statistical tests (T-test)
SAM (Significance Analysis of Microarrays)
Standardization Mean0, STD1 (visualization)

10
Input data
Normalization/ Filtering
Visualization utilities
Links to public annotation databases
Grouping (Clustering/ Biclustering/ Network
based clustering)
Functional enrichment (TANGO)
Promoter signals (PRIMA)
Location enrichment
miRNA Targets enrichment (FAME)
11
Cluster Analysis

partition the responding genes into distinct
sets, each with a particular expression pattern
Identify major patterns ? reduce dimensionality
of the problem
co-expression ? co-function
co-expression ? co-regulation
Partition the genes to achieve
High Homogeneity within clusters
High Separation between clusters

12
Cluster Analysis (II)

Implemented algorithms
CLICK, K-means, SOM, Hierarchical
Visualization
Mean expression patterns
Heat-maps
Chromosomal positions
Network sub-graph

13
Example study responses to ionizing radiation
Ionizing Radiation
Double Strand Breaks
14
Example study experimental design

Genotypes Atm-/- and control w.t. mice
Tissue Lymph node
Treatment Ionizing radiation
Time points 0, 30 min, 120 min
Microarrays Affymetrix U74Av2 (12k probesets)

15
Test case - Data Analysis

Dataset six conditions (2 genotypes, 3 time
points)
Normalization
Filtering step define the responding genes
set
genes whose expression level is changed by at
least 1.75 fold
700 genes met this criterion
The set contains genes with various response
patterns we applied CLICK to this set of genes

16
Major Gene Clusters Irradiated Lymph node
Atm-dependent early responding genes
17
Major Gene Clusters Irradiated Lymph node
Atm-dependent 2nd wave of responding genes
18
Input data
Normalization/ Filtering
Visualization utilities
Links to public annotation databases
Grouping (Clustering/ Biclustering/ Network
based clustering)
Functional enrichment (TANGO)
Promoter signals (PRIMA)
Location enrichment
miRNA Targets enrichment (FAME)
19
Ascribe functional meaning to clusters

Gene Ontology (GO) annotations for human, mouse,
rat, chicken, fly, worm, arabidopsis, zebra-fish,
yeast (sce and pombe) and e.coli.
TANGO Apply statistical tests that seek
over-represented GO functional categories in the
clusters.

20
Enriched GO Functional Categories

Hierarchical structure ? highly dependent
categories.
Problems
High redundancy
Multiple testing corrections assume independent
tests
TANGO

21
Functional Enrichment - Visualization
22
Functional Categories
cell cycle control (plt1x10-6 )
23
Functional Categories
Cell cycle control (plt5x10-6) Apoptosis (p0.001)
24
Input data
Normalization/ Filtering
Visualization utilities
Links to public annotation databases
Grouping (Clustering/ Biclustering/ Network
based clustering)
Functional enrichment (TANGO)
Promoter signals (PRIMA)
Location enrichment
miRNA Targets enrichment (FAME)
25
Clues are in the promoters
Identify Transcriptional Regulators
ATM
Hidden layer
?
?
?
?
?
p53
TF-C
TF-B
TF-A
NEW
Observed layer
g3
g13
g12
g10
g9
g1
g8
g7
g6
g5
g4
g11
g2
26
Reverse engineering of transcriptional networks

Infers regulatory mechanisms from gene expression
data
Assumption
co-expression ? transcriptional co-regulation ?
common cis-regulatory promoter elements
Step 1 Identification of co-expressed genes
using microarray technology (clustering algs)
Step 2 Computational identification of
cis-regulatory elements that are over-represented
in promoters of the co-expressed gene

27
PRIMA general description

Input
Target set (e.g., co-expressed genes)
Background set (e.g., all genes on the chip)
Analysis
Identify transcription factors whose binding site
signatures are enriched in the Target set with
respect to the Background set.
TF binding site models TRANSFAC DB
Default From -1000 bp to 200 bp relative the TSS

28
Promoter Analysis - Visualization
29
PRIMA - Results
30
PRIMA Results
NF-?B
5.1
3.8x10-8
p53
4.2
9.6x10-7
STAT-1
3.2
5.4x10-6
Sp-1
1.7
6.5x10-4
31
Biclustering

Clustering becomes too restrictive on large
datasets
Seeks global partition of genes according to
similarity in their expression across ALL
conditions
Relevant knowledge can be revealed by identifying
genes with common pattern across a subset of the
conditions
Novel algorithmic approach is needed
Biclustering

32
Biclustering SAMBAStatistical Algorithmic
Method for Bicluster Analysis
A. Tanay, R. Sharan, R. Shamir RECOMB 02

Bicluster (module) subset of genes with
similar behavior in a subset of conditions
Computationally challenging has to consider
many combinations of sub-conditions

33
Biclustering Visualization
34
Network based clustering

Goal to identify modules using gene expression
data and interaction networks.
GE data Interactions file (.sif) .
MATISSE (Module Analysis via Topology of
Interactions and Similarity SEts).

35
Network based clustering visualization

Similar to clustering visualization (gene list,
mean patterns, heat maps, etc.).
Interactions map

36
Input data
Normalization/ Filtering
Visualization utilities
Links to public annotation databases
Grouping (Clustering/ Biclustering/ Network
based clustering)
Functional enrichment (TANGO)
Promoter signals (PRIMA)
Location enrichment
miRNA Targets enrichment (FAME)
37
Location analysis

Goal Detect genes that are located in the same
area and are co-expressed.
Search for over represented chromosomal areas
within gene groups.
Statistical test.
Redundancy filter
Ignoring known gene clusters

38
Location analysis visualization

Enrichment analysis visualization
Positions view with color assignments

39
Input data
Normalization/ Filtering
Visualization utilities
Links to public annotation databases
Grouping (Clustering/ Biclustering/ Network
based clustering)
Functional enrichment (TANGO)
Promoter signals (PRIMA)
Location enrichment
miRNA Targets enrichment (FAME)
40
miRNA Analysis

Goal detect microRNAs whose binding sites are
over/under represented in the 3' UTRs of a gene
groups.
FAME Algorithm
Empirical tests using a sampling technique
(random permutations).
Accounting for biases in the 3' UTR sequences

41
Thank you
42
Expression Data Input File
conditions
probes
43
ID Conversion File
44
Gene Sets File
45
Normalization Box plots
46
Standardization of Expression Levels
47
Cluster Analysis Visualization (I)
48
Cluster Analysis - Visualization (II)
49
Positions visualization

Write a Comment

User Comments (0)