APO-SYS workshop on data analysis and pathway charting

About This Presentation
Title:

APO-SYS workshop on data analysis and pathway charting

Description:

APO-SYS workshop on data analysis and pathway charting. Igor Ulitsky ... Seagull Shavit. Igor Ulitsky. Roded Sharan. Yossi Shiloh. Ron Shamir ... –

Number of Views:64
Avg rating:3.0/5.0
Slides: 34
Provided by: ranie
Category:

less

Transcript and Presenter's Notes

Title: APO-SYS workshop on data analysis and pathway charting


1
APO-SYS workshop on data analysis and pathway
charting
  • Igor Ulitsky
  • Ron Shamirs Computational Genomics Group

2
Part I Presentations
  • EXPANDER
  • AMADEUS
  • SPIKE
  • MATISSE

3
Part II Hands-on Session
  • EXPANDER
  • MATISSE
  • SPIKE

4
EXPression ANalyzer and DisplayER
  • Adi Maron-Katz
  • Chaim Linhart
  • Amos Tanay
  • Rani Elkon
  • Israel Steinfeld

Seagull Shavit Igor Ulitsky Roded Sharan Yossi
Shiloh Ron Shamir
http//acgt.cs.tau.ac.il/expander
5
EXPANDER
  • Low level analysis
  • Missing data estimation (KNN or manual)
  • Normalization quantile, loess
  • Filtering fold change, variation, t-test
  • Standardization mean 0 std 1, take log, fixed
    norm
  • High level gene partition analysis
  • Clustering
  • Biclustering
  • Ascribing biological meaning to patterns
  • Enriched functional categories (Gene Ontology)
  • Identify transcriptional regulators promoter
    analysis
  • Built-in support for 9 organisms
  • human, mouse, rat, chicken, zebrafish, fly, worm,
    arabidopsis, yeast

6
Input data
Normalization/ Filtering
Links to public annotation databases
Visualization utilities
Clustering (CLICK, SOM, K-means, Hierarchical)
Biclustering (SAMBA)
Functional enrichment (TANGO)
Promoter signals (PRIMA)
7
EXPANDER - Preprocessing
  • Input data
  • Expression matrix (probe-row condition-column)
  • One-channel data (e.g., Affymetrix)
  • Dual-channel data (cDNA microarrays, data are
    (log) ratios between the Red and Green channels)
  • .cel files
  • ID conversion file map probes to genes
  • Gene sets data
  • Data definitions
  • Defining condition subsets
  • Data type scale (log)

8
EXPANDER Preprocessing (II)
  • Data Adjustments
  • Missing value estimation (KNN or arbitrary)
  • Merging conditions
  • Normalization removal of systematic biases from
    the analyzed chips
  • Implemented methods quantile, lowess
  • Visualization box plots, scatter plots (simple,
    M vs. A)

9
EXPANDER Preprocessing (III)
  • Filtering Focus downstream analysis on the set
    of responding genes
  • Fold-Change
  • Variation
  • Statistical tests (T-test)
  • Standardization Create a common scale
  • For each probe Mean0, STD1
  • Log data (base 2)
  • Fixed Norm (divide by norm of probe vector)

10
Input data
Normalization/ Filtering
Links to public annotation databases
Visualization utilities
Clustering (CLICK, SOM, K-means, Hierarchical)
Biclustering (SAMBA)
Functional enrichment (TANGO)
Promoter signals (PRIMA)
11
Cluster Analysis
  • Partition the responding genes into distinct
    sets, each with a particular expression pattern
  • Identify major patterns in the data reduce the
    dimensionality of the problem
  • co-expression ? co-function
  • co-expression ? co-regulation
  • Partition the genes to achieve
  • Homogeneity genes inside a cluster show highly
    similar expression pattern.
  • Separation genes from different clusters have
    different expression patterns.

12
Cluster Analysis (II)
  • Implemented algorithms
  • CLICK, K-means, SOM, Hierarchical
  • Visualization
  • Mean expression patterns
  • Heat-maps

13
Example study responses to ionizing radiation
Ionizing Radiation
Double Strand Breaks
14
Example study experimental design
  • Genotypes Atm-/- and control w.t. mice
  • Tissue Lymph node
  • Treatment Ionizing radiation
  • Time points 0, 30 min, 120 min
  • Microarrays Affymetrix U74Av2 (12k probesets)

15
Test case - Data Analysis
  • Dataset six conditions (2 genotypes, 3 time
    points)
  • Normalization
  • Filtering step define the responding genes
    set
  • genes whose expression level is changed by at
    least 1.75 fold
  • Over 700 genes met this criterion
  • The set contains genes with various response
    patterns we applied CLICK to this set of genes

16
Major Gene Clusters Irradiated Lymph node
Atm-dependent early responding genes
17
Major Gene Clusters Irradiated Lymph node
Atm-dependent 2nd wave of responding genes
18
Input data
Normalization/ Filtering
Links to public annotation databases
Visualization utilities
Clustering (CLICK, SOM, K-means, Hierarchical)
Biclustering (SAMBA)
Functional enrichment (TANGO)
Promoter signals (PRIMA)
19
Ascribe Functional Meaning to the Clusters
  • Gene Ontology (GO) annotations for human, mouse,
    rat, chicken, fly, worm, Arabidopsis, Zebrafish
    and yeast.
  • TANGO Apply statistical tests that seek
    over-represented GO functional categories in the
    clusters.

20
Enriched GO Functional Categories
  • Hierarchical structure ? highly dependent
    categories.
  • Problems
  • High redundancy
  • Multiple testing corrections assume independent
    tests
  • TANGO

21
Functional Enrichment - Visualization
22
Functional Categories
cell cycle control (plt1x10-6 )
23
Functional Categories
Cell cycle control (plt5x10-6) Apoptosis (p0.001)
24
Input data
Normalization/ Filtering
Links to public annotation databases
Visualization utilities
Clustering (CLICK, SOM, K-means, Hierarchical)
Biclustering (SAMBA)
Functional enrichment (TANGO)
Promoter signals (PRIMA)
25
Clues are in the promoters
Identify Transcriptional Regulators
ATM
Hidden layer
?
?
?
?
?
p53
TF-C
TF-B
TF-A
NEW
Observed layer
g3
g13
g12
g10
g9
g1
g8
g7
g6
g5
g4
g11
g2
26
Reverse engineering of transcriptional networks
  • Infers regulatory mechanisms from gene expression
    data
  • Assumption
  • co-expression ? transcriptional co-regulation ?
    common cis-regulatory promoter elements
  • Step 1 Identification of co-expressed genes
    using microarray technology (clustering algs)
  • Step 2 Computational identification of
    cis-regulatory elements that are over-represented
    in promoters of the co-expressed gene

27
PRIMA general description
  • Input
  • Target set (e.g., co-expressed genes)
  • Background set (e.g., all genes on the chip)
  • Analysis
  • Identify transcription factors whose binding site
    signatures are enriched in the Target set with
    respect to the Background set.
  • TF binding site models TRANSFAC DB
  • Default From -1000 bp to 200 bp relative the TSS

28
Promoter Analysis - Visualization
29
PRIMA - Results
30
PRIMA Results
P-value Enrichment factor Transcription factor
6.0x10-5 2.6 CREB
P-value Enrichment factor Transcription factor




NF-?B
5.1
3.8x10-8
p53
4.2
9.6x10-7
STAT-1
3.2
5.4x10-6
Sp-1
1.7
6.5x10-4
31
Input data
Normalization/ Filtering
Links to public annotation databases
Visualization utilities
Clustering (CLICK, SOM, K-means, Hierarchical)
Biclustering (SAMBA)
Functional enrichment (TANGO)
Promoter signals (PRIMA)
32
Biclustering
  • Clustering becomes too restrictive on large
    datasets
  • Seeks global partition of genes according to
    similarity in their expression across ALL
    conditions
  • Relevant knowledge can be revealed by identifying
    genes with common pattern across a subset of the
    conditions
  • Biclustering algorithmic approach

33
A. Tanay, R. Sharan, R. Shamir RECOMB 02
Biclustering SAMBAStatistical Algorithmic
Method for Bicluster Analysis
  • Bicluster (module) subset of genes with
    similar behavior in a subset of conditions
  • Computationally challenging has to consider
    many combinations of sub-conditions

34
Biclustering Visualization
35
Expression Data Input File
conditions
probes
36
ID Conversion File
37
Normalization Box plots
38
Standardization of Expression Levels
39
Cluster Analysis Visualization (I)
40
Cluster Analysis - Visualization (II)
Write a Comment
User Comments (0)
About PowerShow.com